sql joins and query optimization
Embed Size (px)
DESCRIPTIONA visual explanation of how to join database tables and tips on how to make your queries run faster, better, stronger.
- 1. SQL Joinsand Query OptimizationHow it worksby Brian Gallagherwww.BrianGallagher.com G
2. WhyWh JOIN? Combine data from multiple tables Reduces the number of queries needed Moves processing burden to databaseserver Improves data integrity (over combiningdata ind t i program code)d ) 3. Types of JOINsTf JOIN INNER the most common, return all rows with matching records, g SELECT * FROM T1 INNER JOIN T2 ON T1.fld1 = T2.fld2 LEFT or LEFT OUTER return all rows on the left (first) table and right (second) If no matching record on the right side, NULL-values for each field are returned SELECT * FROM T1 LEFT OUTER JOIN T2 ON T1 fld1 = T2 fld2T1.fld1 T2.fld2 FULL or FULL OUTER return all rows on the left (first) table and right (second) If no matching record on the left or right side, NULL-values for each field are returned S SELECT * FROM T1 FULL OU C O U OUTER JO JOIN T2 ON T1.fld1 = T2.fld2O . d. d CROSS the least common, return all possible row combinations SELECT * FROM T1, T2 RIGHT or RIGHT OUTER Dont use them without a very good reason. They do not add any functionality over a LEFT JOIN and make code more confusing Works the same as the LEFT and LEFT OUTER JOINs, but the second table is the one which all rows are returned from. These two queries are functionally identical: SELECT * FROM T1 LEFT JOIN T2 ON T1.fld1 = T2.fld2 SELECT * FROM T2 RIGHT JOIN T1 ON T1.fld1 = T2.fld2 4. Sample T t DataS l Test D t Sample Customer and Job records 5. INNER JOIN 6. LEFT (OUTER) JOIN ( ) 7. One-To-Many LEFT JOINy 8. RIGHT (OUTER) JOIN( ) 9. FULL (OUTER) JOIN ( ) 10. FULL (OUTER) JOIN FULL JOIN is not supported in MS-Access Can be emulated using UNION queries TheyTh are rarely used l d 11. CROSS JOINsThe Join you never useexcept every time 12. Making BigM ki it Bi A CROSS JOIN combines every row on the lefttable with every row in the right table Resulting recordset will b th t t l of b th rowR lti d t ill be the total f bothcounts multiplied together Leftrow has 10 000 records 10,000 Right row has 30,000 records Resulting recordset has 300,000,000 records If no JOIN condition is specified, a CROSS JOINwill be executed 13. JoinsJ i each record t th other t bl hd to the th table 14. CROSS JOINs resulting set of allJOIN srecord combinations 15. How CH Cross J i M k IJoins Make Inner J i Joins Tables are joined using a CROSS JOIN JOIN criteria is evaluated, removing all recordswhich d t match th ON conditionhi h dontt h the ditiSELECT *FROM Customers CINNER JOIN Job JON C.CustID = J.CustID The remaining rows are the recordset returnedfor the INNER JOIN 16. Using ON criteria to select records 17. One-to-Many CROSS JOINy 18. One to Many: Cross Join to Inner Join 19. Unpredictable RUdi t bl Record O dd Orders Databases are not required to return records inany particular order Records may be returned by the order they arestored on disk or may be unordered, dependingon how the query engine handles data If you need data in a particular order, use andd t i ti ldORDER BY clause Some databases may return data presorted byPrimary key or Clustered index DO NOT DEPEND on this behavior It is not reliably portable across different databases Do not make a program rely on a behavior notspecified by the program bad programming style 20. IndexesUnique, clustered, etc. 21. WhatWh t are Indexes I d Indexes are a pre-sorted list of databasefield values This list speeds up queries A LOT An index is much smaller than the fullrecordset An index tracks only the fields specified whencreated 22. Indexes and P fI d d Performance Indexes GREATLY speed up queries on thefields being indexed Indexes SLIGHTLY slow dI dl down INSERTSINSERTS,UPDATES and DELETES The indexes need to be updated anytime a recordchanges This adds a small bit of overhead to the system The increase in query speed usually greatlyoffsets the slight extra load on INSERTS,UPDATES and DELETES Mostdatabase use is typically reading (instead ofwriting) 23. Types of IndexesTfI d Simple Index Indexesthe field specified Can have multiple simple indexes Speeds up queries using the indexed field Composite Index Indexescombinations of fields Can have multiple composite indexes Speeds up queries using the specifiedcombination of fi ld bi ti f fields 24. Simple IndexesSi l I d An Index contains a list of all field valuesand pointers to the record with that valuep 25. How Indexes help SELECTs p 26. Composite I dC it Indexes Useful only when you will be querying multiplefields simultaneously Most selective (resulting in the fewest recordsmatching) fields should be listed first Slight performance gain over multiple SimpleIndexes when used correctly No performance gain (possibly even aperformance loss) when used incorrectly. 27. Composite Index p 28. Searching I dShi Indexes There are high performance algorithms forhigh-performancesearching pre-sorted dataI dIndex data stored i t d t t d internally as bi llbinary ttrees(typically) Searching throughbinary tree muchfaster than readingthrough table 29. Design StrategyMake it workMake it fastMake it prettyp y 30. Focus on whats neededF h t d d Make your query as specific as possible Makesjoins simpler (and therefore faster) Returns no unwanted data Increases response time Reduces network and server load Put as much as possible in the WHEREclause J i your fi ld properlyJoin fieldsl 31. Maximum S l ti it / SM i Selectivity Specificityifi it If differents parts of the WHERE clausewill result in smaller data sets than otherones, list them the ones that will mostggreatly reduce the data firsty List the others in the same order 32. Optimizing QueriesFaster is Better 33. Horizontal P titi iH it l Partitioning Avoid extremely wide records (recordswith lots of fields) ) Split data into two tables with a commonID field All the record data is rarely used in allqueries,queries so it reduces traffic and speedsup processing and disk reads 34. Add I dIndexes Make sure all fields searched on regularlyare indexed 35. Define Clustered IndexD fi Cl t d I d The most-likely criteria to be sorted onshould be defined as your clustered index y Clustered index defines the order therecords will physically be stored on disk 36. Normalize D tNli Data No redundant data Link related data Dont go crazy 37. Denormalize D tDli Data If joining more 4 tables or more in a singleq y,query, consider de-normalizing data g(combining data from external tables backinto the main table)) Canreturn faster query results Risks include having to manage duplicatedata in database and larger disk usage 38. Size records to fit database pagesize SQL Server 7.0, 2000, and 2005 data pages are8K (8192 bytes) in size. Oftheth 8192 available b t i each d t page, only il bl bytes inh data l8060 bytes are used to store a row. The rest is usedfor overhead. If you have a row that was 4031 bytes long, then onlyone row could fit into a data page, and the remaining4029 bytes left in the page would be emptyyp g py Making each record 1 byte shorter would halve the disk access required to read the table 39. Use Primary KU aPi Key Make sure you have a primary key defined Ideally, it should be a single unique field You can define composite keys, but they aregenerally not needed if the database is designedp p yproperly Most tables should have a unique TableNameIDfield This allows you to identify any row by a single uniquevalue 40. UseU an IDENTITY C lColumn If there is: no Primary Key on the table no unique index on the table Then Add an IDENTITY (unique value) column tothe table Optionally, index the IDENTITY column if youwill query it regularly 41. Move TEXT, NTEXT, and IMAGEdata into table These types normally stored outside thetable (uses a pointer to the data in the (pfield) If these types will be searched frequentlyfrequently,consider moving them into the databasetable stables storage with:sp_tableoption tablename, text in row, onorsp_tableoption tablename, text in row, size 42. Consider R li ti i advanceC id Replication in d If Replication is to be used, factor thedecision into the original design of theg gdatabase 43. Use Built-in Referential I t itU B ilt i R f ti l Integrity Use foreign keys and validationconstraints built into the database Dont manage it in the application Benefits: Faster execution Cant mess it up with application errors 44. Re-test hR t t when changing serversh i Retest and rebenchmark applications whenmoving to a new server, such as: Development Staging Production Problems often caused bP blftd by: More rows in test data on servers closer toproduction Server configurations different Tables not indexed the same way 45. Constraints are F tC t i t Fast Constraints on fields are faster than Triggers Rules Defaults 46. Dont duplicate ff tD t d li t effort Dont check for the same thing twice (duh, but it happens) Dont use a trigger and a constraint to do thesame thing g Same for constraints and defaults Same for constraints and rules 47. LimitLi it records t b JOIN d d to be JOINed Use WHERE clause to minimize rows tobe JOINed. Particularly in the OUTER table of an OUTER JOIN 48. Index F i KI d Foreign Keys Fields in a table that are a foreign key arenot indexed automatically jy just becausethey are a foreign key. Add these indexes manually 49. Minimize DuplicateMi i i D li t JOIN fi ld d t field data JOINs are slower when there are fewdifferent keys in the jy joining tableg 50. JOIN on unique indexed fi ldii d d fields JOINs will perform fastest when joiningindexed fields with no duplicate data p 51. JOIN Numeric Fi ld Ni Fields JOINs on numeric fields perform muchfaster than JOINs on other datatype fields.yp 52. JOIN th exact same d t t thetdatatypes JOINs should be to the exact samedatatype for best p yp performance Same type of field Same field length Same encoding (ASCII vs. Unicode, etc) 53. UseU ANSI JOIN syntax t Improves readability Less likely to cause programmer errors No Aliases:SELECT fname, lname, departmentFROM names INNER JOIN departments ONnames.employeeidnames employeeid = departments employeeid departments.employeeid Microsoft syntax example:SELECT fname, lname, departmentFROM names departments names,WHERE names.employeeid = departments.employeeid Code is more portable between databases 54. Use Table AliasesU T bl Ali Shortens code Makes it easier to follow, especially with long queries Identifies which table each field is coming from No table aliases:SELECT fname, lname, departmentFROM names INNER JOIN departments ONnames.employeeid = departments.employeeid Microsoft syntax example:SELECT N.fname, N.lname, D.departmentFROM names N INNER JOIN departments D ONN.employeeid = D.employeeid 55. DontD t use SELECT * Requires additional parsing on the server toextract field names Returns unneccesary data (unless you areactually using every field) Returns duplicate data on JOINs (do you reallyneed two copies of the RecordID field?) Can cause errors (in some databases) if JOINedtables have fields with the same name 56. Store i separate fil in filStint files i filegroup For very large joins placing tables to bejjoined in separate p ypphysical files within thesame filegroup can improve performance SQL Server can spawn separate threadsfor processing each file 57. DontD t use CROSS JOINs JOIN Unless actually needed (rarely) do not usea CROSS JOIN (returns all combinations (of records on both sides of JOIN) People sometimes will do a CROSS JOINand then use DISTINCT and GROUP BYto eliminate all the duplication Dont do that. 58. JOINsJOIN vs. Subquery S b Depending on the specifics, either could be faster. Writeboth and test performance to be sure which is best. JOIN:SELECT a.*FROM Table1 a INNER JOIN Table2 bON a.Table1ID = b.Table1IDWHERE b.Table2ID = SomeValue Subquery:SELECT a.* FROM Table1 a WHERE Table1ID IN a.(SELECT Table1ID FROM Table2 WHERE Table2ID =SomeValue) 59. Avoid Subqueries unless neededA id S b i l d d Avoid subqueries unless actually required Most subqueries can be expressed asJOINs Subqueries are generally harder to readand understand (for humans) and,therefore,therefore maintain 60. Use an Indexed View(Enterprise 2000 and later only) An Indexed View maintains an updatedrecord of how the tables are joined via a jclustered index This slows INSERTs, UPDATEs andINSERTsDELETEs a bit, so consider the tradeofffor the faster queries 61. Use Databases PerformanceDatabase sOptimization Tools Most major databases have methods formonitoring and optimizing database g pgperformance Google databaseName optimizing fordatabaseName optimizingtons of links 62. AvoidA id DISTINCT when possible h ibl DISTINCT clauses are frequently (and usuallyunintentionally) used to hide an incorrect JOIN Properly normalized database will not frequentlyneed DISTINCT clauses Look for incorrect JOINs or create more explicitWHERE clauses to avoid needing DISTINCT Of course, use it when appropriate 63. The End (for now)Google: Query Optimizationfor lots more information