sql outer joins for fun and profit

43
SQL Outer Joins for Fun and Profit Bill Karwin Proprietor/Chief Architect [email protected] www.karwin.com

Upload: karwin-software-solutions-llc

Post on 27-Jan-2015

132 views

Category:

Technology


3 download

DESCRIPTION

Many questions on database newsgroups and forums can be answered with uses of outer joins. Outer joins are part of the standard SQL language and supported by all RDBMS brands. Many programmers are expected to use SQL in their work, but few know how to use outer joins effectively. Learn to use this powerful feature of SQL, increase your employability, and amaze your friends! Karwin will explain outer joins, show examples, and demonstrate a Sudoku puzzle solver implemented in a single SQL query.

TRANSCRIPT

Page 1: SQL Outer Joins for Fun and Profit

SQL Outer Joins for Fun and Profit

Bill Karwin Proprietor/Chief Architect

[email protected] www.karwin.com

Page 2: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 2

Introduction

n  Overview of SQL joins: inner and outer n  Applications of outer joins n  Solving Sudoku puzzles with outer joins

Page 3: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 3

Joins in SQL

n  Joins: n  The SQL way to express relations between data

in tables n  Form a new row in the result set, from matching

rows in each joined table n  As fundamental to using a relational database as

a loop is in other programming languages

Page 4: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 4

Inner joins refresher

n  ANSI SQL-89 syntax: SELECT ... FROM products p, orders o WHERE p.product_id = o.product_id;

n  ANSI SQL-92 syntax: SELECT ... FROM products p JOIN orders o ON p.product_id = o.product_id;

Page 5: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 5

Inner join example

Products product_id

Abc

Def

Efg

Orders product_id order_id

Abc 10

Abc 11

Def 9

Page 6: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 6

Inner join example

Query result set product_id Product

attributes order_id Order

attributes

Abc $10.00 10 2006/2/1

Abc $10.00 11 2006/3/10

Def $5.00 9 2005/5/2

SELECT ... FROM products p JOIN orders o ON p.product_id = o.product_id;

Page 7: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 7

Outer joins

n  Returns all rows in one table, but only matching rows in joined table. Returns NULL where no row matches.

n  Not supported in SQL-89

n  SQL-92 syntax: SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id;

Page 8: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 8

Types of outer joins n  LEFT OUTER JOIN

Returns all rows from table on left. Returns NULLs in columns of right table where no row matches

n  RIGHT OUTER JOIN Returns all rows from table on right. Returns NULLs in columns of left table where no row matches.

n  FULL OUTER JOIN Returns all rows from both tables. Returns NULLs in columns of each, where no row matches.

Page 9: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 9

Support for OUTER JOIN

Open-source RDBMS products:

MySQL PostgreSQL Firebird SQLite Hypersonic HSQLDB

Apache Derby

Ingres R3

LEFT OUTER JOIN

ü ü ü ü ü ü ü RIGHT OUTER JOIN

ü ü ü ü ü ü ü FULL

OUTER JOIN

ü ü ü ü

Page 10: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 10

Outer join example

Products product_id

Abc

Def

Efg

Orders product_id order_id

Abc 10

Abc 11

Def 9

NULL NULL

Page 11: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 11

Outer join example Query result set

product_id Product attributes

order_id Order attributes

Abc $10.00 10 2006/2/1

Abc $10.00 11 2006/3/10

Def $5.00 9 2005/5/2

Efg $17.00 NULL NULL

SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id;

Page 12: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 12

So what?

n  Difference seems trivial and uninteresting n  SQL works with sets and relations n  Operations on sets combine in powerful

ways (just like operations on numbers, strings, or booleans)

INNER JOIN LEFT OUTER JOIN

RIGHT OUTER JOIN

FULL OUTER JOIN

Page 13: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 13

Solutions using outer joins

n  Extra join conditions

n  Subtotals per day n  Localization n  Mimic

NOT IN (subquery)

n  Greatest row per group

n  Top three per group n  Finding attributes in

EAV tables (entity-attribute-value)

n  Sudoku puzzle solver

Page 14: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 14

Extra join conditions

n  Problem: match only with orders created this year.

n  Put extra conditions on the outer table into the ON clause. This applies the conditions before the join:

SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id AND o.date >= '2006-01-01';

Page 15: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 15

Extra join conditions

Products product_id

Abc

Def

Efg

Orders product_id order_id date

Abc 10 2006/2/1

Abc 11 2006/3/10

Def 9 2005/5/2

NULL NULL NULL

Page 16: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 16

Extra join conditions

Query result set product_id Product

attributes order_id Order

attributes

Abc $10.00 10 2006/2/1

Abc $10.00 11 2006/3/10

Def $5.00 NULL NULL

Efg $17.00 NULL NULL

SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id AND o.date >= '2006-01-01';

Page 17: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 17

Subtotals per day

n  Problem: show all days, and the subtotal of orders per day even when there are zero.

n  Requires an additional table containing all dates in the desired range.

SELECT d.date, COUNT(o.order_id) FROM days d LEFT OUTER JOIN orders o ON o.date = d.date GROUP BY d.date;

Page 18: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 18

Subtotals per day

Days date

2005/5/2

. . .

. . .

. . .

. . .

2006/2/1

. . .

. . .

. . .

. . .

2006/3/10

. . .

Orders date order_id

2005/5/2 9

2006/2/1 10

2006/3/10 11

NULL NULL

Page 19: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 19

Subtotals per day Query result set

date COUNT()

2005/5/2 1

. . . 0

. . . 0

. . . 0

. . . 0

2006/2/1 1

. . . 0

. . . 0

. . . 0

. . . 0

2006/3/10 1

. . . 0

SELECT d.date, COUNT(o.order_id) FROM days d LEFT OUTER JOIN orders o ON o.date = d.date GROUP BY d.date;

Page 20: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 20

Localization

n  Problem: show translated messages, or in default language if translation is not available.

SELECT en.message_id, COALESCE(sp.message, en.message) FROM messages AS sp RIGHT OUTER JOIN messages AS en ON sp.message_id = en.message_id AND sp.language = 'sp' AND en.language = 'en';

n  COALESCE() returns its first non-null argument.

Page 21: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 21

Localization

messages message_id language message

123 en Thank you

123 sp Gracias

456 en Hello

NULL

Page 22: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 22

Localization

Query result set message_id message

123 Gracias

456 Hello

SELECT en.message_id, COALESCE(sp.message, en.message) FROM messages AS sp RIGHT OUTER JOIN messages AS en

ON sp.message_id = en.message_id AND sp.language = 'sp' AND en.language = 'en';

Page 23: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 23

Mimic NOT IN subquery

n  Problem: find rows for which there is no match.

n  Often implemented using NOT IN (subquery): SELECT ... FROM products p WHERE p.product_id NOT IN (SELECT o.product_id FROM orders o)

Page 24: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 24

Mimic NOT IN subquery

n  Can also be implemented using an outer join:

SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id WHERE o.product_id IS NULL;

n  Useful when subqueries are not supported (e.g. MySQL 4.0)

Page 25: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 25

Mimic NOT IN subquery

Products product_id

Abc

Def

Efg

Orders product_id order_id

Abc 10

Abc 11

Def 9

NULL NULL

Page 26: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 26

Mimic NOT IN subquery

Query result set

product_id Product attributes

order_id Order attributes

Efg $17.00 NULL NULL

SELECT ... FROM products p LEFT OUTER JOIN orders o ON p.product_id = o.product_id WHERE o.product_id IS NULL;

Page 27: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 27

Greatest row per group

n  Problem: find the row in each group with the greatest value in one column

SELECT ... FROM products p JOIN orders o1 ON p.product_id = o1.product_id LEFT OUTER JOIN orders o2 ON p.product_id = o2.product_id AND o1.date < o2.date WHERE o2.product_id IS NULL;

n  I.e., show the rows for which no other row exists with a greater date and the same product_id.

Page 28: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 28

Greatest row per group

Orders o2 product_id order_id date

Abc 10 2006/2/1

Abc 11 2006/3/10

Def 9 2005/5/2

Orders o1 product_id order_id date

Abc 10 2006/2/1

Abc 11 2006/3/10

Def 9 2005/5/2

Products product_id

Abc

Def

Efg NULL

Page 29: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 29

Greatest row per group

Query result set product_id Product

attributes order_id Order

attributes

Abc $10.00 11 2006/3/10

Def $5.00 9 2005/5/2

SELECT ... FROM products p JOIN orders o1 ON p.product_id = o1.product_id LEFT OUTER JOIN orders o2 ON p.product_id = o2.product_id AND o1.date < o2.date WHERE o2.product_id IS NULL;

Page 30: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 30

Top three per group

n  Problem: list the largest three cities per US state.

SELECT c.state, c.city_name, c.population FROM cities AS c LEFT JOIN cities AS c2 ON c.state = c2.state AND c.population <= c2.population GROUP BY c.state, c.city_name, c.population HAVING COUNT(*) <= 3 ORDER BY c.state, c.population DESC;

n  I.e., show the cities for which the number of cities with the same state and greater population is less than or equal to three.

Page 31: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 31

Top three per group

Cities c2 state city_name population

CA Los Angeles 3485K

CA San Diego 1110K

CA San Jose 782K

CA San Francisco 724K

Cities c state city_name population

CA Los Angeles 3485K

CA San Diego 1110K

CA San Jose 782K

CA San Francisco 724K

Page 32: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 32

Top three per group

Query result set state city_name population

CA Los Angeles 3485K

CA San Diego 1110K

CA San Jose 782K

SELECT c.state, c.city_name, c.population FROM cities AS c LEFT JOIN cities AS c2 ON c.state = c2.state AND c.population <= c2.population GROUP BY c.state, c.city_name, c.population HAVING COUNT(*) <= 3 ORDER BY c.state, c.population DESC;

Page 33: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 33

Fetching EAV attributes

n  Entity-Attribute-Value table structure for dynamic attributes n  Not normalized schema design n  Lacks integrity enforcement n  Not scalable n  Nevertheless, EAV is used widely and is

sometimes the only solution when attributes evolve quickly

Page 34: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 34

Fetching EAV attributes

Attributes product_id attribute value

Abc Media DVD

Abc Discs 2

Abc Format Widescreen

Abc Length 108 min.

Products product_id

Abc

Def

Efg

Page 35: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 35

Fetching EAV attributes

n Need an outer join per attribute: SELECT p.product_id, media.value AS media, discs.value AS discs, format.value AS format, length.value AS length FROM products AS p LEFT OUTER JOIN attributes AS media ON p.product_id = media.product_id AND media.attribute = 'Media' LEFT OUTER JOIN attributes AS discs ON p.product_id = discs.product_id AND discs.attribute = 'Discs' LEFT OUTER JOIN attributes AS format ON p.product_id = format.product_id AND format.attribute = 'Format' LEFT OUTER JOIN attributes AS length ON p.product_id = length.product_id AND length.attribute = 'Length' WHERE p.product_id = 'Abc';

Page 36: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 36

Fetching EAV attributes

Query result set product_id media discs Format length

Abc DVD 2 Widescreen 108 min.

SELECT p.product_id, media.value AS media, discs.value AS discs, format.value AS format, length.value AS length FROM products AS p LEFT OUTER JOIN attributes AS media ON p.product_id = media.product_id AND media.attribute = 'Media' LEFT OUTER JOIN attributes AS discs ON p.product_id = discs.product_id AND discs.attribute = 'Discs' LEFT OUTER JOIN attributes AS format ON p.product_id = format.product_id AND format.attribute = 'Format' LEFT OUTER JOIN attributes AS length ON p.product_id = length.product_id AND length.attribute = 'Length' WHERE p.product_id = 'Abc';

Page 37: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 37

7 2 6 9

3

2

6

7

1 9

3 1 6 7

Sudoku puzzles

3 5 1 1 4 7 6

8 5 9 4 2 2 3 1

5 3 6 9 8 6 4 2

5 1 2 8 6 7 5 9

7 3 1

Page 38: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 38

Sudoku schema CREATE TABLE one_to_nine (

value INTEGER NOT NULL ); INSERT INTO one_to_nine (value) VALUES

(1), (2), (3), (4), (5), (6), (7), (8), (9); CREATE TABLE sudoku (

column INTEGER NOT NULL, row INTEGER NOT NULL, value INTEGER NOT NULL );

INSERT INTO sudoku (column, row, value) VALUES (6,1,3), (8,1,5), (9,1,1), (1,2,1), (2,2,4), (5,2,7), (7,2,6), (2,3,8), (3,3,5), (4,3,9), (7,3,4), (9,3,2), (3,4,2), (4,4,3), (7,4,1), (9,4,7), (1,5,5), (2,5,3), (8,5,6), (1,6,9), (4,6,8), (5,6,6), (6,6,4), (8,6,2), (2,7,5), (4,7,1), (6,7,2), (8,7,8), (1,8,6), (3,8,7), (4,8,5), (8,8,9), (6,9,7), (7,9,3), (8,9,1);

Page 39: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 39

Showing puzzle state SELECT GROUP_CONCAT(COALESCE(s.value, '_') ORDER BY x.value SEPARATOR ' ') AS `Puzzle_state` FROM one_to_nine AS x INNER JOIN one_to_nine AS y LEFT OUTER JOIN sudoku AS s ON s.column = x.value AND s.row = y.value GROUP BY y.value;

+-------------------+ | Puzzle_state | +-------------------+ | _ _ _ _ _ 3 _ 5 1 | | 1 4 _ _ 7 _ 6 _ _ | | _ 8 5 9 _ _ 4 _ 2 | | _ _ 2 3 _ _ 1 _ 7 | | 5 3 _ _ _ _ _ 6 _ | | 9 _ _ 8 6 4 _ 2 _ | | _ 5 _ 1 _ 2 _ 8 _ | | 6 _ 7 5 _ _ _ 9 _ | | _ _ _ _ _ 7 3 1 _ | +-------------------+

Page 40: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 40

Revealing possible values SELECT x_loop.value AS x, y_loop.value AS y, GROUP_CONCAT(cell.value ORDER BY cell.value) AS possibilities FROM (one_to_nine AS x_loop INNER JOIN one_to_nine AS y_loop INNER JOIN one_to_nine AS cell) LEFT OUTER JOIN sudoku as occupied ON (occupied.column = x_loop.value AND occupied.row = y_loop.value) LEFT OUTER JOIN sudoku as num_in_col ON (num_in_col.column = x_loop.value AND num_in_col.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_row ON (num_in_row.row = y_loop.value AND num_in_row.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_box ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3) AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3) AND cell.value = num_in_box.value) WHERE COALESCE(occupied.value, num_in_col.value, num_in_row.value, num_in_box.value) IS NULL GROUP BY x_loop.value, y_loop.value

Is there any value already in the cell x, y ?

Does the value appear in column x ?

Does the value appear in row y ?

Does the value appear in the sub-square containing x, y ?

Select for cases where all four outer joins find

no matches

Cartesian product: loop x over 1..9 columns,

loop y over 1..9 rows, loop cell over 1..9 values

Page 41: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 41

Revealing singleton values SELECT x_loop.value AS x, y_loop.value AS y, cell.value AS possibilities FROM (one_to_nine AS x_loop INNER JOIN one_to_nine AS y_loop INNER JOIN one_to_nine AS cell) LEFT OUTER JOIN sudoku as occupied ON (occupied.column = x_loop.value AND occupied.row = y_loop.value) LEFT OUTER JOIN sudoku as num_in_col ON (num_in_col.column = x_loop.value AND num_in_col.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_row ON (num_in_row.row = y_loop.value AND num_in_row.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_box ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3) AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3) AND cell.value = num_in_box.value) WHERE COALESCE(occupied.value, num_in_col.value, num_in_row.value, num_in_box.value) IS NULL GROUP BY x_loop.value, y_loop.value HAVING COUNT(*) = 1;

Limit the groups only to those with one value

remaining

Page 42: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 42

Updating the puzzle INSERT INTO sudoku (column, row, value)

SELECT x_loop.value AS x, y_loop.value AS y, cell.value AS possibilities FROM (one_to_nine AS x_loop INNER JOIN one_to_nine AS y_loop INNER JOIN one_to_nine AS cell) LEFT OUTER JOIN sudoku as occupied ON (occupied.column = x_loop.value AND occupied.row = y_loop.value) LEFT OUTER JOIN sudoku as num_in_col ON (num_in_col.column = x_loop.value AND num_in_col.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_row ON (num_in_row.row = y_loop.value AND num_in_row.value = cell.value) LEFT OUTER JOIN sudoku AS num_in_box ON (CEIL(x_loop.value/3) = CEIL(num_in_box.column/3) AND CEIL(y_loop.value/3) = CEIL(num_in_box.row/3) AND cell.value = num_in_box.value) WHERE COALESCE(occupied.value, num_in_col.value, num_in_row.value, num_in_box.value) IS NULL GROUP BY x_loop.value, y_loop.value HAVING COUNT(*) = 1;

Insert these singletons back into the table,

then we can try again

Page 43: SQL Outer Joins for Fun and Profit

2006-07-27 OSCON 2006 43

Finish

n  Outer joins are an indispensable part of SQL programming.

Thank you!