sql wksht-6

SQL JOIN statement – Worksheet 6

Prof. Mukesh N. Tekwani of 10

SQL Queries - Basics

Worksheet - 6

JOIN STATEMENT 1 Summarize the rules for single table query processing.

To generate the query results for a select statement follow these steps:

1. Start with the table named in the FROM clause. 2. If there is a WHERE clause, apply its search condition to each row of the table,

retaining those rows for which the search condition is TRUE, and discarding those rows for which it is FALSE or NULL.

3. For each remaining row, calculate the value of each item in the select list to produce a single row of query results. For each column reference, use the value of the column in the current row.

4. If SELECTED DISTINCT is specified, eliminate any duplicate rows of query results that were produced.

5. If there is an ORDER BY clause, sort the query results as specified. The rows generated by this procedure comprise the query results.

2 Summarize the rules for two table query processing. Consider a query based on two tables, such as:

"List all orders, showing the order number and amount, and the name and credit limit of the customer who placed it."


of 10 [email protected]

The ORDERS table contains the order number and amount of each order, but doesn't have customer names or credit limits. The CUSTOMERS table contains the customer names and balances, but it does not have any information about orders. But, there is a link between these two tables. In each row of the ORDERS table, the CUST column contains the customer number of the customer who placed the order, which matches the value in the CUST_NUM column in one of the rows in the CUSTOMERS table. Hence, the SELECT statement that handles the request must somehow use this link between the tables to generate its query results. Here is the procedure to query both these tables: 1. Start by writing down the four column names for the query results. Then move to the

ORDERS table, and start with the first order. 2. Look across the row to find the order number (112961) and the order amount

($31,500.00) and copy both values to the first row of query results. 3. Look across the row to find the number of the customer who placed the order (2117),

and move to the CUSTOMERS table to find customer number 2117 by searching the CUST_NUM column.

4. Move across the row of the CUSTOMERS table to find the customer's name ("J.P.

Sinclair") and credit limit ($35,000.00), and copy them to the query results table. You've generated a row of query results! Move back to the ORDERS table, and go to the next row. Repeat the process, starting with Step 2, until you run out of orders. Each row of query results draws its data from a specific pair of rows, one from the ORDERS table and one from the CUSTOMERS table.

The pair of rows are found by matching the contents of corresponding columns from the tables.

3 What is a JOIN?

Usually a query will have to refer to two or more tables to find all the information it requires. This happens because in a relational database, data is intentionally split up into multiple tables in order to achieve modularization or normalization of data.

In order to deal with this fragmentation of data, we need a JOIN statement in SQL. A JOIN statement combines data from two or more tables into a single result set. The tables are not actually merged; they just appeared to be merged in the rows returned by the query. Multiple joins can be used to consolidate data from many tables.

There are two types of JOINS: inner join and outer join. The major difference between these two is that the outer join includes rows in the result set even when the conditions specified in the JOIN statement are not met. But the Inner join will not return rows which do not meet the JOIN condition. When the join condition in an outer join is not met,



columns in the first table are returned normally, but columns from the second table are returned with no value – as NULLs.

4 INNER JOIN

The INNER JOIN keyword returns rows when there is at least one match in both tables.

The INNER JOIN keyword return rows when there is at least one match in both tables. If there are rows in "Persons" that do not have matches in "Orders", those rows will NOT be listed.

SELECT column_name(s)

FROM table_name1 INNER JOIN table_name2 ON table_name1.column_name=table_name2.column_name

Example:

SELECT Persons.LastName, Persons.FirstName, Products.OrderNo FROM Persons INNER JOIN Products

ON Persons.P_Id = Products.P_Id ORDER BY Persons.LastName

Example of INNER JOIN

Consider a database XYZLTD. The tables in this database are as follows:

Table: Customers CustomerNumber int NOT NULL , LastName char(30) NOT NULL, FirstName char(30) NOT NULL, StreetAddress char(30) NOT NULL,

City char(20) NOT NULL, State char(3) NOT NULL, PinCode char(6) NOT NULL

Table: Orders OrderNumber int NOT NULL ,

OrderDate datetime NOT NULL , CustomerNumber int NOT NULL , ItemNumber int NOT NULL ,

Amount numeric(9, 2) NOT NULL



Table: Items ItemNumber int NOT NULL , Description char(30) NOT NULL ,

Price numeric(9, 2) NOT NULL

Suppose we wish to join the tables Orders and Customers. There are two different syntaxes to join these two tables. The first method is called the legacy (old) method and is as follows:

Old Method: SELECT customers.CustomerNumber, orders.Amount FROM customers, orders WHERE customers.CustomerNumber = orders.CustomerNumber

This is an inner join. If an order doesnot exist for a given customer, that customer is omitted completely from the list.

The ANSI / SQL-92 syntax is as follows and this is preferable: SELECT customers.CustomerNumber, orders.Amount FROM customers JOIN orders ON (customers.CustomerNumber = orders.CustomerNumber)

Consider the following example, using the old syntax, where we join 3 tables: SELECT customers.CustomerNumber, orders.Amount,

items.Description FROM customers, orders, items WHERE customers.CustomerNumber = orders.CustomerNumber

AND orders.ItemNumber = items.ItemNumber

We write the ANSI/SQL-92 version of the same as follows: SELECT customers.CustomerNumber, orders.Amount, items.Description

FROM customers JOIN orders ON (customers.CustomerNumber = orders.CustomerNumber) JOIN items ON (orders.ItemNumber = items.ItemNumber)

7 A simple example of JOIN statement

Consider two tables as shown below:

Customers:

CustomerID FirstName LastName Email DOB Phone

1 John Smith [email protected] 2/4/1968 626 222

2 Steven Goldfish [email protected] 4/4/1974 323 455

3 Paula Brown [email protected] 5/24/1978 416 323

4 James Smith [email protected] 20/10/1980 416 327

Sales:

CustomerID Date SaleAmount



2 5/6/2004 100.22

1 5/7/2004 99.95

3 5/7/2004 122.95

3 5/13/2004 100.00

4 5/22/2004 555.55

The SQL JOIN clause is used whenever we have to select data from 2 or more tables. To be able to use SQL JOIN clause to extract data from 2 (or more) tables, we need a relationship between certain columns in these tables. As we can see those 2 tables have common field called CustomerID and based on that we can extract information from both tables by matching their CustomerID columns.

Consider the following SQL statement:

SELECT Customers.FirstName, Customers.LastName,

SUM(Sales.SaleAmount) AS SalesPerCustomer

FROM Customers, Sales

WHERE Customers.CustomerID = Sales.CustomerID

GROUP BY Customers.FirstName, Customers.LastName

The SQL expression above will select all distinct customers (their first and last names)

and the total respective amount of dollars they have spent. The SQL JOIN condition has

been specified after the SQL WHERE clause and says that the 2 tables have to be matched

by their respective CustomerID columns.

Here is the result of this SQL statement:

FirstName LastName SalesPerCustomers

John Smith 99.95

Steven Goldfish 100.22

Paula Brown 222.95

James Smith 555.55

The SQL statement above can be re-written using the SQL JOIN clause like this:

SELECT Customers.FirstName, Customers.LastName,


FROM Customers JOIN Sales

ON Customers.CustomerID = Sales.CustomerID




There are 2 types of SQL JOINS – INNER JOINS and OUTER JOINS. If we don't put

INNER or OUTER keywords in front of the SQL JOIN keyword, then INNER JOIN is

used.

The INNER JOIN will select all rows from both tables as long as there is a match between

the columns we are matching on. In case we have a customer in the Customers table,

which still hasn't made any orders (there are no entries for this customer in the Sales

table), this customer will not be listed in the result of our SQL query above.

If the Sales table has the following rows:


2 5/6/2004 $100.22

1 5/6/2004 $99.95

And we use the same SQL JOIN statement from above, we get the result as follows:


John Smith $99.95

Steven Goldfish $100.22

Even though Paula and James are listed as customers in the Customers table they won't be

displayed because they haven't purchased anything yet.

But what if we want to display all the customers and their sales, no matter if they have

ordered something or not? We can do that with the help of SQL OUTER JOIN clause.

SQL OUTER JOIN:

The second type of SQL JOIN is called SQL OUTER JOIN and it has 2 sub-types called

LEFT OUTER JOIN and RIGHT OUTER JOIN.

The LEFT OUTER JOIN or simply LEFT JOIN selects all the rows from the first table

listed after the FROM clause, no matter if they have matches in the second table.

If we slightly modify our last SQL statement to:

SELECT Customers.FirstName, Customers.LastName


FROM Customers LEFT JOIN Sales



ON Customers.CustomerID = Sales.CustomerID


and the Sales table still has the following rows:


2 5/6/2004 100.22

1 5/6/2004 99.95

The result will be the following:


John Smith 99.95

Steven Goldfish 100.22

Paula Brown NULL

James Smith NULL

Thus, we have selected everything from the Customers (first table). For all rows from

Customers, which don’t have a match in the Sales (second table), the SalesPerCustomer

column has amount NULL.

The RIGHT OUTER JOIN or just RIGHT JOIN behaves exactly as SQL LEFT JOIN, except

that it returns all rows from the second table (the right table in our SQL JOIN statement).

5 Explain non-equi join.



The term join applies to any query that combines data from two tables by comparing the values in a pair of columns from the tables. Joins based on equality between matching columns (equi-joins) are by far the most common joins, but SQL also allows you to join tables based on other comparison operators. Here's an example where a greater than (>) comparison test is used as the basis for a join:

Exmple 1:

List all combinations of salespeople and offices where the salesperson's quota is more than the office's target.

SELECT NAME, QUOTA, CITY, TARGET FROM SALESREPS, OFFICES WHERE QUOTA > TARGET

6 What is meant by self-join? Explain with an example.

Some multi-table queries involve a relationship that a table has with itself. For example, suppose you want to list the names of all salespeople and their managers. Each salesperson appears as a row in the SALESREPS table, and the MANAGER column contains the employee number of the salesperson's manager. It would appear that the MANAGER column should be a foreign key for the table that holds data about managers. In fact it is—it's a foreign key for the SALESREPS table itself!

If we tried to express this query like any other two-table query involving a foreign key/primary key match, it would look like this:

SELECT NAME, NAME

FROM SALESREPS, SALESREPS

WHERE MANAGER = EMPL_NUM

This SELECT statement is illegal because of the duplicate reference to the SALESREPS table in the FROM clause. You might also try eliminating the second reference to the SALESREPS table:

SELECT NAME, NAME

FROM SALESREPS

WHERE MANAGER = EMPL_NUM

This SELECT statement is illegal because of the duplicate reference to the SALESREPS table in the FROM clause. You might also try eliminating the second reference to the SALESREPS table:

This query is legal, but it won't do what you want it to do! It's a single-table query, so SQL goes through the SALESREPS table one row at a time, applying the search condition:



MANAGER = EMPL_NUM The rows that satisfy this condition are those where the two columns have the same value—that is, rows where a salesperson is their own manager. There are no such rows, so the query would produce no results—not exactly the data that the English-language statement of the query requested. To understand how SQL solves this problem, imagine there were two identical copies of the SALESREPS table, one named EMPS, containing employees, and one named MGRS, containing managers, as shown in Figure below. The MANAGER column of the EMPS table would then be a foreign key for the MGRS table, and the following query would work:

Example: List the names of salespeople and their managers. SELECT EMPS.NAME, MGRS.NAME FROM EMPS, MGRS WHERE EMPS.MANAGER = MGRS.EMPL_NUM Because the columns in the two tables have identical names, all of the column references are qualified.

6 What is table alias?

As described in the previous section, table aliases are required in queries involving self-joins. However, you can use an alias in any query. For example, if a query refers to another user's table, or if the name of a table is very long, the table name can become tedious to type as a column qualifier. This query, which references the BIRTHDAYS table owned by the user named SAM:

Example:

List names, quotas, and birthdays of salespeople.

SELECT SALESREPS.NAME, QUOTA, SAM.BIRTHDAYS.BIRTH_DATE FROM SALESREPS, BIRTHDAYS

WHERE SALESREPS.NAME = SAM.BIRTHDAYS.NAME



This becomes easier to read and type when the aliases S and B are used for the two tables:

List names, quotas, and birthdays of salespeople.

SELECT S.NAME, S.QUOTA, B.BIRTH_DATE FROM SALESREPS S, SAM.BIRTHDAYS B

WHERE S.NAME = B.NAME

The FROM clause specifies the tag that is used to identify the table in qualified column references within the SELECT statement. If a table alias is specified, it becomes the table tag; otherwise, the table's name, exactly as it appears in the FROM clause, becomes the tag.

sql wksht-6

Education