database systems – sql sql select - aggregation functions it is far more efficient to aggregate...

57
Database Systems – SQL Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server It is far more efficient to aggregate data at the database server than to pull the raw data over the network and compute it in an than to pull the raw data over the network and compute it in an application program. application program. The following are a few, but not all of the aggregation The following are a few, but not all of the aggregation functions. functions. AVG AVG MIN MIN MAX MAX SUM SUM COUNT COUNT While they are traditionally used with the GROUP BY clause, they While they are traditionally used with the GROUP BY clause, they can be used without it. So first we will see how to use them can be used without it. So first we will see how to use them before introducing the GROUP BY clause. before introducing the GROUP BY clause. If we want to determine the total number of items sold from a If we want to determine the total number of items sold from a sales table, we could use the following query: sales table, we could use the following query: SELECT SUM(quantity) AS total_items_sold FROM sales; SELECT SUM(quantity) AS total_items_sold FROM sales; Always rename your aggregation to a readable and relevant name. Always rename your aggregation to a readable and relevant name.

Upload: joanna-patterson

Post on 26-Dec-2015

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - AGGREGATION FUNCTIONSSQL SELECT - AGGREGATION FUNCTIONS

It is far more efficient to aggregate data at the database server than to pull It is far more efficient to aggregate data at the database server than to pull the raw data over the network and compute it in an application program.the raw data over the network and compute it in an application program.

The following are a few, but not all of the aggregation functions.The following are a few, but not all of the aggregation functions.

AVGAVGMINMINMAXMAXSUMSUMCOUNTCOUNT

While they are traditionally used with the GROUP BY clause, they can be used While they are traditionally used with the GROUP BY clause, they can be used without it. So first we will see how to use them before introducing the GROUP without it. So first we will see how to use them before introducing the GROUP BY clause.BY clause.

If we want to determine the total number of items sold from a sales table, we If we want to determine the total number of items sold from a sales table, we could use the following query:could use the following query:

SELECT SUM(quantity) AS total_items_sold FROM sales;SELECT SUM(quantity) AS total_items_sold FROM sales;

Always rename your aggregation to a readable and relevant name.Always rename your aggregation to a readable and relevant name.

Page 2: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - AGGREGATION FUNCTIONSSQL SELECT - AGGREGATION FUNCTIONS

All aggregate functions other than COUNT ignore values that are NULL. COUNT tallies All aggregate functions other than COUNT ignore values that are NULL. COUNT tallies the record whether the value is NULL or not.the record whether the value is NULL or not.

GROUP BYGROUP BY

While being able to aggregate data across an entire table is useful, the true power of While being able to aggregate data across an entire table is useful, the true power of the aggregation functions is achieved by combining them in collections and the aggregation functions is achieved by combining them in collections and aggregating each collection independently. This is accomplished with a GROUP BY aggregating each collection independently. This is accomplished with a GROUP BY clause which groups records and aggregates fields using the aggregation functions. clause which groups records and aggregates fields using the aggregation functions.

The syntax for a GROUP BY clause is as follows:The syntax for a GROUP BY clause is as follows:

SELECT field-list-1 FROM tablename GROUP BY field-list-2;SELECT field-list-1 FROM tablename GROUP BY field-list-2;

Each field in field-list-1 must either have the field name listed in field-list-2 or Each field in field-list-1 must either have the field name listed in field-list-2 or have a aggregation function applied to it when it is returned in the result set.have a aggregation function applied to it when it is returned in the result set.

Page 3: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - AGGREGATION FUNCTIONSSQL SELECT - AGGREGATION FUNCTIONS

The syntax for a GROUP BY clause is as follows:The syntax for a GROUP BY clause is as follows:

SELECT field-list-1 FROM SELECT field-list-1 FROM tablenametablename GROUP BY field-list-2; GROUP BY field-list-2;

For example, to select the total assets for all banks by city, you could use the following For example, to select the total assets for all banks by city, you could use the following query:query:

SELECT city, sum(assests) AS total_assets FROM branches GROUP BY city;SELECT city, sum(assests) AS total_assets FROM branches GROUP BY city;

branch_namebranch_name citycity assetsassets

Park SlopePark Slope BrooklynBrooklyn 3,500,0003,500,000

Bay ParkwayBay Parkway BrooklynBrooklyn 7,500,0007,500,000

Cropsy AveCropsy Ave BrooklynBrooklyn 10,000,00010,000,000

MedfordMedford MedfordMedford 1,250,0001,250,000

North East PhillyNorth East Philly PhiladelphiaPhiladelphia 1,000,0001,000,000

Center CityCenter City PhiladelphiaPhiladelphia 5,000,0005,000,000

citycity total_assetstotal_assets

BrooklynBrooklyn 21,000,00021,000,000

MedfordMedford 1,250,0001,250,000

PhiladelphiaPhiladelphia 6,000,0006,000,000

branches table

result set

• Branch_name is not included as that is the field that has no relation to the aggregated data.

• City is listed without an aggregation function as it is listed in the GROUP BY clause.

• Assets can only be listed by using an aggregation function and is renamed.

Page 4: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - AGGREGATION FUNCTIONSSQL SELECT - AGGREGATION FUNCTIONS

We can also add an ordering to the result by using an ORDER BY clause after the We can also add an ordering to the result by using an ORDER BY clause after the GROUP BY clause. GROUP BY clause.

For example, to select the total assets for all banks by city and order the results by For example, to select the total assets for all banks by city and order the results by total_assets, you could use the following query:total_assets, you could use the following query:

SELECT city, sum(assests) as total_assets FROM branches GROUP BY city ORDER BY SELECT city, sum(assests) as total_assets FROM branches GROUP BY city ORDER BY total_assets;total_assets;

branch_namebranch_name citycity assetsassets

Park SlopePark Slope BrooklynBrooklyn 3,500,0003,500,000

Bay ParkwayBay Parkway BrooklynBrooklyn 7,500,0007,500,000

Cropsy AveCropsy Ave BrooklynBrooklyn 10,000,00010,000,000

MedfordMedford MedfordMedford 1,250,0001,250,000

North East PhillyNorth East Philly PhiladelphiaPhiladelphia 1,000,0001,000,000

Center CityCenter City PhiladelphiaPhiladelphia 5,000,0005,000,000

citycity total_assetstotal_assets

MedfordMedford 1,250,0001,250,000

PhiladelphiaPhiladelphia 6,000,0006,000,000

BrooklynBrooklyn 21,000,00021,000,000

branches table

result set

Page 5: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - AGGREGATION FUNCTIONSSQL SELECT - AGGREGATION FUNCTIONS

GROUP BY clauses can be added to the result of a JOIN.GROUP BY clauses can be added to the result of a JOIN.

Observe the query to join the websites table with the hit_count table to return the total Observe the query to join the websites table with the hit_count table to return the total number of hit counts per website.number of hit counts per website.

SELECT website, SUM(hit_count) as total_hit_count FROM (websites INNER JOIN SELECT website, SUM(hit_count) as total_hit_count FROM (websites INNER JOIN hit_count ON websites.id_website = hit_counts.id_website) GROUP BY website;hit_count ON websites.id_website = hit_counts.id_website) GROUP BY website;

websites table(partial view)

hit_counts table(partial view)

id_websiteid_website websitewebsite

11 www.zojjed.comwww.zojjed.com

22 www.racewalk.com www.racewalk.com

id_websiteid_website hit_counthit_count

11 10001000

11 500500

22 500500

22 10001000

11 20002000websitewebsite total_hit_counttotal_hit_count

www.zojjed.comwww.zojjed.com 35003500

www.racewalk.com www.racewalk.com 15001500

result set

Page 6: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - AGGREGATION FUNCTIONSSQL SELECT - AGGREGATION FUNCTIONS

COUNT(COUNT(exprexpr)) - Returns a count of the number of non-Returns a count of the number of non-NULLNULL values of values of exprexpr in the rows in the rows retrieved by a retrieved by a SELECTSELECT statement. The result is a statement. The result is a BIGINTBIGINT value value

COUNT(*)COUNT(*) is somewhat different in that it returns a count of the number of rows is somewhat different in that it returns a count of the number of rows retrieved, whether or not they contain retrieved, whether or not they contain NULLNULL values. values.

Therefore, the following two statements could result in a different count if there are null values in the website field.

SELECT COUNT(*) AS record_count FROM websites;

SELECT COUNT(website) AS record_count FROM websites;

Page 7: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT – HAVING CLAUSESQL SELECT – HAVING CLAUSE

It is very useful to be able to filter results based upon the results of the aggregated It is very useful to be able to filter results based upon the results of the aggregated fields in the result record set. fields in the result record set.

A HAVING clause accomplishes this and should not be confused with the WHERE clause A HAVING clause accomplishes this and should not be confused with the WHERE clause which filters based upon the condition of rows in the table.which filters based upon the condition of rows in the table.

A HAVING clause is used similarly to the WHERE clause the only difference in syntax is A HAVING clause is used similarly to the WHERE clause the only difference in syntax is that the fields compared in the HAVING clause are the results of aggregation.that the fields compared in the HAVING clause are the results of aggregation.

Observe adding a HAVING clause to the previous query so that it now only returns the Observe adding a HAVING clause to the previous query so that it now only returns the results of websites with a total hit count greater than 1,500.results of websites with a total hit count greater than 1,500.

SELECT website, SUM(hit_count) As total_hit_count FROM (websites INNER JOIN SELECT website, SUM(hit_count) As total_hit_count FROM (websites INNER JOIN hit_count ON websites.id_website = hit_count.id_website) GROUP BY websites.website hit_count ON websites.id_website = hit_count.id_website) GROUP BY websites.website HAVING total_hit_count > 1500;HAVING total_hit_count > 1500;

websitewebsite total_hit_counttotal_hit_count

www.zojjed.com www.zojjed.com 35003500

result set

Notice the racewalk.com record was removed from the result set.

Page 8: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT – HAVING CLAUSESQL SELECT – HAVING CLAUSE

In contrast, if we applied the same logic, but used a WHERE clause and comparing the In contrast, if we applied the same logic, but used a WHERE clause and comparing the hit_count field, the results would be quite different.hit_count field, the results would be quite different.

SELECT website, SUM(hit_count) as total_hit_count FROM (websites INNER JOIN SELECT website, SUM(hit_count) as total_hit_count FROM (websites INNER JOIN hit_count ON websites.id_website = hit_count.id_website) WHERE hit_count > 1500 hit_count ON websites.id_website = hit_count.id_website) WHERE hit_count > 1500 GROUP BY websites.website;GROUP BY websites.website;

websites table(partial view)

hit_count table(partial view)

id_websiteid_website websitewebsite

11 www.zojjed.comwww.zojjed.com

22 www.racewalk.com www.racewalk.com

id_websiteid_website hit_counthit_count

11 10001000

11 500500

22 500500

22 10001000

11 20002000websitewebsite total_hit_counttotal_hit_count

www.zojjedcom www.zojjedcom 20002000

result set

With the WHERE clause used, only records containing a hit_count greater than 1,500, which is only one record.

Page 9: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - UNIONSQL SELECT - UNION

Sometimes it is necessary to combine the results of multiple queries together into a Sometimes it is necessary to combine the results of multiple queries together into a single result set. As long as the result sets domains are the same, they can be single result set. As long as the result sets domains are the same, they can be combined using the UNION command. The syntax follows:combined using the UNION command. The syntax follows:

SELECT ….SELECT ….UNIONUNION

SELECT …SELECT …

Imagine you have two tables, a non-sale and discount, that stores prices. Then imagine Imagine you have two tables, a non-sale and discount, that stores prices. Then imagine you wish to combine them into a single result set. The following query will accomplish you wish to combine them into a single result set. The following query will accomplish this:this:

SELECT product_name, retail_price AS price FROM non_sale_itemsSELECT product_name, retail_price AS price FROM non_sale_itemsUNIONUNION

SELECT product_name, sales_price AS price FROM discount_items;SELECT product_name, sales_price AS price FROM discount_items;

By default, duplicate rows are removed from the result set.By default, duplicate rows are removed from the result set.UNION ALL allows duplicate rows to be included.UNION ALL allows duplicate rows to be included.

Page 10: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - SUBQUERIESSQL SELECT - SUBQUERIES

SQL allows you to nest one query within another.SQL allows you to nest one query within another.

The most common subqueries are IN and NOT IN. I will show you these queries The most common subqueries are IN and NOT IN. I will show you these queries although you already know how to perform them using LEFT OUTER JOINs.although you already know how to perform them using LEFT OUTER JOINs.

While using an IN or NOT IN query is easier to read than a LEFT OUTER JOIN, they While using an IN or NOT IN query is easier to read than a LEFT OUTER JOIN, they usually have poorer performance as the JOIN version relies on indexes with a properly usually have poorer performance as the JOIN version relies on indexes with a properly designed table. The IN or NOT IN query relies on temporary results, which may or may designed table. The IN or NOT IN query relies on temporary results, which may or may not be optimized.not be optimized.

Observe how we can combine queries to indicate the websites that have a daily hit Observe how we can combine queries to indicate the websites that have a daily hit count of at least 1,000 on any day.count of at least 1,000 on any day.

SELECT website FROM websites WHERE id_website IN SELECT website FROM websites WHERE id_website IN (SELECT id_website FROM hit_count WHERE hit_count >= 1000);(SELECT id_website FROM hit_count WHERE hit_count >= 1000);

Page 11: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - SUBQUERIESSQL SELECT - SUBQUERIES

Observe how we can combine queries to indicate the websites that have a daily hit Observe how we can combine queries to indicate the websites that have a daily hit count of at least 1,000 on any day.count of at least 1,000 on any day.

SELECT website FROM websites WHERE id_website IN SELECT website FROM websites WHERE id_website IN (SELECT ID_website FROM hit_count WHERE hit_count >= 1000);(SELECT ID_website FROM hit_count WHERE hit_count >= 1000);

websites table (partial view)

hit_count table(partial view, records selected from the first query are highlighted in yellow)

id_websiteid_website websitewebsite

11 www.zojjed.comwww.zojjed.com

22 www.racewalk.comwww.racewalk.com

33 www.greattreks.comwww.greattreks.com

id_websiteid_website hit_counthit_count

11 10001000

11 500500

22 500500

22 10001000

11 20002000websitewebsite

www.zojjed.comwww.zojjed.com

www.racewalk.comwww.racewalk.com

result set

id_websiteid_website

11

22

11

result set of sub query

The sub query returns two records with only the id of each website, then the outer The sub query returns two records with only the id of each website, then the outer query selects the names of the websites corresponding to those IDs.query selects the names of the websites corresponding to those IDs.

Page 12: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - SUBQUERIESSQL SELECT - SUBQUERIES

Observe how we can combine queries to indicate the websites that do not have a daily Observe how we can combine queries to indicate the websites that do not have a daily hit count of at least 1,000 on any day.hit count of at least 1,000 on any day.

SELECT website FROM websites WHERE id_website NOT IN SELECT website FROM websites WHERE id_website NOT IN (SELECT id_website FROM hit_count WHERE hit_count >= 1000);(SELECT id_website FROM hit_count WHERE hit_count >= 1000);

websites table (partial view)

hit_count table(partial view, records selected

from the first query are highlighted in yellow)

id_websiteid_website websitewebsite

11 www.zojjed.comwww.zojjed.com

22 www.racewalk.comwww.racewalk.com

33 www.greattreks.comwww.greattreks.com

id_websiteid_website hit_counthit_count

11 10001000

11 500500

22 500500

22 10001000

11 20002000websitewebsite

www.greattreks.comwww.greattreks.com

result set

id_websiteid_website

22

11

result set of sub query

The sub query returns two records with only the id of each website, then the outer The sub query returns two records with only the id of each website, then the outer query selects the names of the websites not corresponding to those IDs.query selects the names of the websites not corresponding to those IDs.

Page 13: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - SUBQUERIESSQL SELECT - SUBQUERIES

Observe how we can combine queries to indicate the websites that do not have a daily Observe how we can combine queries to indicate the websites that do not have a daily hit count of at least 1,000 on any day.hit count of at least 1,000 on any day.

SELECT website FROM websites WHERE id_website NOT IN SELECT website FROM websites WHERE id_website NOT IN (SELECT id_website FROM hit_count WHERE hit_count > 1000);(SELECT id_website FROM hit_count WHERE hit_count > 1000);

websites table (partial view)

hit_count table(partial view, records selected

from the first query are highlighted in yellow)

id_websiteid_website websitewebsite

11 www.zojjed.comwww.zojjed.com

22 www.racewalk.comwww.racewalk.com

33 www.greattreks.comwww.greattreks.com

id_websiteid_website hit_counthit_count

11 10001000

11 500500

22 500500

22 10001000

11 20002000websitewebsite

www.racewalk.comwww.racewalk.com

www.greattreks.comwww.greattreks.com

result set

id_websiteid_website

11

result set of sub query

The sub query returns two records with only the id of each website, then the outer The sub query returns two records with only the id of each website, then the outer query selects the names of the websites not corresponding to those IDs.query selects the names of the websites not corresponding to those IDs.

Page 14: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - SUBQUERIESSQL SELECT - SUBQUERIES

Are the following two queries equivalent?Are the following two queries equivalent?

SELECT website FROM websites WHERE id_website NOT IN SELECT website FROM websites WHERE id_website NOT IN (SELECT id_website FROM hit_count WHERE hit_count > 1000);(SELECT id_website FROM hit_count WHERE hit_count > 1000);

andand

SELECT website FROM websites WHERE id_website IN SELECT website FROM websites WHERE id_website IN (SELECT id_website FROM hit_count WHERE hit_count <= 1000);(SELECT id_website FROM hit_count WHERE hit_count <= 1000);

Page 15: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - SUBQUERIESSQL SELECT - SUBQUERIES

Are the following two queries equivalent?Are the following two queries equivalent?

SELECT website FROM websites WHERE id_website IN SELECT website FROM websites WHERE id_website IN (SELECT id_website FROM hit_count WHERE hit_count <= 1000);(SELECT id_website FROM hit_count WHERE hit_count <= 1000);

websites table (partial view)

hit_count table(partial view, records selected

from the first query are highlighted in yellow)

id_websiteid_website websitewebsite

11 www.zojjed.comwww.zojjed.com

22 www.racewalk.comwww.racewalk.com

33 www.greattreks.comwww.greattreks.com

id_websiteid_website hit_counthit_count

11 10001000

11 500500

22 500500

22 10001000

11 20002000websitewebsite

www.zojjed.comwww.zojjed.com

www.racewalk.comwww.racewalk.com

result set

id_websiteid_website

11

22

result set of sub query

So, clearly they are not equivalent.So, clearly they are not equivalent.

Page 16: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - SUBQUERIESSQL SELECT - SUBQUERIES

A better use of a sub-query is to compare the results of a query returning a single value A better use of a sub-query is to compare the results of a query returning a single value and using that value as a scalar in a WHERE clause of another query.and using that value as a scalar in a WHERE clause of another query.

Let’s select the website(s) names that have the single greatest hit count on a single Let’s select the website(s) names that have the single greatest hit count on a single day. This requires data from both the website table and hit_counts table and requires a day. This requires data from both the website table and hit_counts table and requires a subquery to determine the largest hit count on a single day.subquery to determine the largest hit count on a single day.

SELECT website FROM (websites AS w INNER JOIN hit_count AS hc ON w.id_website = SELECT website FROM (websites AS w INNER JOIN hit_count AS hc ON w.id_website = hc.id_website) WHERE hc.hit_count = (SELECT MAX(hit_count) as max_hit_count FROM hc.id_website) WHERE hc.hit_count = (SELECT MAX(hit_count) as max_hit_count FROM hit_count);hit_count);

websites table (partial view)

hit_count table(partial view, records selected

from the sub query are highlighted in yellow)

id_websiteid_website websitewebsite

11 www.zojjed.comwww.zojjed.com

22 www.racewalk.comwww.racewalk.com

33 www.greattreks.comwww.greattreks.com

id_websiteid_website hit_counthit_count

11 10001000

11 500500

22 500500

22 10001000

11 20002000

websitewebsite

www.zojjed.comwww.zojjed.com

result set

Page 17: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - SUBQUERIESSQL SELECT - SUBQUERIES

Another use of a subquery is when you want to aggregate an aggregated field. Another use of a subquery is when you want to aggregate an aggregated field.

Suppose you wish to determine the average of the total hit counts for a website. You Suppose you wish to determine the average of the total hit counts for a website. You might want to do the following:might want to do the following:

SELECT AVG(SUM(hit_count)) FROM hit_counts GROUP BY id_website;SELECT AVG(SUM(hit_count)) FROM hit_counts GROUP BY id_website;

However, you can not place one aggregate function within another (at least on most db However, you can not place one aggregate function within another (at least on most db severs). You must use a subquery as follows:severs). You must use a subquery as follows:

SELECT AVG(sum_hit_count) FROM (SELECT SUM(hit_count) AS sum_hit_count FROM SELECT AVG(sum_hit_count) FROM (SELECT SUM(hit_count) AS sum_hit_count FROM hit_count GROUP BY id_website)hit_count GROUP BY id_website);

Page 18: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - SUBQUERIESSQL SELECT - SUBQUERIES

Here’s where things get interesting. Now imagine you wish to select the id’s of Here’s where things get interesting. Now imagine you wish to select the id’s of websites that have a total hit count greater than the average hit count of all websites. websites that have a total hit count greater than the average hit count of all websites. Note, that if one website has fewer days than another it’s hit count should not be Note, that if one website has fewer days than another it’s hit count should not be proportionally larger than the average.proportionally larger than the average.

First, you must compute the sums of each websites hit count.First, you must compute the sums of each websites hit count.

Second, you must compute the average hit count for each website.Second, you must compute the average hit count for each website.

Finally, you must select those web sites that have a hit count higher than the average Finally, you must select those web sites that have a hit count higher than the average computed in the second part.computed in the second part.

The query follows:The query follows:

SELECT id_website FROM hit_count GROUP BY id_website HAVING SUM(hit_count) >SELECT id_website FROM hit_count GROUP BY id_website HAVING SUM(hit_count) >(SELECT AVG(sum_hit_count) FROM (SELECT SUM(hit_count) AS sum_hit_count FROM (SELECT AVG(sum_hit_count) FROM (SELECT SUM(hit_count) AS sum_hit_count FROM hit_count GROUP BY id_website));hit_count GROUP BY id_website));

Page 19: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT - SUBQUERIESSQL SELECT - SUBQUERIES

SELECT id_website FROM hit_count GROUP BY id_website HAVING SUM(hit_count) >SELECT id_website FROM hit_count GROUP BY id_website HAVING SUM(hit_count) >(SELECT AVG(sum_hit_count) AS avg_sum_hit_count FROM (SELECT SUM(hit_count) AS (SELECT AVG(sum_hit_count) AS avg_sum_hit_count FROM (SELECT SUM(hit_count) AS sum_hit_count FROM hit_count GROUP BY id_website));sum_hit_count FROM hit_count GROUP BY id_website));

Imagine you had the following data:Imagine you had the following data:

id_websiteid_website hit_counthit_count

11 2020

11 3030

22 1010

33 3030

33 3030

33 3030hit_count table

sum_hit_countsum_hit_count

5050

1010

9090

Results of 1st SubQuery

avg_sum_hit_countavg_sum_hit_count

5050

Results of 2nd SubQuery

Id_websiteId_website

33

Final Results

Page 20: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL SELECT – MULTI VALUE SUB SELECTSSQL SELECT – MULTI VALUE SUB SELECTS

SELECT SELECT fieldlistfieldlist FROM t1 WHERE (1,2) = (SELECT column1, column2 FROM t2); FROM t1 WHERE (1,2) = (SELECT column1, column2 FROM t2);

SELECT SELECT fieldlistfieldlist FROM t1 WHERE ROW(1,2) = (SELECT column1, column2 FROM t2); FROM t1 WHERE ROW(1,2) = (SELECT column1, column2 FROM t2);

The queries here are both The queries here are both TRUETRUE if table if table t2t2 has a row where has a row where column1 = 1column1 = 1 and and column2 = 2column2 = 2. .

NOTE, the sub query must return a single row or you get an error.NOTE, the sub query must return a single row or you get an error.

The expressions The expressions (1,2)(1,2) and and ROW(1,2)ROW(1,2) are sometimes called are sometimes called row constructorsrow constructors. The two . The two are equivalent. They are legal in other contexts as well. are equivalent. They are legal in other contexts as well.

SELECT SELECT fieldlistfieldlist FROM t1 WHERE (column1,column2) = (1,1); FROM t1 WHERE (column1,column2) = (1,1);SELECT SELECT fieldlistfieldlist FROM t1 WHERE column1 = 1 AND column2 = 1; FROM t1 WHERE column1 = 1 AND column2 = 1;

The normal use of row constructors is for comparisons with subqueries that return two The normal use of row constructors is for comparisons with subqueries that return two or more columns. For example, the following query answers the request, “find all rows or more columns. For example, the following query answers the request, “find all rows in table in table t1t1 that also exist in table that also exist in table t2t2”: ”:

SELECT column1,column2,column3 FROM t1 WHERE (column1,column2,column3) IN SELECT column1,column2,column3 FROM t1 WHERE (column1,column2,column3) IN (SELECT column1,column2,column3 FROM t2);(SELECT column1,column2,column3 FROM t2);

Page 21: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL DELETESQL DELETE

If you wish to remove records from a table, use the DELETE query. It is very similar to a If you wish to remove records from a table, use the DELETE query. It is very similar to a SELECT query other than the first keyword. The syntax for a single table delete is as SELECT query other than the first keyword. The syntax for a single table delete is as follows:follows:

DELETE FROM tablename WHERE predicate;DELETE FROM tablename WHERE predicate;

NOTE IT APPEARS THAT DELETE IS CASE SENSITIVE ON THE TABLE NAME. NOT SURE NOTE IT APPEARS THAT DELETE IS CASE SENSITIVE ON THE TABLE NAME. NOT SURE WHY.WHY.

If you do not specify a WHERE clause, all records are removed from the table. However, If you do not specify a WHERE clause, all records are removed from the table. However, removing all the records without removing the structure is better accomplished using removing all the records without removing the structure is better accomplished using the TRUNCATE TABLE tablename command.the TRUNCATE TABLE tablename command.

To delete the hit counts with a website id of 1 from the hit_count table use the following To delete the hit counts with a website id of 1 from the hit_count table use the following query:query:

DELETE FROM hit_count WHERE id_website = 1;DELETE FROM hit_count WHERE id_website = 1;

Here are a few other examples of DELETE statements:Here are a few other examples of DELETE statements:

DELETE FROM account WHERE branch=“Northeast Philly”; DELETE FROM account WHERE branch=“Northeast Philly”; DELETE FROM ACCOUNT WHERE branch IN (SELECT branch FROM branches WHERE city DELETE FROM ACCOUNT WHERE branch IN (SELECT branch FROM branches WHERE city = “Brooklyn”);= “Brooklyn”);DELETE FROM account WHERE balance < (SELECT AVG (balance) FROM account);DELETE FROM account WHERE balance < (SELECT AVG (balance) FROM account);

Page 22: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL DELETESQL DELETE

A simple form of a multi table delete can be achieved using the IN operatorA simple form of a multi table delete can be achieved using the IN operator

Observe how to delete all records in the hit count table that contain the id of the Observe how to delete all records in the hit count table that contain the id of the website for www.racewalk.com:website for www.racewalk.com:

DELETE FROM hit_count WHERE id_website IN (SELECT id_website FROM websites DELETE FROM hit_count WHERE id_website IN (SELECT id_website FROM websites WHERE website = “www.racewalk.com”;WHERE website = “www.racewalk.com”;

Observe how to delete all records in the hit count table that contain hit counts lower Observe how to delete all records in the hit count table that contain hit counts lower than the average hit count:than the average hit count:

DELETE FROM hit_count WHERE hit_count < (SELECT AVG (hit_count) FROM hit_count);DELETE FROM hit_count WHERE hit_count < (SELECT AVG (hit_count) FROM hit_count);

Page 23: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLMULTI TABLE DELETE QUERYMULTI TABLE DELETE QUERY

Sometimes you wish to delete records from one or more tables with a query spanning Sometimes you wish to delete records from one or more tables with a query spanning multiple tables.multiple tables.

DELETE t1, t2 FROM t1, t2, t3 WHERE t1.id=t2.id AND t2.id=t3.id;DELETE t1, t2 FROM t1, t2, t3 WHERE t1.id=t2.id AND t2.id=t3.id;

For the first multiple-table syntax, only matching rows from the tables listed before the For the first multiple-table syntax, only matching rows from the tables listed before the FROMFROM clause are deleted. clause are deleted. DELETE FROM t1, t2 USING t1, t2, t3 WHERE t1.id=t2.id AND t2.id=t3.id; DELETE FROM t1, t2 USING t1, t2, t3 WHERE t1.id=t2.id AND t2.id=t3.id;

For the second multiple-table syntax, only matching rows from the tables listed in the For the second multiple-table syntax, only matching rows from the tables listed in the FROMFROM clause (before the clause (before the USINGUSING clause) are deleted. clause) are deleted.

The effect is that you can delete rows from many tables at the same time and have The effect is that you can delete rows from many tables at the same time and have additional tables that are used only for searching: additional tables that are used only for searching:

Or: Or:

These statements use all three tables when searching for rows to delete, but delete These statements use all three tables when searching for rows to delete, but delete matching rows only from tables matching rows only from tables t1t1 and and t2t2. .

The preceding examples show inner joins that use the comma operator, but multiple-The preceding examples show inner joins that use the comma operator, but multiple-table table DELETEDELETE statements can use any type of join allowed in statements can use any type of join allowed in SELECTSELECT statements, such statements, such as as LEFT JOINLEFT JOIN. .

Page 24: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

MULTI TABLE DELETE QUERYMULTI TABLE DELETE QUERY

Assume you have the following two tables:Assume you have the following two tables:

C1C1 C2C2 C3C3

R1V1R1V1 R1V2R1V2 R1V3R1V3

R2V1R2V1 R2V2R2V2 R2V3R2V3

R3V1R3V1 R3V2R3V2 R3V3R3V3

test1 table

C1C1 C2C2 C3C3

R1V1R1V1 R1V2R1V2 R1V3R1V3

R2V1R2V1 R2V2R2V2 R2V3R2V3

R3V1R3V1 R3V2R3V2 R3V3R3V3

R4V1R4V1 R4V2R4V2 R4V3R4V3test2 table

What do you think the tables look like after the following query is executed:

DELETE FROM test1 USING test1, test2 WHERE test1.C1=test2.C1;

Page 25: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

MULTI TABLE DELETE QUERYMULTI TABLE DELETE QUERY

DELETE FROM test1 USING test1, test2 WHERE test1.C1=test2.C1;

C1C1 C2C2 C3C3test1 table

C1C1 C2C2 C3C3

R1V1R1V1 R1V2R1V2 R1V3R1V3

R2V1R2V1 R2V2R2V2 R2V3R2V3

R3V1R3V1 R3V2R3V2 R3V3R3V3

R4V1R4V1 R4V2R4V2 R4V3R4V3test2 table

Page 26: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

MULTI TABLE DELETE QUERYMULTI TABLE DELETE QUERY

Assume you have the following two tables:Assume you have the following two tables:

C1C1 C2C2 C3C3

R1V1R1V1 R1V2R1V2 R1V3R1V3

R2V1R2V1 R2V2R2V2 R2V3R2V3

R3V1R3V1 R3V2R3V2 R3V3R3V3

test1 table

C1C1 C2C2 C3C3

R1V1R1V1 R1V2R1V2 R1V3R1V3

R2V1R2V1 R2V2R2V2 R2V3R2V3

R3V1R3V1 R3V2R3V2 R3V3R3V3

R4V1R4V1 R4V2R4V2 R4V3R4V3test2 table

What do you think the tables look like after the following query is executed:

DELETE FROM test1, test2 USING test1, test2 WHERE test1.C1=test2.C1;

Page 27: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

MULTI TABLE DELETE QUERYMULTI TABLE DELETE QUERY

DELETE FROM test1, test2 USING test1, test2 WHERE test1.C1=test2.C1;

C1C1 C2C2 C3C3test1 table

C1C1 C2C2 C3C3

R4V1R4V1 R4V2R4V2 R4V3R4V3test2 table

Page 28: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL INSERTSQL INSERT

To insert data into a table using explicitly specified values use the following syntax:To insert data into a table using explicitly specified values use the following syntax:

INSERT INTO INSERT INTO tablenametablename ( (columnlistcolumnlist) VALUES () VALUES (valuelistvaluelist););

Therefore, to insert values into the websites table, use the following command:Therefore, to insert values into the websites table, use the following command:

INSERT INTO websites (website, organization, first_year, category) VALUES INSERT INTO websites (website, organization, first_year, category) VALUES ("www.yankees.com", "NY Yankees", 1990, "Sports");("www.yankees.com", "NY Yankees", 1990, "Sports");

The id_website field is filled in automatically because it has an auto increment qualifier.The id_website field is filled in automatically because it has an auto increment qualifier.

It is allowable to use the following format, but I do not recommend it as the table It is allowable to use the following format, but I do not recommend it as the table structure may change and then your code may break:structure may change and then your code may break:

INSERT INTO tablename VALUES (valuelist);INSERT INTO tablename VALUES (valuelist);

Therefore, to insert values into the websites table, use the following command:Therefore, to insert values into the websites table, use the following command:

INSERT INTO websites VALUES ("www.yankees.com", "NY Yankees", 1990, "Sports");INSERT INTO websites VALUES ("www.yankees.com", "NY Yankees", 1990, "Sports");

Page 29: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL INSERT - FROM OTHER TABLESSQL INSERT - FROM OTHER TABLES

To insert data into a table using other tables as a source use the following syntax:To insert data into a table using other tables as a source use the following syntax:

INSERT INTO INSERT INTO tablenametablename ( (columnlistcolumnlist) ) query;query;

Therefore, to insert values into the websites table from a table called old_websites, use Therefore, to insert values into the websites table from a table called old_websites, use the following command:the following command:

INSERT INTO websites (website, organization, first_year, category) SELECT web_site, INSERT INTO websites (website, organization, first_year, category) SELECT web_site, organ, f_year, cat from old_websites;organ, f_year, cat from old_websites;

Notice that I purposely picked different field names. Without the columns listed I Notice that I purposely picked different field names. Without the columns listed I believe, but am not sure, that it will work as long as the fields selected to be inserted believe, but am not sure, that it will work as long as the fields selected to be inserted are type compatible with those defined in the same order of the table and all fields of are type compatible with those defined in the same order of the table and all fields of the insertion table are provided.the insertion table are provided.

Page 30: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL UPDATE – SINGLE TABLESQL UPDATE – SINGLE TABLE

To update data within a table using explicitly specified values use the following syntax:To update data within a table using explicitly specified values use the following syntax:

UPDATE UPDATE tablenametablename SET COL=Val WHERE SET COL=Val WHERE condition;condition;

Page 31: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL UPDATE – SINGLE TABLESQL UPDATE – SINGLE TABLE

Individual values can be changed by using the UPDATE commandIndividual values can be changed by using the UPDATE command

The syntax for updating values explicitly in a single table as follows:The syntax for updating values explicitly in a single table as follows:

UPDATE UPDATE tablenametablename SET SET fieldfield = = valuevalue;;

If we wanted to update the retail price of all products in the products table by 5% we If we wanted to update the retail price of all products in the products table by 5% we could use the following command:could use the following command:

UPDATE products SET retail_price = retail_price * 1.05;UPDATE products SET retail_price = retail_price * 1.05;

In addition, if we want to limit the update to only certain rows, you could use the In addition, if we want to limit the update to only certain rows, you could use the following syntax:following syntax:

UPDATE UPDATE tablenametablename SET SET fieldfield = = valuevalue WHERE WHERE conditioncondition;;

If we wanted to update the retail_price of all products in the products table by 5% for all If we wanted to update the retail_price of all products in the products table by 5% for all prices that are less than $29.99 we could use the following command:prices that are less than $29.99 we could use the following command:

UPDATE products SET retail_price = retail_price * 1.05 WHERE retail_price < 29.99;UPDATE products SET retail_price = retail_price * 1.05 WHERE retail_price < 29.99;

Page 32: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL UPDATE – SINGLE TABLESQL UPDATE – SINGLE TABLE

What if we wanted to update all prices of products under $29.99 by 6% and all prices What if we wanted to update all prices of products under $29.99 by 6% and all prices greater than or equal to $29.99 by 5%.greater than or equal to $29.99 by 5%.

You could issue two UPDATE commands as follows:You could issue two UPDATE commands as follows:

UPDATE products SET retail_price = retail_price * 1.06 WHERE retail_price < 29.99;UPDATE products SET retail_price = retail_price * 1.06 WHERE retail_price < 29.99;UPDATE products SET retail_price = retail_price * 1.05 WHERE retail_price >= 29.99;UPDATE products SET retail_price = retail_price * 1.05 WHERE retail_price >= 29.99;

Will this work?Will this work?

Page 33: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL UPDATE – SINGLE TABLESQL UPDATE – SINGLE TABLE

What if we wanted to update all prices of products under $29.99 by 6% and all prices What if we wanted to update all prices of products under $29.99 by 6% and all prices greater than or equal to $29.99 by 5%.greater than or equal to $29.99 by 5%.

You could issue two UPDATE commands as follows:You could issue two UPDATE commands as follows:

UPDATE products SET retail_price = retail_price * 1.06 WHERE retail_price < 29.99;UPDATE products SET retail_price = retail_price * 1.06 WHERE retail_price < 29.99;UPDATE products SET retail_price = retail_price * 1.05 WHERE retail_price >= 29.99;UPDATE products SET retail_price = retail_price * 1.05 WHERE retail_price >= 29.99;

Will this work?Will this work?

It will not, because values within 5% of 29.99 will be updated twice. So we could update It will not, because values within 5% of 29.99 will be updated twice. So we could update them in the reverse order or use a CASE statement within the UPDATE as follows:them in the reverse order or use a CASE statement within the UPDATE as follows:

UPDATE products SET retail_price = CASEUPDATE products SET retail_price = CASEWHEN retail_price >= 29.99 then retail_price * 1.05WHEN retail_price >= 29.99 then retail_price * 1.05ELSE retail_price * 1.06ELSE retail_price * 1.06END;END;

Page 34: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL UPDATE – WITH QUERY TABLESQL UPDATE – WITH QUERY TABLE

When issuing an update command you can use a sub query in the WHERE clause just When issuing an update command you can use a sub query in the WHERE clause just as you would in a SELECT query.as you would in a SELECT query.

Observe how you can update the retail_price in the products table by 5% wherever the Observe how you can update the retail_price in the products table by 5% wherever the retail price is less than the average retail price:retail price is less than the average retail price:

UPDATE products SET retail_price = retail_price * 1.05 WHERE retail_price < (SELECT UPDATE products SET retail_price = retail_price * 1.05 WHERE retail_price < (SELECT AVG(retail_price));AVG(retail_price));

Page 35: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL UPDATE – WITH MULTIPLE TABLESSQL UPDATE – WITH MULTIPLE TABLES

Similar to updating a single table, you can use multiple tables and update one table Similar to updating a single table, you can use multiple tables and update one table from another.from another.

UPDATE table references SET col1=value1 WHERE UPDATE table references SET col1=value1 WHERE conditioncondition; ;

Imagine you wanted to update a the products table with a list of sale prices. The sales Imagine you wanted to update a the products table with a list of sale prices. The sales prices could be stored in another table called discount_products and then you could prices could be stored in another table called discount_products and then you could update the products table using the following:update the products table using the following:

UPDATE products, discount_products SET products.retail_price = UPDATE products, discount_products SET products.retail_price = discount_products.sale_price WHERE products.id_product = discount_products.sale_price WHERE products.id_product = discount_products.id_product;discount_products.id_product;

Page 36: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL - CREATE TABLE CommandSQL - CREATE TABLE Command

CREATE [TEMPORARY] TABLE [IF NOT EXISTS] CREATE [TEMPORARY] TABLE [IF NOT EXISTS] tbl_nametbl_name ((create_definitioncreate_definition,...) ,...) [[table_optiontable_option ...] ...]

There are some restrictions on the characters that may appear in identifiers: There are some restrictions on the characters that may appear in identifiers:

No identifier can contain ASCII 0 (No identifier can contain ASCII 0 (0x000x00) or a byte with a value of 255. ) or a byte with a value of 255.

The use of identifier quote characters in identifiers is permitted, although it is best to The use of identifier quote characters in identifiers is permitted, although it is best to avoid doing so if possible. avoid doing so if possible.

Database, table, and column names should not end with space characters. Database, table, and column names should not end with space characters.

Database names cannot contain "Database names cannot contain "//", "", "\\", "", "..", or characters that are not allowed in a ", or characters that are not allowed in a directory name. directory name.

Table names cannot contain "Table names cannot contain "//", "", "\\", "", "..", or characters that are not allowed in a ", or characters that are not allowed in a filename. filename.

The length of the identifier is in bytes, not characters. If you use multi-byte The length of the identifier is in bytes, not characters. If you use multi-byte characters in your identifier names, then the maximum length will depend on the byte characters in your identifier names, then the maximum length will depend on the byte count of all the characters used. count of all the characters used.

Page 37: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL CREATE TABLESQL CREATE TABLE

CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE table_name ( [ { column_name data_type [ DEFAULT default_expr ] [ column_constraint [ ... ] ] | table_constraint | LIKE parent_table [ { INCLUDING | EXCLUDING } DEFAULTS ] } [, ... ]] )[ INHERITS ( parent_table [, ... ] ) ][ WITH OIDS | WITHOUT OIDS ][ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ][ TABLESPACE tablespace ]

where column_constraint is:

[ CONSTRAINT constraint_name ]{ NOT NULL | NULL | UNIQUE [ USING INDEX TABLESPACE tablespace ] | PRIMARY KEY [ USING INDEX TABLESPACE tablespace ] | CHECK (expression) | REFERENCES reftable [ ( refcolumn ) ] [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action ] [ ON UPDATE action ] }[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]

and table_constraint is:

[ CONSTRAINT constraint_name ]{ UNIQUE ( column_name [, ... ] ) [ USING INDEX TABLESPACE tablespace ] | PRIMARY KEY ( column_name [, ... ] ) [ USING INDEX TABLESPACE tablespace ] | CHECK ( expression ) | FOREIGN KEY ( column_name [, ... ] ) REFERENCES reftable [ ( refcolumn [, ... ] ) ] [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action ] [ ON UPDATE action ] }[ DEFERRABLE | NOT DEFERRABLE ] [ INITIALLY DEFERRED | INITIALLY IMMEDIATE ]

Page 38: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL CREATE TABLESQL CREATE TABLE

Numeric Data TypesNumeric Data Types

Character TypesCharacter Types

Name Storage Size Description Range

smallint 2 bytes small-range integer -32768 to +32767

integer 4 bytes usual choice for integer -2147483648 to +2147483647

bigint 8 bytes large-range integer -9223372036854775808 to 9223372036854775807

decimal variable user-specified precision, exact no limit

numeric variable user-specified precision, exact no limit

real 4 bytes variable-precision, inexact 6 decimal digits precision

double precision 8 bytes variable-precision, inexact 15 decimal digits precision

serial 4 bytes autoincrementing integer 1 to 2147483647

bigserial 8 bytes large autoincrementing integer 1 to 9223372036854775807

Name Description

character varying(n), varchar(n) variable-length with limit

character(n), char(n) fixed-length, blank padded

text variable unlimited length

Page 39: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL CREATE TABLESQL CREATE TABLE

Time/Date Data TypesTime/Date Data Types

Name Storage Size Description Low Value High Value Resolution

timestamp [ (p) ] [ without time zone ]

8 bytes both date and time 4713 BC 5874897 AD1 microsecond / 14 digits

timestamp [ (p) ] with time zone

8 bytesboth date and time, with time zone

4713 BC 5874897 AD1 microsecond / 14 digits

interval [ (p) ] 12 bytes time intervals -178000000 years 178000000 years1 microsecond / 14 digits

date 4 bytes dates only 4713 BC 5874897 AD 1 day

time [ (p) ] [ without time zone ]

8 bytes times of day only 00:00:00 24:00:001 microsecond / 14 digits

time [ (p) ] with time zone

12 bytestimes of day only, with time zone

00:00:00+1459 24:00:00-14591 microsecond / 14 digits

Page 40: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL CREATE TABLESQL CREATE TABLE

Simple table declaration example:Simple table declaration example:

CREATE TABLE websitesCREATE TABLE websites

(website CHAR (50), (website CHAR (50),

organization CHAR (30),organization CHAR (30),

first_year SMALLINT,first_year SMALLINT,

category CHAR (20));category CHAR (20));

Page 41: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL CREATE TABLE - PRIMARY KEYSSQL CREATE TABLE - PRIMARY KEYS

A primary key is a unique identifier for the table. When you declare a primary key, the A primary key is a unique identifier for the table. When you declare a primary key, the database will not allow null values in the field nor will it allow duplicates to be inserted.database will not allow null values in the field nor will it allow duplicates to be inserted.

To add a primary key to the database, just add the key words PRIMARY KEY and then a To add a primary key to the database, just add the key words PRIMARY KEY and then a list of the fields you want as primary keys. This can be a single field or a series of keys list of the fields you want as primary keys. This can be a single field or a series of keys (known as a compound primary key).(known as a compound primary key).

The following is the declaration of the websites table with the website field defined as a The following is the declaration of the websites table with the website field defined as a primary key:primary key:

CREATE TABLE websitesCREATE TABLE websites

(website CHAR (50), (website CHAR (50),

organization CHAR (30),organization CHAR (30),

first_year SMALLINT,first_year SMALLINT,

category CHAR (20),category CHAR (20),

PRIMARY KEY (website));PRIMARY KEY (website));

Page 42: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL CREATE TABLE - PRIMARY KEYSSQL CREATE TABLE - PRIMARY KEYS

The following is the declaration of a depositor table with the customer_name and The following is the declaration of a depositor table with the customer_name and account_number fields defined as a compound primary key:account_number fields defined as a compound primary key:

CREATE TABLE depositorCREATE TABLE depositor

(customer_name CHAR (20),(customer_name CHAR (20),

account_number CHAR (10),account_number CHAR (10),

PRIMARY KEY (customer_name, account_number));PRIMARY KEY (customer_name, account_number));

Page 43: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL CREATE TABLE - PRIMARY KEYSSQL CREATE TABLE - PRIMARY KEYS

While it is OK to make the primary key a field like website, it is better to use a unique While it is OK to make the primary key a field like website, it is better to use a unique number to represent the record so that it can be referenced from other tables. The best number to represent the record so that it can be referenced from other tables. The best way to do this is to create an ID field that contains a non null integer that automatically way to do this is to create an ID field that contains a non null integer that automatically increments so that you are ensured a unique number of each record inserted into the increments so that you are ensured a unique number of each record inserted into the table.table.

Therefore, observe the new websites table definition:Therefore, observe the new websites table definition:

CREATE TABLE websitesCREATE TABLE websites

(id_website SERIAL,(id_website SERIAL, website CHAR(50), website CHAR(50),

organization CHAR(30),organization CHAR(30),

first_year SMALLINT,first_year SMALLINT,

category CHAR (20),category CHAR (20),

PRIMARY KEY (id_website));PRIMARY KEY (id_website));

Note many texts use ID fields named just ID. I find this confusing as you can’t identify Note many texts use ID fields named just ID. I find this confusing as you can’t identify what the ID stands for from the name. Always use ID_identifier as the field name.what the ID stands for from the name. Always use ID_identifier as the field name.

Page 44: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL CREATE TABLE - INDEXESSQL CREATE TABLE - INDEXES

One complication of using a unique ID as the primary key is that you have extra work to One complication of using a unique ID as the primary key is that you have extra work to ensure that fields like website, which should not be repeated, do not have duplicates ensure that fields like website, which should not be repeated, do not have duplicates (Other than NULL, if NULL is allowed).(Other than NULL, if NULL is allowed).

To combat this, we must create a unique index on the website field.To combat this, we must create a unique index on the website field.

The syntax for creating unique index is:The syntax for creating unique index is:

CREATE UNIQUE INDEX index_name ON TABLE NAME (field list);CREATE UNIQUE INDEX index_name ON TABLE NAME (field list);

Therefore, to create a unique index on the websites table on the website field:Therefore, to create a unique index on the websites table on the website field:

CREATE UNIQUE INDEX website_index ON websites (website);CREATE UNIQUE INDEX website_index ON websites (website);

Page 45: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL CREATE TABLE – INDEXESSQL CREATE TABLE – INDEXES

Indexes are not only created to prevent duplicates, but more commonly are created to Indexes are not only created to prevent duplicates, but more commonly are created to speed searches. speed searches.

If you wish to create an index that is not unique, use the following syntax:If you wish to create an index that is not unique, use the following syntax:

INDEX index_name (field list)INDEX index_name (field list)

If we wished to add an index to the category field of the websites table, we would If we wished to add an index to the category field of the websites table, we would define the index as follows:define the index as follows:

CREATE INDEX category_index ON websites (category);

Page 46: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL CREATE TABLE – INDEXESSQL CREATE TABLE – INDEXES

The decision of when to create an index requires some knowledge of how the data in The decision of when to create an index requires some knowledge of how the data in the table is going to be accessed.the table is going to be accessed.

As a general rule of thumb, create each table with a primary key. If a row in that table is As a general rule of thumb, create each table with a primary key. If a row in that table is referenced from another table, create a unique ID field to represent the row and assign referenced from another table, create a unique ID field to represent the row and assign that as the primary key.that as the primary key.

If a field in a table is commonly searched on, create an index on the field. If a field in a table is commonly searched on, create an index on the field.

If a field in a table should not contain duplicates, create an index on the field.If a field in a table should not contain duplicates, create an index on the field.

So why not create indexes on every field?So why not create indexes on every field?

Page 47: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL CREATE TABLE – INDEXES SQL CREATE TABLE – INDEXES

So why not create indexes on every field?So why not create indexes on every field?

Indexing fields speeds searches when the index field is included in the WHERE clause. Indexing fields speeds searches when the index field is included in the WHERE clause. However, when a row is inserted, each field that is indexed will slow the insertion of the However, when a row is inserted, each field that is indexed will slow the insertion of the record. This is why many websites/applications do not show inserted values record. This is why many websites/applications do not show inserted values immediately.immediately.

The same issue exists for deletes and updates.The same issue exists for deletes and updates.

An index, like a primary key may be created on a single field or multiple fields.An index, like a primary key may be created on a single field or multiple fields.

Page 48: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL CREATE TABLE – FOREIGN KEYSSQL CREATE TABLE – FOREIGN KEYS

Whenever a field is defined in another table and referenced in another table, a good Whenever a field is defined in another table and referenced in another table, a good database developer creates a foreign key relationship from the child table to the parent database developer creates a foreign key relationship from the child table to the parent table.table.

This forces the parent value to exist, before it may be entered into the child table.This forces the parent value to exist, before it may be entered into the child table.

The syntax for the Foreign Key constraint is as follows:The syntax for the Foreign Key constraint is as follows:

FOREIGN KEY (child table field) REFERENCES parent table(field list)FOREIGN KEY (child table field) REFERENCES parent table(field list)

Page 49: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL CREATE TABLE – FOREIGN KEYSSQL CREATE TABLE – FOREIGN KEYS

Let’s create the hitcounts table that references the id_website field created in the Let’s create the hitcounts table that references the id_website field created in the websites table.websites table.

CREATE TABLE hitcountsCREATE TABLE hitcounts (id_website INTEGER ,(id_website INTEGER , hit_date DATE,hit_date DATE, hit_count INTEGER,hit_count INTEGER, PRIMARY KEY (id_website, hit_date),PRIMARY KEY (id_website, hit_date), FOREIGN KEY (id_website) REFERENCES websites(id_website));FOREIGN KEY (id_website) REFERENCES websites(id_website));

NOTE, the table being referenced must already be created in order for the foreign key NOTE, the table being referenced must already be created in order for the foreign key to be allowed.to be allowed. Let’s also create the customer table:Let’s also create the customer table:

CREATE TABLE CustomerCREATE TABLE Customer (id_customer SERIAL,(id_customer SERIAL, first_name CHAR(20),first_name CHAR(20), last_name CHAR(20),last_name CHAR(20), primary key (id_customer));primary key (id_customer));

CREATE UNIQUE INDEX name_index ON Customer (last_name, first_name);CREATE UNIQUE INDEX name_index ON Customer (last_name, first_name);

Page 50: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL CREATE TABLE – FOREIGN KEYSSQL CREATE TABLE – FOREIGN KEYS

Now let’s define a products table although there is nothing particularly new about it, we Now let’s define a products table although there is nothing particularly new about it, we will need it for another table we wish to create. We’ll keep the products table simple will need it for another table we wish to create. We’ll keep the products table simple and only record the item name, price, and an identifier for it. and only record the item name, price, and an identifier for it.

CREATE TABLE productsCREATE TABLE products (id_product SERIAL,(id_product SERIAL, product_name CHAR(40),product_name CHAR(40), retail_price DECIMAL,retail_price DECIMAL, primary key (id_product));primary key (id_product));

CREATE UNIQUE INDEX product_name_index ON products (product_name);CREATE UNIQUE INDEX product_name_index ON products (product_name);

Next, we will create a sales table that contains the website that the sale was purchased Next, we will create a sales table that contains the website that the sale was purchased from, the customer, the product, quantity bought, and the date. from, the customer, the product, quantity bought, and the date.

What should the primary key be?What should the primary key be?

What should the foreign key be?What should the foreign key be?

What, if any, indexes should you create?What, if any, indexes should you create?

Page 51: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL CREATE TABLE – FOREIGN KEYSSQL CREATE TABLE – FOREIGN KEYS

The sales table should use the id’s of the website, customer, and product so we do not The sales table should use the id’s of the website, customer, and product so we do not repeat redundant information.repeat redundant information.

The primary key of the table should be a unique identifier for sales, because you can The primary key of the table should be a unique identifier for sales, because you can sell the same product, from the same website, for the same customer more than once sell the same product, from the same website, for the same customer more than once per day. The alternative is to store the data and time of the sale and create a per day. The alternative is to store the data and time of the sale and create a compound primary key from id_website, id_customer, and the date/time of the sale. I compound primary key from id_website, id_customer, and the date/time of the sale. I do not like the latter solution however.do not like the latter solution however.

CREATE TABLE salesCREATE TABLE sales (id_sale SERIAL,(id_sale SERIAL, id_website SMALLINT,id_website SMALLINT, id_customer SMALLINT,id_customer SMALLINT, id_product SMALLINT,id_product SMALLINT, sales_date DATE,sales_date DATE, quantity SMALLINT,quantity SMALLINT, PRIMARY KEY (id_sale),PRIMARY KEY (id_sale), FOREIGN KEY (id_website) REFERENCES websites(id_website),FOREIGN KEY (id_website) REFERENCES websites(id_website), FOREIGN KEY (id_customer) REFERENCES customers (id_customer),FOREIGN KEY (id_customer) REFERENCES customers (id_customer), FOREIGN KEY (id_product) REFERENCES products(id_product));FOREIGN KEY (id_product) REFERENCES products(id_product));

Page 52: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQLSQL CREATE TABLE – FOREIGN KEYSSQL CREATE TABLE – FOREIGN KEYS

CREATE TABLE salesCREATE TABLE sales (id_sale SERIAL,(id_sale SERIAL, id_website SMALLINT,id_website SMALLINT, id_customer SMALLINT,id_customer SMALLINT, id_product SMALLINT,id_product SMALLINT, sales_date DATE,sales_date DATE, quantity SMALLINT,quantity SMALLINT, PRIMARY KEY (id_sale),PRIMARY KEY (id_sale), FOREIGN KEY (id_website) REFERENCES websites(id_website),FOREIGN KEY (id_website) REFERENCES websites(id_website), FOREIGN KEY (id_customer) REFERENCES customers (id_customer),FOREIGN KEY (id_customer) REFERENCES customers (id_customer), FOREIGN KEY (id_product) REFERENCES products(id_product));FOREIGN KEY (id_product) REFERENCES products(id_product)); Notice the sales table refers to three different tables with foreign keys. There is no Notice the sales table refers to three different tables with foreign keys. There is no problem with this. However, the more foreign keys you define, the slower inserts, problem with this. However, the more foreign keys you define, the slower inserts, deletes, and updates to the fields listed in foreign keys will be. deletes, and updates to the fields listed in foreign keys will be.

Also, depending upon how you set up the database, parent tables records may or may Also, depending upon how you set up the database, parent tables records may or may not be able to be deleted if children tables have data containing foreign keys referring not be able to be deleted if children tables have data containing foreign keys referring to the record in the parent table. to the record in the parent table.

Page 53: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL CREATE TABLE – CHECK CONSTRAINTSQL CREATE TABLE – CHECK CONSTRAINT

It is a good practice, although it will slow insertion, update, and deletes, to add a It is a good practice, although it will slow insertion, update, and deletes, to add a CHECK constraint to ensure certain conditions are not violated.CHECK constraint to ensure certain conditions are not violated.

The CHECK syntax is as follows:The CHECK syntax is as follows:

CHECK (expression)CHECK (expression)

For instance if we wished to add a constraint to ensure that the date of a website was For instance if we wished to add a constraint to ensure that the date of a website was not before the first year the world wide web started, you could use the following table not before the first year the world wide web started, you could use the following table definition:definition:

CREATE TABLE websitesCREATE TABLE websites

(website CHAR(50), (website CHAR(50),

organization CHAR(30),organization CHAR(30),

first_year SMALLINT,first_year SMALLINT,

category CHAR (20),category CHAR (20),

PRIMARY KEY (website),PRIMARY KEY (website),

CHECK (first_year > 1990));CHECK (first_year > 1990));

Page 54: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL CREATE TABLESQL CREATE TABLE

You can create one table from another by adding a You can create one table from another by adding a SELECTSELECT statement at the end of the statement at the end of the CREATE TABLECREATE TABLE statement: statement:

CREATE TABLE CREATE TABLE new_tblnew_tbl SELECT * FROM SELECT * FROM orig_tblorig_tbl; ;

When creating a table with When creating a table with CREATE ... SELECTCREATE ... SELECT, make sure to alias any function calls , make sure to alias any function calls or expressions in the query. If you do not, the or expressions in the query. If you do not, the CREATECREATE statement might fail or result in statement might fail or result in undesirable column names. undesirable column names.

CREATE TABLE artists_and_works SELECT artist.name, COUNT(work.artist_id) AS CREATE TABLE artists_and_works SELECT artist.name, COUNT(work.artist_id) AS number_of_works FROM artist LEFT JOIN work ON artist.id = work.artist_id GROUP BY number_of_works FROM artist LEFT JOIN work ON artist.id = work.artist_id GROUP BY artist.id; artist.id;

Use Use LIKELIKE to create an empty table based on the definition of another table, including to create an empty table based on the definition of another table, including any column attributes and indexes defined in the original table: any column attributes and indexes defined in the original table:

CREATE TABLE CREATE TABLE new_tblnew_tbl LIKE LIKE orig_tblorig_tbl; ;

Page 55: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL DROP TABLESQL DROP TABLE

To remove a table from the database, use the DROP TABLE command.To remove a table from the database, use the DROP TABLE command.

To remove the websites table, issue the following command:To remove the websites table, issue the following command:

DROP TABLE websites;DROP TABLE websites;

DETERMINE WHAT TABLES ARE IN YOUR DATABASEDETERMINE WHAT TABLES ARE IN YOUR DATABASE

To get a list of the tables you created issue the following SQL command.To get a list of the tables you created issue the following SQL command.

SELECT table_name FROM information_schema.tables WHERE table_schema = 'public';SELECT table_name FROM information_schema.tables WHERE table_schema = 'public';

Page 56: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

SQL MULTIPLE DATABASES (Note we do not have permission for this)SQL MULTIPLE DATABASES (Note we do not have permission for this)

Often you wish to separate a collection of tables. You do this by creating and using Often you wish to separate a collection of tables. You do this by creating and using individual databases. I believe as students you are restricted to one database.individual databases. I believe as students you are restricted to one database.

The syntax of the create database command is as follows:The syntax of the create database command is as follows:

CREATE DATABASE dbname;CREATE DATABASE dbname;

If you wanted to create a database called “CS461” use the following statement:If you wanted to create a database called “CS461” use the following statement:

CREATE DATABASE CS461;CREATE DATABASE CS461;

To switch between databases use the USE command, which has the following syntax: To switch between databases use the USE command, which has the following syntax:

USE dbname;USE dbname;

Therefore to switch to using the “CS461” database use the following command:Therefore to switch to using the “CS461” database use the following command:

USE CS461;USE CS461;

Page 57: Database Systems – SQL SQL SELECT - AGGREGATION FUNCTIONS It is far more efficient to aggregate data at the database server than to pull the raw data over

Database Systems – SQLDatabase Systems – SQL

LOGGIN INTO POSTGRESLOGGIN INTO POSTGRES

To log and use Postgres log into tux and use the following command:To log and use Postgres log into tux and use the following command:

psql -h wander -p 5432 -U EnterMyUserName psql -h wander -p 5432 -U EnterMyUserName

password: EnterMyPasswordpassword: EnterMyPassword