introduction to sql structured query language martin egerhill

15
d University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27 Introduction to SQL Structured Query Language Martin Egerhill

Upload: randell-parker

Post on 25-Dec-2015

239 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to SQL Structured Query Language Martin Egerhill

Introduction to SQLStructured Query Language

Martin Egerhill

Page 2: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

Disposition

• Databases and their components• What is SQL?• The basics• JOIN-predicate logic• Advanced functionality• Subqueries• Regular expressions• Inserting and updating data• GCP – Good Coding Practice

Page 3: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

Databases and their components

• Databases are containers for data, relational databases have connections between data points.

• Database Management Systems (DBMS) handle interactions with the database.

• Tables are structured arrangements of specific data

• Schemas define the structures and parameters of the tables.

• Columns are single fields in a table containing specific information (columns have defined data types).

• Data types define the content format.

• Rows represent the records in the table and contain the information for the record in the respective column.

• Keys define the relational logic.

Page 4: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

What is SQL?

Database communication language with very few commands.

Pros:

• Intuitively constructed syntax (case- and break-insensitive).

• Widely supported.

• Supports both data retrieval and manipulation

• Fast

Cons:

• Several dialects

• Lack of advanced data manipulation

• Can be slow if not optimized

Page 5: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

The basics• SELECT-statement

– Defines your output data. Rename columns with “AS”.

• FROM-statement

– Defines your data sources and relations.

• WHERE-clause

– Defines dependencies and restrictions on selection.

• ORDER BY-statement

– Defines the sort order (columns delimited by “,”). Can be either ascending (asc) or descendng (desc). Different columns can have different directions!

• GROUP BY-statement

– Used to group calculated columns, for example counts, by data level. Can have multiple levels (delimited by “,”).

• HAVING-clause

– Same functionality as the WHERE-clause but for groups.

Page 6: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

JOIN-predicate logic

There are two main types of joins; inner and outer.

There are two ways of writing inner joins:

Explicit: Full specification of predicate using JOIN and ON keywords.

Implicit: Simply states columns to be joined in the WHERE-clause.

Page 7: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

JOIN-predicate logic (continued)

Different types of inner joins are:

Equi-join: Uses equality between columns as comparator. For columns with the same name the shorthand USING (column name) can be used (not supported by SQL server and Sybase).

Natural join: Not recommended as it implicitly joins columns with the same name from both tables.

Cross join: Returns the Cartesian product of the two tables joined (rows which combine each row from the first table with each row from the second table).

Page 8: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

JOIN-predicate logic (continued)

Different types of outer joins are:

Left (outer) join: Preserves all the records in the “left” table while finding matches in the “right” table.

Right (outer) join: Preserves all the records in the “right” table while finding matches in the “left” table.

Full (outer) join: Retrieves all records regardless of match (combines the effects of both the left and right outer joins).

Note: No implicit join-notation for outer joins exists in standard SQL.

FROM MASTER ALEFT JOIN LM_SEQ_LENGTH B ON A.SEQ_LENGTH_ID = B.SEQ_LENGTH_ID

Page 9: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

Advanced functionalityCommonly used functions in the SELECT-statement:

• COUNT(column): counts the number of rows (not NULL values). Should be combined with GROUP BY for stratification.

• DISTINCT: Selects only unique values. Used on the first selected column if not nested; COUNT(DISTINCT column).

• SUM(column): Returns the total of values in a column.

• MIN(column), MAX(column): returns the min/max value.

• CASE: Converts the returned data using logic:

• NVL: Converts NULL values, other values are left untouched:

• DECODE: Specifies conversion of (multiple) string(s):

CASE WHEN x=w1 THEN r1 WHEN x=w2 THEN r2 ELSE r3 END

NVL(column, replacement_value)

DECODE(column [, string , match], default)

Page 10: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

Advanced functionality (continued)

Commonly used functions in the WHERE-clause:

• LIKE: Matches string in desired column. The % is used as wildcard:

• EXISTS: Boolean operator used together with subqueries. Is negated with NOT:

• IN: Used to specify several equality conditions (limited to 1000):

• AND, OR: Standard logical operators to combine conditions.

• BETWEEN: Selects a range of data between two values. This argument can behave differently depending on database!

WHERE country_name LIKE ‘%JAPAN%’

WHERE (NOT) EXISTS (SELECT * FROM master)

WHERE country_id IN (63856, 37465, 98364)

WHERE country_id BETWEEN 63856 AND 98364

Page 11: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

SubqueriesSubqueries are used to compile a subset of data instead of actually creating a new table. The “virtual” subset-table can then be referenced. This can be especially useful when many operands and longer conditions are needed or when calculated conditions are applied. It is not recommended to use more than 2 nested layers.

SELECTCASE

WHEN A.LINEAGE_NAME IN (SELECT LINEAGE_NAME FROM

(SELECT DISTINCT LINEAGE_NAME FROM HOSTS_AND_SITES WHERE COUNTRY_REGION_ID = 133))

THEN 1 END AS HAWAIFROMmaster A...

Page 12: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

Regular expressions

• Can be used with SQLite3• Requires third party installation• Requires session initialization• Syntax is slightly different than Perl

Page 13: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

Inserting and updating data

Commonly used functions in the WHERE-clause:

• INSERT: Inserts specified data into specified table and columns:

• UPDATE: Will update existing values with possible condition:

• DELETE: Used to remove specified entries from specified table:

There are other modifiers like CREATE, ALTER and DROP but that will not be covered in this presentation.

INSERT INTO enterprise (col1, col2, col3, col4) VALUES (‘James T.’, ‘Kirk’, ‘Captain’, 1)

UPDATE enterprise SET col1 = ‘Jean-Luc’, col2 = ‘Picard’ WHERE col4 = 1

DELETE FROM enterprise WHERE col3 = ‘Redshirt’

Page 14: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

GCP – Good Coding Practice

• Use descriptive table and column names.• Use descriptive table aliases.• Avoid arbitrary column names or identical column names.• Comment your code if necessary.• Don’t mix joins if not needed (use explicit outer joins).• Keep table relations in the FROM-clause.• Use Primary Keys, Secondary Keys and Foreign Keys.• Use list maintenance tables.• Do not overuse subqueries.• Split the code up if too complicated.• Always try to avoid writing to the database when retrieving data.

Page 15: Introduction to SQL Structured Query Language Martin Egerhill

Lund University / Faculty of Science / Department of Biology / MEMEG / SQL / 2012-11-27

For further reading visit http://www.w3schools.com/sql/