introduction to sql session 1 retrieving data from a single table

27
Introduction to SQL Introduction to SQL Session 1 Retrieving Data From a Single Table

Upload: daniel-lewis

Post on 22-Dec-2015

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to SQL Session 1 Retrieving Data From a Single Table

Introduction to SQLIntroduction to SQL

Session 1Retrieving Data From a

Single Table

Page 2: Introduction to SQL Session 1 Retrieving Data From a Single Table

What is SQL?

Definition of SQL:

•The original Structured Query Language was designed by an IBM research center in 1974-75 and introduced commercially by Oracle in 1979.•There are different dialects of SQL, but it remains as close to a standard query language as you will get.•Some standard SQL commands are as follows:

o SELECT DELETEo INSERT CREATEo UPDATE DROP

Page 3: Introduction to SQL Session 1 Retrieving Data From a Single Table

What is SQL?

Uses:

PROC SQL is used for the following tasks:•Generate reports•Generate summary statistics•Retrieve data from tables or views•Combine data from tables or views•Create tables, views, and indexes•Update the data values in PROC SQL tables•Update and retrieve data from database management system (DBMS) tables.

Page 4: Introduction to SQL Session 1 Retrieving Data From a Single Table

What is SQL?

Definition of SAS SQL:

•Structured Query Language (SQL) is a language that is used to retrieve and update data. SQL uses relational tables and databases•SQL procedure simply refers to the implementation of SQL by using a SAS procedure, PROC SQL.•PROC SQL can replace many SAS procedures or DATA steps.•There are many options, functions, informants, and formats that can be used within PROC SQL.•This course focuses on using SQL inside of SAS

Page 5: Introduction to SQL Session 1 Retrieving Data From a Single Table

What is a Table?

As review, a SAS data file is the following: a type of SAS data set that contains both the data values and the descriptor information.

SQL uses tables from which to work.

Definition:

A PROC SQL table is the same as a SAS data file. It can be thought of as having two dimensions:•Rows – Observations•Columns - Variables

Page 6: Introduction to SQL Session 1 Retrieving Data From a Single Table

TABLE OF CONTENTS

Retrieving Data From A Single File Clauses

SELECT and FROM WHERE ORDER HAVING

Useful Tools Eliminating Duplicate Rows Determining Structure Adding Text To Output Calculating Values Specific Clauses

Page 7: Introduction to SQL Session 1 Retrieving Data From a Single Table

Retrieving Data From Single Table:

Clauses1. SELECT and FROM

The SELECT statement is the primary tool of PROC SQL. A SQL procedure must contain a SELECT clause and a FROM clause. The following is sufficient for a workable procedure:

proc sql;select Name from sql.FileName;quit; where Name is the desired variable identifier.

Notice: There is no semi-colon after the SELECT clause. Remember, the statement is read in its entirety before placing semi-colon.

Page 8: Introduction to SQL Session 1 Retrieving Data From a Single Table

Retrieving Data From Single Table:

ClausesSELECT

•You SELECT data from the Source File

FROM

•You get data in the FROM file

*Let’s use the table (data set), Widgeone, for this course.

Page 9: Introduction to SQL Session 1 Retrieving Data From a Single Table

Retrieving Data From Single Table:

Clauses2. WHERE

The WHERE clause is a restriction tool used to limit the amount of retrieved data. Using WHERE will produce a table with only those rows that satisfy the condition of the clause:proc sql;title 'Widgeone Table';select Plant from widget.widgeonewhere position = "HRLY" and jobgrade = 5;quit;

Note: Both conditions must be satisfied to be printed in the report. The conditions are separated by the word “and.” This shows that there is more than one condition that limits the data. Also, widget is a library created for the tables which is not necessary to repeat.

Page 10: Introduction to SQL Session 1 Retrieving Data From a Single Table

Retrieving Data From Single Table:

Clauses3. ORDERThe ORDER clause enables the user to sort columns

in ascending/descending alphabetical order (for numerical values, ascending or descending values).select Plant, jobgrade, gender, Pre_Training_Productivity from widget.widgeonewhere jobgrade in (5,6,7)order by plant;

Note: If ordering by multiple columns, separate variables by a comma (order by jobgrade, Pre_Training_Productivity desc;)

4. GROUP BYThis clause enables the user to break table

results into subsets of rows:group by Plant;

Page 11: Introduction to SQL Session 1 Retrieving Data From a Single Table

Retrieving Data From Single Table:

Clauses5. HAVING

The HAVING clause is another way to restrict the query results by working along with the GROUP BY clause. Together, they group data that satisfy the information given by the HAVING clause. For example,

from widget.widgeonegroup by jobgradehaving jobgrade gt 5;quit;

Page 12: Introduction to SQL Session 1 Retrieving Data From a Single Table

Retrieving Data From Single Table:

SAS Codeproc sql ;title 'Widgeone Table';create table sql.table asselect Plant, SUM(Post_Training_Productivity)

format=comma14. as TotalProductivity, count(*) as Count from widget.widgeonegroup by planthaving count(*) gt 17;quit;

By inserting the Create Table clause, the results go to the library sql with the name, Table. This is a good way to put the results in a more permanent place. Also, count(*) is an aggregate function that returns all plants with more than 17 rows. In this case, it will return Dallas only. Notice that several variables can be listed in the SELECT statement including formatted functions.

Page 13: Introduction to SQL Session 1 Retrieving Data From a Single Table

USEFUL TOOLS

• Eliminating duplicate rows from the tableTo get query results with only unique values, add

the word “distinct” to the SELECT clause:proc sql;select distinct Plant, Avg(Post_Training_Productivity) as Average

format 8.2from Widget.WidgeOne;quit;

• Determining the structure of a tableThe DESCRIBE TABLE statement allows the user to

see information about the data within the file.

proc sql;describe table Widget.WidgeOne;

Page 14: Introduction to SQL Session 1 Retrieving Data From a Single Table

USEFUL TOOLS

• Adding text to outputo By putting string text within single

quotation marks, you can add text to your tables within the rows:

select ‘Factory’ , Plant , ‘in Production’from Widget.WidgeOne; where Plant is a variable name.

o By putting in a TITLE statement, you can add a title to your table:

proc sql;title ‘Widgets Work!’;select plant, Genderfrom sql.filename;

Page 15: Introduction to SQL Session 1 Retrieving Data From a Single Table

USEFUL TOOLS: CALCULATING VALUES

• Calculated data within PRC SQL:o By putting the calculation within the

procedure, values can be manipulated from the data set with the ending result within the table.

o We have seen an example already from Slide 13 using Average.

o Here is another example with a Title:

proc sql;

title ‘What is the Sum of Training Hours?’select distinct Plant, sum(Post_Training_Productivity) as Sum format 8.2;from Widget.WidgeOne;quit;

Page 16: Introduction to SQL Session 1 Retrieving Data From a Single Table

USEFUL TOOLS: SUMMARIZING DATA

• AVG, MEAN mean or average of data• COUNT, FREQ, N number of nonmissing

values• MAX/MIN largest value• NMISS number of missing values• RANGE range of values• STD standard deviation• SUM sum of values

• T Student’s t, H0 = population=0

• CSS corrected sum of squares

• VAR variance of data

Page 17: Introduction to SQL Session 1 Retrieving Data From a Single Table

USEFUL TOOLS: SUMMARIZING DATA

• How to use summarizing data, two examples:o MEAN

proc sql;select distinct Plant, Avg(Post_Training_Productivity) as

Average format 8.2from Widget.WidgeOne;quit;Note: Notice that the variables are inside

parentheses, similar to all functions. If you reference a calculated value such as Average within the WHERE clause, use the word “calculated” prior to using it. o COUNTproc sql;

select count(distinct Plant) as Countfrom Widget.WidgeOne; where “distinct” is used to count non-duplicates.

Page 18: Introduction to SQL Session 1 Retrieving Data From a Single Table

USEFUL TOOLS: THE WHERE CLAUSE

• The WHERE clause is very flexible and can be used to distinguish your data in very specific ways.o Do not include missing dataIS MISSING or IS NOT MISSING are missing indicators.

o Used in conjunction with AND/OR to combine clauses:

select Plant, sum(Post_Training_Productivity) as Totalfrom widget.widgeone

where Total lt 50 and Plant is not missing ; Note: Both conditions must be met. (lt means “less than.”)

o Used in conjunction with the LIKE operator:select Name

from widget.widgeonewhere Plant like ‘D%’;

Note: The condition will be filled if Plant begins with “D.” The % is a wild card within SAS.

Page 19: Introduction to SQL Session 1 Retrieving Data From a Single Table

USEFUL TOOLS: GROUPING

• Grouping by multiple columnso To group by multiple columns (variables), separate the column names with commas within the GROUP BY clause.

where Plant is not missinggroup by Plant, Post_Training_Productivity;

Page 20: Introduction to SQL Session 1 Retrieving Data From a Single Table

USEFUL TOOLS: HAVING VERSUS WHERE

• The HAVING clause with the GROUP BY clause affects groups in a way that is similar to the way in which a WHERE clause affects individual rows. PRC SQL only displays values satisfying the HAVING clause.

• It is helpful to know when to use HAVING and when to use WHERE.

• The HAVING clause can also use aggregate functions (summarizing functions) like the WHERE clause:select Plant, SUM(Post_Training_Productivity)format=comma14. as TotalProductivity, count(*) as Count from widget.widgeonegroup by planthaving count(*) gt 17;

Note: SQL will include the count of non-missing values to the table (* means to count every non-missing.) However, the HAVING clause limits the count to those greater than 15.

Page 21: Introduction to SQL Session 1 Retrieving Data From a Single Table

USEFUL TOOLS: HAVING VERSUS WHEREHAVING WHERE

__________________________________________________________________________•Includes groups of rows Includes individual rows•Must follow GROUP BY if used Must precede any GROUP BY used

with GROUP BY•When no GROUP BY, acts like Is not affected by GROUP BY clause

WHERE•Is processed after GROUP BY Is processed before GROUP BY clause, if

and any summations there is one and before summations.

Page 22: Introduction to SQL Session 1 Retrieving Data From a Single Table

PROC SQL Vs. SAS Program

Total Post Training Hours Grouped by Plant

•Let’s see the difference between using Prc Sql and SAS procedures to find the total hours grouped by the two different plants.

Page 23: Introduction to SQL Session 1 Retrieving Data From a Single Table

PROC SQL Vs. SAS Program

Total Post Training Hours Grouped by Plant

•Proc Sql

proc sql;select Plant, sum(Post_Training_Hours) as Total

format=comma15.from sql.WidgeOnewhere Gender = ‘Female’group by Plantorder by Total;

quit;

Page 24: Introduction to SQL Session 1 Retrieving Data From a Single Table

PROC SQL Vs. SAS ProgramTotal Post Training Hours Grouped by Plant•SAS Programproc summary data = sql.WidgeOne;

where Gender = ‘Female’;class Plant;var Post_Training_Hours;output out = sumPost sum = Total;

run;proc sort data = SumPost;

by Total;run;proc print data = SumPost noobs;

var Plant Total;format Total comma15.;where_type_=1;

run;

Page 25: Introduction to SQL Session 1 Retrieving Data From a Single Table

PROC SQL VS. SAS Program

• The example shows that PROC SQL can achieve the same results as base SAS software but often with fewer and shorter statements.

• The SELECT statement that is shown performs summation, grouping, sorting, and row selection. It also displays results without PROC PRINT.

Page 26: Introduction to SQL Session 1 Retrieving Data From a Single Table

SUMMARY

• The following code represents a summary of the clauses we discussed.

proc sql;title ‘Widgeone Table’;create table widget.table as select Plant, sum(Post_Training_Productivity) format=comma14.

as Totalfrom widget.widgeonewhere Plant is not missing;quit;

Page 27: Introduction to SQL Session 1 Retrieving Data From a Single Table

SUMMARY CONTINUED• Now we use HAVING and GROUP BY as well.

proc sql;title ‘Widgeone Table’;create table widget.table as select Plant, sum(Post_Training_Productivity) format=comma14.

as Total, jobgradefrom widget.widgeonehaving jobgrade gt 5group by plant;quit;