create table students - courses · 1999. 10. 23. · cs 5614: basic data definition and...
TRANSCRIPT
CS 5614: Basic Data Definition and Modification in SQL 67
Introduction to SQL
� Structured Query Language (‘Sequel’)� Serves as DDL as well as DML
� Declarative� Say what you want without specifying how to do it� One of the main reasons for commercial success of DBMSs
� Many standards and implementations� ANSI SQL� SQL-92/SQL-2 (Null operations, Outerjoins etc.)� SQL3 (Recursion, Triggers, Objects)� Vendor specific implementations
� “Bag Semantics” instead of “Set Semantics”� Used in commercial RDBMSs
CS 5614: Basic Data Definition and Modification in SQL 68
Example:
� Create a Relation/Table in SQL
CREATE TABLE Students(sid CHAR(9),name VARCHAR(20),login CHAR(8),age INTEGER,gpa REAL);
� Support for Basic Data Types� CHAR(n)� VARCHAR(n)� BIT(n)� BIT VARYING(n)� INT/INTEGER� FLOAT� REAL, DOUBLE PRECISION� DECIMAL(p,d)� DATE, TIME etc.
CS 5614: Basic Data Definition and Modification in SQL 69
More Examples
� And one for Courses
CREATE TABLE Courses(courseid CHAR(6),department CHAR(20));
� And one for their relationship!
CREATE TABLE takes(sid CHAR(9),courseid CHAR(6));
� Why?
� Can also provide default values
CREATE TABLE Students(sid CHAR(9),....age INTEGER DEFAULT 21,gpa REAL);
CS 5614: Basic Data Definition and Modification in SQL 70
Examples Contd.
� DATE and TIME� Implementations vary widely� Typically treated as strings of a special form� Allows comparisons of an ordinal nature (<, > etc.)
� DATE Example� ‘1999-03-03’ (No Y2K problems)
� TIME Examples� ‘15:30:29’� ‘15:30:29.3875’
� Deleting a Relation/Table in SQL
DROP TABLE Students;
CS 5614: Basic Data Definition and Modification in SQL 71
Modifying Relation Schemas
� ‘Drop’ an attribute (column)
ALTER TABLE Students DROP login;
� ‘Add’ an attribute (column)
ALTER TABLE Students ADD phone CHAR(7);
� What happens to the new entry for the old records?� Default is ‘NULL’ or say
ALTER TABLE Students ADD phone CHAR(7) DEFAULT ‘unknown’;
� Always begin with ‘ ALTER TABLE <TABLE_Name>’
� Can use DEFAULT even with regular definition (as in Slide 69)
CS 5614: Basic Data Definition and Modification in SQL 72
How do you enter/modify data?
� INSERT command
INSERTINTO StudentsVALUES (‘53688’,’Mark’,’mark2345’,23,3.9)
� Cumbersome (use bulk loading; described later)
� DELETE command
DELETEFROM Students SWHERE S.name = ‘Smith’
� UPDATE command
UPDATE Students SSET S.age=S.age+1, S.gpa=S.gpa-1WHERE S.sid = ‘53688’
CS 5614: Basic Data Definition and Modification in SQL 73
Domains
� Domains: Similar to Structs and other user-defined types
CREATE DOMAIN Email AS CHAR(8) DEFAULT ‘unknown’;....login Email // instead of login CHAR(8) DEFAULT ‘unknown’
� Advantages: can be reused
junkaddress Email,fromaddress Email,toaddress Email,....
� Can DROP DOMAINS too!
DROP DOMAIN Email;� Affects only future declarations
CS 5614: Basic Data Definition and Modification in SQL 74
Keys
� To Specify Keys� Use PRIMARY KEY or UNIQUE� Declare alongside attribute� For multiattribute keys, declare as a separate line
CREATE TABLE takes( sid CHAR(9),
courseid CHAR(6),PRIMARY KEY (sid,courseid)
);
� Whats the difference between PRIMARY KEY and UNIQUE?� Typically only one PRIMARY KEY but any number of UNIQUE keys� Implementor allowed to attach special significance
CS 5614: Basic Data Definition and Modification in SQL 75
Creating Indices/Indexes
� Why?� Speeds up query processing time
� For Students
CREATE INDEX indexone ON Students(sid);CREATE INDEX indextwo ON Students(login);
� How to decide attributes to place indices on?� One is (typically) created by default on PRIMARY KEY� Creation of indices on UNIQUE attributes is implementation-dependent� In general, physical database design/tuning is very difficult!� Use Tools: Microsoft SQLServer has an index selection Wizard
� Why not place indices on all attributes?� Too cumbersome for insertions/deletions/updates
� Like all things in computer science, there is a tradeoff! :-)
CS 5614: Basic Data Definition and Modification in SQL 76
Other Properties
� ‘NOT NULL’ instead of DEFAULT
CREATE TABLE Students(sid CHAR(9),name VARCHAR(20),login CHAR(8),age INTEGER,gpa REAL);
� Can insert a tuple without a value for gpa� NULL will be inserted
� If we had specified
gpa REAL NOT NULL);
� insert cannot be made!
������������ �������������������
�� "!$#&%('*),+.-&/10(2$35462748%(0(09':!;/&-�<=!$#&>?2@2A-&%CBD2@>?2$ ;!FEG/&2$>:<H0(IJ &KL/1ILKL2@'?ML>N2$O&>N2$'?2$ ;!;IP!;%C+L &'Q/&'N2$-�%( SR$+L JTU /& &RV!;%(+J W48%X!;#W!$#&2>?2$0CIY!;%C+L &IJ0L)"+.-&2$0CZ\[]#&2^&>N'�!_%('9>N2$0(IP!;%C+L &IJ0.IL0(KJ2$`&>NIL3GIL QIL0CKL2$`&>NIL%CR_IJ &-AO1>?+.R$2$-1/&>?IJ046I�<Hab+J>]R$>?2@IY!;%C &Kc &2N4d>?2@0(IY!$%(+J &']ab>?+J)eKL%gfD2$ �+L 12$'?Zh[]#&2Q'?2$R@+L &-�%(']i*IY!;IJ0(+JKL3548#&%(RY#�%(']0(+JKL%CR$IL0jIL &--&2$R@0(IL>NIY!$%XfD2h%( � &IY!$/&>?2,k:'�!;/1-&2$ ;!;']abIL)"%(0C%(IJ>l48%X!$#F!;#&2Qmn]oQp9o8qrO&>?+JKL>NIL)"),%C &K,0CIL &KJ/&ILKJ2h48%C0(0�^1 &-!;#1%('\IJ0(0�!;+.+c &IP!;/&>NIL0(sNZt�� �abIJRN!;3�!;#&2@>?2uIJ>?2uR$0C+L'N2uR$+L>N>?2$'NOv+L &-&2@ &R$2$'w`D2V!V4u2$2$ �-1IY!;IJ`&IL'N2Q'�<�'�!;2@),'wIL &-mn]oQp9o8qQZwm_n]oQp9oQqxR$IL y`v2z!;#1+L/&KJ#�!=+Ja]IL'AIS-&IY!$IL`&IJ'?2�'�<�'�!$2$){48#&2$>N2,IJ0(0|!;#12�-&IY!$I}^J!$'A%C �!$+)"IL%( ~)"2$)"+L>�<.Z��� �R@+L ;!;>?IJ'�!$3uI�-1%('�!$%( &KJ/&%('N#&%( 1K�ab2$IY!$/&>?2S+Launhiu�]���&'c%C'Q!$#&IY!"!;#&2V<�+LOv2$>NIY!;2�+J '?2@R$+L &-1IL>�<�'�!$+L>NILKL2@Z�oW!$#&2$>W!;#&IJ �!;#&IP!yk�IJ &-~'?+J),2�-1%(Bv2$>?2@ &R$2$'Q%( yEG/&2$>�<�O&>N+.R$2$'?'N%( &KJs?3�!$#&2$>?2"IL>N22N��R$2$0C0(2$ ;!7IL 1IL0(+JKL'w!$+c`v+Y!;#SR$/&0g!;/&>N2$'?Z]�]+P!;#�mn]o8p9oQq�IJ &-�ntiu�]���1']IL>?2Q-&2@R$0(IJ>?IY!$%XfD2$Z���#&IY!]4u2� &+�4�!;+"`v2A>?2@0(IY!$%(+J &']IL>N2A>?2@ab2$>?>N2$-z!;+"IL'*�(O&>N2$-&%(R@IY!;2@'?��%C �m_n]oQp9oQqQZ���!$/&O&0(2Q%C']R$IL0C0(2$-�I"KL>?+J/& &-abILRV!F%C }mn]oQp9o8qQZ���!;IJ`&0(2z%('hR$IL0C0(2$-�IL ��C2N�G!;2$ &'N%(+J &IL0l-&2$^& 1%X!;%C+L &�9%C }mn]oQp9oQq�IJ &-�'?+�+J &Z�k:iu+ &+P!8KL2N!u`D+JKLKJ2$-}-&+�48 �`;<"!;#&2$'N2u'?Ov2$R@%(^&R$'N�G4u2t%C &R$0(/1-&2!$#&2$)�#&2$>N2 U /&'�!u'?+*!$#&IY!w<D+L/�R$IL �),I � 2!$#&2R$+J & &2$RN!$%(+J &3�%(a|<D+J/�IJ>?2cIL0C>?2$IJ-J<�abIJ),%C0(%CIL>w48%g!;#�mn]oQp9o8qQZ�0('N2$3� &+P!;#&%C &KF!;+F46+J>?>�<.Z(s�[]#&2*!$#&%(>N-EG/&2$>�<=>N2$O&>N2$'?2$ ;!;IP!;%C+L "%('?3j+La9R$+J/&>?'N2$3��1�8p�!$#&IY!w46IJ']%( ;!;>?+.-1/&R$2$-"2$IL>N0(%C2$>?Zw�V 6!;#12u>?2$)"IL%C &-&2$>+Ja1!;#&%C'-&+.R$/1),2$ ;!;3�4u2F48%(0(0%C �!$>?+.-&/&R@2=`1IL'?%CR�+LOv2$>?IP!;%C+L &'"IL &-~)"IL &%CO&/&0CIY!;%C+L &'*!$#&IY!c4u2�R$IL ~Ov2$>?ab+J>?)�+J >?2@0(IY!$%(+J &'?Z" v+L>Q2$ILRY#�'?/1R¡#�`&IJ'?%CRc+LOv2$>?IP!;%C+L &3�462*48%(0C0�'N#&+�4¢#&+�4�%X!c%('Q>?2$O1>?2$'N2$ �!$2$-}%C �2$ILRY#�+La|!$#&2!;#1>?2$2Q-&%CBD2@>?2$ ;!6 &+P!;IP!;%(+J &'?Z£ Zu¤�¥l¦¨§v¥©§vª]«�¬G¨®&¯@¦¨§v¥l°�±\[]#&2z/& &%C+L �+Laj!�4u+c>?2$0CIY!;%C+L &'7²�IL &-=³�%('�!;#&2Q'?2V!8+Lal2$0(2$)"2$ ;!;'�!;#&IP!IJ>?2u%C �²�+J>\%C �³´+L>]`v+P!;#&Z���2uIL'N'?/&)"2w!$#&IY!w!;#&2h'?RY#&2$)"IL'w+La�²�IJ &-�³�IL>N2uIL0C% � 28k�+Ja9R$+L/1>?'?2@sIJ &-z!;#1IY!!;#&2@%(>�R$+J0(/&)" &'IL>N2tIJ0('N+A+L>N-&2$>N2$-,IJ0(% � 26k�+Ja�R$+J/&>?'N2$3vILKJIL%( 1s?Z���2] &+�4�KJ%XfD2�!;#12!$#&>?2$2>N2$O&>?2@'?2$ ;!;IP!;%(+J &']+La5!;#128/1 &%(+J }>?2@0(IY!$%(+J &±
² ³�k�'?%C),O10(2$3j>?%(KJ#�!$µ�s¶�·?¸¹?º¹?»�¹�¼�½~¾À¿�Á·?¸�¹�º�¹�»�¹?¼|½�¶�·?¸¹?º¹?»�¹�¼�½~¾À¿�÷?¸�¹�º�¹�»�¹?¼|½�ÂÄ*+Y!;%CR$2S!$#&IY!�!$#&2SfÀIL>N%(IL`10(2$' ¸�¹�º�¹?»¹?¼ IJ>?2�)"2$>?2@0X<��(O10(ILR@2$#&+L0C-&2$>N'?�Q/&'?2@-�ab+J>�O&IY!N!;2$>N )"IY!$R¡#1%( &KJ�1462uR$+J/&0(-�#&I�fD2W48>?%g!?!$2$ F!;#&2QIL`v+�fD2W!�46+cIL'N±¶�·?Ź?ƹ;Ç�¹�È�½~¾À¿�Á·?Å�¹�Æ�¹�Ç�¹?È|½�¶�·?ɹ?ʹ?Ë�¹�Ì�½~¾À¿�÷?É�¹�Ê�¹�Ë�¹?Ì|½�·?ÃGÍÀÎGÍÀÏG¶ÑÐ
Ò�ÓlÔ\Õ�Ö¨ÔØ×_Ù�ÚÜÛ�ÝLÔYÚÜÙNÞ?ß9àáÔYÖ9âYã�Û�ß�ߨÚäÙ�â�×_åVæ]×�Û�ßç×_Ú Þ?èéå�ÛVêWÞ?Ô]ë]ê�ÚÜìLÛ�Ö�ÛVÙVÞ_ê�Ô@èVã�í]Û�ÙNÞ\ßîÞbæ.ïÜÛ�àîÔ¡ÖjðQÔ$ê�ã�ïäÛ�ñ¡ò
óLó
ôGÁ�õ�ö©Á�½÷Gø�ùÀõ;ø·?ÃGÍÀÎGÍÀÏG¶ÑÐôGÁ�õ�ö©Ã�½
ú Zuû�¦�ü_¬Gý�¬À¥lþ.¬"§Dªw«�¬G¨®&¯@¦¨§v¥l°�±\[]#&2*-&%(Bv2$>N2$ &R$2t² ³~+Ja1!�46+A>N2$0(IP!;%C+L &'\²ÿIJ &-�³�%('�!;#&2*'?2N!u+La2$0C2$)"2$ �!$'l!$#&IY!AIL>N2A%( =²�`&/J!A &+Y! %C =³Zh�u'h/&'?/1IL0(35462QIJ'?'?/1),27!;#&IP!t!$#&2A'NR¡#&2@),IJ']+La�²�IL &-³�IL>N2uIL0C% � 2hIL &-F!$#&IY!w!;#&2$%C>R$+L0C/&), 1'\IL>N2uIL0C'?+c+J>?-&2$>N2$-�IJ0(% � 2@Z\��+L>N2$+�fD2$>?31 &+Y!$%(R$2w!$#&IY!6² ³%C'] &+Y!�k�KJ2$ &2$>NIL0(0g<|s�!;#&2Q'NIL)"28IJ'*³ ²*Z
² ³¶�·?¸¹?º¹?»�¹�¼�½~¾À¿�Á·?¸�¹�º�¹�»�¹?¼|½D¹�ø|õ;¶©Ã·?¸�¹�º�¹�»�¹?¼|½�·?ÃGÍÀÎGÍÀÏG¶ÑÐôGÁ�õ�ö©Á�½Í��GÏÀÍ��À¶·?ÃGÍÀÎGÍÀÏG¶ÑÐôGÁ�õ�ö©Ã�½
� Z��Y¥�¯$¬Gý�°$¬Gþ.¯$¦¨§v¥�§vª«�¬G¨®&¯@¦¨§v¥l°�±[]#&2]%( ;!;2$>N'?2$RV!;%(+J F² ³´+La�!�46+Q>?2$0CIY!$%(+L 1'�² IJ &-,³´%('j!$#&2]'?2V!+Ja�2$0C2$)"2$ �!$'l!$#&IY!8IJ>?2A%C ����� �²�IL &-}³Z]�*KLIL%C &31462*IL'?'N/&)"2h!;#1IY!]!;#128'NR¡#12$),IJ']+La�²ÑIJ &-�³IJ>?2\IJ0(% � 2_IJ &-*!;#1IY!�!$#&2$%(>9R@+L0(/1), &'lIL>N2]IL0C'?+h+L>?-12$>?2$-cIJ0(% � 2$Z�Äu+P!;%(R@29!$#&IY!7² ³���² k¨² ³�sNZ
² ³¶�·?¸¹?º¹?»�¹�¼�½~¾À¿�Á·?¸�¹�º�¹�»�¹?¼|½D¹�÷?¸¹?º�¹�»�¹�¼D½�·?ÃGÍÀÎGÍÀÏG¶ÑÐôGÁ�õ�ö©Á�½ù;øG¶ÀÍGÁÀÃGÍGÏÀ¶·?ÃGÍÀÎGÍÀÏG¶ÑÐôGÁ�õ�ö©Ã�½
� Z��Qý�§�� ¬Àþ.¯$¦�§v¥�±*o8Ov2$>NIY!$2$'u+L yI�'N%( &KJ0(2A>N2$0(IP!;%C+L }IJ &-�>?2@),+�fD2$'h'?+J),2z+La|!;#&2cR@+L0(/1), &'NZ��u'N2$ab/&0ab+J>,>N2$'�!$>?%(RV!;%( 1K}%( 1ab+L>?)"IY!$%(+J &Z©�u'?'N/&)"2c!$#&IY!"462F4uIL ;!=+L &0g<�!;#&2� &IJ),2�IJ &-�IJ-&-&>?2@'?'"ab>?+L)>N2$0(IP!;%(+J =²*Z
���������! �#"$"&%'�)(*( ²¶�·?¸¹?º|½©¾G¿�Á·?¸¹?º�¹�»�¹�¼D½�Â[]#G/&' »�¹�¼ `D2@R$+L)"2]%(>N>?2$0C2NfÀIL ;!WIP!?!;>N%(`&/�!;2$'NZl��2]R$+L/&0C-,%C &'�!$2$IL-c>N2$%( 1ab+L>?R@2�!;#&%C'�`�<z48>?%g!;%( 1K¶�·?¸¹?º|½©¾G¿�Á·?¸¹?º�¹,+�¹,+.½�ÂÃGÍGÎÀÍGÏÀ¶.-GÅ�/ÀÉ�¹�ÅGÈÀÈ�0GÉ21�1ôGÁ�õ�ö©Á
3 Z�4l¬À�¬Gþ.¯$¦¨§v¥�±\oQOv2$>?IP!;2$'h+L SIc'?%C &KL0C28>N2$0(IP!;%C+L �IL &-S>?2$)"+�fD2$']'?+J),2Q+Jaj!;#&2Q>N+ 48'NZt[]#12A>?2@),+�fÀIL0%C'8`1IL'?2@-�+L �'N+L)"2,R@+L &-&%g!;%C+L y'?Ov2$R$%C^&2$-y`�<�!$#&2,/1'?2$>NZ� D+J>Q2N�|IJ),O10(2$3'?/1O&Ov+L'?28462646IL ;!�IJ0(0!$#&2W!;/&O&0C2$']ab>?+J)¢²�48#12$>?2W!;#128 1IL)"28%C']�(��%CR¡#&IJ2$0(�CZ
ó65
7���������8:9<;>=<?A@�#�)BDC C ²¶�·?¸¹?º¹?»�¹�¼�½~¾À¿�Á·?¸�¹�º�¹�»�¹?¼|½D¹�¸FEHG�öJIGÇ�ÌGÅÀÉ�KJL�ÂÃGÍGÎÀÍGÏÀ¶�ÐôGÁ�õ�ö©ÁM�NGÍÀÁGÍ.-ÀÅ�/GÉ.EOG?ö2IGÇ;ÌÀÅGÉ2KJL
P Z�Qÿ§vý�¬R4l¬À�¬Gþ.¯$¦¨§v¥l°TSH�&2$0C2$RN!;%C+L &'c`v2$R@+L)"2}R$+J),O&0C%(R$IP!;2$-�48#&2@ y!;#12}R$+J &-&%X!$%(+J &',KJ2N!}0(+L 1KL2$>N3O&IJ>�!$%(R$/&0CIL>N0X<�48%X!$#}iuIP!;IL0C+LK�kç!$#&2A+P!;#&2@>w!�46+�ab+J>?)"'tIJ>?2cO&>N2N!?!�<�'�!$>?IL%CKL#;!;ab+J>�46IJ>?-&sNZVU]+L &'N%(-&2@>ab+J>_2V�|IJ),O&0C2$3;48#&2$ *462�4uIL ;!7IL0C0J!;#12�!;/&O&0C2$'�ab>N+L)dn´48#12$>?2�!$#&2] &IL)"2]%('��(��%(RY#&IJ2$0(�.oQny48#12$ !$#&2QKL2$ &-12$>]%(']�C���(ZG��2W48>?%g!;2w!$#&%(']%C =i*IY!;IJ0(+JK�IL'N±
¶�·�¸�¹?º¹?»¹?¼�½�¾G¿�Á·?¸¹?º�¹�»�¹�¼D½�¹,¸FEHG?ö2IGÇ�ÌGÅGÉ2KJLD¶�·�¸�¹?º¹?»¹?¼�½�¾G¿�Á·?¸¹?º�¹�»�¹�¼D½�¹,»FEHG?öWL�Â
Äu+P!;%CR$2h!$#&IY!Q462c�C'?O&0C%X!$�j!;#&2cR@+L &-&%g!;%C+L �ILR@>?+L'N'h!�46+�>?/&0C2$'?3 U /1'�!,IL'W462"-&+�%( �!$#&2c/& &%C+L yR$IL'N2kXU]+L)"2z!;+ !;#&%C � +Lah%X!$3l!;#12,oQn�%('A%C &-&2$2@-�!$#&2,/1 &%(+J �+Ja�!�46+}R@+L &-&%g!;%C+L &'Ns?Zy�&%()"%(0CIL>?0g<�3j!$#&2R$+J),)"I�%C }2$IJR¡#S+La|!;#&2cIJ`v+ fD2c-1IY!;IJ0(+JK=>N/&0(2@'t)"+.-&2$0C'w!;#&2c�*ÄuidR$+J &-&%g!;%(+J &Zcp92N!$�('hR$+L &'N%(-&2@>I�)"+L>N2AR$+J),O10(%(R@IY!;2@-=R@+L &-&%g!;%C+L &±Q�&2$0C2$RN!8IJ0(05!;#&27!;/&O&0C2$'hab>?+L)xn�!;#&IP! IL>N2A &2$%g!;#&2@>t)"IL0C2A &+J>#&I�fD2w!$#&2Q &IL)"2Q�( v+��|�CZ
¶�·�¸�¹?º¹?»¹?¼�½�¾G¿�Á·?¸¹?º�¹�»�¹�¼D½�¹,» ¾�YZG?öWLG¹=¸d¾FY[G�ôJ\;¸WL�Â
��#&%C0(2Q'?2@0(2$RN!$%( &K*!$#&2h!$/&O&0C2$']ab>?+J) n�!$#&IY! IJ>?2z &+Y! `v+P!;#�)"IL0(2QIJ &-�#&I�fD2W!;#&2z &IL)"2Q�( v+��|�9%C'IJR¡#&%C2NfD2$-"`�<�kç48#;<|µ�s?±
¶�·�¸�¹?º¹?»¹?¼�½�¾G¿�Á·?¸¹?º�¹�»�¹�¼D½�¹,» ¾�YZG?öWL�¶�·�¸�¹?º¹?»¹?¼�½�¾G¿�Á·?¸¹?º�¹�»�¹�¼D½�¹,¸ ¾�YZG?ô2\;¸HL|Â
ó Z�]c®&ý�¯$¬G°$¦�®&¥^�Qý�§:_�`lþ.¯�±c[]#&%('z%('w!$#&2"'?2N!,+Law�(O&IJ%(>N'?�ab+L>N),2$-y`;<�RY#&+.+L'N%( &K !;#&2"^&>?':!,2$0(2$)"2$ ;!ab>N+L)¢²ÑIL 1-"!;#12u'?2$R@+L &-�2$0C2$),2@ �!tab>?+J)¢³�Z]�� �R$IL'N2Q+LalR$+L &ab/1'?%(+J &'hIL)"+L &K"IY!N!;>?%C`&/J!$2u &IL)"2$'N3-&%C'?IJ)c`&%(KJ/&IY!$2h!;#12$)x`�<�O&>N2$^J��%( &K6!;#&2@)�48%g!;# !;#&2c>N2$0(IP!;%(+J } &IJ),2@Z8[]#12cR$IL>:!;2$'N%(IL SO&>?+.-1/&RN!+Jav!�46+">?2@0(IY!$%(+J &'7²ÑIL &-}³ %('hKL%gfD2$ �`;<|±
² ³¶�·?¸2a�¹�ºJa�¹�»Ja|¹V¼JaD¹?¸Fb�¹?ºFb�¹?»�b�¹�¼�b�½~¾À¿�Á·?¸Ja|¹?º2a�¹?»2a�¹V¼2a.½�¹,÷?¸�b¹?ºFb�¹?»Fb�¹V¼Fb�½�ÂÃGÍGÎÀÍGÏÀ¶©ÁhÂ*-ÀÅ�/ÀÉ�¹�ÁhÂ�ÅÀÈGÈF0GÉJ1F1D¹�ÁtÂ�ËÀÉ�-GÈÀÉ�0¹,Áh¨ÆJIc0FdGÌÀÈGÅ�d.ɹ
ÃhÂ*-ÀÅ�/ÀÉ�¹�ÃhÂ�ÅÀÈGÈF0GÉJ1F1D¹�ÃtÂ�ËÀÉ�-GÈÀÉ�0¹,Ãh¨ÆJIc0FdGÌÀÈGÅ�d.ÉôGÁ�õ�ö©Á�¹�Ã
Äu+P!;%CR$2Q#&+�4�462z-&%('NIL)c`&%CKL/&IP!;2AIP!?!$>?%(`1/J!;2$'h%( !;#&2Q�&�Qp�fD2$>N'?%(+J &Zu�u0C'?+" &+Y!$%(R$27!;#&IP!F%Ca�²r#1IL'e !;/&O&0C2$']IL 1-�³ #1IL'Wf�!;/&O&0C2$'?31!$#&2$ =² ³H48%C0(0�#1I fD2 e f�!;/1O&0(2$'NZ
ó6g
5 Z�hji�¬G¯$®lk#m�§v¦¨¥�±�[]#&%('c%C' U /&'�!=0C% � 2*!$#&2�R$IL>:!;2$'N%(IL ~O&>N+.-&/&RN!=`1/J!=KL+.2$'"I�'�!$2$O~ab/&>�!$#&2$>NZ��*aä!;2$>ab+J>?)"%( &K*!$#&2 e f�!;/&O10(2$'N3�%X!u'?2$0C2$RN!$'\+L 10X<HIc'N/&`&'?2V!8+La5!;#&2$)Ñ!;+c%C &R$0C/&-&2u%C �%X!$']IL &':462$>N39`&IL'N2$-+J �'N+L)"2�R$+L &-1%X!;%C+L &Z�[]#G/&'N3h!;#12�!;#&2N!$IYT U +L%C ©+Lah!�4u+�>?2$0CIY!$%(+L 1'�#&IL'8!;#&2�'NIL)"2� À/&)c`v2$>�+LaR$+J0(/&)" &'hIL'�!;#&2zR$IL>:!;2$'N%(IJ }O&>N+.-&/&RN!A`&/J!A &+Y!c &2$R$2$'N'?IJ>?%(0g<�!;#128'NIL)"2A G/&)c`v2$>]+La>N+ 48'Ak�'N+L)"2+Ja]!$#&2}>N+�48'Q48%C0(0h`D2S>?2$)"+�fD2$-~`v2$R$IL/1'?2�!;#12N<�-&%C'?'?IP!;%C'?aä<Ñ'N+L)"2}R$+J &-&%g!;%(+J &s?Z^U]+J &'?%C-&2$>�ab+J>2N��IL)"O&0C2$3 !;#&IP!]462W46IJ �!w!;+c^& &-��CO&IL%C>?'?�9+Ja�':!;/&-12$ �!$'\'N/&R¡#F!$#&IY!]!$#&2Q^&>?':!8OD2@>?'?+J �%( F!;#&2*O&IL%C>%C']IL0X4uI�<|']�C��%(RY#&IL2$0C�(Z|��28KJ2N!;±
²on)p6qsr �t�#�u�#8lv ;>=<?A@�#�)BDC ³¶�·?¸2a�¹�ºJa�¹�»Ja|¹V¼JaD¹?¸Fb�¹?ºFb�¹?»�b�¹�¼�b�½~¾À¿�Á·?¸Ja|¹?º2a�¹?»2a�¹V¼2a.½�¹,÷?¸�b¹?ºFb�¹?»Fb�¹V¼Fb�½D¹,¸2acEHG�öJIÀÇ;ÌGÅÀÉJK�L|ÂÃGÍGÎÀÍGÏÀ¶©ÁhÂ*-ÀÅ�/ÀÉ�¹�ÁhÂ�ÅÀÈGÈF0GÉJ1F1D¹�ÁtÂ�ËÀÉ�-GÈÀÉ�0¹,Áh¨ÆJIc0FdGÌÀÈGÅ�d.ɹ
ÃhÂ*-ÀÅ�/ÀÉ�¹�ÃhÂ�ÅÀÈGÈF0GÉJ1F1D¹�ÃtÂ�ËÀÉ�-GÈÀÉ�0¹,Ãh¨ÆJIc0FdGÌÀÈGÅ�d.ÉôGÁ�õ�ö©Á�¹�ÃM�NGÍÀÁGÍ©ÁtÂ*-GÅF/GÉwExG?öJIÀÇ;ÌÀÅGÉJK2L
Äu+P!;%CR$2|!;#&IP!�4u2_%C �!$>?+.-&/1R$2l!$#&2�(`v+�4]!;%(2@�.'�<�)c`D+J0yn*pz'?/lz6�ÀT?2$-z`�<u!$#&2\R@+L &-&%g!;%C+L Aab+J>�%C &-&%CR$IY!$%( &K!$#&2W!;#&2N!$IYT U +L%C &Z]�u0('N+L39>N2$IL0C%|{$2w!$#&IY!
²}n*p�~´³�� 7 ~]k�² ³�s
g Z��S®&¯�`lý�®1�m�§v¦�¥�±S[]#&%('z%(' U /&':!�I�R$0C2NfD2$>N2$>w46I�<�!$+�R$+L)c`1%( &26!�46+}>N2$0(IP!;%C+L &'c%C �!$+}+L 12$Z~[]#&2`&IJ'?%CR,%C-&2$I�%('W!;#1IY!�%Ca|!;#&26!�46+�>?2$0CIY!$%(+L 1'8#1I fD2"'?+J),2"R$+J0(/&)" �k�'NsQ%( yR$+L)"),+J &3|!;#&2@ �4u2,R$IJ �CR$+L0C0(ILO1'?2$�j!$#&2$) %C �!$+"!;#12,'NIL)"2�R$+L0C/&)" �%C �!$#&2,^1 &IL0w+L/J!$O&/J!;Zy��+L>N2$+�fD2$>?3|4u2,R$IJ �-&+ !;#&%C'+J &0X<}%(a�!;#&2w!�4u+h!;/1O&0(2$'ab>N+L)Ñ!;#&2w!�4u+8>N2$0(IP!;%C+L &'wILKL>N2$2u%C z!;#&+J'?2uR@+L)"),+J �R$+L0C/&)" &'?Zw[]#À/&'N3�%g!%C'u'?%C),%C0(IL>�!$+z!;#12cR$IL>:!;2$'N%(IL �O1>?+.-&/&RV!;3�`&/�!u462z� U +L%C &�l+L &0g<}!;#1+L'?2cO1IL%(>N'w!;#1IY! )"IY!$R¡#S%( �!;#12$%(>R$+J),)"+L "IY!N!;>N%(`&/J!$2$'?Z�U]+L &'N%(-&2@>l!;#1IY!462�46IJ �!!;+Q^& 1-z!;#&2h &IJ),2$3vIJ-&-&>?2@'?'?3vKJ2$ &-&2$>N3vKLO&IzIL &-`&%C>�!$#&-&IY!$2A+Ja�'�!$/&-&2$ ;!;'h%( �I�'N%( &KJ0(2Q>?2$0CIY!$%(+L 1ZuÄu+Y!$%(R$2_!;#&IP! KLO&I"%('hI�fÀIL%(0CIL`&0C2Qab>?+L)���`&/�!t!$#&2+P!;#&2$>ab+J/&>�IY!N!;>N%(`&/J!$2$'IL>N2]O&>?2$'N2$ ;!7%( ²*Z]�1+L3 462] &2@2$-cIW46I�<A!$+8%C �!$2$0(0C%(KJ2$ �!$0X<AR$+L)c`1%( &2�!;#&2$'N2!�46+c>N2$0(IP!;%C+L &'N±
²on)p��¶�·?¸¹?º¹?»�¹!��¹�¼D½~¾G¿~Á�·?¸¹?º¹?»�¹�¼�½|¹,Î�·�¸�¹?º¹��|½�ÂÃGÍGÎÀÍGÏÀ¶©ÁhÂ*-ÀÅ�/ÀÉ�¹�ÁhÂ"ÅÀÈGÈF0GÉJ1F1�¹,ÁtÂ�ËÀÉ�-GÈÀÉ�0¹�ÎtÂ�Ë��ÀÅ�¹�Áh¨ÆJIc0FdGÌÀÈGÅ�d.ÉôGÁ�õ�ö©Á�¹�ÎM�NGÍÀÁGÍ©ÁtÂ*-GÅF/GÉFEGÎhÂ)-.Å�/ÀÉ� ø���Áh¨ÅGÈGÈF0GÉ21�1�E�ÎhÂ�ÅÀÈGÈF0GÉJ1F1
£�� Z8«�¬G¥l®l��¦�¥u�j±\[]#1%('h%(' U /&':!FIcR$+.+J0v!$#&%( 1KL39%( �R$IJ'?2W462Q#&I�fD2W!;+.+")"IL ;<H &IL)"%( 1KcR$+L l�1%(RN!$']IL &-R$+J &ab/&'N%(+L 1'uIL>N%('?%C &KLZ7��2cR@IL }/1'?2*!;#1%('h+LOv2$>?IP!;+J>]!$+�>?2@ &IL)"2AI�>N2$0(IP!;%C+L &�C't &IJ),2zIL &-1ML+L>Q+J &2+J>t)"+L>N28+Ja%X!;']IP!?!$>?%(`1/J!;2$'NZu v+L>]2N��IL)"O&0C2$39IL'N'?/&)"2h4u2]4uIL ;!t!$+,>N2$ &IJ),28²�!$+�� IJ &-})"I � 2%g!;'hR$+L0C/&), 1'�!;+�`v2AR@IL0(0C2$- e £ 3 e ú IL 1- e � ZQ[]#&%C't%C't)"+L':!F/&'N2$ab/&0548%X!$#=>N2$0(IP!;%C+L &IJ0�IJ0(KL2@`&>?IJ30C% � 2h'?+L±
� ;������& ���# �s�t� k¨²*s
5 �
CS 5614: Misc. SQL Stuff, Safety in Queries 81
What is still to be covered(and will be)
� Declaring constraints� Domain Constraints� Referential Integrity (Foreign Keys)
� More SQL Stuff� Subqueries� Aggregation
� SQL Peculiarities� Strange Phenomena� More on Bag Semantics� Ifs and Buts
� Embedding SQL in a Programming Environment� Accessing DBs from within a PL� (will be covered in Module 3)
CS 5614: Misc. SQL Stuff, Safety in Queries 82
What will be mentioned(but not covered in detail)
� Triggers� Read Cow Book or Boat Book
� More SQL Gory Details
� Recursive Queries (SQL3)� Why do we need these?
� Security
� Authorization and Privacy
� Trends towards Object Oriented DBMSs
CS 5614: Misc. SQL Stuff, Safety in Queries 83
Tuple-Based Domain Constraints
� Already Seen� NOT NULL� UNIQUE, PRIMARY KEY etc.
� In General
CREATE TABLE Students(sid CHAR(9),name VARCHAR(20),login CHAR(8),age INTEGER,gpa REAL,CHECK (gpa >= 0.0));
� Note: Implementations vary, but this is the general idea
� Other Complicated Forms� Constraints on whole relations, Assertions
CS 5614: Misc. SQL Stuff, Safety in Queries 84
Referential Integrity Constraints
� Foreign Keys� An attribute a of R1 is a foreign key if it “references”
the primary key (say b) of another relation R2� In addition, there is a ref. integrity constraint from R1 to R2.
� Example� login is a FOREIGN KEY for Students
CREATE TABLE Students(sid CHAR(9) PRIMARY KEY,name VARCHAR(20),login CHAR(8)
REFERENCES Accounts(acct),age INTEGER,gpa REAL);
CREATE TABLE Accounts(acct CHAR(8) PRIMARY KEY);
CS 5614: Misc. SQL Stuff, Safety in Queries 85
Alternatively
� Can use “FOREIGN KEY” construct
CREATE TABLE Students(sid CHAR(9) PRIMARY KEY,name VARCHAR(20),login CHAR(8),age INTEGER,gpa REAL,FOREIGN KEY login
REFERENCES Accounts(acct));
CREATE TABLE Accounts(acct CHAR(8) PRIMARY KEY);
� Note: acct should be declared as PRIMARY KEY for Accounts!� in both cases
CS 5614: Misc. SQL Stuff, Safety in Queries 86
SQL Subqueries
� GivenStudents(sid,name,login,age,gpa)HasCar(sid,carname)
� Find� the car of the student with login=”mark”
� Traditional Way
SELECT carnameFROM Students, HasCarWHERE Students.login=’mark’AND Students.sid=HasCar.sid;
� The ‘Subway’
SELECT carnameFROM HasCarWHERE sid=
(SELECT sid FROM StudentsWHERE login=’mark’);
CS 5614: Misc. SQL Stuff, Safety in Queries 87
Aggregation
� GivenStudents(sid,name,login,age,gpa)
� Find� the average of the ages of all the students
� Solution
SELECT AVG(age)FROM Students;
� Other Operations� SUM (summation of all the values in a column)� MIN (least value)� MAX (highest value)� COUNT (the number of values), e.g.
SELECT COUNT(*)FROM Students;
� COUNTs the number of Students!
CS 5614: Misc. SQL Stuff, Safety in Queries 88
Ordering
� GivenStudents(sid,name,login,age,gpa)
� List� the students in (ascending) alphabetical order of name
� Solution
SELECT *FROM StudentsORDER BY name;
� and that for DESCending ORDER is
SELECT *FROM StudentsORDER BY name DESC;
� Default is ASC
CS 5614: Misc. SQL Stuff, Safety in Queries 89
Grouping
� GivenStudents(sid,name,login,age,gpa)
� Find� the names of students with gpa=4.0 and� group people with like ages together
� Solution
SELECT nameFROM StudentsWHERE gpa=4.0GROUP BY name;
CS 5614: Misc. SQL Stuff, Safety in Queries 90
More on Grouping
� GivenStudents(sid,name,login,age,gpa)
� Find� the names of students with gpa=4.0 and� group people with like ages together and � show only those groups that have more than 2 students in it
� Solution
SELECT nameFROM StudentsWHERE gpa=4.0GROUP BY nameHAVING COUNT(*) > 2;
CS 5614: Misc. SQL Stuff, Safety in Queries 91
Summary of SQL Syntax
� General Form
SELECT <attribute(s)>FROM <relation(s)>WHERE <condition(s)>GROUP BY <attribute(s)>HAVING <grouping condition(s)>
� Order of Execution� FROM� WHERE� GROUP BY� HAVING� SELECT
CS 5614: Misc. SQL Stuff, Safety in Queries 92
Views
� Can be viewed as temporary relations� do not exist physically BUT� can be queried and modified (sometimes) just like normal relations
� Example:
CREATE VIEW GoodStudents(id,name) ASSELECT sid,nameFROM StudentsWHERE gpa=4.0;
SELECT * FROM GoodStudentsWHERE name=’Mark’;
� Can we update the original relation using the GoodStudents VIEW?
CS 5614: Misc. SQL Stuff, Safety in Queries 93
Beginning of Wierd Stuff
� SQL uses Bag Semantics� meaning: does not normally eliminate duplicates� e.g. the SELECT clause
� BUT (a big BUT)
� this doesn’t apply to� UNION, INTERSECT and DIFFERENCE
� Either way, it provides facilities to do whatever we want
� If you want duplicates eliminated in SELECT clause� use SELECT DISTINCT .....
� If you want to prevent elimination of duplicates in UNION etc.� use (SELECT ...) UNION ALL (SELECT ...)� Likewise for INTERSECT and DIFFERENCE
CS 5614: Misc. SQL Stuff, Safety in Queries 94
... and that’s just the tip of the iceberg
� What happens with the following code?
SELECT R.AFROM R,S,TWHERE R.A = S.A or R.A = T.A
� when R(A) has {2,3}, S(A) has {3,4} and T(A) is {}
� Confusion Reigns!
CS 5614: Misc. SQL Stuff, Safety in Queries 95
Safety in Queries
� Some queries are inherently “unsafe”� should not be permitted in DB access
� Example� Given only the following relation
Students(id)
� Find all those who are not students
� Easy to distinguish unsafe queries via common-sense� Final result is not closed� Is there an automatic way to determine “safety”?
CS 5614: Misc. SQL Stuff, Safety in Queries 96
Answer: Yes!
� Easiest to spot when written in Datalog
Answer(id) <- NOTStudents(id).
� Golden Rule� Any variable that appears anywhere must also
appear in a non-negated body part� In this case, id causes the query to be unsafe
� Example of a Safe Query
Answer(id) <- People(id), NOT Students(id)
� This produces all those people who are NOT students� safe because the People relation provides a reference point� id which appears in a negated body part also appears non-negated
CS 5614: Misc. SQL Stuff, Safety in Queries 97
More Dangers
� Problem not restricted to negated body parts� occurs even with arithmetic body parts (why?)
� Given� only the following relation
Students(id,age)
� Find all those numbers that are greater than the age of some student
Answer(x) <- Student(id,age), x>age.
� Extension to previous rule� Any variable that appears anywhere must also
appear in a non-negated, non-arithmetic body part� In this case, x causes the query to be unsafe
bcoz it doesn’t appear in a non-negated, non-arithmetic part
CS 5614: Misc. SQL Stuff, Safety in Queries 98
One More Example
� Given � a relation Composite(x)� which lists all the composite numbers
� Write a query to find� the prime numbers
� Wrong Way� Prime(x) <- NOT Composite(x).
� Right Way� Prime(x) <- Number(x), NOT Composite(x).
� Safety in Other Notations� Relational Algebra: via the subtraction operator� SQL: via the EXCEPT construct
� Notice how SQL and Relational Algebra do not allow unsafe queries� because there is no way to write such queries with the given constructs� how clever, eh? :-)� It is always amazing how “languages” force you to think in a certain manner� a problem long studied by philosophers
CS 5614: Misc. SQL Stuff, Safety in Queries 99
Recursion in Queries
� Used to specify an indefinite number of “applications” of a relation
� Example� Given only the following relation
Person(name,parent)
� Find all the ancestors of “Mark”
� Easy to find an ancestor at a predefined level� parent: Use Person� grandparent: Join Person with Person� great-grandparent: Join Person with Person with Person� and so on.
� To find an ancestor at no predefined level� Need to join Person with Person an “indefinite” number of times
� SQL3 provides support for recursive definitions
CS 5614: Misc. SQL Stuff, Safety in Queries 100
Solution in Datalog
� First, the base case
Ancestor(x,y) <- Person(x,y).
� Then, the inductive step
Ancestor(x,y) <- Person(x,z), Ancestor(z,y).
� Can also write the previous rule as
Ancestor(x,y) <- Ancestor(x,z), Ancestor(z,y).
� why?
CS 5614: Misc. SQL Stuff, Safety in Queries 101
Recursion in SQL3
� Use the “WITH RECURSIVE ... SELECT” construct
� Example
WITH RECURSIVE Ancestor(name,ans) AS
(SELECT * FROM Person)UNION(SELECT Person.name,Ancestor.ansFROM Person, AncestorWHERE Person.parent=Ancestor.name)
SELECT * FROM Ancestor;
� Use with caution: Some kinds of recursive queries will not be allowed!� example: the following Datalog query might not be allowed in SQL3
Ancestor(x,y) <- Ancestor(x,z), Ancestor(z,y).
� because the rule involves 2 applications of the recursively defined predicate� “Linear recursion” allows only one (as in the SQL code above)
CS 5614: Misc. SQL Stuff, Safety in Queries 102
Final Example
� Be careful when combining negation, aggregation and recursion � perfect recipe for disaster!
� Mutual Recursion� Odd(x) <- Number(x), NOT Even(x).� Even(x) <- Number(x), NOT Odd(x).
� What are the problems?� Notice that the query appears “safe” (per Slide 96)� cycles indefinitely!; no proper base cases
� Illegal in SQL3� not because of mutual recursion� but due to the fact that there is no “unique interpretation” to the query� Eg: 6 could be either in Odd or in Even ; both are acceptable!
� Sometimes mutual recursion is good and fruitful, if written properly� with proper limiting constraints and base cases
CS 5614: Misc. SQL Stuff, Safety in Queries 103
Introduction to Deductive DBMSs
� Intersection of traditional RDBMSs and Logic Programming
� Example Systems� CORAL (Univ. Wisc.)� LDL++ (MCC)� XSB Systems (SUNY, Stony Brook)
� Can be viewed as� extending PROLOG-type systems with secondary storage� extending RDBMSs with deductive functionality
� Mappings: Commonalities between PROLOG and DBMSs� Predicate: Relation� Argument: Attribute� Ground Fact: Tuple� Extensional Definition: Table (defined by data)� Intensional Definition: Table (defined by a view)
CS 5614: Misc. SQL Stuff, Safety in Queries 104
PROLOG vs. RDBMSs
� Characteristics of PROLOG� Tuple-at-a-time� Backward Chaining� Top-Down� Goal-Oriented� Fixed-Evaluation Strategy (Depth-First)
� Characteristics of RDBMSs� Set-at-a-time (recall relational algebra)� Forward Chaining � Bottom-Up� Query Optimizer figures a good evaluation strategy
� Example� ancestor(X,X). parent(amy,bob).� ancestor(X,Y) <- parent(X,Z), ancestor(Z,Y).
� Query� Find the ancestors of bob: ancestor(X,bob)?
CS 5614: Misc. SQL Stuff, Safety in Queries 105
PROLOG Pitfalls
� Previous Example� Linear Recursion� Tail Recursion
� What if we reverse the order of clauses in� ancestor(X,Y) <- parent(X,Z), ancestor(Z,Y).� PROLOG goes into an infinite loop (why?)
� What if we make it� ancestor(X,Y) <- ancestor(X,Z), ancestor(Z,Y).� “Not Linear” Recursion
� Inference = Resolution + Unification� Entailment in First Order Logic is Semi-decidable
CS 5614: Misc. SQL Stuff, Safety in Queries 106
Example of Deductive Query Optimization
� Same-Generation: Hello World of DDBMSs� sg(X,Y) <- flat(X,Y).� sg(X,Y) <- up(X,U),sg(U,V),down(V,Y).
� Magic: A Rewriting Technique� Rewrite query such that advantages of bottom-up evaluation
goal-oriented behavior are combined
� Example: For the query� sg(john,Z)?
� Magic produces� sg(X,Y) <- magic_sg(X),flat(X,Y).� sg(X,Y) <- magic_sg(X),up(X,U),sg(U,V),down(V,Y).� magic_sg(john).
� How do you know when to stop?� Iterative Fixpoint Evaluation (when the answer stops changing)
CS 5614: Misc. SQL Stuff, Safety in Queries 107
SQL in a Programming Environment
� Incorporating SQL in a complete application
� Why?� There are some things we cannot do with SQL alone
� e.g. preserving complex states, looping, branching etc.� Typically embed SQL in a host-language interface
� Problems: Impedance Mismatch� SQL operates on sets of tuples� Languages such as C, C++ operate on an individual basis
� Solution� easy when SELECT returns only one row
� When more than one row is returned� design an iterator to “run” over the results� called a “cursor”
CS 5614: Misc. SQL Stuff, Safety in Queries 108
How are these implemented?
� Vendor-Specific Implementations� ORACLE: PL/SQL (procedural extensions to SQL)
� Open Database Connectivity Standard� Provides a standard API for transparent database access� used when “database independence” is important� used when required to “connect” to diverse data sources
CS 5614: Misc. SQL Stuff, Safety in Queries 109
Tradeoffs
� ODBC� originated by Microsoft in 1991� adds one more abstraction layer� not as fast as a native API (does not exploit “special features”)� least-common denominator approach� constantly evolving
� PL/SQL etc.� “tailored” to the details of the underlying DBMS� might not extend to heterogeneous domains� modeled after a specific programming language (e.g. Ada for Pl/SQL)
CS 5614: Misc. SQL Stuff, Safety in Queries 110
In Between: Stored Procedures
� Used for developing “tightly-coupled” applications� ”push computations” selectively into the database system� avoid performance degradation� work in database address space instead of application address space
� Advantages� No sending SQL statements to and fro� eliminate pre-processing� speedup by an order of magnitude
� Example Applications� Database Adminstration� Integrity Maintenance and Checks� Database Mining
� Disadvantages� Non-standard implementation� Difficult to enforce transactional synchronization� Without traditional SQL optimization, can lead to performance degradation
CS 5614: Misc. SQL Stuff, Safety in Queries 111
Introduction to Query Optimization
� Helps attain declarativeness of RDBMSs� One of the main reasons for commercial success of DBMSs
� A motivating example� Find all students with 4.0 gpa enrolled in CS5614
SELECT nameFROM Students, ClassrollWHERE Students.name = Classroll.studentnameAND Students.gpa = 4.0AND Classroll.coursename = ‘CS5614’
� Two Strategies� Do join and then filter out the ones with gpa <> 4.0 and course <> CS5614� Filter first the ones with gpa <> 4.0 and course <> CS5614 and then Join
� Which is Better?� Always good to “push selections” as far down into the query parse tree
CS 5614: Misc. SQL Stuff, Safety in Queries 112
How does a Query Optimizer work?
� Three Requirements� A Search Space of “Plans”� A Cost Model (for Plan evaluation)� An Enumeration Algorithm
� Ideally� Search Space: contains both good and efficient plans� Cost Models: cheap to compute and accurate� Enumeration Algorithm: efficient (not a monkey-typewriter algorithm)
� Example of a Search Space� See Previous Slide
� Examples of Cost Models� #(tuples) evaluation� #(main memory locations) etc.
� Example of an enumeration algorithm� Sequential enumeration of a lattice of plans� Dynamic Programming vs. Greedy Approaches
CS 5614: Misc. SQL Stuff, Safety in Queries 113
A Simple Measure of “Cost”
� #(Tuples) in a query
� Easiest to compute for� Cartesian Product: #(R X S) = #(R)#(S)� Projection:#(Pi(R)) = #(R)
� A Notation for Other Operations� V(R,A) = Number of distinct values of attribute “A” in R� formulas assume that all values of “A” are equally likely in R� Holds in average case for most distributions (e.g. Zipf)
� Selectivity Factors for Selection Operations� Equality Tests: Use 1/V(R,A)� < or > Tests: Use 1/3� “<>” Test: Use (V(R,A)-1)/V(R,A)� AND Conditions: Multiply Selectivity Factors� OR Conditions: Three Choices
� Sum of results from individual selectivity factors� Max(sum,total size of relation): why?� n(1-(1-m1/n)(1-m2/n)) formula : most accurate
CS 5614: Misc. SQL Stuff, Safety in Queries 114
Estimating the Size of a Join
� Assume: R(X,Y) Join S(Y,Z)
� Range of Values� Minimum: 0� In-between: #(R) (if Y is a foreign key for R and a key for S)� Maximum: #(R)#(S) (if Y’s in R and S are all the same)
� Assumptions for Join Size Estimation� Containment of Value Sets� Preservation of Value Sets
� Containment of Value Sets� If V(R,Y) <= V(S,Y) then the Y’s in R are a subset of the Y’s in S� Satisfied when Y is a foreign key in R and a key in S
� Preservation of Value Sets� #(R Join S,X) = #(R,X)� #(R Join S,Z) = #(S,Z)� why is this reasonable?
CS 5614: Misc. SQL Stuff, Safety in Queries 115
The Actual Estimate
� Assume that V(R,Y) <= V(S,Y)� Every tuple in R has a chance of 1/V(S,Y) of joining with a tuple of S� Every tuple in R has a chance of #(S)/V(S,Y) of joining with S� All tuples in R have a chance of #(R)#(S)/V(S,Y) of joining with S
� What if V(S,Y) <= V(R,Y)� Answer: #(R)#(S)/V(R,Y)� In general: #(R)#(S)/(max (V(S,Y),V(R,Y)))
� What if there are multiple join attributes� Have a “max” factor in the denominator for each such attribute!
� How to Estimate #(R Join S Join T)?� Does it matter which we do first?
� Surprise!� Estimation formula preserves associativity of Joins!� In other words, “it takes care of itself!”
� Thus, for a Join attribute appearing > 2 times� 3 times: Use two highest values� 4 times: Use three highest values etc.
CS 5614: Misc. SQL Stuff, Safety in Queries 116
More on Join Associativity and Commutativity
� Which is better: (R Join S) or (S Join R)� Good to put the smaller relation on the left� Why? Most Join algorithms are assymmetric
� Example:� Construct a “good query tree” for the following
SELECT movietitleFROM Actors,ActedInWHERE Actors.name = ActedIn.actornameAND Actors.age = 23
� Number of Possible Trees of n Attributes� Arises from the shape of the trees: T(n)� Arises from permuting the leaves: n!� Total choices: n! T(n)
CS 5614: Misc. SQL Stuff, Safety in Queries 117
What is T(n)?
� Sample Values� 1: 1� 2: 1� 3: 2� 4: 5� 5: 14
� A formula� T(1) = 1 (Basis)� T(n) = T(1)T(n-1) + T(2)T(n-2) + ..... + T(n-1)T(1)
� Classifications� Left-Deep Trees: All right children are leaves� Right-Deep Trees: All left children are leaves� Bushy Trees: Neither Left-Deep nor Right-Deep
� Choosing a Join Order: Restricted to Left-Deep Trees� By Dynamic Programming: O(n!)� Greedy Approach: Make local selections
CS 5614: Misc. SQL Stuff, Safety in Queries 118
Example
� Consider� R(a,b): #(R) = 1000, V(R,a) = 100, V(R,b) = 200� S(b,c): #(S) = 1000, V(S,b) = 100, V(S,c) = 500� T(c,d): #(T) = 1000, V(T,c) = 20, V(T,d) = 50
� Possible Join Orders� (R Join S) Join T� (S Join R) Join T (same as above; why?)� (R Join T) Join S� (T Join R) Join S (same as above)� (S Join T) Join R� (T Join S) Join R (same as above)
� Cost Estimation = Sizes of Intermediate Relations� (R Join S) Join T: 5000� (R Join T) Join S: 1000000� (S Join T) Join R: 2000
� Best Plan = (S Join T) Join R
CS 5614: Misc. SQL Stuff, Safety in Queries 119
Database Tuning: Why?
� Two Families of Queries� OLTP (Access small number of records)� OLAP (Summarize from a large number of records)
� Sources of Poor Performance� Imprecise Data Searches� Random vs. Sequential Disk Accesses� Short Bursts of Database Interaction� Delays due to Multiple Transactions
� What can be done?� Tune Hardware Architecture� Tune OS� Tune Data Structures and Indices
CS 5614: Misc. SQL Stuff, Safety in Queries 120
Examples of Database Tuning
� To normalize or not to� Sacrificing Redundancy Elimination� Sacrificing Dependency Elimination
� Several Choices of Normalized Schemas� Vertical Partitioning Applications
� Recomputing Indices� Histograms etc. might be outdated
� Restricting Uses of Subqueries� Unnesting query blocks by Joins
� Declining the Use of Indices� Table Scans for Small Tables� Rule-based optimization: Rewrite A=6 as “A+0=6”
� Provide Redundant Tables� Decision-Support/ Data Mining Queries