chapter 5 algebraic and logical query languages pp.54 is added pp 61 updated
Post on 21-Dec-2015
245 views
TRANSCRIPT
Chapter 5
Algebraic and Logical Query Languages
pp.54 is addedPp 61 updated
5.1 Relational Operations on Bags
• What is a bag?
Bags
• What is a bag?• Bag is a relation that may( or may not ) have
duplicate tuples.• Example:
A B
1 2
3 4
1 2
1 2
Bags continue
• Since the tuple (1,2) appear three times this is a bag
5.1.1 Why Bags?
Speed• Example:Suppose I want the projection of A and B from
the following relation.A B C
1 2 5
3 4 6
1 2 7
1 2 8
I simply cut attribute C and get the result
A B
1 2
3 4
1 2
1 2
I created a table and copy A and B to it. Simple and fast!
Now suppose I wanted a set with no duplication
I will have to take the first tuple and put it in
A B
1 2
I will then have to read the second tuple and compare it against the first.
A B
1 23 4
• Since they are different I will include this tuple in the result and get
A B
1 2
3 4
Now I will read the third tuple and compare it to the first.
A B
1 2
3 4
1 2
• Since they are the same I will not include this tuple.
• The point is that I had to do a lot of work. Each new tuple has to be compared with all other tuples before I could add it to the set. Hence time consuming.
Another reason to use bags
• Suppose I would like to calculate the averageOf attribute A. Suppose farther that A = revenue
in million of dollars. A B C
1 2 5
3 4 6
1 2 7
1 2 8
Then the average of a set will be 2 and the actual average is 1.5. this is substantial difference.
5.1 Relational Operations on Bags
• 5.1.2 Union intersection and Difference of bags
• RUS that same as regular union only welcomes duplications.
• Suppose we have two grocery bags. One has two boxes of Oreos the second has five boxes or Oreos. I consolidate the two bags into one bag with 2+5=7 Oreo boxes.
Union of bags
Intersection of Bags R ∩ S
• If the tuple t appears n times in R and m times in S then the tuple t appear min(m, n) times in the intersection.
• That is because the intersection it the common element in R and S and the relations has exactly min(m, n) in common.
The difference of R and S
• Each occurrence of t in S will cancel one occurrence of t in R. Then output is the “left over” of t.
Examples of union, intersection and difference on bags
• Let R be the relation (bag) A B
1 2
3 4
1 2
1 2
* Let S be the relation bellow.(bag)
A B
1 2
3 4
3 4
5 6
Then R U S in a bag is simply the two tables written together.
A B
1 2
3 4
1 2
1 2
1 2
3 4
3 4
5 6
Intersection of bags R ∩ SA B
1 2
3 4
The difference of bags R and S, R-S A B
1 2
1 2
5.1 Relational Operations on Bags
5.1.3. Projection of BagsIt has been explained previously (simply cut)
5.1 Relational Operations on Bags
• 5.1.4 Selection on Bags
Selection on bags
• Let R be the bag
A B C
1 2 5
3 4 6
1 2 7
1 2 8
σC>=6(R)
σC>=6(R)
A B C
3 4 6
1 2 7
1 2 8
Since it is a bag we allow duplication
5.1 Relational Operations on Bags
• 5.1.5 Product of Bags
Product on bags R X S
A B
1 2
1 2
Bag R
Bag S
B C
2 3
4 5
4 5
Product on bags
• As we learned earlier each row from R has to be paired with ALL rows in S.
A R.B S.B C
1 2 2 3
1 2 2 3
1 2 4 5
1 2 4 5
1 2 4 5
1 2 4 5
Product on bags
• As we learned earlier each row from R has to be paired with ALL rows in S.
A R.B S.B C
1 2 2 3
1 2 2 3
1 2 4 5
1 2 4 5
1 2 4 5
1 2 4 5
Product on bags continue
• Notice that in the above bag we again used the convention for the attribute name. B appear twice so we call it R.B and S.B.
• Equivalent Relation•
5.1 Relational Operations on Bags
• 5.1.6 Joins of Bags
Joins of bags ∞
• We compare each tuple of one relation with each tuple of the other, decide whether or not this pair of tuples joins successfully, and if so we put the resulting tuple in the answer. When constructing the answer we permit duplication.
Example of Joins in bags ∞A B
1 2
1 2
Relation R
Relation S
A B
2 3
4 5
4 5
Result of R ∞ SA B C
1 2 3
1 2 3
Please notice that unlike the product we do not write B for each relation. We are “naturally” joining the relations. Think of it as the transitive rule.
Theta-join in bags
1. Find the product of the two relations 2. select only these tuples that comply with the
condition.3. Allow duplications It is the symbol ∞C with condition beneath it.
Example theta join A B
1 2
1 2
Relation R
Relation S
A B
2 3
4 5
4 5
The theta-join of R and S with the condition R.B <S.B
A R.B S.B C
1 2 2 3
1 2 2 3
1 2 4 5
1 2 4 5
1 2 4 5
1 2 4 5
2< 4 hence selected
5.1 Relational Operations on Bags
1. Exercise from previous section• (Team 3/7) P52: upload Fig 2.20-21 into your
oracle (submit the source codes: create and insert statements to grader
5.1.7 Exercises for Section 5.1 Ex 5.1.1 (3/7)Ex 5.1.4 Assigned in the ClassList them into the algebraic law Table
5.2 Extended Operators of Relational Algebra
• 5.2.1 Duplicate Elimination
The duplicate-elimination operator δ
• Turns a bag into set. • Eliminate all but one copy of each tuple.Relation R
A B
1 2
3 4
1 2
1 2
Apply the duplication eliminator to R δ(R)
A B
1 2
3 4
( δ is the Greek letter Delta)
5.2 Extended Operators of Relational Algebra
• 5.2.2 Aggregation Operators
Aggregation operators
• Aggregation operators apply to attributes (columns ) of relations. Example of aggregation operators are sums and averages.
Example of aggregation operators
Relation RA B
1 2
3 4
1 2
1 2
1.SUM(B)= 2+4+2+2=102.AVG(A)=(1+3+1+1)/43.MIN(A)=14.MAX(B)=45.COUNT(A)=4 number of elements in A
5.2 Extended Operators of Relational Algebra
• 5.2.3 Grouping
Grouping
• Grouping of tuples according to their value in one or more attributes has the effect of partitioning the tuples of a relation into groups.
Example of grouping Studio name Length
Disney 123
MGM 345
Century fox 678
Century fox 900
MGM 23
Suppose we use the aggregation, sum(length). This aggregation will give us the sum of the whole column.
Example of grouping continue
• but suppose we want to know the total umber of minutes of movies produced by each studio.
• Then we must have sub tables within the table. Each sub table represent a studio.
• We will do that by grouping by studio name.• Now we can apply the aggregation operator
sum( length) to each group.
Example of grouping continue Studio name Length
MGM 23
MGM 345
Century fox 678
Century fox 900
Disney 123
Now the table is grouped by studio name and we can apply the aggregation operator sum(length)
5.2 Extended Operators of Relational Algebra
• 5.2.4 The Grouping Operator
The grouping operator ϒ
• Given the schema StarsIn(title, year, StarName)• We would like to find the starName of each
star who appeared in at least three movies and earliest year in which they appear.
• How can we approach this problem?
Grouping operator continue
• We must first group by StarName. It is very intuitive. We want to partition the table into stars and then we can do all the tests for each star
• In relational algebra we write ϒ StarName
• Bellow is the table grouped by starName
Group by
MOVIETITLE MOVIEYEAR STARNAME
Blood Diamond 2006 Leonardo Dicaprio
The Quick and the Dead 1995 Leonardo Dicaprio
Titanic 1997 Leonardo Dicaprio
The Departed 2006 Leonardo Dicaprio
Body of lies 2008 Leonardo Dicaprio
Inception 2010 Leonardo Dicaprio
Somersault 2004 Samuel Henry
Macbeth 2006 Samuel Henry
Love my Way 2006 Samuel Henry
The Great Raid 2005 Samuel Henry
Terminator Salvation 2009 Samuel Henry
Avatar 2009 Samuel Henry
Perseus 2010 Samuel Henry
Autumn in 2000 Vera A Farmiga
Dust 2001 Vera A Farmiga
Mind the Gap 2004 Vera A Farmiga
• Notice that in the above table there are three groups one for each Star.
• Now for each group we are interested in the first year in which the Star appeared, and we would like to know if he played in 3 or more movies.
• We will use the aggregations min(year) and count(title)>=3
The grouping operator continue
How these aggregations works? • In each group separately we look for the min
year • In each group we look for the number of titles
in this group.• If the number of titles in a group is grate then
3, then this will be sent to the output otherwise this group is eliminated.
Final statement in the grouping operator
• ϒ starName, min(year)->minYear, count(title)->ctTitle(StarsIn)
Group by
Then
Find the minimum of each group
Count the number of title in each group
Final statement in the grouping operator
• starName (ctTitle>3(ϒ starName, min(year)->minYear, count(title)->ctTitle(StarsIn)))
See Fig 5.5 for tree expresion
Final statement in the grouping operator
• starName (ctTitle>3(ϒ starName, min(year)->minYear, count(title)->ctTitle(StarsIn)))
5.2 Extended Operators of Relational Algebra
• 5.2.5 Extending the Projection Operator
Extending the projection operator
• We can include, renaming and arithmetic operators in projection.
Example:π A, B+C-->X
Projection
Of attribute AAnd Add the value in B and C
Rename it to X
Extending the projection operator continue
Relation R
A B C
0 1 2
0 1 2
3 4 5
Relation S
A X
0 3
0 3
3 9
B+C=X
5.2 Extended Operators of Relational Algebra
• 5.2.6 The Sorting Operator τ The sorting operator τ turns a relation into a
list of tuples, sorted according to one or more attributes.
External Merge Sort
• How do you sorting 4000 students using only one class room (can hold only 40 students)
1. Fill in the class room with 40 students, let them line up alphabetically
2. So we have 100 sorted group3. Line up two groups in front of class room4. One of the two “head” students will walk
into class room and sit at the first seat.
External Merge Sort
1. Once the 40 seats are full, let them go out.2. Student continue to walk int until all sets are
occupied.3. Move then out, now we have a group of
sorted students. 4. Continue . . .
5.2 Extended Operators of Relational Algebra
• 5.2.7 Outerjoins
Outer join
• Youtube link • http://www.youtube.com/watch?v
=L5sKDSgPt7M
OuterjoinsA B C
B C D
Simple outer join
• First find all tuple that agree and pair them. Notice that unlike product the tuples that matches do not repeat.
• Next we deal with tuples that do not agree. We call these dangling tuples.
• Add the dangling tuples but what ever is messing add null.
example:
A B C
1 2 3
4 5 6
7 8 9
B C D
2 3 10
2 3 11
6 7 12
If I was doing natural join I would be done here. These are the only matching tuples
A B C
1 2 3
4 5 6
7 8 9
B C D
2 3 10
2 3 11
6 7 12
But what about the dangling tuples. In the outer join we have to account for them too
Outer joinsA B C D
1 2 3 10
1 2 3 11
4 5 6 Null
7 8 9 Null
Null 6 7 12
Left outer join
• The easier way of thinking of it is that we must keep all the tuples from the left relation.
A B C
1 2 3
4 5 6
7 8 9
B C D
2 3 10
2 3 11
6 7 12
Example of left outer join
• Step one do normal join. That is write all the tuples that pair correctly.
A B C D
1 2 3 10
1 2 3 11
Left outer join
• Next look at the left relation and see that second and third tuples were not used. We must use them
A B C D
1 2 3 10
1 2 3 11
4 5 6 NULL
7 8 9 NULL
I have use the left relation fully
A B C
1 2 3
4 5 6
7 8 9
B C D
2 3 10
2 3 11
6 7 12
Example of right outer join
• Step one do natural join. That is write all the tuples that pair correctly. Same exact step as the left outer join. I actually copied and paste it.
A B C D
1 2 3 10
1 2 3 11
Example right outer join continue
• Now look for the tuple that were not used in the right relation. That is the third tuple. Add this tuple to complete the right outer join.
A B C D
1 2 3 10
1 2 3 11
NULL 6 7 8
5.2 Extended Operators of Relational Algebra
• 5.2.8 Exercises for Section 5.21. Show the commutate law for Cartesian
Product by example in pp.252. Exercise 5.2.1 (a), (b)
5.3 A Logic for Relations
• 5.3.1 Predicates and Atoms
Relational Algebra
• Thanks
5.3 A Logic for Relations
• 5.3.2 Arithmetic Atoms
5.3 A Logic for Relations
• 5.3.3 Datalog Rules and Queries
5.3 A Logic for Relations
• 5.3.4 Meaning of Datalog Rules
• 5.3.5 Extensional and Intensional Predicates
5.3 A Logic for Relations
• 5.3.6 Datalog Rules Applied to Bags
5.3 A Logic for Relations
• 5.3.7 Exercises for Section 5.3
5.4 Relational Algebra and Datalog
• 5.4.1 Boolean Operations
5.4 Relational Algebra and Datalog
• 5.4.2 Projection
5.4 Relational Algebra and Datalog
• 5.4.3 Selection
5.4 Relational Algebra and Datalog
• 5.4.4 Product
5.4 Relational Algebra and Datalog
• 5.4.5 Joins
5.4 Relational Algebra and Datalog
• 5.4.6 Simulating Multiple Operations with Datalog
5.4 Relational Algebra and Datalog
• 5.4.7 Comparison Between Datalog and Relational Algebra
5.4 Relational Algebra and Datalog
• 5.4.8 Exercises for Section 5.4
5.5 Summary of Chapter
5 5.6 References for Chapter 5