oracle optimizer. combining output from multiple index scans and-equal: –select * from sailors...
Post on 22-Dec-2015
225 views
TRANSCRIPT
![Page 1: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/1.jpg)
Oracle OptimizerOracle Optimizer
![Page 2: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/2.jpg)
Combining Output From Combining Output From Multiple Index ScansMultiple Index Scans
• AND-EQUAL: – select * from sailors
where sname = 'Jim' and rating = 10
• Suppose we have 2 indexes: sname, rating
TABLE ACCESS BY ROWID
AND-EQUAL
INDEX RANGE SCAN Sailors(sname)
INDEX RANGE SCAN Sailors(rating)
• Suppose we also have an index on (sname, rating)– How should the query be performed?
![Page 3: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/3.jpg)
Operations that Manipulate Data Operations that Manipulate Data SetsSets
• Up until now, all operations returned the rows as they were found
• There are operations that must find all rows before returning a single row
• Try to avoid these operations for online users!– SORT ORDER BY: query with order by
select sname, age
from Sailors
order by age;
![Page 4: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/4.jpg)
Operations that Manipulate Data Operations that Manipulate Data SetsSets
– SORT UNIQUE: sorting records while
eliminating duplicates
e.g., query with distinct; query with minus,
intersect or union
select DISTINCT age from Sailors;
– SORT AGGREGATE, SORT GROUP BY:
queries with aggregate or grouping
functions (like MIN, MAX)
![Page 5: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/5.jpg)
Is the table always Is the table always accessed?accessed?
What if there is no index?
![Page 6: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/6.jpg)
Operations that Manipulate Data Operations that Manipulate Data SetsSets
• Consider the query:
– select sname from sailors
union
select bname from boats;
![Page 7: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/7.jpg)
Operations that Manipulate Data Operations that Manipulate Data SetsSets
• Consider the query:
– select sname from sailors
minus
select bname from boats;
How do you think that
Oracle implements intersect?union all?
![Page 8: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/8.jpg)
• Select age, COUNT(*)
from Sailors
GROUP BY age
SORT GROUP BY
TABLE ACCESS FULL
Operations that Manipulate Data Operations that Manipulate Data SetsSets
![Page 9: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/9.jpg)
DistinctDistinct
• What should Oracle do when
processing the query (assuming that
sid is the primary key):
– select distinct sid
from Sailors
![Page 10: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/10.jpg)
Join MethodsJoin Methods
• Select * from Sailors, Reserves
where Sailors.sid = Reserves.sid
• Oracle can use an index on Sailors.sid
or on Reserves.sid (note that both will
not be used)
• Join Methods: MERGE JOIN, NESTED
LOOPS, HASH JOIN
![Page 11: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/11.jpg)
Nested Loops JoinsNested Loops Joins
• Block nested loop join
NESTED LOOPS
TABLE ACCESS FULL OF our_outer_table
TABLE ACCESS FULL OF our_inner_table
• Index nested loop joinNESTED LOOPS
TABLE ACCESS FULL OF our_outer_table
TABLE ACCESS BY ROWID OF our_inner_table
INDEX RANGE SCAN OF inner_table_index
![Page 12: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/12.jpg)
When Are Nested Loops When Are Nested Loops Joins Used?Joins Used?
• If tables are of unequal size
• If results should be returned
online
![Page 13: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/13.jpg)
Hash JoinHash Join//Partition R into k partitions
foreach tuple r in R do //flush when fills
read r and add it to buffer page h(ri)
foreach tuple s in S do //flush when fills
read s and add it to buffer page h(sj)
for l = 1..k
//Build in-memory hash table for Rl using h2
foreach tuple r in Rl do
read r and insert into hash table with h2
foreach tuple s in Sl do
read s and probe table using h2
output matching pairs <r,s>
![Page 14: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/14.jpg)
Hash Join PlanHash Join Plan
HASH JOINTABLE ACCESS FULL OF table_ATABLE ACCESS FULL OF table_B
![Page 15: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/15.jpg)
When Are Hash Joins When Are Hash Joins Used?Used?
• If tables are small
• If results should be returned online
![Page 16: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/16.jpg)
Sort-Merge Join PlanSort-Merge Join Plan
MERGE JOINSORT JOINTABLE ACCESS FULL OF table_ASORT JOINTABLE ACCESS FULL OF table_B
![Page 17: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/17.jpg)
When Are Sort/Merge Joins When Are Sort/Merge Joins Used?Used?
• Performs badly when tables are
of unequal size. Why?
![Page 18: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/18.jpg)
HintsHints
• You can give the optimizer hints about
how to perform query evaluation
• Hints are written in /*+ */ right after
the select
• Note: These are only hints. The oracle
optimizer can choose to ignore your
hints
![Page 19: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/19.jpg)
ExamplesExamples
Select /*+ FULL (sailors) */ sidFrom sailorsWhere sname=‘Joe’;
Select /*+ INDEX (sailors) */ sidFrom sailorsWhere sname=‘Joe’;
Select /*+ INDEX (sailors s_ind) */ sidFrom sailors S, reserves RWhere S.sid=R.sid AND sname=‘Joe’;
![Page 20: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/20.jpg)
More ExamplesMore Examples
Select /*+ USE_NL (sailors) */ sidFrom sailors S, reserves RWhere S.sid=R.sid AND sname=‘Joe’;
Select /*+ USE_MERGE (sailors, reserves) */ sidFrom sailors S, reserves RWhere S.sid=R.sid AND sname=‘Joe’;
Select /*+ USE_HASH */ sidFrom sailors S, reserves RWhere S.sid=R.sid AND sname=‘Joe’;
inner table
![Page 21: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/21.jpg)
Information Retrieval and DBInformation Retrieval and DB
![Page 22: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/22.jpg)
CONTAINSCONTAINS
• Introduce text search in SQL
• CONTAINS operator
select Name
from article
where CONTAINS(abstract, ‘play’) > 0;
• Can combine OR, AND
![Page 23: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/23.jpg)
StemmingStemming
• Given the “stem” of a word, Oracle will
expand the list of words to search for
to include all words having the same
stem
– Stem of plays, played, playing, playful:
play
– where CONTAINS(abstract, ‘$play’) > 0;
![Page 24: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/24.jpg)
RankingRanking
• We need to rank between the retrieved
tuples according to their relevance
– Open challenge
– Several implementations for oracle
The following slides are based on those of Dr. Sara Cohen
![Page 25: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/25.jpg)
The Vector Space ModelThe Vector Space Model
• The Vector Space Model (VSM) is a way of representing text data through the words that they contain
• It is a standard technique in Information Retrieval
• In the following, we call this text data, document (classical IR)
• The VSM allows decisions to be made about which documents are similar to each other and to keyword queries
![Page 26: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/26.jpg)
How Does it Work?How Does it Work?
• Each document is represented as a vector
which contains a value for each word in the
vocabulary
– this value is 0, if the word does not appear in the
document
• Similarly, a query is represented as a vector
• The rank of the document with respect the the
query is the distance between their vectors
![Page 27: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/27.jpg)
Example: Boolean ValueExample: Boolean Value
• P1 = “I live in a green
house with a green roof”
• P2 = “There is no life
form on Mars”
• P3 = “Men love green
cars”
• P4 = “I saw some little
green men yesterday”
P1 P2 P3 P4a 1 0 0 0cars 0 0 1 0green 1 0 1 1house 1 0 0 0I 0 0 0 1is 1 1 0 0life 0 1 0 0little 1 0 0 1love 0 0 1 0mars 0 1 0 0men 0 0 1 1my 1 0 0 0no 0 1 0 0on 1 1 0 0roof 1 0 0 0saw 0 0 0 1there 1 1 0 0
1 if the word appears, 0 otherwise
![Page 28: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/28.jpg)
Example: Boolean ValueExample: Boolean Value
• P1 = “I live in a green
house with a green roof”
• P2 = “There is no life
form on Mars”
• P3 = “Men love green
cars”
• P4 = “I saw some little
green men yesterday”
P1 P2 P3 P4a 1 0 0 0cars 0 0 1 0green 1 0 1 1house 1 0 0 0I 0 0 0 1is 1 1 0 0life 0 1 0 0little 1 0 0 1love 0 0 1 0mars 0 1 0 0men 0 0 1 1my 1 0 0 0no 0 1 0 0on 1 1 0 0roof 1 0 0 0saw 0 0 0 1there 1 1 0 0
Vector for P1
![Page 29: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/29.jpg)
Example: Boolean ValueExample: Boolean Value
• Q = green OR men OR marsQuery
a 0cars 0green 1house 0I 0is 0life 0little 0love 0mars 1men 1my 0no 0on 0roof 0saw 0there 0
![Page 30: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/30.jpg)
Distance Between VectorsDistance Between Vectors
• For two vectors d and d’ the cosine distance between d and d’ is given by:
• d d’ is the scalar product of d and d’, calculated by multiplying corresponding values together
• |d| is the norm of d
• The “cosine measure” calculates the cosine between the vectors in a high-dimensional virtual space
'
'
dd
dd
![Page 31: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/31.jpg)
Distance Between Distance Between DocumentsDocuments
t1
d2
d1
d3
d4
d5
t3
t2
θ
φ
![Page 32: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/32.jpg)
P3 Querycars 1 0green 1 1love 1 0men 1 1
ExampleExample
• Consider the query Q="green
men" and the document P3 =
"Men love green cars"
• The cosine distance:
– scalar product:
1*0 + 1*1+ 1*0 + 1*1 = 2
– norms:
(12 + 12 + 12 + 12 ) = 2
(02 + 12 + 02 + 12 ) = 2
– Similarity: 2/(2 2) = 1/ 2
Only dimensions that are non-zero in one of
the vectors are shown
![Page 33: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/33.jpg)
Defining Vector Values: TFDefining Vector Values: TF
• Instead of boolean value, put word frequency (called tf, for "term frequency")
• What affect does this give?
• Sometimes a normalized version is used:– term frequency/number of
words in the document
P1 P2 P3 P4a 1 0 0 0cars 0 0 1 0green 2 0 1 1house 1 0 0 0I 0 0 0 1is 1 1 0 0life 0 1 0 0little 1 0 0 1love 0 0 1 0mars 0 1 0 0men 0 0 1 1my 1 0 0 0no 0 1 0 0on 1 1 0 0roof 1 0 0 0saw 0 0 0 1there 1 1 0 0
![Page 34: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/34.jpg)
P1 P2 P3 P4a 0.1 0 0 0cars 0 0 0.25 0green 0.2 0 0.25 0.2house 0.1 0 0 0I 0 0 0 0.2is 0.1 0.1667 0 0life 0 0.1667 0 0little 0.1 0 0 0.2love 0 0 0.25 0mars 0 0.1667 0 0men 0 0 0.25 0.2my 0.1 0 0 0no 0 0.1667 0 0on 0.1 0.1667 0 0roof 0.1 0 0 0saw 0 0 0 0.2there 0.1 0.1667 0 0
Normalized TFNormalized TF
Always: Sum = 1
![Page 35: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/35.jpg)
Another Option:Another Option:Defining Vector Values as IDFDefining Vector Values as IDF
• We can combine TF
with IDF, inverse
document frequency
– 1/(number of
documents
containing the word)
• What is the affect?
P1 P2 P3 P4a 1 0 0 0cars 0 0 1 0green 0.3333 0 0.3333 0.3333house 1 0 0 0I 0 0 0 1is 0.5 0.5 0 0life 0 1 0 0little 0.5 0 0 0.5love 0 0 1 0mars 0 1 0 0men 0 0 0.5 0.5my 1 0 0 0no 0 1 0 0on 0.5 0.5 0 0roof 1 0 0 0saw 0 0 0 1there 0.5 0.5 0 0
![Page 36: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/36.jpg)
Normalized IDFNormalized IDF
• Sometimes a normalized version is used:
• The logarithm gives less influence to IDF when TF and IDF are combined
• What is the value for a word that appears in all documents? Why?
ww n
Nidf log
P1 P2 P3 P4a 0.6021 0 0 0cars 0 0 0.6021 0green 0.1249 0 0.1249 0.1249house 0.6021 0 0 0I 0 0 0 0.6021is 0.301 0.301 0 0life 0 0.6021 0 0little 0.301 0 0 0.301love 0 0 0.6021 0mars 0 0.6021 0 0men 0 0 0.301 0.301my 0.6021 0 0 0no 0 0.6021 0 0on 0.301 0.301 0 0roof 0.6021 0 0 0saw 0 0 0 0.6021there 0.301 0.301 0 0
Number of documents
Number of documents in
which w appears
![Page 37: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/37.jpg)
Standard Measure is TF-IDFStandard Measure is TF-IDF
• Use normalized TF
times normalized IDF
• Note: Once the values
are chosen (using any
of the schemes
considered), we use
cosine distance to
compare the document
and query
P1 P2 P3 P4a 0.06021 0 0 0cars 0 0 0.1505 0green 0.02498 0 0.0312 0.025house 0.06021 0 0 0I 0 0 0 0.1204is 0.0301 0.0502 0 0life 0 0.1003 0 0little 0.0301 0 0 0.0602love 0 0 0.1505 0mars 0 0.1003 0 0men 0 0 0.0753 0.0602my 0.06021 0 0 0no 0 0.1003 0 0on 0.0301 0.0502 0 0roof 0.06021 0 0 0saw 0 0 0 0.1204there 0.0301 0.0502 0 0
![Page 38: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/38.jpg)
XML (Extensible Markup XML (Extensible Markup Language) Language)
andand
the Semi-Structured Data Modelthe Semi-Structured Data Model
![Page 39: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/39.jpg)
MotivationMotivation
• We have seen that relational databases
are very convenient to query. However:
– There is a LOT of data not in relational
databases!!
• Perhaps the most widely accessed
database is the web, and it certainly
isn’t a relational database.
![Page 40: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/40.jpg)
Querying the WebQuerying the Web
• The web can be queried using a search engine, however, we can’t ask questions like:– What is the lowest price for which a Jaguar
is sold on the web?
• Problems:– There are no facilities for asking complex
questions, such as aggregation of data
![Page 41: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/41.jpg)
Understanding the WebUnderstanding the Web
• In order to query the web, we must be
able to understand it.
• 2 Computer Science Approaches:
– Artificial Intelligence Approach
– Database Approach
![Page 42: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/42.jpg)
Database ApproachDatabase Approach
“The web is unstructured and we will structure it”
• Sometimes problems that are very difficult can be solved easily by enforcing a standard
• Encourage the use of XML as a standard for data exchange on the web
![Page 43: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/43.jpg)
<addresses >
<person friend="yes">
<name> Jeff Cohen</name>
<tel> 04-828-1345 </tel>
<tel> 054-470-778 </tel>
<email> [email protected] </email>
</person>
<person friend="no">
<name> Irma Levy</name>
<tel> 03-426-1142 </tel>
<email>[email protected]</email>
</person>
</addresses>
Example XML DocumentExample XML Document
Opening Tag
AttributeElement
Closing Tag
![Page 44: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/44.jpg)
Very Unstructured XMLVery Unstructured XML
<?xml version=“1.0”?>
<DamageReport>
The insured’s <Vehicle Make = “Toyota”>
Corolla </Vehicle> broke through the guard rail and plummeted into the ravine. The cause was determined to be <Cause>faulty brakes </Cause>. Amazingly there were no casualties.
</DamageReport>
![Page 45: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/45.jpg)
XML Vs. HTMLXML Vs. HTML
• XML and HTML are brothers. They are both
special cases of SGML.
• HTML has specific tag and attribute names.
These are associated with a specific meaning
• XML can have any tag and attribute name.
These are not associated with any meaning
• HTML is used to specify visual style
• XML is used to specify meaning
![Page 46: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/46.jpg)
A Different Data ModelA Different Data Model
RelationalSemi-Structured
Abstract
Model
Sets of
tuples
Labeled Directed
Graph
Concrete
Model
TablesXML Documents
Standard
for
Storing
Data
Data Exchange
Separating Content
from Style
![Page 47: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/47.jpg)
Data ExchangeData Exchange
• Problem: Many data sources, each of a different type (different vendor), with a different schema. – How can the data be combined and used
together?
– How can different companies collaborate on their data?
– What format should be used to exchange the data?
![Page 48: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/48.jpg)
Separating Content from Separating Content from StyleStyle
• Web sites develop over time
• Important to separate style from data in order to allow changes to the site structure and appearance
• Using XML, we can store data alone
• CSS separates style from data only in a limited way
• Using XSL, this data can be translated into HTML
• The data can be translated differently as the site develops
![Page 49: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/49.jpg)
Write Once Use Write Once Use EverywhereEverywhere
XML Data
XSL
WML(hand-held
devices)
XSL
HTML(web browser
XSL
TEXT(Excel)
![Page 50: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/50.jpg)
Using XMLUsing XML
• Quering and Searching XML: There are query languages and search engines that query XML and return XML. Examples: Xpath, Xquery /SQL4X, Equix, XSEarch
• Displaying XML: An XML document can have an associated style-sheet which specifies how the document should be translated to HTML. Examples: CSS, XSL
![Page 51: Oracle Optimizer. Combining Output From Multiple Index Scans AND-EQUAL: –select * from sailors where sname = 'Jim' and rating = 10 Suppose we have 2 indexes:](https://reader036.vdocuments.mx/reader036/viewer/2022062715/56649d805503460f94a63f8a/html5/thumbnails/51.jpg)
DTD: Document Type DTD: Document Type DescriptorsDescriptors
• Document Type Descriptors (DTDs)
impose structure on an XML
document
• There is some relationship
between a DTD and a schema