management of xml and semistructured data lecture 5: query languages wednesday, 4/1/2001

30
Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Upload: emery-rodgers

Post on 03-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Management of XML and Semistructured Data

Lecture 5: Query Languages

Wednesday, 4/1/2001

Page 2: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Strudel and StruQL

• Strudel = a Website management tool

• Idea: separate the following three tasks– Management of data

• use some database

– Management of the site’s structure • use StruQL

– Management of the site’s presentation• use HTML templates (this was before XML...)

Page 3: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Example: Bibliography Data

{Bib: { paper: { author: “Jones”,

author: “Smith”,

title: “The Comma”,

year: 1994 },

paper: { author: “Jones”,

title: “The Dot”,

year: 1998 },

paper: { author: “Mark”,

.... }

. . .

}

}

{Bib: { paper: { author: “Jones”,

author: “Smith”,

title: “The Comma”,

year: 1994 },

paper: { author: “Jones”,

title: “The Dot”,

year: 1998 },

paper: { author: “Mark”,

.... }

. . .

}

}

Input data: Bib

paper paperpaper

authorauthor title

year

“Jones” “Smith” “The Comma” .....

Page 4: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Simple Website Definition in StruQL

WHERE Root -> “Bib.paper.author” -> A

CREATE Root(), HomePage(A)

LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root()

WHERE Root -> “Bib.paper.author” -> A

CREATE Root(), HomePage(A)

LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root()

Root()

HomePage(“Smith”) HomePage(“Jones”) HomePage(“Mark”)

personperson

person

StruQL query:

Result:

Root(), HomePage(A) = Skolem Functions (more later)

“Smith” “Jones” “Mark”name name name

home

home

home

Page 5: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Complex Website Definition in StruQL

WHERE Root -> “Bib” -> X, X -> “paper” -> P, P -> “author” -> A, P -> “title” -> T, P -> “year” -> Y

CREATE Root(), HomePage(A), YearPage(A,Y), PubPage(P)

LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “yearentry” -> YearPage(A,Y), YearPage(A,Y) -> “publication” -> PubPage(P), PubPage(P) -> “author” -> HomePage(A), PubPage(P) -> “title” -> T

WHERE Root -> “Bib” -> X, X -> “paper” -> P, P -> “author” -> A, P -> “title” -> T, P -> “year” -> Y

CREATE Root(), HomePage(A), YearPage(A,Y), PubPage(P)

LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “yearentry” -> YearPage(A,Y), YearPage(A,Y) -> “publication” -> PubPage(P), PubPage(P) -> “author” -> HomePage(A), PubPage(P) -> “title” -> T

Page 6: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Example: A Complex Web SiteRoot()

YearPage(“Smith”,1994)

YearPage(“Smith”,1996)

YearPage(“Jones”,1994)

YearPage(“Jones”,1998)

YearPage(“Mark”,1996)

yearentry yearentry yearentryyearentry yearentry

publication

publicationPubPage(“The Comma”) PubPage(“The Dot”)

publicationpublication

publication

title title

author

author

author

HomePage(“Smith”) HomePage(“Jones”) HomePage(“Mark”)

personperson

person

“The Comma” “The Dot”

Page 7: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Skolem Functions

• Maier, 1986– in OO systems

• Kifer et al, 1989– F-logic

• Hull and Yoshikawa, 1990– deductive db (ILOG)

• Papakonstantinou et al., 1996– semistructured db (MSL)

Page 8: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Skolem Functions in Logic

Origins: First Order Logic

The Satisfiability problemgiven a formula , does it have a model ?

Page 9: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Skolem Functions in Logic

• Example: does have a model ?

Skolem functions: replace with functions, drop

Fact: has a model iff ’ “has a model”

z)))R(y,z)z.(R(x,y)y.(R(x,x.

y))))f(x,R(y,y))f(x,(R(x,y)R(x,'

Page 10: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Skolem Functions in Databases

Recall Datalog:

Means:

Answer(title, author) :- Paper(author, title, year)Answer(title, author) :- Paper(author, title, year)

year))title,or,Paper(auth title)er(author,year.(Answtitle.author.

Page 11: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Skolem Functions in Databases

Now consider:

I want to “create a new object x”. What meaning ?

Answer(author, x) :- Paper(author, title, year)Answer(author, x) :- Paper(author, title, year)

year))title,or,Paper(auth x)er(author,year.(Answx.title.author.

year))title,or,Paper(auth x)er(author,year.(Answx.title.author.

year))title,or,Paper(auth x)er(author,year.(Answtitle.x.author.

Page 12: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Skolem Functions in Databases

Better: use Skolem functions directly in Datalog

Choices:Answer(author, NewObj(author)) :- Paper(author, title, year)Answer(author, NewObj(author)) :- Paper(author, title, year)

Answer(author, NewObj(author,title)) :- Paper(author, title, year)Answer(author, NewObj(author,title)) :- Paper(author, title, year)

Answer(author, NewObj(title,year)) :- Paper(author, title, year)Answer(author, NewObj(title,year)) :- Paper(author, title, year)

Answer(author, NewObj()) :- Paper(author, title, year)Answer(author, NewObj()) :- Paper(author, title, year)

Page 13: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Skolem Functions in StruQL

StruQL’s semantics:

• Input graph: (Node, Edge)

• Output graph:(Node’, Edge’)

Example:

WHERE Root -> “Bib.paper.author” -> A

CREATE Root(), HomePage(A)

LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root()

WHERE Root -> “Bib.paper.author” -> A

CREATE Root(), HomePage(A)

LINK Root() -> “person” -> HomePage(A), HomePage(A) -> “name” -> A HomePage(A) -> “home” -> Root()

Node’(Root()) :-Node’(HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)Edge’(Root,person,HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),person, A) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),home,Root()) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)

Node’(Root()) :-Node’(HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)Edge’(Root,person,HomePage(A)) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),person, A) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A) Edge’(HomePage(A),home,Root()) :- Edge(Root,Bib,X), Edge(X,paper,Y),Edge(Y,author,A)

Page 14: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

XPath• http://www.w3.org/TR/xpath (11/99)

• Building block for other W3C standards:– XSL Transformations (XSLT) – XML Link (XLink)– XML Pointer (XPointer)– XML Query

• Was originally part of XSL

Page 15: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Example for XPath Queries<bib>

<book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

<bib><book> <publisher> Addison-Wesley </publisher> <author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <title> Foundations of Databases </title> <year> 1995 </year></book><book price=“55”> <publisher> Freeman </publisher> <author> Jeffrey D. Ullman </author> <title> Principles of Database and Knowledge Base Systems </title> <year> 1998 </year></book>

</bib>

Page 16: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Data Model for XPath

bib

book book

publisher author . . . .

Addison-Wesley Serge Abiteboul

The root

The root element

Much like the Xquery data model

Page 17: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

XPath: Simple Expressions

/bib/book/year

Result: <year> 1995 </year>

<year> 1998 </year>

/bib/paper/year

Result: empty (there were no papers)

Page 18: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

XPath: Restricted Kleene Closure

//author

Result:<author> Serge Abiteboul </author> <author> <first-name> Rick </first-name> <last-name> Hull </last-name> </author> <author> Victor Vianu </author> <author> Jeffrey D. Ullman </author>

/bib//first-nameResult: <first-name> Rick </first-name>

Page 19: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Xpath: Text Nodes

/bib/book/author/text()

Result: Serge Abiteboul

Jeffrey D. Ullman

Rick Hull doesn’t appear because he has firstname, lastname

Functions in XPath:– text() = matches the text value– node() = matches any node (= * or @* or text())– name() = returns the name of the current tag

Page 20: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Xpath: Wildcard

//author/*

Result: <first-name> Rick </first-name>

<last-name> Hull </last-name>

* Matches any element

Page 21: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Xpath: Attribute Nodes

/bib/book/@price

Result: “55”

@price means that price is has to be an attribute

Page 22: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Xpath: Qualifiers

/bib/book/author[firstname]

Result: <author> <first-name> Rick </first-name>

<last-name> Hull </last-name>

</author>

Page 23: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Xpath: More Qualifiers

/bib/book/author[firstname][address[//zip][city]]/lastname

Result: <lastname> … </lastname>

<lastname> … </lastname>

Page 24: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Xpath: More Qualifiers

/bib/book[@price < “60”]

/bib/book[author/@age < “25”]

/bib/book[author/text()]

Page 25: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Xpath: Summarybib matches a bib element

* matches any element

/ matches the root element

/bib matches a bib element under root

bib/paper matches a paper in bib

bib//paper matches a paper in bib, at any depth

//paper matches a paper at any depth

paper|book matches a paper or a book

@price matches a price attribute

bib/book/@price matches price attribute in book, in bib

bib/book/[@price<“55”]/author/lastname matches…

Page 26: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Xpath: More Details

• An Xpath expression, p, establishes a relation between:– A context node, and– A node in the answer set

• In other words, p denotes a function:– S[p] : Nodes -> {Nodes}

• Examples:– author/firstname– . = self– .. = parent– part/*/*/subpart/../name = part/*/*[subpart]/name

Page 27: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

The Root and the Root

• <bib> <paper> 1 </paper> <paper> 2 </paper> </bib>

• bib is the “document element”

• The “root” is above bib

• /bib = returns the document element

• / = returns the root

• Why ? Because we may have comments before and after <bib>; they become siblings of <bib>

• This is advanced xmlogy

Page 28: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Xpath: More Details

• We can navigate along 13 axes:ancestorancestor-or-selfattributechilddescendantdescendant-or-selffollowingfollowing-siblingnamespaceparentprecedingpreceding-siblingself

Page 29: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Xpath: More Details

• Examples:– child::author/child:lastname = author/lastname

– child::author/descendant::zip = author//zip

– child::author/parent::* = author/..

– child::author/attribute::age = author/@age

• What does this mean ?– paper/publisher/parent::*/author

– /bib//address[ancestor::book]

– /bib//author/ancestor::*//zip

Page 30: Management of XML and Semistructured Data Lecture 5: Query Languages Wednesday, 4/1/2001

Xpath: Even More Details

• name() = the name of the current node– /bib//*[name()=book] same as /bib//book

• What does this mean ? /bib//*[ancestor::*[name()!=book]]

– In a different notation bib.[^book]*._

• Navigation axis give us strictly more power !