1 statistics xml: –altavista: 800,000 pages returned. –amazon.com: 242 books. in comparison:...

Download 1 Statistics XML: –Altavista: 800,000 pages returned. –Amazon.com: 242 books. In comparison: –God: 12,000 books, 7 Million pages –Bible: 32,000 books,

Post on 21-Dec-2015




0 download

Embed Size (px)


  • Slide 1
  • 1 Statistics XML: Altavista: 800,000 pages returned. Amazon.com: 242 books. In comparison: God: 12,000 books, 7 Million pages Bible: 32,000 books, 4.6 Million pages. More comparisons: Alon Levy + XML: 132 pages (770 without Alon) XML-QL: 509 pages. Levy + God: 12,000, (Alon Levy + God: 1, but not me). Levy + Bible: 10,000 (Alon Levy + bible: 3; 1 me).
  • Slide 2
  • 2 What is XML? Emerging format for data exchange on the web and between applications. eXtensible Markup Language:
  • Slide 3
  • 3 Attributes and References XML distinguishes attributes from sub-elements. IDs and IDREFs are used to reference objects.
  • Slide 4
  • 4 Document Type Descriptors Sort of like a schema but not really. Wont stay for very long, either. First in a long series of 3-letter acronyms.
  • Slide 5
  • 5 Origin of XML Comes from SGML (very nasty language). Principle: separate the data from the graphical presentation.
  • Slide 6
  • 6 XML, After the roots A format for sharing data. Applications: EDI: electronic data exchange: Transactions between banks Producers and suppliers sharing product data (auctions) Extranets: building relationships between companies Scientists sharing data about experiments. Sharing data between different components of an application. Format for storing all data in Office 2000. Basis for data sharing and integration.
  • Slide 7
  • 7 Why Do People Like it so much? Its easy to learn. Its human readable. No need for proprietary formats anymore. Its very flexible: Data is self-describing Can add attributes easily Data can be irregular Note: without common DTDs data sharing is not solved!
  • Slide 8
  • 8 Why are we DBers interested? Its data, stupid. Thats us. Proof by Altavista: database+XML -- 40,000 pages. Database issues: How are we going to model XML? (graphs). How are we going to query XML? (XML-QL) How are we going to store XML (in a relational database? object-oriented?) How are we going to process XML efficiently? (uh well..., um..., ah..., get some good grad students!)
  • Slide 9
  • 9 3-Letter Acronyms XML, DTD, W3C DOM (Document Object Model) XML-schemas XQL (very early query language) RDF (resource description framework) Today, in New Jersey, a W3C committee is meeting to discuss standard query language.
  • Slide 10
  • 10 XML Data Model (Graph) Issues: distinguish between attributes and sub-elements? Should we conserve order? Think of the labels as names of binary relations.
  • Slide 11
  • 11 Querying XML Requirements: Query a graph, not a relation. The result should be a graph (representing an XML document), not a relation. No schema. We may not know much about the data, so we need to navigate the XML.
  • Slide 12
  • 12 Query Languages First, there was XQL (from Microsoft). Very quickly realized that it was very limited. Then, a bunch of database researchers looked at XML and invented XML-QL. XML-QL comes from the nicer StruQL language. Many people got excited. Formed a committee.
  • Slide 13
  • 13 Extracting Data by Query Matching data using elements patterns. WHERE Addison-Wesley $t $a IN www.a.b.c/bib.xml CONSTRUCT $a
  • Slide 14
  • 14 Constructing XML Data WHERE Addison-Wesley $t $a IN www.a.b.c/bib.xml CONSTRUCT $a $t
  • Slide 15
  • 15 Grouping with Nested Queries WHERE $t, Addison-Wesley CONTENT_AS $p IN www.a.b.c/bib.xml CONSTRUCT $t WHERE $a IN $p CONSTRUCT $a
  • Slide 16
  • 16 Joining Elements by Value WHERE $f $l ELEMENT_AS $e IN www.a.b.c/bib.xml $f $l IN www.a.b.c/bib.xml, y > 1995 CONSTRUCT $e Find all articles whose writers also published a book after 1995.
  • Slide 17
  • 17 Tag Variables WHERE $f $l ELEMENT_AS $e IN www.a.b.c/bib.xml $f $l IN www.a.b.c/bib.xml, y > 1995 CONSTRUCT $e Find all articles whose writers have done something after 1995.
  • Slide 18
  • 18 Regular Path Expressions WHERE $r Ford IN "www.a.b.c/bib.xml" CONSTRUCT $r Find all parts whose brand is Ford, no matter what level they are in the hierarchy.
  • Slide 19
  • 19 Regular Path Expressions WHERE $r IN "www.a.b.c/parts.xml" CONSTRUCT $r
  • Slide 20
  • 20 XML Data Integration WHERE ELEMENT_AS $n $ssn IN www.a.b.c/data.xml $ssn ELEMENT_AS $I IN www.irs.gov/taxpayers.xml CONSTRUCT $n $I Query can access more than one XML document.
  • Slide 21
  • 21 Query Processing For XML Approach 1: store XML in a relational database. Translate an XML-QL query into a set of SQL queries. Leverage 20 years of research & development. Approach 2: store XML in an object- oriented database system. OO model is closest to XML, but systems do not perform well and are not well accepted. Approach 3: build an entire DBMS tailored to XML. Still in the research phase.