representing and storing structured data
TRANSCRIPT
![Page 1: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/1.jpg)
Representing and Storing Structured Data
LBSC 690: Jordan Boyd-Graber
October 15, 2012
Adapted from Jimmy Lin’s Slides
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 1 / 75
![Page 2: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/2.jpg)
Goals
Metadata: XML
Databases
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 2 / 75
![Page 3: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/3.jpg)
Outline
1 The joys and sorrows of metadata
2 XML: a framework for data representation
3 New and interesting things
4 Relational Databases
5 Relational Algebra
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 3 / 75
![Page 4: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/4.jpg)
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 4 / 75
![Page 5: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/5.jpg)
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 4 / 75
![Page 6: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/6.jpg)
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 4 / 75
![Page 7: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/7.jpg)
Take-Away Messages
Metadata makes data useful
XML is a way to encode data and metadata
XML allows computers to exchange information in new andinteresting ways
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 5 / 75
![Page 8: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/8.jpg)
What’s going on here? How do I use this?
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 6 / 75
![Page 9: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/9.jpg)
Metadata
Literally “data about data”
“a set of data that describes and gives information about other data” –Oxford English Dictionary
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 7 / 75
![Page 10: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/10.jpg)
Dublin Core
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 8 / 75
![Page 11: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/11.jpg)
What is the Dublin Core?
A metadata standard for describing digital resources
An initiative to create a library card catalog for the Web
Dublin Core fields:
Title Creator SubjectDescription Publisher Contributor
Date Type FormatIdentifier Source LanguageRelation Coverage Rights
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 9 / 75
![Page 12: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/12.jpg)
Encoding Metadata
Language for encoding metadata should be:I Universal - so all can understandI Flexible - to incorporate di↵erent typesI Extensible - flexible to custom typesI Simple - to encourage adoptionI Modular - so that schemes can be mixed, extended
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 10 / 75
![Page 13: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/13.jpg)
How do we encode data for interoperability?
Challenges
January 31, 200131 janvier 20012001-01-3101-31-2000980942400
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 11 / 75
![Page 14: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/14.jpg)
Outline
1 The joys and sorrows of metadata
2 XML: a framework for data representation
3 New and interesting things
4 Relational Databases
5 Relational Algebra
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 12 / 75
![Page 15: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/15.jpg)
What is XML?
XML = eXtensible Markup Language
XML is a standard for exchanging structured dataI Provides standardization at the syntactic levelI Does not provide “meaning” for the tagsI XML is a standard recommended by the W3C
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 13 / 75
![Page 16: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/16.jpg)
Goals of XML
Easy to use
Easy to extend and adapt
Easy to write programs that use XML
Support a wide variety of applications
Should be human legible
Formal and concise
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 14 / 75
![Page 17: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/17.jpg)
Refresher: Elements and Attributes
Attribute
<pe r son age=”28” />
Element
<pe r son><age>28</age></ pe r son>
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 15 / 75
![Page 18: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/18.jpg)
The Basic Rules
XML is case sensitive
All start tags must have end tags
Elements must be properly nested
XML declaration is the first statement
<?xml ve r s i on=” 1 .0 ”?>
Every document must contain a root element
Attribute values must have quotation marks
<i t em i d=”33905”>
Certain characters are reserved for parsing
\& l t ; = ‘ ‘< ’ ’
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 16 / 75
![Page 19: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/19.jpg)
<rdf :RDFxm l n s : r d f=” h t t p : //www.w3 . org /1999/02/22� rd f�syntax�ns#”xmln s : dc=” h t t p : // p u r l . o rg /dc/ e l ement s /1 .1/ ”>
< r d f : D e s c r i p t i o nr d f : a b o u t=” h t t p : //media . example . com/ aud io / gu ide . r a ”>
<d c : c r e a t o r>Rose Bush</ d c : c r e a t o r><d c : t i t l e>A Guide to Growing Roses</ d c : t i t l e><d c : d e s c r i p t i o n>De s c r i b e s p r o c e s s f o r p l a n t i n g and n u r t u r i n g
d i f f e r e n t k i n d s o f r o s e bushes .</ d c : d e s c r i p t i o n><d c : d a t e>2001�01�20</ d c : d a t e>
</ r d f : D e s c r i p t i o n>
</ rdf :RDF>
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 17 / 75
![Page 20: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/20.jpg)
What does XML do?
. . .nothing
Syntax vs. semantics
XML vs HTML
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 18 / 75
![Page 21: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/21.jpg)
What does XML do? . . .nothing
Syntax vs. semantics
XML vs HTML
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 18 / 75
![Page 22: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/22.jpg)
Historic Perspective: Three Core Technologies
HTTP - HyperText Transfer ProtocolI A protocol for transferring data between machines on the Internet
URL - Uniform Resource LocatorI A scheme for referencing the specific location of a resource
HTML - HyperText Markup LanguageI A markup language for encoding information to be read by humans
HTTP and URLs have stood the test of time.But by 1996, HTML was already showing signs of age . . .
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 19 / 75
![Page 23: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/23.jpg)
HTML
Started with very few tags
Language evolved as more tags were added:I FormsI TablesI FontsI FramesI . . .
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 20 / 75
![Page 24: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/24.jpg)
Problems with HTML
I want personalized tagsI HTML can’t be extended
I want to incorporate other types of dataI Mathematics, database entries, literary text, poems, purchase orders
. . .I HTML can’t accommodate other types of data
I want to process pages automatically with softwareI HTML is too messy and inconsistentI Browsers are too forgiving
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 21 / 75
![Page 25: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/25.jpg)
Back to Basics
HTML was defined using SGMLI Standard Generalized Markup LanguageI A meta-language for defining languagesI Complex, sophisticated, powerful . . .
too di�cult to useI Idea: create a simpler version of SGML . . . the birth of XML!
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 22 / 75
![Page 26: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/26.jpg)
Back to Basics
HTML was defined using SGMLI Standard Generalized Markup LanguageI A meta-language for defining languagesI Complex, sophisticated, powerful . . . too di�cult to useI Idea: create a simpler version of SGML . . .
the birth of XML!
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 22 / 75
![Page 27: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/27.jpg)
Back to Basics
HTML was defined using SGMLI Standard Generalized Markup LanguageI A meta-language for defining languagesI Complex, sophisticated, powerful . . . too di�cult to useI Idea: create a simpler version of SGML . . . the birth of XML!
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 22 / 75
![Page 28: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/28.jpg)
XML Languages
XML can be used to define other languages
Many XML languages, optimized for di↵erent rolesI XHTML: HTML by XML rulesI MathML: for mathematicsI EPUB: for creating eBooksI RSS: for news feedsI Civ IV: Create your own gameI SVG: Create graphics
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 23 / 75
![Page 29: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/29.jpg)
XHTML: Cleaning up HTML
<?xml ve r s i on=” 1 .0 ” encod ing=” i s o �8859�1”?><html xmlns=” h t t p : //www.w3 . org /TR/ xhtml1 ” ><head>
< t i t l e> T i t l e o f t e x t XHTML Document </ t i t l e></head><body><d i v c l a s s=”myDiv”>
<h1> Heading o f Page </h1><p>A parag raph t h i s one wi th an
<img s r c=” image . g i f ” a l t=”waste o f t ime ” />image , and a <br /> l i n e b reak . </p>
</ d i v></body></html>
What’s new?New preamble to tell us what’s here, and tags must have explicit ends.
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 24 / 75
![Page 30: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/30.jpg)
MathML
An XML language for defining mathematic formulas
x
2 + 4x + 4 = 0
<mrow><mrow>
<msup><mi>x</mi><mn>2</mn></msup><mo>+</mo><mrow>
<mn>4</mn><mo>&I n v i s i b l e T im e s ;</mo><mi>x</mi>
</mrow><mo>+</mo><mn>4</mn>
</mrow><mo>=</mo><mn>0</mn>
</mrow>
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 25 / 75
![Page 31: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/31.jpg)
EPUB
Format for putting books on mobile readers (except Kindles)
Divide up a book into XHTML files
Create two additional XML filesI opf (open packaging format)
F Metadata (using Dublin Core)F All the files neededF Linear reading order
I ncx (navigation control file for XML)F Hierarchical organization of content (for easy navigation)
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 26 / 75
![Page 32: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/32.jpg)
RSS
RSS = Really Simple Syndication or Rich Site Summary
An XML format for distributing news headlines on the Web
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 27 / 75
![Page 33: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/33.jpg)
And Others . . .
CML: chemical Markup Lang
CellML: biological models
BSML: bioinformatic sequences
MAGE-ML: Microarray Gene Expression
XSTAR: for archaeological research
MARCXML: MARC in XML
AML: astronomy markup language
SportsML: for sharing sports data
List goes on and on and on . . .
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 28 / 75
![Page 34: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/34.jpg)
The XML Family Tree
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 29 / 75
![Page 35: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/35.jpg)
Mixing XML Dialects
XML is designed to support the integration of multiple standards
Allows users to mix elements from di↵erent standardsI Snapping together XML dialects like Lego piecesI Based on the notion of “namespaces”
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 30 / 75
![Page 36: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/36.jpg)
Example
<?xml ve r s i on=” 1 .0 ”?><rdf :RDF
xm l n s : r d f=” h t t p : //www.w3 . org /1999/02/22� rd f�syntax�ns#”xm l n s : r s s=” h t t p : // p u r l . o rg / r s s /1 .0/ ”xm ln s :dc=” h t t p : // p u r l . o rg /dc/ e l ement s /1 .1/ ”>< r s s : c h a n n e l r d f : a b o u t=” h t t p : //www. xml . com/xml/news . r s s ”>
< r s s : t i t l e>XML. com</ r s s : t i t l e>< r s s : l i n k>h t t p : // xml . com/pub</ r s s : l i n k><d c : d e s c r i p t i o n>
XML. com f e a t u r e s a r i c h mix o fi n f o rma t i o n and s e r v i c e s f o r the XML community .
</ d c : d e s c r i p t i o n><d c : s u b j e c t>XML, RDF, metadata , i n f o rma t i o n
s y n d i c a t i o n s e r v i c e s</ d c : s u b j e c t><d c : i d e n t i f i e r>h t t p : //www. xml . com</ d c : i d e n t i f i e r><d c : p u b l i s h e r>O’ R e i l l y & As s o c i a t e s , I n c .</ d c : p u b l i s h e r><d c : r i g h t s >Copy r i gh t 2000 , O ’ R e i l l y &
As s o c i a t e s , I n c .</ d c : r i g h t s></ r s s : c h a n n e l>
</ rdf :RDF>
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 31 / 75
![Page 37: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/37.jpg)
Another Example
<?xml ve r s i on=” 1 .0 ” encod ing=” i s o �8859�1”?><html xmlns=” h t t p : //www.w3 . org /TR/ xhtml1 ” ><head>
< t i t l e> T i t l e o f XHTML Document </ t i t l e></head><body><d i v c l a s s=”myDiv”>
<h1> Heading o f Page </h1><math xmlns=” h t t p : //www.w3 . org /1998/Math/MathML”>
. . . MathML markup . . .</math><p> more html s t u f f goes he r e </p>
<sm i l xmlns=” h t t p : //www.w3 . org /TR/ sm i l 1 ”>. . . SMIL markup . . .</ sm i l>
</ d i v></body></html>
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 32 / 75
![Page 38: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/38.jpg)
Outline
1 The joys and sorrows of metadata
2 XML: a framework for data representation
3 New and interesting things
4 Relational Databases
5 Relational Algebra
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 33 / 75
![Page 39: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/39.jpg)
Interoperability
What does it mean and what’s the role of XML?
XML: universal format for data interchange
Software exchanges data as XML-format messages
Advantages?I Eliminates proprietary data formatsI Promotes interoperabilityI Encourages cooperationI Leverages lots of existing XML processing software
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 34 / 75
![Page 40: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/40.jpg)
Interoperability
What does it mean and what’s the role of XML?
XML: universal format for data interchange
Software exchanges data as XML-format messages
Advantages?I Eliminates proprietary data formatsI Promotes interoperabilityI Encourages cooperationI Leverages lots of existing XML processing software
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 34 / 75
![Page 41: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/41.jpg)
XML Messaging
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 35 / 75
![Page 42: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/42.jpg)
XML Messaging
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 35 / 75
![Page 43: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/43.jpg)
What’s in it for me?
WebappsI Lower overheadI Richer dataI More portability
Mashups
Syntax vs. Semantics
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 36 / 75
![Page 44: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/44.jpg)
Mashups
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 37 / 75
![Page 45: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/45.jpg)
Mashups
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 37 / 75
![Page 46: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/46.jpg)
XML Schema
Defines what a valid XML document should look likeI FieldsI AttributesI Number of entries
Has filename extension “xsd”
There are plenty of XML validators out there
Won’t go into details . . . think of it like a rulebook
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 38 / 75
![Page 47: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/47.jpg)
Extensible Stylesheet Language Transformations
XSLT transforms one XML document into another
Often used to display XML to a userI WebpageI Graphics
Syntax varies, semantics are fixed
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 39 / 75
![Page 48: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/48.jpg)
Business Card: Source Data
<?xml ve r s i on=” 1 .0 ”?>
<ca rd type=” s imp l e ”><name>John Doe</name>< t i t l e>CEO, Widget I n c .</ t i t l e><ema i l>j ohn . doe@widget . com</ ema i l><phone>(202) 456�1414</phone>
</ ca rd>
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 40 / 75
![Page 49: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/49.jpg)
Business Card: XSLT Transformation
< x s l : s t y l e s h e e txm l n s : x s l=” h t t p : //www.w3 . org /1999/XSL/Transform” ve r s i on=” 1 .0 ”xmlns=” h t t p : //www.w3 . org /1999/ xhtml ”>
<x s l : t em p l a t e match=” card ”><html>
<head>< t i t l e>b u s i n e s s ca rd</ t i t l e></head><body>
<x s l : a p p l y �t emp l a t e s s e l e c t=”name”/><x s l : a p p l y �t emp l a t e s s e l e c t=” t i t l e ”/><x s l : a p p l y �t emp l a t e s s e l e c t=” ema i l ”/><x s l : a p p l y �t emp l a t e s s e l e c t=”phone”/>
</body></html>
</ x s l : t em p l a t e>
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 41 / 75
![Page 50: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/50.jpg)
Business Card: XSLT Transformation (cont.)
<x s l : t em p l a t e match=”name”><h1><x s l : v a l u e �o f s e l e c t=” t e x t ( ) ”/></h1>
</ x s l : t em p l a t e>
<x s l : t em p l a t e match=” t i t l e ”><b>T i t l e :</b> <x s l : v a l u e �o f s e l e c t=” t e x t ( ) ”/> <br />
</ x s l : t em p l a t e>
<x s l : t em p l a t e match=” ema i l ”><b>Ema i l :</b> <a h r e f=” ma i l t o : { t e x t ( )} ”><t t>
<x s l : v a l u e �o f s e l e c t=” t e x t ( ) ”/></ t t></a><br />
</ x s l : t em p l a t e>
<x s l : t em p l a t e match=”phone”><b>Phone:</b> <x s l : v a l u e �o f s e l e c t=” t e x t ( ) ”/> <br />
</ x s l : t em p l a t e>
</ x s l : s t y l e s h e e t>
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 42 / 75
![Page 51: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/51.jpg)
Card with Style
<?xml ve r s i on=” 1 .0 ”?><?xml�s t y l e s h e e t type=” t e x t / x s l ” h r e f=” c a r d s t y l e 1 . xml”?>
<ca rd type=” s imp l e ”><name>John Doe</name>< t i t l e>CEO, Widget I n c .</ t i t l e><ema i l>j ohn . doe@widget . com</ ema i l><phone>(202) 456�1414</phone>
</ ca rd>
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 43 / 75
![Page 52: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/52.jpg)
In a browser . . .
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 44 / 75
![Page 53: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/53.jpg)
XML isn’t all there is
S-ExpressionsI Based on logical statementsI Not used outside academia (not in it too much, either)
Protocol Bu↵ersI Blazingly fastI More constrained than XML (have to specify data types, ranges)
JSONI Designed specifically for web applicationsI Lighter weight than XML
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 45 / 75
![Page 54: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/54.jpg)
Take-Away Messages
Metadata makes data useful
XML is a way to encode data and metadata
XML allows computers to exchange information in new andinteresting ways
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 46 / 75
![Page 55: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/55.jpg)
Outline
1 The joys and sorrows of metadata
2 XML: a framework for data representation
3 New and interesting things
4 Relational Databases
5 Relational Algebra
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 47 / 75
![Page 56: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/56.jpg)
Take-Away Messages
Databases are suitable for storing structured information
Databases are important tools to organize, manipulate, and accessstructured information
Databases are integral components of modern Web applications
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 48 / 75
![Page 57: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/57.jpg)
Definitions
Structured Information
What you put in a database (e.g. from XML)
DatabaseWhat you put structured information in.
Database Management System (DBMS)
Software system designed to store, manage, and facilitate access todatabases
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 49 / 75
![Page 58: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/58.jpg)
What’s a database?
An integrated collection of data organized according to some model . . .
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 50 / 75
![Page 59: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/59.jpg)
What’s a relational database?
An integrated collection of data organized according to a relational model. . .
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 50 / 75
![Page 60: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/60.jpg)
Databases (try to) model reality. . .
Entities: things in the world (Example: airlines, tickets, passengers)
Relationships: how di↵erent things are related (Example: the ticketseach passenger bought)
“Business Logic”: rules about the world (Example: fare rules)
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 51 / 75
![Page 61: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/61.jpg)
Components of a Relational Database
Field: an “atomic” unit of data
Record: a collection of related fields
Table: a collection of related recordsI Each record is a row in the tableI Each field is a column in the table
Database: a collection of tables
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 52 / 75
![Page 62: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/62.jpg)
A Simple Example
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 53 / 75
![Page 63: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/63.jpg)
Components of a Relational Database
Why “Relational?”
View of the world in terms of entities and relations between them
Tables represent “relations”
Each row in the table is sometimes called a “tuple”
Each tuple is “about” an entity
Fields can be interpreted as “attributes” or “properties” of the entity
Data is manipulated by “relational algebra”:
Defines things you can do with tuples
Expressed in SQL (Structured Query Language, next week)
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 54 / 75
![Page 64: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/64.jpg)
The Registrar Example
What do we need to know?I Something about the students ?(e.g., first name, last name, email,
department)I Something about the courses ?(e.g., course ID, description, enrolled
students, grades)I Which students are in which courses
How do we capture these things?
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 55 / 75
![Page 65: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/65.jpg)
A first stab . . .
What’s wrong with this?
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 56 / 75
![Page 66: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/66.jpg)
A first stab . . .
What’s wrong with this?
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 56 / 75
![Page 67: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/67.jpg)
Goals of “Normalization”
Save spaceI Save each fact only once
More rapid updatesI Every fact only needs to be updated once
More rapid searchI Finding something once is good enough
Avoid inconsistencyI Changing data once changes it everywhere
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 57 / 75
![Page 68: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/68.jpg)
Updated Organization
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 58 / 75
![Page 69: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/69.jpg)
Updated Organization
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 58 / 75
![Page 70: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/70.jpg)
Keys
“Primary Key” uniquely identifies a recordI e.g., student ID in the student table
“Foreign Key” is primary key in the other tableI It need not be unique in this table
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 59 / 75
![Page 71: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/71.jpg)
Approaches to Normalization
For simple problems:I Start with the entities you’re trying to modelI Group together fields that “belong together”I Add keys where necessary to connect entities in di↵erent tables
For more complicated problems:I Entity-relationship modeling (LBSC 670)
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 60 / 75
![Page 72: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/72.jpg)
Entity Relationship Modeling
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 61 / 75
![Page 73: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/73.jpg)
Entity Relationship Modeling
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 61 / 75
![Page 74: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/74.jpg)
Database Integrity
Registrar database must be internally consistentI All enrolled students must have an entry in the student tableI All courses must have a nameI Grades can’t be negative
What happens:I When a student withdraws from the university?I When a course is taken o↵ the books?
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 62 / 75
![Page 75: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/75.jpg)
Integrity Constraints
Conditions that must be true of the database at any timeI Specified when the database is designedI Checked when the database is modified
RDBMS ensures that integrity constraints are always keptI So that database contents remain faithful to the real worldI Helps avoid data entry errors
Where do integrity constraints come from?
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 63 / 75
![Page 76: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/76.jpg)
Outline
1 The joys and sorrows of metadata
2 XML: a framework for data representation
3 New and interesting things
4 Relational Databases
5 Relational Algebra
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 64 / 75
![Page 77: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/77.jpg)
Relational Operations
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 65 / 75
![Page 78: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/78.jpg)
Relational Operations
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 65 / 75
![Page 79: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/79.jpg)
Relational Operations
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 65 / 75
![Page 80: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/80.jpg)
Relational Operations
Joining tables: JOIN
Choosing columns: SELECT
Based on their labels (field names)
Choosing rows: WHERE
Based on their contents
These can be specified together
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 66 / 75
![Page 81: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/81.jpg)
How is a database more than aspreadsheet?
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 67 / 75
![Page 82: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/82.jpg)
Database in the “Real World”
Typical database applications:I Banking (e.g., saving/checking accounts)I Trading (e.g., stocks)I Traveling (e.g., airline reservations)I Networking (e.g., Facebook)
Characteristics:I Lots of dataI Lots of concurrent operationsI Must be fastI “Mission critical” (well . . . sometimes)
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 68 / 75
![Page 83: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/83.jpg)
Operational Requirements
Must hold a lot of dataI Use lots of computers, each with a small sliceI So which machine has your data?
Must be reliableI Use lots of computers with duplicate copiesI How do you keep copies consistent
Must be fastI Use lots of computersI Share the load
Must support concurrent operationsI This is hardI But often not needed
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 69 / 75
![Page 84: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/84.jpg)
Database Transactions
Transaction = sequence of database actions grouped togetherI e.g., transfer $500 from checking to savings
ACID properties:I
Atomicity: all-or-nothingI
Consistency: each transaction must take the DB between consistentstates
IIsolation: concurrent transactions must appear to run in isolation
IDurability: results of transactions must survive even if systems crash
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 70 / 75
![Page 85: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/85.jpg)
Making Transactions
Idea: keep a log (history) of all actions carried out while executingtransactions
I Before a change is made to the database, the corresponding log entryis forced to a safe location
Recovering from a crash:I E↵ects of partially executed transactions are undoneI E↵ects of committed transactions are redoneI Trickier than it sounds!
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 71 / 75
![Page 86: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/86.jpg)
Discussion Question
RideFinder
Design a database to match drivers with passengers (e.g., for road trips)
Drivers post available seats; they want to know about interestedpassengers
Passengers call up looking for rides: they want to know aboutavailable rides (they don’t get to post “rides wanted” ads)
These things happen in no particular order
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 72 / 75
![Page 87: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/87.jpg)
Discussion Goals
Design the tables you will needI First decide what information you need to keep track ofI Then design tables to capture this information
Design queries (using join, project, and restrict)I What happens when a passenger comes looking for a ride?I What happens when a driver comes to find out who his passengers are?
Role play!
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 73 / 75
![Page 88: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/88.jpg)
Exercise solution: tables
Ride: Ride ID, Driver ID, Origin, Destination, Departure Time,Arrival Time, Available Seats
Passenger: Passenger ID, Name, Address, Phone Number
Driver: Driver ID, Name, Address, Phone Number
Booking: Ride ID, Passenger ID
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 74 / 75
![Page 89: Representing and Storing Structured Data](https://reader036.vdocuments.mx/reader036/viewer/2022070121/62bd2f6385ccf00c5f4eaf86/html5/thumbnails/89.jpg)
Exercise solution: queries
Passenger calls: Can I get a ride?I Join: Ride, DriverI Project: Departure Time, Name, Phone NumberI Restrict: Origin, Destination, Available Seats > 0
Driver calls: Who are my passengers?I Join: Ride, Passenger, BookingI Project: Name, Phone NumberI Restrict: (Driver) Name, Origin, Destination, Departure Time
LBSC 690: Jordan Boyd-Graber () Representing and Storing Structured Data October 15, 2012 75 / 75