xml – a data sharing standard dsc340 mike pangburn

20
XML – a data sharing standard DSC340 Mike Pangburn

Upload: catherine-golden

Post on 23-Dec-2015

223 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: XML – a data sharing standard DSC340 Mike Pangburn

XML – a data sharing standard

DSC340

Mike Pangburn

Page 2: XML – a data sharing standard DSC340 Mike Pangburn

Data-sharing challenge

Companies need to “talk” across different locations/systems and software

Most database systems (and other software applications, e.g., Excel) can read text files formatted as XML.

Page 3: XML – a data sharing standard DSC340 Mike Pangburn

Blind Men and Elephants

Page 4: XML – a data sharing standard DSC340 Mike Pangburn

The ol’ standby: tab (or comma) delimited data…

Who: authored it? to contact about data?

What: are contents of database?

When: was it collected? processed? finalized? Where: was the study done?

Why: was the data collected?

How: were data collected? processed? Verified?

… can be pretty useless!

Page 5: XML – a data sharing standard DSC340 Mike Pangburn

Metadata

Literally, “data about data” a set of data that describes and gives

information about other data

― Oxford English Dictionary

Page 6: XML – a data sharing standard DSC340 Mike Pangburn

Early Example of Metadata

Page 7: XML – a data sharing standard DSC340 Mike Pangburn

Is HTML a good way to share data?

<h1> Bibliography </h1>

<p> <i> Foundations of Databases </i>

Abiteboul, Hull, Vianu

<br> Addison Wesley, 1995

<p> <i> Data on the Web </i>

Abiteoul, Buneman, Suciu

<br> Morgan Kaufmann, 1999

Page 8: XML – a data sharing standard DSC340 Mike Pangburn

XML: Tags provide meaning (metadata)

<bibliography>

<book> <title> Foundations… </title>

<author> Abiteboul </author>

<author> Hull </author>

<author> Vianu </author>

<publisher> Addison Wesley </publisher>

<year> 1995 </year>

</book>

</bibliography>

Page 9: XML – a data sharing standard DSC340 Mike Pangburn

XML vs. HTML

HTML<h1> DELTA </h1><h2> 101 </h2> <h3> Atlanta </h3><h3> Brussels </h3><p> flight information </p>

XML<airline>DELTA </airline> <number>101 </number><from> Atlanta </from><to> Brussels </to> <description> flight information </description>

Page 10: XML – a data sharing standard DSC340 Mike Pangburn

Another example: iTunes Library is XML<dict>

<key>Track ID</key><integer>617</integer>

<key>Name</key><string>Take Five</string>

<key>Artist</key><string>Dave Brubeck Quartet</string>

<key>Album</key><string>Time Out</string>

<key>Genre</key><string>Jazz</string>

<key>Kind</key><string>AAC audio file</string>

<key>Size</key><integer>5892093</integer>

<key>Total Time</key><integer>363578</integer>

….

</dict>

Page 11: XML – a data sharing standard DSC340 Mike Pangburn

Acct./Fin XML example

XBRL (eXtensible Business Reporting Language) is a language for the electronic communication of business and financial data.

For example, company net profit has its own unique tag.

Edgar Online (SEC company information) uses XBRL The solution will allow users to request XBRL formatted

data for all US equities from within Microsoft Excel, custom templates, and the web.

Page 12: XML – a data sharing standard DSC340 Mike Pangburn

HTML vs. XML

HTML started with very few tags, but… HTML now has many tags, and more keep being added

over time Messy, yet not customizable

XML has very few standard tags You add custom tags that are particular to your data

needs

Page 13: XML – a data sharing standard DSC340 Mike Pangburn

XML helps solves the “Is this an elephant?” problem

Page 14: XML – a data sharing standard DSC340 Mike Pangburn

The following code is legal in HTML:<p>This is a paragraph<p>This is another paragraph

In XML all elements must have a closing tag like this:

<p>This is a paragraph</p><p>This is another paragraph</p>

Opening and closing tags must have the same case:

<message>This is correct</message><Message>This is incorrect</message>

More strict tagging rules than HTML

Page 15: XML – a data sharing standard DSC340 Mike Pangburn

In HTML, improperly nested tags like the following are frowned upon, but will not cause an error:

<b><i>This text is bold and italic</b></i>

In XML all elements must be properly nested within each other like this

<b><i>This text is bold and italic</i></b>

More strict tagging rules than HTML

Page 16: XML – a data sharing standard DSC340 Mike Pangburn

Visualizing XML data: a “Tree”

<data><person id=“o555” >

<name> Mary </name><address>

<street> Maple </street> <no> 345 </no> <city> Seattle </city>

</address></person><person>

<name> John </name><address> Thailand </address><phone> 23456 </phone>

</person></data>

data

Mary

person

person

name address

name address

street no city

Maple 345 Seattle

JohnThai

phone

23456

id

o555

Elementnode

Valuenode

Attributenode

Page 17: XML – a data sharing standard DSC340 Mike Pangburn

Database data vs. XML Data

<persons> <row> <name>John</name> <phone> 3634</phone></row> <row> <name>Sue</name> <phone> 6343</phone> <row> <name>Dick</name> <phone>

6363</phone></row>

</persons>

row row row

name name namephone phone phone

“John” 3634 “Sue” “Dick”6343 6363

XML:persons

Database table:

Page 18: XML – a data sharing standard DSC340 Mike Pangburn

XML data can usually fit a DBMS table…

Consider the case of: missing attributes

An acceptable fit withtable Even though blanks are

deemed a bit undesirable in database tables

<person> <name> John</name> <phone>1234</phone> </person>

<person> <name>Joe</name></person>

no phone !

name phone

John 1234

Joe -

Page 19: XML – a data sharing standard DSC340 Mike Pangburn

…but the fit is not always good

Problematic case: Repeated attributes

A poor fit with table, because database-design rule is that you should not have two columns with the same name:

<person> <name> Mary</name> <phone>2345</phone> <phone>3456</phone></person>

two phones !

name phone phone

Mary 2345 3456

If XML data haddistinct “home” and “cell” columns, the issue goes away

Page 20: XML – a data sharing standard DSC340 Mike Pangburn

Formatting (visualizing) XML data

Formatting (visualizing in a web page) XML data requires a separate file

What is that separate file? It’s called a “style sheet”

There are a couple different standards for style sheets XML-specific standard: XSL Flexible standard: CSS

CSS can be used with both .xml and .html files We will look at CSS after XML