complex data and xml

30
1 Complex Data and XML

Upload: rhett

Post on 12-Jan-2016

37 views

Category:

Documents


0 download

DESCRIPTION

Complex Data and XML. Week 10 Review. Trees and the places where tree structures are used in computing and knowledge representation Nomenclature in trees Tree-based – root, node, branch, leaf Family-based – parent, child, sibling The Document Object Model as a tree DOM access in Javascript - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Complex Data and XML

1

Complex Data and XML

Page 2: Complex Data and XML

2

Week 10 Review

• Trees and the places where tree structures are used in computing and knowledge representation

• Nomenclature in trees– Tree-based – root, node, branch, leaf– Family-based – parent, child, sibling

• The Document Object Model as a tree– DOM access in Javascript

• An example of DOM manipulation – DHTML – accessing and updating the DOM with Javascript– AJAX -– server communication from JavaScript

Page 3: Complex Data and XML

3

Tree and questions

• Look at the tree in the diagram

• Answer the following questions:– Name the root of the tree– Name the leaves of the tree– Using path notation in Windows or Unix

• What is the location of J relative to the root?• What is the location of the root if you are at J?• What is the location of K if you are at J?

Page 4: Complex Data and XML

4

A

L

D K NM

JGB

R C

Page 5: Complex Data and XML

5

The rationale for XML

• Complex data– StudentsOnline– Modules and Courses

• Export and import – ‘self-documenting’ data– Flickr– TransXChange

• XML-based tools– XForms

Page 6: Complex Data and XML

6

Year 1 Oracle example

• Students on year 1 Database module modelled a much bigger data model concerning modules, students, staff and courses.

• Oracle SQL DDL was reverse-engineered to QSEE, then the code regenerated to MySQL, installed and executed.

• Query interface using PHP

Page 7: Complex Data and XML

7

oracle

departmentdeptName:VARCHARphone:VARCHARfaxno:VARCHARlocation:VARCHARmgr_startDate:DATE

staffstaffNo:NUMERICfName:VARCHARlName:VARCHARaddress:VARCHARphone:VARCHARofficeNo:VARCHARsex:VARCHARsalary:NUMERICpost:VARCHARcomputerId:VARCHAR

worksin

mgr

coursecCode:VARCHARtitle:VARCHARduration:VARCHAR

leader

modulemCode:VARCHARtitle:VARCHARstartDate:DATEendDate:DATEcoursework:VARCHARexam:VARCHAR

studentmatricNo:NUMERICfName:VARCHARlName:VARCHARtown:VARCHARstreet:VARCHARpostcode:VARCHARdob:DATEsex:VARCHARloan:NUMERIC

teacheshours:NUMERIC

undertakeperformance:VARCHAR

nextOfKinname:VARCHARphone:VARCHARrelationship:VARCHAR

texttext:VARCHAR

qualificationqualification:VARCHAR

Foreign key

identifies

Weak entity

Page 8: Complex Data and XML

8

Lessons

• Problems with data types and data formats

• ER model as Conceptual Model

• Foreign keys in QSEE

• Many of the entities are ‘weak’

Page 9: Complex Data and XML

9

Data types

• Data types differ between the two databases – QSEE uses a more general set of datatypes and then outputs DDL dependant on the target.– Data typing is a big problem in databases

• Data types too small for the data stored, so data might be lost

• Date formats differ 2006-12-08 in MySQL, 08-dec-2006 in Oracle

Page 10: Complex Data and XML

10

Conceptual ER model• No foreign keys in the ER model.

– foreign keys are the realisation (implementation) of the relationships defined in the ER Model in a RDBMS

• Case tool can then– removes redundancy - there is no point in defining both the

relationship AND the foreign keys [one thing in one place]– it emphasises the conceptual nature of the model, which may be

implemented in other ways– the data type of the primary key and all foreign keys must be the

same - but you may need to change it so better to change it in one place and regenerate the schema

– all the necessary integrity constraints are also generated

Page 11: Complex Data and XML

11

Foreign Keys in QSEE

• The bar across a relationship indicates that the foreign key is to be part of the primary key. – As well as creating a foreign key, the primary key constraint will be

generated appropriately. • A dotted boundary indicates that the whole primary key is made up

of foreign keys• Any entity in which a foreign key is part of the primary key is 'weak' -

it depends for its existence on another entity.• Foreign key naming

– default - fk1_ , fk2_ prefix– My policy is to

• remove the prefix globally• set the prefix on specific relationships where required (manager, leader)

Page 12: Complex Data and XML

12

Foreign keys in RDBMS

• In MySQL, action depends on which file model is being used– ISAM (Index Sequential access Method) does not

enforce foreign key integrity– INNODB does

• Loading data is a problem– Need to load data in the right order to prevent foreign

key errors – roughly parent before child– Not possible if any cycles, so RDBMS allows it to be

turned off for the loading– Special programs for bulk loading from files such as

CSV– easier than using INSERT statements

Page 13: Complex Data and XML

13

Questions about the model

• What foreign keys would be added to the Course table in an RDBMS? To the Teaches table?

• Explain the optionality on the mgr relationship between staff and department? Is it on the right end?

• Which two tables would be impossible to put in order to ensure foreign key constraints are not violated?

Page 14: Complex Data and XML

14

Non-first normal form

• First normal form specifies that all fields must be atomic – single valued

• Look at the qualification table. The bar indicates that it is dependant on staff. If repeated fields were allowed in tables then we could simply add:

qualifications : set Varchar• to the staff table

• We would need to be able to search for individual qualifications

• Tags in the photo album site are a similar problem.• Which other table could be removed if non-first normal

form tables were allowed?• MySQL has sets and operators to add, remove and

search (but these all add complexity)

Page 15: Complex Data and XML

15

Harder cases

• What additional structure would be required to get rid of the nextOfKin table?

• How could we get rid of the ‘teaches’ and ‘undertakes’ tables? How would this data be handled in an Object-oriented model?

• If all the weak entities were removed, how much would this simplify the model?

Page 16: Complex Data and XML

16

• NextOfKin– Requires a sub-record of name, phone and

relationship which can be repeated

• Teaches– Decide which side this data belongs to and

add a set of sub-records

• Which side would you place the data from teaches and undertakes?

Page 17: Complex Data and XML

17

Complex data

• Real data is much more complex than this• ISIS the student record system has over 200

tables, many them like the 5 weak tables in the simpler model – so the same ratio might be assumed to apply

• Consider an actual module specification • Note the number of repeated sub-records?• How many tables would be required to model

this data?

Page 18: Complex Data and XML

18

Communication• Applications which stand alone and cannot interchange

data with other applications are called ‘silos’• An RBMS must export data as files of INSERT

statements or as a comma separated variable file• In both cases, you can’t make any sense of the data

without knowing the Database schema

INSERT INTO teaches VALUES ( 11, 'UFCEMQ-20-1', 5);INSERT INTO teaches VALUES ( 11, 'UFEEHV-20-1', 5);INSERT INTO teaches VALUES ( 13, 'UFQEGC-20-1', 6);INSERT INTO teaches VALUES ( 8, 'UFCEKN-20-1', 10);

1, 'David', 'Johnson', '109 Mount View Road, Kingswood, Bristol BS15 8UB', '0117 9676505', '3A31', 'Male', 32000, 'Professor', 'dpjohnso', 'Applied Sciences'2, 'Carrie', 'Ford', '15 Heron Way, Chipping Sodbury, Bristol BS37 6NT', '01454 854089', '1C29', 'Female', 29000, 'Senior Lecturer', 'caford', 'Bristol Business School'

Page 19: Complex Data and XML

19

'BUWE B80 GF78', 'PgDIT', '3 Years', 13, 'CIS''BUWE B80 M221', 'Commercial Law', '3 Years', 7, 'Law''BUWE B80 B230', 'Pharmaceutical Sciences', '4 Years', 1, 'Applied Sciences''BUWE B80 G500', 'Business Decision Analysis', '4 Years', 9, 'Bristol Business School''BUWE B80 X300', 'Education Studies', '3 Years', 5, 'Education''BUWE B80 G451', 'Multimedia Computing', '4 Years', 10, 'Computing, Engineering and Mathematical Sciences''BUWE B80 L300', 'Sociology', '3 Years', 6, 'Humanities, Languages and Social Sciences'

CREATE TABLE course(cCode VARCHAR(30) NOT NULL,title VARCHAR(50),duration VARCHAR(20),deptName VARCHAR(50) NOT NULL,leader_staffNo NUMERIC(4,0) NOT NULL,

-- Specify the PRIMARY KEY constraint for table "course".-- This indicates which attribute(s) uniquely identify each row of data.CONSTRAINT pk_course PRIMARY KEY (cCode)

) TYPE=INNODB;

Page 20: Complex Data and XML

20

XML

• XML is tagged data following some simple rules on the formation and relationship between tags – ‘well-formed’

• XML ‘documents’ are trees and thus support non-first normal form data

• The tags in the document provide –some- information about the meaning of the data

• The tags and the common tree structure allow general tools to do much of the processing without a separate schema.

Page 21: Complex Data and XML

21

<?xml version="1.0" encoding="UTF-8"?><StaffList> <Staff staffNo="1"> <fName>David</fName> <lName>Johnson</lName> <address>109 Mount View Road, Kingswood, Bristol BS15 8UB</address> <phone>0117 9676505</phone> <officeNo>3A31</officeNo> <sex>Male</sex> <salary>32000</salary> <post>Professor</post> <computerid>dpjohnso</computerid> <department>Applied Sciences</department> <teaching> <teaches module="UGDFNP-20-1" hours="6"/> <teaches module="UGDF58-20-1" hours="7"/> </teaching> <qualification>PhD</qualification> </Staff></StaffList>

Staff as XML Optional processing instruction

Root node

element

Element with children

attribute

value

Page 22: Complex Data and XML

22

Processes required for XML

• Store XML (native XML database)

• Query XML (XQuery)

• Transform XML into other XML, RDF or text (XQuery, XSLT)

• Transform structured text to XML (?)

• Define deeper semantics for an XML file (XSchema, RelaxNG)

• User Interface (XForms)

Page 23: Complex Data and XML

23

Applications of XML

• XHTML – HTML as well-formed XML which conforms to standard

• SVG – Structured Vector Graphics• kml – GoogleEarth overlay files• XForms – UI definition• Flickr API responses• RSS news syndication• ….

Page 24: Complex Data and XML

24

Well-formed XML

• ‘Well-formed’ – absolute property of an XML document – must obey rules for well-formedness

• Check by – Opening in Word 2003– Submitting to an online validator– Check using an XML tool

• XML Spy• Stylus Studio• Oxygen

• Check out the rules in the Wikipedia entry

Page 25: Complex Data and XML

25

Well-formed XML

Every XML document must be well-formed and must therefore adhere to the following rules (among others):

1. Every start-tag must have a matching end tag.

2. Elements may nest but must not overlap. <name>Anna<em>Coffey</em></name> - √ <name><em>Anna</name>Coffey</em> - ×

3. There must be exactly one root element.

4. Attribute values must be quoted.

5. An element must not be quoted.

6. Comments and processing instructions may not appear inside tags.

7. No unescaped < or & signs may occur in the character data of an element.

Page 26: Complex Data and XML

26

Well-formed XML documents

Element names are case sensitive - <NAME>, <name>, <Name> & <NaMe> are four different element types.

No white spaces in element name - <First Name> not allowed; <First_Name> OK.

Element names cannot start with the letters “XML” or “xml” – reserved terms.

Element names must start with a letter or a underscore. Element names cannot start with a number but numbers may be embedded within an element name - <2you> not allowed; <me2you> is OK.

Attribute names are constrained by the above rules for element names.

Entity references are used to substitute specific characters. There are five predefined entities built into XML:

Entity Char Notes

&amp; & Do not use inside processing instructions

&lt; < Use inside attribute values quoted with “.

&gt; > Use after ]] in normal text and inside processing instruction.

&quot; “ Use inside attribute values quoted with “.

&apos; ‘ Use inside attribute values quoted with ‘.

Page 27: Complex Data and XML

27

Spot the mistakes

<?xml version="1.0" encoding="UTF-8"><StaffList> <Staff staffNo="1“ temporary> <fName>Davd</fName> <lName>Johnson</lname> <address>109 Mount View Road, Kingswood, Bristol BS15 8UB</addess> <phone>0117 9676505</phone> <officeNo>3A31</officeNo> <sex=Male/> <salary>32000</SALARY> <POST>Profesor</POST> <computerid>dpjohnso</computerid> <department>Applied Sciences</department> <teaching> <teaches module=UGDFNP-20-1 hours="6"/> <teaches module="UGDF58-20-1" hours="7"> </teaching> <qualification>PhD</qualification></StaffList>

Page 28: Complex Data and XML

28

‘Valid’

• Validity is relative - valid with respect to a specified schema which defines:– The sequence and nesting of elements– The allowable attributes of an element– Whether elements are optional or repeated– The type of data in an element or attribute

• Different languages for expressing these rules– DTD – Document Type Definition– XML Schema (xsd) – itself an XML vocabulary– Relax NG– Schematron – permissive

• Document can be linked to a specific schema– Too inflexible

Page 29: Complex Data and XML

29

Tutorial• Review the exercises from this lecture• Take the module specification and work out how many tables would

be required to hold this data• Take one of the TransXChange files and try to develop a data

model. – Use a browser to view the file, don’t try to print it out. – Use ‘find’ to follow references.

• Later– Read the wikipedia article on XML– Work through the w3.schools tutorial– XML dialects have developed in many fields of knowledge. Investigate

the XML dialects which have been created in one particular area of interest e.g. Multimedia, Family History

Page 30: Complex Data and XML

30

Next Week

• Normalisation Exercise

• Processing Language

• Prize for the best Photo Album as judged by you and us.