native xml database for information systems chris wallace smrg seminar feb 2006

27
Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Post on 22-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Native XML Databasefor Information Systems

Chris WallaceSMRG Seminar

Feb 2006

Page 2: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

2

Exploring the design space

• “design as a conversation with the materials in the situation” (Schon)

• Native XML database (NXD)– Storing, querying and updating XML documents without

mapping into relations– Schema-free– Trees are to NXD what tables are to RDBMS– Tables are trees

• Information Systems– Focus on semi-structured data (mixture of simple data

items, text and complex nested structures)– Searching, derived data, visualisation– Process support– Large problem space variously supported by

spreadsheets, word documents, ad-hoc databases, increasingly web-integrated data.

Page 3: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

3

eXist Native XML Database• Open source Java • European team of developers led by Wolfgang

Meier• Documents (files) are organised in collections

(folders) in a file store– XML Documents stored in an efficient, B+ tree structure

with indexes– Non-XML resources (XQuery, CSS, JPEG ..), etc can be

stored as binary• Deployable in different ways

– Embedded in a Java application– Part of a Cocoon pipeline– As web application in Apache/Tomcat– With embedded Jetty HTTPserver (as on stocks)

• Multiple Interfaces– REST – to Java servlet – SOAP– XML:RPC

Page 4: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

4

NXD case studies

• FOLD– modules, programmes, scheme operations,

staff, organisational structures, events

• Family photos and history– Integration of meta-data on family photos with

family history (births, deaths and marriages)

• ISD3 Assignment – a web-based calculator– e.g. a currency converter

Page 5: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

5

Research Work

• Development of the FOLD (Faculty OnLine Data) - a pilot project for UWE

• Teaching students and staff in XML languages (XML Schema, XSLT, XQuery) and NDX database design

• Links with other eXist projects• SPA2006 Workshop on NDX• XML Prague (eXist)

Page 6: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

6

Research Areas

• Design practice for NDX– ‘Pattern language’ to help map from conceptual

model to multiple XML schemes– Identifier design– Structuring documents by responsibility and

versions

• NDX in organisational use– Social effects of distributed responsibility– Visualisation of complex relationships – Handling integrity problems – accept

inconsistency as a way of life– Management of veracity

Page 7: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

7

The FOLD

• Faculty OnLine Data• Technologies

– eXist– (Java) – not yet– XQuery – XSLT– CSS– PHP – to be eliminated

Page 8: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

8

The FOLD (2)

• Scope – Module and Programme specifications– Modular Schema operations (runs)– Staff– Organisational structure– Events

• Functionality– Highly linked– (Integrating UWE sources)– (Personalized Interface)

Page 9: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

9

FOLD - Modules and Programmes

+ Module

- moduleCode : String

+ Module Specification

- version : Year

- faculty : Faculty

- field : Field

- title : String

- credits : CreditsType

- level : LevelType

- syllabus : RestrictedHTML

- readingStrategy : RestrictedHTML

+ 1..1+ 1..*

+ definition

+ ProgrammeStructure

- version : Year

+ Programme

- programmeCode : String

- ucasCode : String [0..1]

+ 1..1

+ 1..*+ s tructure

+ Stage

+ 1..1

+ 1..* {ordered}

+ OptionGroup

- id : String

- comment : String [0..1]

- minCredits : int

- maxCredits : int

+ 1..1

+ 1..* {ordered}

+ Core

+ 1..1

+ 1..* {ordered}

+ 1..*

+ 1..*

+ core

+ Option

+ 1..1

+ 1..* {ordered}

+ 1..*

+ 1..*

+ optional

+ Module Combination

- comment : String

+ 1..1

+ 0..1+ pre-requis ite

+ 1..1

+ 0..1

+ co-requisite

+ 1..*

+ 1..*

+ e

xpre

ssio

n

This is a boolean expression such as ( m1 and m2 and (m4 or (m5 and m6))

+ Learning Outcome

- assessed in Comp A : boolean

- assessed in Comp B : boolean

- specification : RestrictedHTML

- outcomeType : Learning Outcome

+ 1..1

+ 1..* {ordered}

+ Reading item

+ Book

- authors : String

- title : String

- year : String

- source : String

+ WebSite

- url : URL

- text : String

+ 1..1

+ 1..1

+ 1..1

+ 1..*+ Excluded

The FOLD

Page 10: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

10

Fold Design Issues

• Conceptual Modelling• Conceptual – Logical – Physical mapping• Identifiers• Relationships and links• Versioning• Editing• Views• Responsibilities• Processes

Page 11: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

11

Mapping from Conceptual modelto the Logical and physical layers

• What criteria to use in breaking up the whole model into – Logical

• Entity – a logical compound structure– Physical

• Documents – a physical aggregation of entity instances• Collections – a physical aggregation of documents

• Examples– Module Specification [moduleCode]

• Module Spec is an Entity• Each Module Spec is a Document

– Module Run [moduleCode/year/runNo]• Module Run is an Entity• Set of Module Runs for a Field is a Document

• Issues– Where to develop Schemas?– No logical data in the physical – purely for convenience

Page 12: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

12

Conceptual Modelling

• Conventional normalised data model• Generality issue e.g. Module run

– Roles as Attributes• <ModuleLeader>Stewart Green</ModuleLeader>

– Roles as Entities• <role><title>Module Leader</title><person>Stewart Green</person></role>

– Entities enable meta data, but defeat use of tables for data entry

• Need views

• Attributes v elements – a Conceptual/logical mapping issue– <Module code=“UFIEKG-20-3” level=“3”>…– <Module><ModuleCode>UFIEKG-20-3</ModuleCode>..

Page 13: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

13

Conceptual Modelling Tools

• UML class model closest to suitable conceptual model– Allows multi-valued attributes– Distinguished relationship kinds

• Composition• Bi-directional associations• Uni-directional associations (for multiplicity resolution)

– QSEE/Rose• No identifiers (primary keys) ??• No indication of mapping to attributes or elements• No mapping into Entites• No mapping into Documents and Collections

Page 14: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

14

Identifiers• Principle adopted – use naturally occurring identifiers wherever possible

– Persons : “Ian Beeson”– Rooms : “3P14”

• Plus– Reduces gap between RW domain and system– Names in minutes of meetings, on spreadsheets are readable– )

• Minus– Duplicates

• Duplicates not tolerable in the RW either, resolved through RW negotiation within a RW namespace e.g. the Faculty

• Mergers generate duplicates– Aliases– Not all entities have unique identifiers

• Programmes – ISIS Primary Award and UCAS are candidates but don’t work

• ?– All names need namespace – “Ian Beeson” at CEMS at UWE– Need to replace multiple naming conventions with a single naming scheme (e.g.

initials)– URN’s and semantic web

Page 15: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

15

Alias handling

– Problem handling aliases in staff data• Currently a person can have multiple names

–first is the prime• Better is a separate alias table

– Lookup the base table– If not find, try the alias table

Page 16: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

16

Relationships and Links• Relationships need to be implemented

– One – Many • RDBMS – primary key on the One side becomes foreign key on the

Many side• NXD – choose which side on the basis of complexity and

responsibility– Sequence (modules in a stage)– Complex (pre-requisite expression)

– Many-Many• RDBMS – intersection table • NXD– as for one-many • or either side as appropriate – Groups and subgroups

• Issues– Referential integrity

• RDBMS – ‘eager’ – data not allowed in unless links OK, links maintained through updates– integrity failures transient, repair outside database

• NXD – ‘lazy’– store the data and provide on-demand or on-trigger validation– Integrity failures can be persisted (XLinkit) and repair is inside

database

Page 17: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

17

Versioning

• Based on Yearly cycle– Base Year set in user’s session– Default set in system config

• Two different approaches– Module Run, Coursework Elements..

• Explicit version identifier– ModuleCode/Year/RunNo– Selection is explicit [Year= $year]

– Module Specification, Programme Structure• Implicit version defined by sequence of versions

Page 18: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

18

Implicit Versioning

2002

2005

2007

Versions

Year=2006 Latest version =2005

Latest version =2002Year=2004

Page 19: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

19

Implicit Versioning

let $specPath := "/db/versionTest", $currentYear := "2005", $moduleCode := request:request-parameter("moduleCode",""),

$year := request:request-parameter("year",$currentYear),

(: get the set of possible versions for this module :) $modspecs := collection($specPath)/moduleSpecification [ModuleCode=$moduleCode] [Version <= $year],

(: select the version with the highest version number :) $modspec := $modspecs[Version = max($modspecs/Version)] return $modspec

Page 20: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

20

Editing• Table structured Document editing

– Allows maintenance using familiar Spreadsheet tools (Excel 2003)– Schema is induced by Excel– Accommodations

• Multi-valued fields as concatenated values– XPath Join and tokenise functions– Embedded separator problem (a name with ‘,’ as a legitimate character)– Defeats indexing

• Optional elements increase table width• Formatting choices not maintained (e.g.Freeze-Window)

• Structured Document editing– Allows maintenance with Word without a schema

• With difficulty –not schema awareness– Use InfoPath to create desktop form based on schema

• Need to redo if schema changes• In-situ Updates

– With Xquery-generated forms and update– With XForms

Page 21: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

21

Views

• Views arise from the need for de-normalisation– Coursework Element

• As a simple element– Key : moduleCode/Year/runNo/elementNo– Data: due date

• As a derived complex element– SuggestedHours (computed from Hours table)– Late date (computed from UWE calendar)– Weighings (extracted from relevant specification)– Module Leader (extracted from Module Run)

• Views as transient or materialize• View definition• View Maintenance

Page 22: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

22

Page 23: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

23

declare function fold:courseworkElement($moduleCode, $year, $runNo, $elementNo) { let $mod := fold:moduleSpecification($moduleCode,$year), $run := fold:moduleRun($moduleCode,$year,$runNo), $elementRun := fold:elementRun($moduleCode,$year,$runNo,'B', $elementNo) , $elementSpec := $mod/Assessment/FirstAttempt/Components/ComponentB/Element[position() = $elementNo], $dueDate := $elementRun/DueDate, $returnDate := fold:workingDays($dueDate,20), $componentWeight := $mod/Assessment/Weighting/ComponentWeightB, $weightInComponent := data($elementSpec/Weight), $weightInModule := round($weightInComponent * $componentWeight div 100), $load := fold:load($mod/Level), $hrs := round(data($mod/UWERating) div data($load/Credits) * $weightInModule div 100 * data($load/Hours)) return<CourseworkElement> <ModuleCode>{$moduleCode}</ModuleCode> {$mod/Title} <RunNo>{$runNo}</RunNo> {$run/ModuleLeader} {$run/InternalModerator} {$run/ExternalExaminer} <Component>CW</Component> <ElementNo>{$elementNo}</ElementNo> {$elementSpec/Description} <SuggestedHours>{$hrs}</SuggestedHours> <WeightInComponent>{$weightInComponent}</WeightInComponent> <WeightInModule>{$weightInModule}</WeightInModule> <DueDate>{data($dueDate)}</DueDate> <ReturnDate>{data($returnDate)}</ReturnDate></CourseworkElement>

};

Page 24: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

24

Process support

• Short term – Process support– Form generation– Linkage to process documentation

• Medium term – Process monitoring– Online capture of significant dates

• Coursework hand-in date• Date exam sent to moderator• Date coursework returned to students

– Derived information• Workload prediction based on coursework schedule and

student numbers• Display of latest coursework returned and SMS message to

students

• Long term- Process management – Workflow – Process enactment software

Page 25: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

25

Short-term • Session based logins to personalise the interface and

specify parameters (currentYear) • Form generation as passive documents

– Update through the form an obvious extension• Extend operational data with date-based status

– Date-returned-to students • If set (work has been returned)

– Date used to generate page of coursework recently returned – Date used to monitor conformance to target return date(!)

• Link Forms to textual/graphical process description– Coursework from setting to field board– How to specialise a generic description?

• By level• By module• By field

Page 26: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

26

Responsibilities

• Responsibility allocation– Admin / architect decision– Physical level design for responsibility

• All Module Runs in a Field in one document• Modules and Programme Structures in Field Collections

(within Year)– Group access rights

• For IS Field - ISAdmin– Anne Moggridge– Peter Rawlings– Lilly Cooke– Tracey Davis

• Need for check-in check-out of documents– WebDav (Web Folders)

Page 27: Native XML Database for Information Systems Chris Wallace SMRG Seminar Feb 2006

Chris Wallace, SMRG Seminar, Feb 2006

27

Conclusion

• Slide from prototype to production• Pluses and Minuses of user enthusiasm• Go for ‘low-hanging fruit’• Pay attention to the learning process

– XQuery, XSLT are non-trivial languages because deeply unlike Java/PHP

• Reflection forced by presentations and workshops