lore light object repository by othman chhoul csc5370 fall 2003

45
LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Post on 21-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

LORELight Object Repository

by

Othman Chhoul

CSC5370 Fall 2003

Page 2: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Outline

IntroductionWhat is Lore?HistoryLore’s ForensicConclusionQuestionsDemo

Page 3: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Introduction

Limitations faced by traditional Databases: force all data to adhere to an explicitly

specified schema Data Elements may change Structures may change along the execution

path of an application Head ache when it comes to decide on a

fixed schema for irregular or unstable data

Page 4: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

SemiStructured Data

Widespread SemiStructured Data: “Self-describing” “Schemaless”

Examples: Data from the web

Overall site structure may change often. It would be nice to be able to query a web site.

Data integrated from multiple, heterogeneous data sources.

Information sources change, or new sources added.

Page 5: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

What is Lore?

Lore is a DBMS designed specifically for managing semistructured information, such as XML

Among the Pioneers in this domain

Page 6: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

History

Built, from scratch, by the DB Group at Stanford University, with research funding from DARPA, NASA and others.

Introduced in 1995, with the first version of the query language called Lorel, and used OEM as data model.

A lightweight system, because it was designed for a single-user, read-only access.

1999 - changed to support XML

Page 7: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Lore’s Forensic

Lore’s Data model

Lore’s Query Language

Lore’s General Architecture

When XML gets into action

Page 8: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

OEM (Object Exchange Model)

Simple, self-describing, nested object model for semi structured data (XML???)

Data in this model can be thought of as a labeled directed graph

Vertices in graph are objects. Each object has a unique object identifier (oid),

such as &5. Atomic objects have no outgoing edges and are

types such as int, real, string, gif, etc. All other objects that have outgoing edges are

called complex objects.

Page 9: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

OEM (Summary)

An OEM object has:

Label: a character string, object aliases OID: Object unique identifier Type: Atomic (int, real, string), Complex Value: If it is a complex object list of OIDs

If it is an atomic object atomic value of type int, real, string…

Page 10: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

OEM (Example)

Page 11: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Lorel (Lore’s Query Language)Lorel is an extension of OQLLorel supports path expressions for

traversing graph dataA simple path expression is a name

followed by a sequence of labels. DBGroup.Member.Office: Set of objects that

can be reached starting with the DBGroup object, following edges labels member and then office.

Page 12: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Lorel

Range variables can be assigned to path expression

Path expression are used directly in queries in an SQL style:

select DBGroup.Member.Office

where DBGroup.Member.Age > 30

Page 13: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Lorel

Result:

Office “Gates252”Office

Building “CIS”

Room “411”

Page 14: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Lorel (Behind the scenes)Previous query rewritten to OQL style:

select Ofrom DBGroup.Member M, M.Office Owhere exists y in M.Age : y > 30

Comparison on age transformed to existential condition: A user can ask DBGroup.Member.Age < 30

regardless of whether Age is single valued, set valued, or unknown.

Page 15: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Lorel (More examples) select DBGroup.Member.Name

where DBGroup.Member.Office(.Room%)?like “%252”

Result: Name “Jones” Name “Smith”

Update: update P.Member +=( select DBGroup.Member where DBGroup.Member.Name = "Clark" ) from DBGroup.Project Pwhere P.Title = "Lore" or P.Title = "Tsimmis"

Page 16: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Lore’s General Architecture

Page 17: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Lore’s General Architecture

Query and Update Processing

External Data

DataGuides

Page 18: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Query and Update Processing

Queries

Data Engine

(A Set of OEM objects)

Page 19: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Query Plan Generator

select Ofrom DBGroup.Member M, M.Office Owhere exists y in M.Age : y > 30

Page 20: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Query Iterators

Use recursive iterator approach:

execution begins at top of query plan each node in the plan requests a tuple at a time

from its children and performs some operation on the tuple(s).

pass result tuples up to parent.

Page 21: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Tuples (Object Assignment)

OA is a data structure containing slots for range variables with additional slots depending on the query.

Each slot within an OA will holds the oid of a vertex on a path being considered by the query engine.

We should end up at the end of a query with complete OAs

Page 22: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Query Operators

The Scan operator returns all oids that are sub-objects of a given object following a specified path expression: Scan (StartingOASlot, Path_expression, TargetOASlot) For each oid in StartingOASlot, check to see if

object satisfies path_expression and place oid into TargetOASlot.

For each returned OA of the left child, the join operator calls exhaustively the right child until no more OA is returned

Page 23: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Query Operators (cont)

The aggregation operator (Aggr) adds to the target slot the result of the aggregation.

The Join, Project and Select are almost identical to their corresponding relational operators

Other operators: CreateSet, GoupBy, ArithOp

Page 24: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Query Operators (Visualize the Words)

Page 25: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Query Operators (Visualize the Words)

Page 26: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Query Optimizer

Does only a few optimizations: Push selection ops down query tree. Eliminate/combine redundant query operators.

Explores query plans that use indexes when possible. Two kinds of indexes: Lindex (link index): returns all parents OIDs of a

given OID via a label, impl. as hashing. Vindex (value index): returns all atomic objects of a

label that satisfies a condition, impl. as B+-trees

Page 27: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Vindexes

Because of non-strict typing system, have String Vindex, Real Vindex, and String-coerced-to-real Vindex.

Separate B-Trees of each type are constructed for each label.

Using Vindex for comparison If type is string, do lookup in String Vindex If can convert to real the do lookup in String-

coerced-to-real Vindex. If type is real or int, do almost the same thin

Page 28: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Vindexes (cont)

Arg2

Arg1

String Real Int

String -- Stringreal

Bothreal

Real Stringreal

-- Int real

int Bothreal Intreal --

Page 29: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Index Query plans

If the user’s query contains a comparison between a path expression and a value + appropriate Vindex and Lindex exist generate an index query plan

Previous query: select Ofrom DBGroup.Member M,

M.Office Owhere exists y in M.Age : y > 30

Page 30: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Index Query plans (cont)

Page 31: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Update Query plans

update P.Member +=( select DBGroup.Member where DBGroup.Member.Name = "Clark" )

from DBGroup.Project Pwhere P.Title = "Lore" or P.Title =

"Tsimmis"

Page 32: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

External Data

Enables retrieval of information from other data sources, transparent to the user.

An external object in Lore is a “placeholder” for the external data and specifies how lore interacts with an external data source.

Page 33: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

External Data During query processing

Scan operator notifies the external data manager whenever an external object is encountered

The spec for an external object includes: Location of a wrapper

program to fetch and convert data to OEM,

timeout interval a set of arguments used

to limit info fetched from external source.

Page 34: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

DataGuides

A DataGuide is a concise and accurate summary of the structure of an OEM database (stored as OEM database itself, kind of like the system catalog).

Very Helpful: No explicit database schema difficult to formulate

meaningful queries Query processor may perform unnecessary work

with no knowledge of the database structure. What if a path expression doesn’t exist (waste).

Each possible path expression is encoded once.

Page 35: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

DataGuides (cont)

DataGuides are dynamically generated and maintained over an existing database

Can store statistics in DataGuide For example, the # of atomic objects of each type reachable by p.

Page 36: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

DataGuides (example)

Page 37: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

When XML gets into Action

Little reminder: Lore first proposal in 1995 XML new standard for data representation and

data exchange over the WWW. Public class XML_data extends

Semi_structured_data Lore among the pioneers to integrate XML in

their DBMS architecture

Page 38: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

From Semistructured Data to XML

Data Model

Query Language

DataGuides

Page 39: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Changes in The Data Model

Similar to an OEM, an XML element in Lore is a pair of < EID , VALUE >

EID: is a unique element identifier VALUE: is either an atomic string text or a complex

value containing: A String value: tag XML tag An ordered list of attribute-name/atomic-value An ordered list of crosslink subelements of the form

<label,EID>, reachable via IDREF or IDREFS An ordered list of subelements of the form <label,EID>

Page 40: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Changes in The Data Model (cont)Comments are ignoredWhen an XML document is mapped into

this new data model, it can be seen as a directed labeled graph

Page 41: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Example

Page 42: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Query Language

Extended path expression to distinguish between subelements and attributes, by using qualifiers: DBGroup.Member.>Name &6, use > to

implicitly specify a subelement DBGroup.Member.@Name “Smith”, use @

to implicitly specify an attribute DBGroup.Member.Name &6 “Smith”, when

no @ or > qualifier is used, both attributes and subelements are matched

Page 43: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

DataGuides

Provide a DTD from which Lore builds the corresponding DataGuide

Otherwise if no DTD is provided, a DataGuide is generated from the XML document

Problems when updating: With a DTD is provided, validity is assured With no DTD, DataGuide is updated as the XML

document is updated

Page 44: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Conclusion

Lore was originally developed for OEM data model since 1995, XML was integrated later in 1999

Lore Provided a clear and robust solution for storing, querying, and updating semistructured data (XML came after)

The Lore project was declared pretty much out of business in 2000 by The Stanford Database Group

Page 45: LORE Light Object Repository by Othman Chhoul CSC5370 Fall 2003

Questions???????