first

A

Project Report

On

Conceptual Design of Data warehouse from XML using CCMsubmitted in partial fulfillment of the requirement

for the award of the

Degree of

Bachelor of Engineeringin

Computer Science

(University of Rajasthan, Jaipur)

Under the supervision of Submitted by

Mr. Jarnail Singh Ankita Agarwal(CS) Deepika Raipuria(CS)

June 2006

Mody Institute of Technology and Science

( a Deemed University Under Section 3 of the UGC Act 1956)Formerly Known as Mody College of Engineering and Technology

Lakshmangarh-332311 (Dist.-Sikar)

M o d y I n s t i t u t e o f T e c h n o l o g y a n d S c i e n c e

CERTIFICATE

This is to certify that Ms.Ankita Agarwal student of B.E.VIII (CS) semester has submitted their project entitled “Conceptual Design of Data warehouse from XML using CCM” under my guidance.

Guide

Mr. Jarnail Singh


Project Approval

The Project entitled “Conceptual Design of Data warehouse from XML using CCM” by Ms. Ankita Agarwal, Ms. Deepika Raipuria, is approved in partial fulfillment of the requirement of the Degree of Bachelor of Engineering (CS) of the University of Rajasthan, Jaipur.

HOD Examiner Examiner(Internal) (External)

Dr. S.N Puri(Dean, FET)


ABSTRACT

XML is a meta mark up language that provides a format for describing

semi-structured data. Today, a large amount of data in XML format is used

in decision making. Traditionally, a data warehouse is designed to support

decision making. Thus, it becomes necessary to include XML data while

designing a data warehouse.

We are concerned with developing a semi-automated method for designing a

multidimensional model from XML Schema. While specifying composition,

a choice among sub elements can be specified in a XML schema. Our

attempt is to carry forward this information to the data warehouse design. In

order to achieve this, we first create a canonic conceptual model (CCM)

schema. Subsequently the CCM schema is converted into an attribute tree

and finally a multidimensional model is arrived at. The concept of CCM

made our approach easy to be understood by the user. So, in this approach

we have introduced user intervention in the data warehouse design which is

one of the major concern in the conceptual phase of data warehouse.

In this project we show how a Star Schema can be built from a XML

schema. This is a new approach developed by us. Earlier, attempts have

been made to arrive at an attribute tree using XML schema but no attempt

has been made to arrive at the attribute tree using CCM incorporating

various object oriented features.


We have successfully tested our approach on two w3c standard XML

schema taken from the web:

1. Purchase Order.

2. Road Event.


ACKNOWLEDGEMENT

We would like to express our sincere and profound gratitude to our guide

Mr. Jarnail Singh for their invaluable guidance, continuous encouragement

and wholehearted support in every stage of this project. It is our privilege

and honour to have worked under their supervision. It is indeed difficult to

put their contribution in a few words.

We owe our greatest thanks to Mr. Arun Kumar for his esteemed guidance.

His teaching, judgment, keen personal interest and enthusiasm have enabled

us to successfully complete the entire work. We sincerely thank him for

providing guidance to solve our problems at various stages. He was always a

source of strength, inspiration and motivation. Without his keen interest and

constant supervision this work would have been very difficult. He gave us

our motive to start the project and suggested us such a nice project. He

contribution in our initial stages of project is invaluable.

We are thankful to all other people, who are directly or indirectly involved

in this project.

ANKITA AGARWAL

DEEPIKA RAIPURIA


TABLE OF CONTENTS

1.1 Overview of project

2.1 XML

2.2 XML Schema

3.1 CCM

3.2 Attribute Tree

3.3 Resolving Relationships

4.1 Abstract

4.2 Dimension Modeling

5.1 Converting XML Schema to CCM


Chapter 1: Overview 1

Chapter 4: Star Schema 22

Chapter 2: Extensible Markup Language (XML) 4 11

Chapter 3:Canonical Conceptual Model(CCM) & Attribute Tree 16 6 11

Chapter 5: Design Approach 30

5.2 Converting CCM to Attribute Tree

5.3 Converting Attribute Tree to Star Schema

6.1 Strategy

6.2 PL/SQL

6.3 Oracle XML Components

6.4 XSU PL/SQL API

6.5 DBMS XML Query Package

6.6 DBMS XML Save Package

7.1 Case Study 1:PurchaseOrder

7.2 Case Study 2:RoadEvent


Chapter 6: Physical implementation 35 20

Conclusion 75

Bibliography 76

Chapter 7: Case Studies 60

Appendix 62

CHAPTER # 1

OVERVIEW

1.1 Overview

Today companies need strategic information to counter fiercer competition, extend market share and improve profitability. So they need an information delivery system that is subject oriented, integrated, nonvolatile and time variant. Data warehouse is the viable solution. It is an integrated repository of data generated from many sources and used by the entire enterprise.

A star schema is a data model used for Data warehousing systems. This structure consists of a number of normalized tables known as dimensions surrounding a large centralized table called the ‘FACT table’. This model schematically represents a star and is therefore called the star schema . There are several advantages of using STAR schema. It is easy for users to understand, and is most suitable for query processing and enables specific performance schemes.

The web constitutes the largest body of information accessible to a person and keeps growing at a very fast rate. XML is increasingly used for data exchange on the web. XML is a meta-markup language that provides a format for describing semi structured data. XML is valuable to the Internet as well as to largest corporate Intranet environment because it provides interoperability using a flexible, open, standard based format with new ways of accessing databases and delivering data to web clients. In XML applications can be built more quickly, are easier to maintain and can easily provide multiple views on the semi-structured data.

Attempts have been made to build data warehouse from XML data. For example in a semi-automated methodology for designing web warehouse from XML sources modeled by the XML schemas have been proposed. In this methodology, design is carried out by first creating a schema graph and then navigating its arcs in order to derive a correct


multidimensional representation. This approach is implemented in a prototype that reads an XML schema and produces as output the logical schema of the Data warehouse. However, all these approaches do not consider the choice of sub elements that can be specified in a XML schema.

In this project we show the manner in which starting from an XML source it is possible to arrive at a multidimensional model. Major issue which is of concern is the specification of choice among sub elements that are specified in a XML schema. The choice is clearly defined to mean that one of the sub elements will appear in a XML document corresponding to this schema. But if we look at a problem domain, we find that a richer set of semantic relationships exist between elements and sub elements. The relationships can be a disjoint association among sub elements or it could be an inheritance relationship between the element and the sub element. For example, if we look at the types of clients that exist for a purchase order, then the clients which represent companies are disjoint with those that represent individuals. This association tends to get represented as a choice element in an XML schema. The complex Type typeOfClient of Figure 1 (a) represents the disjoint association as a choice. As an example of inheritance relationship, consider the address to which orders are shipped to.

They can be shipped to an address in the same country or to an address in a different country. However, both the kinds of addresses are in a is-a relationship with an element which represent the common attributes of the address. Even an inheritance relationship gets represented as a choice among sub elements in a XML schema. The complexType countryToShip of Figure 1 (a) represents the inheritance relationship as a choice. In our approach, the different relationships expressed by a choice construct is carried forward to be able to ultimately represent it in a multidimensional model.

Towards this, we first translate the XML schema to a schema of the Canonical Conceptual Model(CCM). We have chosen CCM as CCM permits specification of these relationships. A fact is chosen among the non lexical concepts of the CCM. An attribute tree is built around the fact which is subsequently used to arrive at the multidimensional model .In this paper we show the manner in which we can arrive at an attribute tree from the XML schema.


figure 1(a)


CHAPTER # 2

EXtensible Markup Language (XML)

2.1 Extensible Markup Language (XML)

Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a widevariety of data on the Web and elsewhere.

Some facts about XML

XML stands for EXtensible Markup Language XML is a markup language much like HTML XML was designed to describe data XML tags are not predefined. You must define your own tags XML uses a Document Type Definition (DTD) or an XML Schema to describe

the data XML with a DTD or XML Schema is designed to be self-descriptive XML is a W3C Recommendation

XML is a framework for defining markup languages:

1. There is no fixed collection of markup tags - we may define our own tags, tailored for our kind of information 2. Each XML language is targeted at its own application domain, but the languages will share many features 3. There is a common set of generic tools for processing documents

XML is designed to:

1. Separate syntax from semantics to provide a common framework for structuring information (browser rendering semantics is completely defined by style sheets); 2. Allow tailor-made markup for any imaginable application domain


3. Support internationalization (Unicode) and platform independence 4. Be the future of structured information, including databases

XML and HTML were designed with different goals:

XML was designed to describe data and to focus on what data is.HTML was designed to display data and to focus on how data looks.HTML is about displaying information, while XML is about describing information.

The main difference between XML and HTML

XML was designed to carry data.XML is not a replacement for HTML. In future Web development it is most likely that XML will be used to describe the data, while HTML will be used to format and display the same data.

My best description of XML is this: XML is a cross-platform, software and hardware independent tool for transmitting information.

Important Thing about XML.

XML does not DO anythingXML was not designed to DO anything.Maybe it is a little hard to understand, but XML does not DO anything.XML was created to structure, store and to send information.

The following example is a note to Tove from Jani, stored as XML:<note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note> The note has a header and a message body. It also has sender and receiver information. But still, this XML document does not DO anything. It is just pure information wrapped in XML tags. Someone must write a piece of software to send, receive or display it.Some features of XML

XML is free and extensibleXML tags are not predefined. You must "invent" your own tags.

The tags used to mark up HTML documents and the structure of HTML documents are predefined. The author of HTML documents can only use tags that are defined in the HTML standard (like , <h1>, etc.).

XML allows the author to define his own tags and his own document structure.The tags in the example above (like <to> and <from>) are not defined in any XML standard. These tags are "invented" by the author of the XML document.

XML in future Web developmentXML is going to be everywhere.

It is strongly believe that XML will be as important to the future of the Web as HTML has been to the foundation of the Web and that XML will be the most common tool for all data manipulation and data transmission.

Must Do Things In XML :

The syntax rules of XML are very simple and very strict. The rules are very easy to learn, and very easy to use.Because of this, creating software that can read and manipulate XML is very easy.An example XML documentXML documents use a self-describing and simple syntax.<?xml version="1.0" encoding="ISO-8859-1"?><note><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note>

The first line in the document - the XML declaration - defines the XML version and the character encoding used in the document. In this case the document conforms to the 1.0 specification of XML and uses the ISO-8859-1 (Latin-1/West European) character set.

The next line describes the root element of the document (like it was saying: "this document is a note"):

<note> The next 4 lines describe 4 child elements of the root (to, from, heading, and body):

<to>Tove</to><from>Jani</from>

<heading>Reminder</heading><body>Don't forget me this weekend!</body> And finally the last line defines the end of the root element:

</note> Can you detect from this example that the XML document contains a Note to Tove from Jani? Don't you agree that XML is pretty self-descriptive?

All XML elements must have a closing tagWith XML, it is illegal to omit the closing tag.

In HTML some elements do not have to have a closing tag. The following code is legal in HTML:

This is a paragraphThis is another paragraph

In XML all elements must have a closing tag, like this:

This is a paragraphThis is another paragraph Note: You might have noticed from the previous example that the XML declaration did not have a closing tag. This is not an error. The declaration is not a part of the XML document itself. It is not an XML element, and it should not have a closing tag.

XML tags are case sensitiveUnlike HTML, XML tags are case sensitive.

With XML, the tag <Letter> is different from the tag <letter>.

Opening and closing tags must therefore be written with the same case:

<Message>This is incorrect</message>

<message>This is correct</message> All XML elements must be properly nestedImproper nesting of tags makes no sense to XML.

In HTML some elements can be improperly nested within each other like this:

This text is bold and italic In XML all elements must be properly nested within each other like this:

This text is bold and italic

All XML documents must have a root elementAll XML documents must contain a single tag pair to define a root element.

All other elements must be within this root element.All elements can have sub elements (child elements). Sub elements must be correctly nested within their parent element:

<root> <child> <subchild>.....</subchild> </child></root>

Attribute values must always be quotedWith XML, it is illegal to omit quotation marks around attribute values.

XML elements can have attributes in name/value pairs just like in HTML. In XML the attribute value must always be quoted. Study the two XML documents below. The first one is incorrect, the second is correct:

<?xml version="1.0" encoding="ISO-8859-1"?><note date=12/11/2002><to>Tove</to><from>Jani</from></note>

<?xml version="1.0" encoding="ISO-8859-1"?><note date="12/11/2002"><to>Tove</to><from>Jani</from></note>

The error in the first document is that the date attribute in the note element is not quoted.

This is correct: date="12/11/2002". This is incorrect: date=12/11/2002.

With XML, white space is preservedWith XML, the white space in your document is not truncated.

This is unlike HTML. With HTML, a sentence like this:

Hello my name is Tove,

will be displayed like this:

Hello my name is Tove,

because HTML reduces multiple, consecutive white space characters to a single white space.

With XML, CR / LF is converted to LFWith XML, a new line is always stored as LF.

Do you know what a typewriter is? Well, a typewriter is a mechanical device which was used last century to produce printed documents. :-

After you have typed one line of text on a typewriter, you have to manually return the printing carriage to the left margin position and manually feed the paper up one line.

In Windows applications, a new line is normally stored as a pair of characters: carriage return (CR) and line feed (LF). The character pair bears some resemblance to the typewriter actions of setting a new line. In Unix applications, a new line is normally stored as a LF character. Macintosh applications use only a CR character to store a new line.

Comments in XMLThe syntax for writing comments in XML is similar to that of HTML.



Requirements for XML:

There is nothing special about XML. It is just plain text with the addition of some XML tags enclosed in angle brackets.

Software that can handle plain text can also handle XML. In a simple text editor, the XML tags will be visible and will not be handled specially.

In an XML-aware application however, the XML tags can be handled specially. The tags may or may not be visible, or have a functional meaning, depending on the nature of the application.

XML Elements are Extensible

XML documents can be extended to carry more information.Look at the following XML NOTE example:

<note><to>Tove</to><from>Jani</from><body>Don't forget me this weekend!</body></note>

Let's imagine that we created an application that extracted the <to>, <from>, and <body> elements from the XML document to produce this output:

MESSAGE To: ToveFrom: Jani

Don't forget me this weekend!

Imagine that the author of the XML document added some extra information to it:

<note><date>2002-08-01</date><to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body></note> Should the application break or crash?

No. The application should still be able to find the <to>, <from>, and <body> elements in the XML document and produce the same output.

XML documents are Extensible.

XML Elements:

Elements have ContentElements can have different content types.

An XML element is everything from (including) the element's start tag to (including) the element's end tag.

An element can have element content, mixed content, simple content, or empty content. An element can also have attributes.

Element NamingXML elements must follow these naming rules:

Names can contain letters, numbers, and other characters Names must not start with a number or punctuation character Names must not start with the letters xml (or XML or Xml ..) Names cannot contain spaces

Take care when you "invent" element names and follow these simple rules:

Any name can be used, no words are reserved, but the idea is to make names descriptive. Names with an underscore separator are nice.

Examples: <first_name>, <last_name>.

Avoid "-" and "." in names. For example, if you name something "first-name," it could be a mess if your software tries to subtract name from first. Or if you name something "first.name," your software may think that "name" is a property of the object "first."

Element names can be as long as you like, but don't exaggerate. Names should be short and simple, like this: <book_title> not like this: <the_title_of_the_book>.

XML documents often have a corresponding database, in which fields exist corresponding to elements in the XML document. A good practice is to use the naming rules of your database for the elements in the XML documents.

Non-English letters like éòá are perfectly legal in XML element names, but watch out for problems if your software vendor doesn't support them.

The ":" should not be used in element names because it is reserved to be used for something called namespaces (more later).

XML Attributes:

XML elements can have attributes.From HTML you will remember this: <IMG SRC="computer.gif">. The SRC attribute provides additional information about the IMG element.

In HTML (and in XML) attributes provide additional information about elements:

<img src="computer.gif"><a href="demo.asp"> Attributes often provide information that is not a part of the data. In the example below, the file type is irrelevant to the data, but important to the software that wants to manipulate the element:

<file type="gif">computer.gif</file> Quote Styles, "female" or 'female'?Attribute values must always be enclosed in quotes, but either single or double quotes can be used. For a person's sex, the person tag can be written like this:

<person sex="female"> or like this:

<person sex='female'>

Note: If the attribute value itself contains double quotes it is necessary to use single quotes, like in this example:

<gangster name='George "Shotgun" Ziegler'> Note: If the attribute value itself contains single quotes it is necessary to use double quotes, like in this example:

<gangster name="George 'Shotgun' Ziegler">

2.2 XML Schema

XML Schema is an XML based alternative to DTD.

An XML schema describes the structure of an XML document.

The XML Schema language is also referred to as XML Schema Definition (XSD).

What is an XML Schema?

The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.

An XML Schema:

defines elements that can appear in a document defines attributes that can appear in a document defines which elements are child elements defines the order of child elements defines the number of child elements defines whether an element is empty or can include text defines data types for elements and attributes defines default and fixed values for elements and attributes

XML Schemas are the Successors of DTDs

We think that very soon XML Schemas will be used in most Web applications as a replacement for DTDs. Here are some reasons:

XML Schemas are extensible to future additions XML Schemas are richer and more useful than DTDs XML Schemas are written in XML XML Schemas support data types XML Schemas support namespaces

There are a number of reasons why XML Schema is better than DTD.

XML Schema has Support for Data TypesOne of the greatest strengths of XML Schemas is the support for data types.With the support for data types:

It is easier to describe permissible document content It is easier to validate the correctness of data It is easier to work with data from a database It is easier to define data facets (restrictions on data) It is easier to define data patterns (data formats) It is easier to convert data between different data types

XML Schemas use XML SyntaxAnother great strength about XML Schemas is that they are written in XML.Because XML Schemas are written in XML:


You don't have to learn another language You can use your XML editor to edit your Schema files You can use your XML parser to parse your Schema files You can manipulate your Schema with the XML DOM You can transform your Schema with XSLT

XML Schemas Secure Data CommunicationWhen data is sent from a sender to a receiver it is essential that both parts have the same "expectations" about the content.

With XML Schemas, the sender can describe the data in a way that the receiver will understand.A date like this: "03-11-2004" will, in some countries, be interpreted as 3. November and in other countries as 11. March, but an XML element with a data type like this:

<date type="date">2004-03-11</date>

ensures a mutual understanding of the content because the XML data type date requires the format YYYY-MM-DD.

XML Schemas are ExtensibleXML Schemas are extensible, just like XML, because they are written in XML.

With an extensible Schema definition you can:

Reuse your Schema in other Schemas Create your own data types derived from standard types Reference multiple schemas from the same document

Well-Formed is not EnoughA well-formed XML document is a document that conforms to the XML syntax rules:

must begin with the XML declaration must have one unique root element all start tags must match end-tags XML tags are case sensitive all elements must be closed all elements must be properly nested all attribute values must be quoted XML entities must be used for special characters

Even if documents are Well-Formed they can still contain errors, and those errors can have serious consequences. Think of this situation: you order 5 gross of laser printers, instead of 5 laser printers. With XML Schemas, most of these errors can be caught by your validating software.

What is a Simple Element?A simple element is an XML element that can contain only text. It cannot contain any other elements or attributes.However, the "only text" restriction is quite misleading. The text can be of many different types. It can be one of the types that are included in the XML Schema definition (boolean, string, date, etc.), or it can be a custom type that you can define yourself.You can also add restrictions (facets) to a data type in order to limit its content, and you can require the data to match a defined pattern.

How to Define a Simple ElementThe syntax for defining a simple element is:

<xs:element name="xxx" type="yyy"/>

where xxx is the name of the element and yyy is the data type of the element. Here are some XML elements:

<lastname>Refsnes</lastname><age>34</age><dateborn>1968-03-27</dateborn> And here are the corresponding simple element definitions:

<xs:element name="lastname" type="xs:string"/><xs:element name="age" type="xs:integer"/><xs:element name="dateborn" type="xs:date"/>

Common XML Schema Data TypesXML Schema has a lot of built-in data types. Here is a list of the most common types:

xs:string xs:decimal xs:integer xs:boolean xs:date xs:time

Declare Default and Fixed Values for Simple Elements:

Simple elements can have a default value OR a fixed value set.

A default value is automatically assigned to the element when no other value is specified. In the following example the default value is "red":

<xs:element name="color" type="xs:string" default="red"/> A fixed value is also automatically assigned to the element. You cannot specify another value. In the following example the fixed value is "red":

<xs:element name="color" type="xs:string" fixed="red"/>

CHAPTER # 3Canonical Conceptual Model ( CCM )

&

Attribute Tree

3.1 Canonic Conceptual Model

To reduce the complexity of integration process, schema is converted in the canonic conceptual model

The Canonic Conceptual Model (CCM) is conceptual model used for semi structured schemata representation.

By using the conceptual model, Users can restructure the existing XML documents . Improve the quality. Integrate various XML documents.


Canonic Conceptual Model ( CCM ) basically works on two models

CANONIC CONCEPTUAL MODEL( CCM )

OBJECT WITH ROLE MODEL ( ORM )

ENTITY RELATION SHIP MODEL ( E-R MODEL)

CCM contains the following:

Root: It is the starting point of a CCM and is usually shown by a rectangle with thick lines.

Lexical Concepts: These are the elements that have no further children and are represented by dotted rectangles in CCM.

Non Lexical Concepts: The elements included in this category have further sub nodes and are represented by solid rectangles.

Direct relationship: Arrows are used in CCM to show directed relationships.


Choice Concept: The choice in a CCM can either be a disjoint association or an inheritance relationship. If it is a disjoint association it is represented by an “X” circled symbol and if it is the latter one it is represented by a triangle.

For example, the CCM corresponding to XML schema given in fig. 1 (a) is shown below:


CHOICE CONCEPT

DISJOINT ASSOSIATION INHERITANCE

Inheritance

Non Lexical concept

Lexical concept

Disjoint

Root

Inheritance

Relationship

Disjoint assosiation

figure 3-a

Further, vertices are added to the attribute tree by traversing the CCM schema. If vertices in CCM has no further sub nodes then it is a lexical concept. E is simply added

as a node to the attribute tree.

3.2 Attribute Tree

The attribute tree is constructed from the CCM. Firstly, a vertex F is created which corresponds to the fact chosen in the CCM.

Further, vertices are added to the attribute tree by traversing the CCM schema. If

vertices in CCM has no further sub nodes then it is a lexical concept. E is simply added as a node to the attribute tree.


Vertex F

Purchase Order

If vertices has further sub nodes then it is a non lexical concept. It is added as a node to the attribute tree and is further explored.

The constructs that are represented in the CCM graph are disjoint associations and inheritance relationships. These relationships are carried forward to the attribute tree. Disjoint association is represented using ‘__’. Inheritance is graphically shown by ‘=’ in the attribute tree.


Lexical concept

Order Date

Purchase orders Vertex F

figure 3-b

Grafting: A node is said to be grafted when that node is deleted and its children becomes children of its parent.

Pruning: A node is said to be pruned when that node and all its children are deleted.

3.3 Resolving Disjoint & is-a relationship

It can be seen that there exists a disjoint association between the node Company and the node Individual Customer. Thus, according to our approach the node Client Type is grafted to the fact node Purchase Order. Company and Individual Customer are now attached to Purchase Order. Now with Purchase Order as the fact and company as one of its dimension a new star schema is created.


Purchase orders

Total AmountOrder Date

Lexical concept

Product Name

US Price

Quantity

Ship To Country Name

Same Country

Distribution OfficeCustom

Duty

NameId-No.

Individual Customer

Client Type

Turn OverProfile

Item

Company

Different Country

Non-Lexical concept

Disjoint

Inheritance

‘is-a’ relationship:

It is also clearly seen in figure that there exists is-a relationship between Different ountry and the Same Country. Thus according to our approach the node Country Name which is common to both Different Country & Same Country is moved down to both of them individually. Now with Purchase Order as the fact a star schema with Different Country as one of its dimension is created and similarly another star schema with Same Country as its dimension is created. All the other dimensions of Purchase Order will be present in both the star schemas created.

ATTRIBUTE TREE LOOKS LIKE THIS:-

figure 3-c


Purchase orders

Total AmountOrder DateProduct Name

US Price

Quantity

Country Name

Same Country

Distribution OfficeCustom Duty

Item

NameId-No.

Individual Customer

Turn OverProfile

Company

Different Country

GRAFTING

Same Country

CHAPTER # 4

Star Schema

4.1 Abstract

In recent years, data warehousing has emerged as the primary method of analyzing sales and marketing data for a competitive advantage. As the number of knowledge workers using the data Warehouse/data mart grows and the amount of data increases daily, performance problems have become a major concern of both the Information Systems staff and the users.Many options have been tried in an attempt to solve the performance problems – from bigger hardware to different software or database redesign and tuning for ROLAP (Relational Online Analytical Processing). However, all have limitations – either in functionality or in terms of cost and their strengths are almost inevitably outstripped by users’ demands.Many companies have opted to perform ROLAP (Relational OnLine Analytical Processing) using a relational database (RDBMS) that they already own. Typically, the data warehouse/data mart is designed with a star or snowflake schema in order to obtain the multidimensional access that is so critical to data analysis.

NEED FOR MULTIDIMENSIONAL ANALYSIS

If we take an example of business model of a large retail operation then we realize that the sales are interrelated to many business dimensions. The daily sales are meaningful only when they are related to the dates of the sales, the products, the distribution channels, the stores, sales territories, the promotions, and a few more dimensions. Multidimensional views are inherently representative of any business model. Very few models are limited o three dimensions or less.

For planning and making strategic decisions, managers and executives probe into business data through scenarios. For example, they compare actual targets and against sales in prior periods. They examine the breakdown of the sales by products, by store, by sales territory, by promotion, so on.

Decision makers are no longer satisfied with one-dimension queries. The user continues to ask for further comparisons to similar products, comparison among territories and views of the result by rotating the presentation between columns and rows.


For effective analysis, Decision makers must be able to analyze data along any

number of dimensions, at any level of aggregation, wit the capability of view the hierarchies of every dimension. Hence , we need for multidimensional analysis. Without a solid system for true multidimensional analysis, our data warehouse is incomplete.

4.2 Dimension Modeling

It is the methodology of the data warehouse. Here, all the data are stored in two type of the table

1. Fact Table2. Dimension Table

FACT TABLE:

It contains the measures or the facts of the business processes. The measures are the ratings of the question categories like Excellent, Very Good, Good Fair, Undecided etc. The aggregated values of these measures are captured and stored in the fact table. In addition to the measures, the only other thing that the fact table contains foreign keys for the dimension table. For example, SALES is a Fact Table.

DIMENSION TABLES

The context of the measurements is represented in the dimension tables. That is when, what and where the measurements are taken. For example, the dimensions can be taken as Time, Store, Product , Promotion etc. Each dimension may have different hierarchy levels like Fiscal Year, Quarter and Month.

There are two ways to connect the dimension tables with fact table.


TABLES

FACT TABLE

DIMENSION TABLE

1) THE STAR SCHEMA

This schema is very easy to understand and it provides better performance, This schema is easily extensible. A star schema consists of one fact table that contains measurable or countable fact data. The fact table contains foreign keys to join to dimension data in a metric dimension table and in one or more component dimension tables (hereafter referred to as dimension tables). Data Warehouse uses two types of fact tables, measurement fact tables and event fact tables.

Figure shows a typical star schema with one fact table, one metric dimension table, and dimension tables for four components.


FACT TABLES

MEASUREMENTFACT TABLE

EVENTFACT TABLE

http://publib.boulder.ibm.com/infocenter/tiv3help/topic/com.ibm.tivoli.tdwi.doc/srfmst158.htm#starschema

2) SNOW FLAKE SCHEMA:

If the dimensions table is not normalized into one table then 2 or more tables will represent one dimension. This unit is linked to the fact table.

STAR SCHEMA EXAMPLE

An example of a central Sales fact table surrounded by dimension tables for Store, Product, and Time:

This schema has three dimensions, namely, product, time, and store. The fact table contains sales. Figure showing the schema and a three-dimension representation of the model as a cube, with products on the X-axis, time on the Y-axis, and stores on the Z-axis.

STAR SCHEMA


Product KeyTime keyStore Key

Fixed CostsVariable CostsIndirect SalesDirect Sales

Profit Margin

Product KeyProduct NameSub-Category

CategoryProduct LineDepartment

Store KeyStore nameTerritoryRegion

Time KeyDate

MonthQuarter

Year

THREE-DIMENSIONAL CUBE

In this star schema time is one of the dimension and month is one of the attribute of the time dimension. Values of this attribute month are represented on the Y-axis. Similarly, values of the attributes product name and store name are represented on the other two axes.From the attribute of the dimension tables, pick the attribute product name from product dimension, month from time dimension, store from store dimension. Now the cube representing the values of these attributes along the primary edge of the physical cube.

If we want to visualize the sales for coats in the month of January at the New York store to be at the intersection of the three lines representing the products: coats, month: January, and store: New York.If we are displaying the data for sales along these three dimensions on a spreadsheet, the columns may display the product names, the rows the months and the pages the data along the third dimension of the store names. The page displayed on the screen shows the slice of the cube.

HYPERCUBE

Now, we add another dimension Promotion to the model. That result in three dimension plus Promotion dimension. Now, we have a problem that how can we represent these four dimension with three-dimensional cube? To remove these type of the problem we use MDS diagram. Now we need not try to perceive four-dimensional data as along the


edge of the three-dimensional cube. All we have to do is draw four straight lines to represent the data as an MDS. These four line represent the data. This intuitive representation is called a HYPERCUBE .

An example of a central Sales fact table surrounded by dimension tables for Store, Promotion, Product, and Time:

Figure : Star Schema Example


MDS FOR FOUR DIMENSION

ADVANTAGE OF STAR SCHEMA

Star schema is very easy to understand, even for non technical business managers

Star schema provides better performance and smaller query times.

Star schema is easily extensible and will handle future changes easily.

MULTIDIMENSIONAL QUERY

A multidimensional query accesses data by more than one dimension, i.e. by more than one column or criteria. In a data warehousing environment, users rarely want to access data by only one column or dimension, such as finding the number of customers in the state of CA. They more commonly want to ask complex questions such as how many


STORE TIME PRODUCTTIME

PROMOTIONHoliday

New York

Dallas

San Jose

Denver

Cleverland

Boston

Hats

Jackets

Coats

Dresses

Shirts

Slacks

January

MarFebruary

April

May

June

JulyAug

SepOct

NovDec

customers in the state of CA have purchased product B and C in the last year, and how does that compare to the year before. Over 90 percent of data warehousing queries are multidimensional in nature, using multiple criteria against multiple columns.For the star schema example shown earlier, one possible multidimensional query would be to find the number of sales for stores in California for holiday promotions for products with the word “book” in the description for the month of December.

MULTIDIMENSIONAL QUERY EXAMPLE

A Typical SQL Query Template for the Sales Schema will look like:

Select the measurements that you want to aggregate using SUM clause

SELECT P.Name, SUM(F.Sales)

JOIN the FACT table with Dimension Tables

FROM Sales F, Time T, Product P, Location L WHERE F.TM_Dim_Id = T.Dim_Id AND F.PR_Dim_Id = P.Dim_Id AND F.LOC_Dim_Id = L.Dim_Id


Constrains the Dimension Attributes

AND T.Month='Jan' AND T.Year='2003' AND L.Country_Name='USA'

finally the 'group by' clause identifies the aggregation level. In this example you are aggregating all sales within a product category.

GROUP BY P.Category

CHAPTER # 5

Design Approach

Conceptual Design of Data Warehouses from XML Schemas using CCM


figure 5-a

5.1 Conceptual Design from XML sources

Starting with the XML schema the following steps are performed:

1. Convert XML schema to CCM schema.2. Choose facts 3. For each fact

a) Build an attribute tree from the CCM schemab) Define dimensions and measures.


Converted To

Converted To

XML tagsXML tagsXML tagsXML SchemaCCM SchemaCCM SchemaCCM SchemaCCM Schema

Converted To

CCM SchemaCCM SchemaCCM SchemaAttribute Tree

CCM SchemaCCM SchemaCCM SchemaMultidimensional Model(star schema)

Converting XML schema to CCM Schema

As a first step the XML schema is converted to a schema represented in CCM. The conversion processis adopted to define the mapping from XML schema to CCM. The mapping is as follows:

(a) An element in the XML schema that has further sub elements is mapped to a

non lexical concept in CCM schema.

For example, in Figure 1 (a) ‘Item’ is a non lexical concept.

(a) Element with no sub elements in the XML schema is mapped to a lexical concept.For example, item like ‘US Price’ is considered as a lexical concept.

(b) The choice construct of an XML schema can either be a disjoint association or an inheritance relationship.

If an element has no other sub elements other than a choice element then the association is taken to be a disjoint association.For example, in Figure. 1(a) ‘Client Type ’ can either be a ‘Company’ or an ‘Individual Customer’ thus representing a disjoint association.

If an element has other sub elements and/or attributes in addition to the choice element then the relationship is considered to be an inheritance relationship. For example, goods can be shipped to either different country or the same country. An inheritance relationship graphically represented by a triangle exists between the two. They share a common attribute ‘Country Name’.

(c) Relationships in XML schema with minOccurs and maxOccurs options specify cardinality in one direction. These can be used to unamoguously specify the cardinalities in one direction in the CCM schema.

Choosing Facts


After the conversion of an XML schema to CCM schema the next step involves choosing a fact in the CCM.

This has to be chosen by the user.

figure 5-b

5.2 Conversion of CCM schema to an attribute tree

The attribute tree is constructed from the CCM. Firstly, a vertex F is created which corresponds to the fact chosen in the CCM. Further, vertices are added to the attribute

tree by traversing the CCM schema.Let E be the current concept in the CCM if E has further sub nodes then E is a non lexical concept. E is added as a node to the attribute tree and is further explored. If E in CCM has no further sub nodes then it is a lexical concept. E is simply added as a

node to the attribute tree. The other constructs that are represented in the CCM graph are disjoint associations and inheritance relationships. These relationships are carried

forward to the attribute tree. Disjoint association is represented using ‘__’ and inheritance is graphically shown by ‘=’ in the attribute tree. The attribute tree obtained

from the CCM of Figure 3(a) is shown in Figure.3(b)

Defining Dimensions and measures

There are several nodes surrounding the fact. These are termed as dimensions.

If they are at a distance 1 from the fact node they are referred to as D1N1, D1N2. …D1Nj. They may or may not have further nodes. If they have, then the next node in the tree is at a distance 2 from the fact. The nodes at distance 2 are represented by D2N1, D2N2

….D2Nk.Continuing this way, further consecutive nodes from the fact node are denoted by DiNj , Di Nj+1, Di Nj+2----_--------------------Di Nj+m where Di refers to the ith distance from the fact node F.


XML SCHEMA

CCM SCHEMA

CHOOSE FACTS

figure 5-c

Consider the nodes at distance 1 from the fact node. These nodes may become measures or dimensions. Without user intervention it is not possible to exactly determine whether a node is a dimension or a measure. We proposed that the nodes which have no further subnodes, are measures & the nodes having further sub nodes are dimensions. However this can be changed by user intervention . We propose that the nodes which have no further subnodes are measures and the nodes having further subnodes are dimensions. The nodes with no further subnodes correspond to the lexical concept and the nodes with further subnodes corresponds to the non lexical concept of CCM.

Among the nodes designated as dimension, they can exhibit relationships, which are carried over to the attribute tree from the CCM. These relationships can be either a

disjoint relationship or an’ is-a’ relationship.

.


Purchase orders

Total AmountOrder Date

D1N1

Client Type

Company

Individual Customer

D1N1

D2N1

D2N2

Fact

5.3 Reaching from Attribute Tree to Star Schema

To reach star schema from attribute tree, we go by following procedure:1. All the nodes directly connected to fact node are made dimensions.2. The nodes which are children of dimensions becomes its attribute.


CHAPTER # 6

Physical Implementation

6.1 Implementation strategy taken for data transformation from source database to fact schema:


ORACLE 9i Release 2

XML Development Kit

PL/SQL

DBMS_XMLQuery Package DBMS_XMLSave Package

Used to generate XML data

Used to Save XML data into database

XML DATA

XML DATA

BFILE

CLOB

Generated in two forms

Takes XML data as input in the form of CLOB or BFILE matches the tags with the target table’s column name and transfers data into destination table.

Takes XML data as input in the form of CLOB or BFILE matches the tags with the target table’s column name and transfers data into destination table.

6.2 PL/SQL

PL/SQL is Oracle’s procedural language (PL) superset of the Structured Query Language (SQL). You can use PL/SQL to do such things as codify your business rules through the creation of stored procedures and packages, trigger database events to occur, or add programming logic to the execution of SQL commands.

PL/SQL Overview

PL/SQL code is grouped into structures called blocks. If you create a stored procedure or package, you give the block of PL/SQL code a name; if the block of PL/SQL code is not given a name, then it is called an anonymous block. A block of PL/SQL code contains three sections:

Section DescriptionDeclarations Defines and initializes the variables and cursors used

in the blocks.

Uses flow-Control commands (such as if commands and loops) to execute the commands and assign values to the declared variables.

Exception Handling Provides customized handling of error conditions.

Within a PL/SQL block, the first section is the Declarations section. Within the Declarations section, you define the variables and cursors that the block will use. The Declarations section starts with the keyword declare and ends when the Executable Commands section starts indicated by the keyword begin. The Executable Commands


Can be stored in Database itself thus various database operation can be applied like searching,indexing etc.

Therefore, CLOBs are chosen.

section is followed by the Exception Handling section; the exception keyword signals the start of the Exception Handling section. The PL/SQL block is terminated by the end keyword.

The structure of a typical PL/SQL block is shown in the following listing:declare < declarations section >begin < executable commands>exception < exception handling>end;

PROCEDURES

Sophisticated business rules and application logic can be stored as procedures within Oracle. Stored procedures – groups of SQL, PL/SQL, and Java statements – enable you to move code that enforces business rules from your application to the database. As a result, the code will be stored once for use by multiple applications. Because Oracle supports stored procedures, the code within your applications should become more consistent and easier to maintain.You may experience performance gains when using procedures, for two reasons:

1. The processing of complex business rules may be performed within the database – and therefore by the server. In the client-server or three-tier applications, shifting complex processing from the application (on the client) to the database (on the server) may drastically improve performance.

2. Because the procedural code is stored within the database and is fairly static, you may also benefit from the reuse of the same queries within the database. The shared SQL area in the System Global Area (SGA) will store the parsed versions of the executed commands. Thus, the second time a procedure is executed, it may be able to take advantage of the parsing that was previously performed, improving the performance of the procedure’s execution.

In addition to these advantages, your application development effort may also benefit. Business rules that are consolidated within the database no longer need to be written into

each application, thus saving you time during application creation and simplifying the maintenance process.

6.3Oracle XML Components: Overview

Oracle9i provides several components, utilities, and interfaces you can use to take advantage of XML technology in building your Web-based database applications. Which components you use depends on your application requirements, programming preferences, development, and deployment environments. Starting with XDK 9.0.2 (shipped with iAS v2) and XDK 9.2 (shipped with Oracle9i Release 2), XSLStylesheet is thread-safe and can be used across threads in multiple XSLProcessor.processXSL calls. But XSLProcessor, a light-weight object, will not be made thread safe. The following XML components are provided with Oracle9i and Oracle9i Application Server:

XML Developer’s Kits (XDKs). There are Oracle XDKs for Java, C, C++, andPL/SQL. These development kits contain building blocks for reading,manipulating, transforming, and viewing XML documents. Oracle XDKs arefully supported and come with a commercial redistribution license.

XML SQL Utility (XSU). This utility, for Java and PL/SQL: Generates andstores XML data to and from the database from SQL queries or result sets ortables. It achieves data transformation, by mapping canonically any SQL queryresult to XML and vice versa.

XDK for PL/SQLXDK for PL/SQL is composed of the following:

XML Parser for PL/SQL: Creates and parses XML using Internet standardDOM and SAX interfaces. Includes an XSL Transformation (XSLT) Processorthat transforms XML to XML or other text-based formats, such as HTML.

XML Schema Processor for PL/SQl. Supports simple and complex types.

XML SQL Utility (XSU) for PL/SQL. Enables you to transform data retrievedfrom object-relational database tables or views into XML, extract data from an


XML document and:– Use canonical mapping to insert data into appropriate columns or attributesof a table or a viewXML Parsers– Apply this data to update or delete values of the appropriate columns orattributes

XSLT Processor.

XML Schema Processor. Transforms or renders XML into other text-based formats such as HTML and WML.


figure 6-a: Oracle XML Component and e-Business solutions


6.4Oracle XML SQL Utility (XSU)

Oracle XML SQL Utility (XSU) supports Java and PL/SQL.n XML SQL Utility is comprised of core Java class libraries for automatically and dynamically rendering the results of arbitrary SQL queries into canonical XML. It includes the following features:

– Supports queries over richly-structured user-defined object types and objectviews.

– Supports automatic XML Insert of canonically-structured XML into anyexisting table, view, object table, or object view. By combining with XSLTtransformations, virtually any XML document can be automaticallyinserted into the database.

XML SQL Utility Java classes can be used for the following tasks:

– Generate from an SQL query or Result set object a text or XML document, aDocument Object Model (DOM), Document Type Definition (DTD), or XMLSchema.

– Load data from an XML document into an existing database schema orview.

XML SQL Utility for PL/SQL is comprised of a PL/SQL package that wrapsthe XML SQL Utility for Java.

Figure 6-b: Oracle XML SQL utitlity

XML SQL Utility for Java consists of a set of Java classes that perform the following tasks:Pass a query to the database and generate an XML document (text or DOM)from the results or the DTD which can be used for validation.


Oracle XML SQL Utility (XSU)– Write XML data to a database tableGenerating XML from Query Results

Figure 6-c shows how XML SQL Utility processes SQL queries and returns theresults as an XML document.

figure 6-cXML Document Structure: Columns Are Mapped to ElementsThe structure of the resulting XML document is based on the internal structure of the database schema that returns the query results:Columns are mapped to top level elementsScalar values are mapped to elements with text-only contentObject types are mapped to elements with attributes appearing as sub-elementsCollections are mapped to lists of elements

XSU Generates the XML Document as a String or DOM Element TreeThe XML SQL Utility (XSU) generates either of the following:A string representation of the XML document. Use this representation if you are returning the XML document to a requester.An in-memory XML DOM tree of elements. Use this representation if you areoperating on the XML programmatically, for example, transforming it using theXSLT Processor using DOM methods to search or modify the XML in some way.

XSU Generates a DTD Based on Queried Table’s Schema


You can also use the XML SQL Utility (XSU) to generate a DTD based on the schema of the underlying table or view being queried. You can use the generated DTD as input to the XML Class Generator for Java or C++. This generates a set of classes based on the DTD elements. You can then write code that uses these classes to generate the infrastructure behind a Web-based form. Based on this infrastructure, theWeb form can capture user data and create an XML document compatible with the database schema. This data can then be written directly to the corresponding database table or object view without further processing.

Using Oracle XML Components to Generate XML Documents: PL/SQL

Figure 1–11 shows the XDK for PL/SQL components used to generate an XMLdocument:XML Parser for PL/SQL, Version 2 including XSLTXML SQL Utility (XSU) for PL/SQLIn the PL/SQL environment, when a user or client or application sends a SQLquery, there are two possible ways of processing the query using the Oracle XMLcomponents:Directly by JDBC which then accesses the XML ParserThrough XML SQL Utility (XSU)


figure 6-d:Generating XML document using XDK for PL/SQL

XSU PL/SQL API

XML SQL Utility (XSU) PL/SQL API reflects the Java API in the generation andstorage of XML documents from and to a database. DBMS_XMLQuery andDBMS_XMLSave are the two packages that reflect the functions in the Java classes -OracleXMLQuery and OracleXMLSave. Both of these packages have a contexthandle associated with them. Create a context by calling one of the constructor-likefunctions to get the handle and then use the handle in all subsequent calls.

6.5 Generating XML with DBMS_XMLQuery()

Generating XML results in a CLOB that contains the XML document. To useDBMS_XMLQuery and the XSU generation engine, follow these steps:


1. Create a context handle by calling the DBMS_XMLQuery.getCtx function andsupplying it the query, either as a CLOB or a VARCHAR2.2. Bind possible values to the query using the DBMS_XMLQuery.bind function.The binds work by binding a name to the position. For example, the query canbe select * from emp where empno = :EMPNO_VAR. Here you arebinding the value for the EMPNO_VAR using the setBindValue function.3. Set optional arguments like the ROW tag name, the ROWSET tag name, or thenumber of rows to fetch, and so on.4. Fetch the XML as a CLOB using the getXML() functions. getXML() can becalled to generate the XML with or without a DTD or schema.5. Close the context.Here are some examples that use the DBMS_XMLQuery PL/SQL package.

XSU Generating XML Example 1: Generating XML from Simple Queries (PL/SQL)

In this example, you select rows from table emp, and obtain an XML document as aCLOB. First get the context handle by passing in a query and then call thegetXMLClob routine to get the CLOB value. The document is in the same encodingas the database character set.

declarequeryCtx DBMS_XMLquery.ctxType;result CLOB;begin-- set up the query context...!queryCtx := DBMS_XMLQuery.newContext('select * from emp');-- get the result..!result := DBMS_XMLQuery.getXML(queryCtx);-- Now you can use the result to put it in tables/send as messages..printClobOut(result);DBMS_XMLQuery.closeContext(queryCtx); -- you must close the query handle..end;/

XSU Generating XML Example 2: Printing CLOB to Output Buffer

printClobOut() is a simple procedure that prints the CLOB to the output buffer. Ifyou run this PL/SQL code in SQL*Plus, the result of the CLOB is printed to screen.Set the serveroutput to on in order to see the results.


CREATE OR REPLACE PROCEDURE printClobOut(result IN OUT NOCOPY CLOB) isxmlstr varchar2(32767);line varchar2(2000);beginxmlstr := dbms_lob.SUBSTR(result,32767);loopexit when xmlstr is null;line := substr(xmlstr,1,instr(xmlstr,chr(10))-1);dbms_output.put_line('| '||line);xmlstr := substr(xmlstr,instr(xmlstr,chr(10))+1);end loop;end;/

XSU Generating XML Example 3: Changing ROW and ROWSET Tag Names

With the XSU PL/SQL API you can also change the ROW and the ROWSET tagnames. These are the default names placed around each row of the result, andround the whole document, respectively. The procedures, setRowTagName andsetRowSetTagName accomplish this as shown in the following example:

--Setting the ROW tag namesdeclarequeryCtx DBMS_XMLQuery.ctxType;result CLOB;begin-- set the query context.queryCtx := DBMS_XMLQuery.newContext('select * from emp');DBMS_XMLQuery.setRowTag(queryCtx,'EMP'); -- sets the row tag nameDBMS_XMLQuery.setRowSetTag(queryCtx,'EMPSET'); -- sets rowset tag nameresult := DBMS_XMLQuery.getXML(queryCtx); -- get the resultprintClobOut(result); -- print the result..!DBMS_XMLQuery.closeContext(queryCtx); -- close the query handle;end;/

The resulting XML document has an EMPSET document element. Each row isseparated using the EMP tag.

XSU Generating XML Example 4: Using setMaxRows() and setSkipRows()

The results from the query generation can be paginated by using:n setMaxRows function. This sets the maximum number of rows to beconverted to XML. This is relative to the current row position from which the


last result was generated.n setSkipRows function. This specifies the number of rows to skip beforeconverting the row values to XML.For example, to skip the first 3 rows of the emp table and then print out the rest ofthe rows 10 at a time, you can set the skipRows to 3 for the first batch of 10 rowsand then set skipRows to 0 for the rest of the batches.

As in the case of XML SQL Utility’s Java API, call the keepObjectOpen()function to ensure that the state is maintained between fetches. The defaultbehavior is to close the state after a fetch. For multiple fetches, you must determinewhen there are no more rows to fetch. This can be done by setting thesetRaiseNoRowsException(). This causes an exception to be raised if no rowsare written to the CLOB. This can be caught and used as the termination condition.-- Pagination of results

declarequeryCtx DBMS_XMLquery.ctxType;result CLOB;begin-- set up the query context...!queryCtx := DBMS_XMLQuery.newContext('select * from emp');DBMS_XMLQuery.setSkipRows(queryCtx,3); -- set the number of rows to skipDBMS_XMLQuery.setMaxRows(queryCtx,10); -- set the max number of rows per fetchresult := DBMS_XMLQuery.getXML(queryCtx); -- get the first result..!printClobOut(result); -- print the result out.. This is you own routine..!DBMS_XMLQuery.setSkipRows(queryCtx,0); -- from now don't skip any more rows..!DBMS_XMLQuery.setRaiseNoRowsException(queryCtx,true);-- raise no rows exception..!beginloop -- loop forever..!result := DBMS_XMLQuery.getXML(queryCtx); -- get the next batchprintClobOut(result); -- print the next batch of 10 rows..!end loop;exceptionwhen others then-- dbms_output.put_line(sqlerrm);null; -- termination condition, nothing to do;end;DBMS_XMLQuery.closeContext(queryCtx); -- close the handle..!end;/


Setting Stylesheets in XSU (PL/SQL)

The XSU PL/SQL API provides the ability to set stylesheets on the generated XMLdocuments as follows:n Set the stylesheet header in the result XML. To do this, usesetStylesheetHeader() procedure, to set the stylesheet header in the result.This simply adds the XML processing instruction to include the stylesheet.n Apply a stylesheet to the result XML document, before generation. This methodis a huge performance win since otherwise the XML document has to begenerated as a CLOB, sent to the parser again, and then have the stylesheetapplied. XSU generates a DOM document, calls the parser, applies thestylesheet and then generates the result. To apply the stylesheet to the resultingXML document, use the useStyleSheet() procedure. This uses the stylesheetto generate the result.

Binding Values in XSU (PL/SQL)

The XSU PL/SQL API provides the ability to bind values to the SQL statement. TheSQL statement can contain named bind variables. The variables must be prefixedwith a colon (:) to declare that they are bind variables. To use the bind variablefollow these steps:

1. Initialize the query context with the query containing the bind variables. Forexample, the following statement registers a query to select the rows from theemp table with the where clause containing the bind variables :EMPNO and:ENAME. You will bind the values for employee number and employee namelater.

queryCtx = DBMS_XMLQuery.getCtx(’select * from emp where empno = :EMPNO andename = :ENAME’);

2. Set the list of bind values. The clearBindValues() clears all the bindvariables set. The setBindValue() sets a single bind variable with a stringvalue. For example, you will set the empno and ename values as shown later:

DBMS_XMLQuery.clearBindValues(queryCtx);DBMS_XMLQuery.setBindValue(queryCtx,’EMPNO’,20);DBMS_XMLQuery.setBindValue(queryCtx,’ENAME’,’John’);

3. Fetch the results. This will apply the bind values to the statement and then getthe result corresponding to the predicate empno = 20 and ename = ’John’.DBMS_XMLQuery.getXMLClob(queryCtx);


4. Re-bind values if necessary. For example to change the ENAME alone to scottand reexecute the query,

DBMS_XMLQuery.setBindValue(queryCtx,’ENAME’,’Scott’);

The rebinding of ENAME will now use Scott instead of John.

XSU Generating XML Example 5: Binding Values to the SQL StatementThe following example illustrates the use of bind variables in the SQL statement:

declarequeryCtx DBMS_XMLquery.ctxType;result CLOB;beginqueryCtx := DBMS_XMLQuery.newContext('select * from emp where empno = :EMPNO and ename = :ENAME');--No longer needed:--DBMS_XMLQuery.clearBindValues(queryCtx);DBMS_XMLQuery.setBindValue(queryCtx,'EMPNO',7566);DBMS_XMLQuery.setBindValue(queryCtx,'ENAME','JONES');result := DBMS_XMLQuery.getXML(queryCtx);--printClobOut(result);DBMS_XMLQuery.setBindValue(queryCtx,'ENAME','Scott');result := DBMS_XMLQuery.getXML(queryCtx);--printClobOut(result);end;/

6.6 Storing XML in the Database Using DBMS_XMLSave

To use DBMS_XMLSave() and XML SQL Utility storage engine, follow these steps:1. Create a context handle by calling the DBMS_XMLSave.getCtx function andsupplying it the table name to use for the DML operations.2. For inserts. You can set the list of columns to insert into using thesetUpdateColNames function. The default is to insert values into all thecolumns.For updates. The list of key columns must be supplied. Optionally the list ofcolumns to update may also be supplied. In this case, the tags in the XMLInsert Processing Using XSU (PL/SQL API)23-8 Oracle9 iXML Developer’s Kits Guide - XDKdocument matching the key column names will be used in the WHERE clauseof the update statement and the tags matching the update column list will beused in the SET clause of the update statement.For deletes. The default is to create a WHERE clause to match all the tag values


present in each ROW element of the document supplied. To override thisbehavior you can set the list of key columns. In this case only those tag valueswhose tag names match these columns will be used to identify the rows todelete (in effect used in the WHERE clause of the delete statement).3. Supply an XML document to the insertXML, updateXML, or deleteXMLfunctions to insert, update and delete respectively.4. You can repeat the last operation any number of times.5. Close the context.Use the same examples as for the Java case, OracleXMLSave class examples.

Insert Processing Using XSU (PL/SQL API)

To insert a document into a table or view, simply supply the table or the view nameand then the XML document. XSU parses the XML document (if a string is given)and then creates an INSERT statement, into which it binds all the values. By default,XSU inserts values into all the columns of the table or view and an absent element istreated as a NULL value.The following code shows how the document generated from the emp table can beput back into it with relative ease.

XSU Inserting XML Example 6: Inserting Values into All Columns (PL/SQL)

This example creates a procedure, insProc, which takes in:n An XML document as a CLOBn A table name to put the document intoand then inserts the XML document into the table:

create or replace procedure insProc(xmlDoc IN CLOB, tableName IN VARCHAR2) isinsCtx DBMS_XMLSave.ctxType;rows number;begininsCtx := DBMS_XMLSave.newContext(tableName); -- get the context handlerows := DBMS_XMLSave.insertXML(insCtx,xmlDoc); -- this inserts the document

DBMS_XMLSave.closeContext(insCtx); -- this closes the handleend;/This procedure can now be called with any XML document and a table name. Forexample, a call of the form:


insProc(xmlDocument, ’scott.emp’);

generates an INSERT statement of the form:

insert into scott.emp (EMPNO, ENAME, JOB, MGR, SAL, DEPTNO) VALUES(?,?,?,?,?,?);

and the element tags in the input XML document matching the column names willbe matched and their values bound. For the code snippet shown earlier, if you sendit the following XML document:

<?xml version=’1.0’?><ROWSET><ROW num="1"><EMPNO>7369</EMPNO><ENAME>Smith</ENAME><JOB>CLERK</JOB><MGR>7902</MGR><HIREDATE>12/17/1980 0:0:0</HIREDATE><SAL>800</SAL><DEPTNO>20</DEPTNO></ROW></ROWSET>

you would have a new row in the emp table containing the values (7369, Smith,CLERK, 7902, 12/17/1980,800,20). Any element absent inside the row elementwould is considered a null value.

XSU Inserting XML Example 7: Inserting Values into Certain Columns (PL/SQL)

In certain cases, you may not want to insert values into all columns. This might betrue when the values that you are getting is not the complete set and you needtriggers or default values to be used for the rest of the columns. The code thatappears later shows how this can be done.Assume that you are getting the values only for the employee number, name, andjob, and that the salary, manager, department number and hiredate fields are filledin automatically. You create a list of column names that you want the insert to workon and then pass it to the DBMS_XMLSave procedure. The setting of these valuescan be done by calling setUpdateColumnName() procedure repeatedly, passingin a column name to update every time. The column name settings can be clearedusing clearUpdateColumnNames().

create or replace procedure testInsert( xmlDoc IN clob) isinsCtx DBMS_XMLSave.ctxType;

doc clob;rows number;begininsCtx := DBMS_XMLSave.newContext('scott.emp'); -- get the save context..!DBMS_XMLSave.clearUpdateColumnList(insCtx); -- clear the update settings-- set the columns to be updated as a list of values..DBMS_XMLSave.setUpdateColumn(insCtx,'EMPNO');DBMS_XMLSave.setUpdateColumn(insCtx,'ENAME');DBMS_XMLSave.setUpdatecolumn(insCtx,'JOB');

-- Now insert the doc. This will only insert into EMPNO,ENAME and JOB columnsrows := DBMS_XMLSave.insertXML(insCtx, xmlDoc);DBMS_XMLSave.closeContext(insCtx);end;/

If you call the procedure passing in a CLOB as a document, an INSERT statement ofthe form:insert into scott.emp (EMPNO, ENAME, JOB) VALUES (?, ?, ?);is generated. Note that in the earlier example, if the inserted document containsvalues for the other columns (JOB, HIREDATE, and so on), those are ignored.Also an insert is performed for each ROW element that is present in the input.These inserts are batched by default.

Update Processing Using XSU (PL/SQL API)

Now that you know how to insert values into the table from XML documents, let ussee how to update only certain values. If you get an XML document to update thesalary of an employee and also the department that she works in:

<ROW num="1"><EMPNO>7369</EMPNO><SAL>1800</SAL><DEPTNO>30</DEPTNO></ROW><ROW><EMPNO>2290</EMPNO><SAL>2000</SAL><HIREDATE>12/31/1992</HIREDATE></ROWSET>you can call the update processing to update the values. In the case of update, youneed to supply XSU with the list of key column names. These form part of thewhere clause in the update statement. In the emp table shown earlier, the employee

number (EMPNO) column forms the key and you use that for updates.

XSU Updating XML Example 8: Updating XML Document Key Columns (PL/SQL)

Consider the PL/SQL procedure:

create or replace procedure testUpdate ( xmlDoc IN clob) isupdCtx DBMS_XMLSave.ctxType;rows number;beginupdCtx := DBMS_XMLSave.newContext('scott.emp'); -- get the contextDBMS_XMLSave.clearUpdateColumnList(updCtx); -- clear the update settings..DBMS_XMLSave.setKeyColumn(updCtx,'EMPNO'); -- set EMPNO as key columnrows := DBMS_XMLSave.updateXML(updCtx,xmlDoc); -- update the table.DBMS_XMLSave.closeContext(updCtx); -- close the context..!end;/

In this example, when the procedure is executed with a CLOB value that containsthe document described earlier, two update statements would be generated. For thefirst ROW element, you would generate an UPDATE statement to update the SAL andJOB fields as shown:UPDATE scott.emp SET SAL = 1800 and DEPTNO = 30 WHERE EMPNO = 7369;and for the second ROW element,

UPDATE scott.emp SET SAL = 2000 and HIREDATE = 12/31/1992 WHERE EMPNO = 2290;

XSU Updating XML Example 9: Specifying a List of Columns to Update (PL/SQL)

You may want to specify the list of columns to update. This would speed up theprocessing since the same update statement can be used for all the ROW elements.Also you can ignore other tags which occur in the document. Note that when youspecify a list of columns to update, an element corresponding to one of the updatecolumns, if absent, will be treated as NULL.If you know that all the elements to be updated are the same for all the ROWelements in the XML document, then you can use the setUpdateColumnName()


procedure to set the column name to update.

create or replace procedure testUpdate(xmlDoc IN CLOB) isupdCtx DBMS_XMLSave.ctxType;rows number;beginupdCtx := DBMS_XMLSave.newContext('scott.emp');DBMS_XMLSave.setKeyColumn(updCtx,'EMPNO'); -- set EMPNO as key column-- set list of columnst to update.DBMS_XMLSave.setUpdateColumn(updCtx,'SAL');DBMS_XMLSave.setUpdateColumn(updCtx,'JOB');rows := DBMS_XMLSave.updateXML(updCtx,xmlDoc); -- update the XML document..!DBMS_XMLSave.closeContext(updCtx); -- close the handleend;/

Delete Processing Using XSU (PL/SQL API)

For deletes, you can set the list of key columns. These columns will be put as part ofthe WHERE clause of the DELETE statement. If the key column names are notsupplied, then a new DELETE statement will be created for each ROW element of theXML document where the list of columns in the WHERE clause of the DELETE willmatch those in the ROW element.

XSU Deleting XML Example 10: Deleting Operations for Each Row (PL/SQL)Consider the delete example shown here:

create or replace procedure testDelete(xmlDoc IN clob) isdelCtx DBMS_XMLSave.ctxType;rows number;begindelCtx := DBMS_XMLSave.newContext('scott.emp');DBMS_XMLSave.setKeyColumn(delCtx,'EMPNO');rows := DBMS_XMLSave.deleteXML(delCtx,xmlDoc);DBMS_XMLSave.closeContext(delCtx);end;/If you use the same XML document shown for the update example, you would endup with two DELETE statements,

DELETE FROM scott.emp WHERE empno=7369 and sal=1800 and deptno=30;


DELETE FROM scott.emp WHERE empno=2200 and sal=2000 and hiredate=12/31/1992;

The DELETE statements were formed based on the tag names present in each ROWelement in the XML document.

XSU Example 11: Deleting by Specifying the Key Values (PL/SQL)

If instead you want the delete to only use the key values as predicates, you can usethe setKeyColumn function to set this.

create or replace package testDML ASsaveCtx DBMS_XMLSave.ctxType := null; -- a single static variableprocedure insertXML(xmlDoc in clob);procedure updateXML(xmlDoc in clob);procedure deleteXML(xmlDoc in clob);end;/create or replace package body testDML ASrows number;procedure insertXML(xmlDoc in clob) isbeginrows := DBMS_XMLSave.insertXML(saveCtx,xmlDoc);end;

procedure updateXML(xmlDoc in clob) isbeginrows := DBMS_XMLSave.updateXML(saveCtx,xmlDoc);end;procedure deleteXML(xmlDoc in clob) isbeginrows := DBMS_XMLSave.deleteXML(saveCtx,xmlDoc);end;beginsaveCtx := DBMS_XMLSave.newContext('scott.emp'); -- create the context once..!DBMS_XMLSave.setKeyColumn(saveCtx, 'EMPNO'); -- set the key column name.end;/

Here a single delete statement of the form,

DELETE FROM scott.emp WHERE EMPNO=?


will be generated and used for all ROW elements in the document.

XSU Deleting XML Example 12: Reusing the Context Handle (PL/SQL)

In all the three cases described earlier, insert, update, and delete, the samecontext handle can be used to do more than one operation. That is, you can performmore than one insert using the same context provided all of those inserts aregoing to the same table that was specified when creating the save context. Thecontext can also be used to mix updates, deletes, and inserts.For example, the following code shows how one can use the same context andsettings to insert, delete, or update values depending on the user’s input.The example uses a PL/SQL supplied package static variable to store the context sothat the same context can be used for all the function calls.

create or replace package testDML ASsaveCtx DBMS_XMLSave.ctxType := null; -- a single static variableprocedure insert(xmlDoc in clob);procedure update(xmlDoc in clob);procedure delete(xmlDoc in clob);end;/

create or replace package body testDML ASprocedure insert(xmlDoc in clob) isbeginDBMS_XMLSave.insertXML(saveCtx, xmlDoc);end;procedure update(xmlDoc in clob) isbeginDBMS_XMLSave.updateXML(saveCtx, xmlDoc);end;procedure delete(xmlDoc in clob) isbeginDBMS_XMLSave.deleteXML(saveCtx, xmlDoc);end;beginsaveCtx := DBMS_XMLSave.newContext(’scott.emp’); -- create the contextonce..!DBMS_XMLSave.setKeyColumn(saveCtx, ’EMPNO’); -- set the key column name.end;end;/


In the earlier package, you create a context once for the whole package (thus thesession) and then reuse the same context for performing inserts, updates anddeletes.

Users of this package can now call any of the three routines to update the emp table:

testDML.insert(xmlclob);testDML.delete(xmlclob);testDML.update(xmlclob);

All of these calls would use the same context. This would improve the performanceof these operations, particularly if these operations are performed frequently.

XSU Exception Handling in PL/SQL

Here is an XSU PL/SQL exception handling example:

declarequeryCtx DBMS_XMLQuery.ctxType;result clob;errorNum NUMBER;errorMsg VARCHAR2(200);beginqueryCtx := DBMS_XMLQuery.newContext('select * from emp where df = dfdf');-- set the raise exception to true..DBMS_XMLQuery.setRaiseException(queryCtx, true);DBMS_XMLQuery.setRaiseNoRowsException(queryCtx, true);-- set propagate original exception to true to get the original exception..!DBMS_XMLQuery.propagateOriginalException(queryCtx,true);result := DBMS_XMLQuery.getXML(queryCtx);exceptionwhen others then-- get the original exceptionDBMS_XMLQuery.getExceptionContent(queryCtx,errorNum, errorMsg);dbms_output.put_line(' Exception caught ' || TO_CHAR(errorNum)|| errorMsg );end;/

6.5 DBMS_XMLQuery Package

Description of DBMS_XMLQuery

This API provides DB_to_XML type functionality.


Types of DBMS_XMLQuery

ctxType : The type of the query context handle. This is the return type ofnewContext().

Constants of DBMS_XMLQueryConstant Description

DB_ENCODING Used to signal that the DB character encoding is to be used

DEFAULT_ROWSETTAG The tag name for the element enclosing the XML generated fromthe result set (that is, for most cases the root node tag name) --ROWSET

DEFAULT_ERRORTAG The default tag to enclose raised errors -- ERROR

DEFAULT_ROWIDATTR The default name for the cardinality attribute of XML elementscorresponding to db. records. -- NUM

DEFAULT_ROWTAG The default tag name for the element corresponding to db. records.-- ROW

DEFAULT_DATE_FORMAT Default date mask. -- ’MM/dd/yyyy HH:mm:ss’

ALL_ROWS The ALL_ROWS parameter is to indicate that all rows are neededin the output.

NONE Used to specifies that the output should not contain any XMLmetadata (for example, no DTD or Schema).Used to specifies that the output should not contain any XMLmetadata (for example, no DTD or Schema).

DTD Used to specify that the generation of the DTD is desired.

SCHEMA Used to specify that the generation of the XML SCHEMA is


desired.

LOWER_CASE Use lower cased tag namesUPPER_CASE Use upper case tag names.

Functions and Procedures of DBMS_XMLQuery

Summary of Functions and Procedures of DBMS_XMLQueryFunctions/Procedures Description

newContext() Creates a query context and it returns the context handle.

closeContext() Closes/deallocates a particular query context.

setRowsetTag() Sets the tag to be used to enclose the XML dataset.

setRowTag() Sets the tag to be used to enclose the XML element corresponding to a db.

setErrorTag() Sets the tag to be used to enclose the XML error docs.

setRowIdAttrName() Sets the name of the id attribute of the row enclosing tag.

setCollIdAttrName() Sets the name of the id attribute of the collection

useTypeForCollElemTag() Tells the XSU to use the collection element’s type name

as the collection element tag name.

setTagCase() Specified the case of the generated XML tags.

setDateFormat() Sets the format of the generated dates in the XML doc.

setMaxRows() Sets the max number of rows to be converted to XML.

setSkipRows() Sets the number of rows to skip.

setStylesheetHeader() Sets the stylesheet header.

setXSLT() Registers a stylesheet to be applied to generated XML.


setXSLTParam() Sets the value of a top-level stylesheet parameter.

removeXSLTParam() Removes a particular top-level stylesheet parameter.

setBindValue() Sets a value for a particular bind name.

setMetaHeader() Sets the XML meta header.

setDataHeader() Sets the XML data header.

setEncodingTag() Sets the encoding processing instruction in the XML document.

setRaiseException() Tells the XSU to throw the raised exceptions.

setRaiseNoRowsException() Tells the XSU to throw or not to throw anOracleXMLNoRowsException in the case when for one reason or another, the XML doc generated is empty.

setSQLToXMLNameEscaping() Turns on or off escaping of XML tags in the case that the SQL object name, which is mapped to a XML identifier, is not a valid XML identifier.

propagateOriginalException() Tells the XSU that if an exception is raised, and is being thrown, the XSU should throw the very exception raised; rather then, wrapping it with an OracleXMLSQLException.

getExceptionContent() Returns the thrown exception's error code and errormessage.

getDTD() Generates the DTD.getNumRowsProcessed() Returns the number of rows processed for the query.

getVersion() Prints the version of the XSU in use.

getXML() Generates the XML document.

6.6 DBMS_XMLSave Package

Description of DBMS_XMLSave


This API provides XML_to_DB type functionality.

Types of DBMS_XMLSaveType DescriptionctxType The type of the query context handle. The type of the query context handle. This the return type of newContext().

Constants of DBMS_XMLSaveConstant DescriptionDEFAULT_ROWTAG The default tag name for the element corresponding to db. Records.-- ROWDEFAULT_DATE_FORMAT Default date mask. -- ’MM/dd/yyyy HH:mm:ss’MATCH_CASE Used to specify that when mapping XML elements to DB. entitiesthe XSU should be case sensitive.IGNORE_CASE Used to specify that when mapping XML elements to DB. entitiesthe XSU should be case insensitive.

Functions and Procedures of DBMS_XMLSave

Summary of Functions and Procedures of DBMS_XMLSavenewContext() Creates a save context, and returns the context handle.closeContext() It closes/deallocates a particular save context.setRowTag() Names the tag used in the XML doc., to enclose the XML

elements corresponding to db.setIgnoreCase() The XSU does mapping of XML elements to db.setDateFormat() Describes to the XSU the format of the dates in the XML

document.setBatchSize() Changes the batch size used during DML operations.setCommitBatch() Sets the commit batch size.

setSQLToXMLNameEscaping() This turns on or off escaping of XML tags in the case that the SQL object name, which is mapped to a XML identifier, is not a valid XML identifier.

setUpdateColumn() Adds a column to the "update column list".clearUpdateColumnList() Clears the update column list.setPreserveWhitespace() Tells the XSU whether to preserve whitespace or not.setKeyColumn() This methods adds a column to the "key column list".clearKeyColumnList() Clears the key column list.setXSLT() Registers a XSL transform to be applied to the XML to be

saved.setXSLTParam() Sets the value of a top-level stylesheet parameter.removeXSLTParam() Removes the value of a top-level stylesheet parameterinsertXML() Inserts the XML document into the table specified at the


context creation time.

updateXML() Updates the table given the XML document.deleteXML() Deletes records specified by data from the XML document,

from the table specified at the context creation time.propagateOriginalException()

Tells the XSU that if an exception is raised, and is beingthrown, the XSU should throw the very exception raised;rather then, wrapping it with anOracleXMLSQLException.

getExceptionContent() Returns the thrown exception's error code and errormessage through its arguments.

CHAPTER # 7

Case Studies

7.1 CASE STUDY 1: PurchaseOrder

PurchaseOrder is a W3C standard XML Schema taken from the web.The schema is shown in the figure 1-a.

The root (Purchase Order) of the XML Schema has been taken as the root of CCM and thus only the forward cardinalities has been taken. The reverse cardinalities could be resolved by some Query. The <choice> present in the XML schema has been taken forward to CCM generating two different notations: Disjoint and is-a.Its corresponding CCM has been shown in the figure 3-a.

The CCM has now being converted into attribute preserving its Object Oriented features as shown in the figure 3-b.The relationships has been resolved using pruning and grafting of nodes as shown in the figure 3-c.

Now attribute tree has been converted to Star Schema. The fact chosen in fact table are as follows:

1. number of Orders2. number of items3. amount

NOTE: fact chosen should be measurable in nature so that they can play a role in decision making.

The corresponding star schema is shown in the figure 7-a below:

orderdateiddate


No of ordersNo of items AmountOrderdateidTotalamountidCompanyidIndividualcustiddifferentcidSamecidItemid

Totalamountidamount

SamecidCountrynamelocaltaxes

companyidCompanynameProfileTurn over

Individualcustidname

ItemidProductnameUSpriceQuantity

differentcidCountrynameCustomdutydistributionoffice

PurchaseOrder

Order dateTotal amount

Different country

Same country

Company

Individual Customer

Item Figure 7-a. Star schema of PurchaseOrder

PL/SQL CODE:

The PL/SQL code written here automatically extracts, transform and loads data from XML data files to our star schema.

-- PL/SQL Procedure for Purchase Order Schema for data transformation

declare

qCtx DBMS_XMLQuery.ctxType; result CLOB;

begin

qCtx := DBMS_XMLQuery.newContext(' select * from orderdate'); result := DBMS_XMLSave.getXML(qCtx); InsProc(result,'PROJECT.D_ORDERDATE'); DBMS_XMLQuery.closeContext(qCtx);

qCtx := DBMS_XMLQuery.newContext(' select * from totalamount'); result := DBMS_XMLSave.getXML(qCtx); InsProc(result,'PROJECT.D_TOTALAMOUNT'); DBMS_XMLQuery.closeContext(qCtx);


qCtx := DBMS_XMLQuery.newContext(' select * from item'); result := DBMS_XMLSave.getXML(qCtx); InsProc(result,'PROJECT.D_ITEM'); DBMS_XMLQuery.closeContext(qCtx);

qCtx := DBMS_XMLQuery.newContext(' select * from company'); result := DBMS_XMLSave.getXML(qCtx); InsProc(result,'PROJECT.D_COMPANY'); DBMS_XMLQuery.closeContext(qCtx);

qCtx := DBMS_XMLQuery.newContext(' select * from individualCustomer'); result := DBMS_XMLSave.getXML(qCtx); InsProc(result,'PROJECT.D_INDIVIDUALCUSTOMER'); DBMS_XMLQuery.closeContext(qCtx);

qCtx := DBMS_XMLQuery.newContext(' select differentcountry.differentcid as differentcid,customduty, distributionoffice,shipto.countryname from differentcountry, shipto where differentcountry.differentcid=shipto.differentcid'); result := DBMS_XMLSave.getXML(qCtx); InsProc(result,'PROJECT.D_DIFFERENTCOUNTRY'); DBMS_XMLQuery.closeContext(qCtx);

qCtx := DBMS_XMLQuery.newContext(' select samecountry.samecid as samecid,localtaxes,shipto.countryname from samecountry,shipto where samecountry.samecid=shipto.samecid'); result := DBMS_XMLSave.getXML(qCtx); InsProc(result,'PROJECT.D_SAMECOUNTRY'); DBMS_XMLQuery.closeContext(qCtx);

qCtx := DBMS_XMLQuery.newContext(' select orderdateid, totalamountid, clienttypeid, itemid, shiptoid,

clienttype.companyid as companyid, clienttype.individualcustid as individualcustid,

shipto.differentcid as differentcid, shipto.samecid as samecid from purchaseorder, clienttype, shipto where

purchaseorder.clienttypeid=clienttype.clienttypeid and purchase.shiptoid=shipto.shiptoid'); result := DBMS_XMLSave.getXML(qCtx); InsProc(result,'PROJECT.D_PURCHASEORDER');


DBMS_XMLQuery.closeContext(qCtx);

end;/

-- Procedure to insert data into destination table from XML data stored in the form of -- CLOB

create or replace procedure insProc(xmlDoc IN CLOB,tableName IN VARCHAR2) is insCtx DBMS_XMLsave.ctxType; --variable to store context handle rows number; --variables that stores no. of rows updated

begin insCtx:=DBMS_XMLSave.newContext(tableName); --getting the context handle for the destination.

table

rows:= DBMS_xMLSave.insertXML(insCtx,xmlDoc); --inserting the data from CLOB into the destination

--table

DBMS_XMLSave.closeContext(insCtx); --closing the context of the tableend;/

7.2 CASE STUDY 2: RoadEvent

RoadEvent is a W3C standard XML Schema taken from the web.The schema is shown in the figure 7-b.

<?xml version="1.0" encoding="UTF-8" standalone="yes"?><xsd:schematargetNamespace="http://www.govtalk.gov.uk/LocalGovernment/RoadEvent" xmlns:xsd="http://www.w3.org/2001/XMLSchema" version="1.0" id="RoadEvent">

<!-- | | Schema For Road Works / Events version 1.0

|-->

<xsd:annotation><xsd:documentation>Schema developed by Manchester City Council to enable

Road Works / Events information to be distributed.</xsd:documentation>

</xsd:annotation>

<xsd:annotation>

<xsd:appinfo xmlns:gms="http://www.govtalk.gov.uk/CM/gms">

<Metadata xmlns="http://www.govtalk.gov.uk/CM/gms-xs">

<Audience>e-service developers</Audience>

<Contributor>Deepak Sharma, Manchester City Council (mailto:[email protected])</Contributor>

<Contributor>Deepak Sharma, (mailto:[email protected])</Contributor>

<Creator>Manchester City Council, Environment and Operations</Creator>

<Date><Created>2004-01-11</Created></Date>

<Date><Modified>2004-02-12</Modified></Date>

<Date><Modified>2004-04-28</Modified></Date>

<Description>Schema For Road Works / Events </Description>

<Format><MediaType>text/xml</MediaType>

<Syntax>http://www.w3.org/2001/XMLSchema</Syntax>

<Description>XML schema, W3C Recommendation 2001</Description>

</Format>

<Identifier>RoadEventMessage</Identifier><Language>[ISO 639-2/B] ENG</Language><Publisher>Manchester City Council, Pink Bank Lane, Manchester, M12 5QN,UK</Publisher>

<Rights>Unclassified

<Copyright>Crown Copyright 2004</Copyright>

</Rights><Subject> <Category>Transport, People</Category></Subject><Subject>

<Project>UK GovTalk</Project></Subject>

<Title>Road Events Works message</Title>

<Type>message</Type>

</Metadata>

</xsd:appinfo>

</xsd:annotation>

<xsd:import namespace="http://www.govtalk.gov.uk/people/bs7666" schemaLocation="BS7666-v1-3.xsd"/>

<xsd:import namespace="http://www.govtalk.gov.uk/people/AddressAndPersonalDetails" schemaLocation="PersonalDetailsTypes-v1-3.xsd"/><xsd:import namespace="http://www.govtalk.gov.uk/core" schemaLocation="CommonSimpleTypes-v1-3.xsd"/>



<xsd:complexType name="RoadEventStructure"><xsd:annotation><xsd:documentation>This element contains the details of a single event.</xsd:documentation>

</xsd:annotation>

<xsd:sequence><xsd:element name="Star_Time" type="EventTimeStructure"/><xsd:element name="End_Time" type="EventTimeStructure"/><xsd:element name="Publisher" type="PublisherStructure"/> <xsd:element name="Promoter" type="PromoterStructure"/><xsd:element name="Location" type="LocationStructure" minOccurs="0"

maxOccurs="999"/><xsd:element name="Restriction" type="RestrictionStructure"/>

</xsd:sequence>

<xsd:attribute name="Status" type="EventStatusType" use="required"/>

<xsd:attribute name="GenerationTimeStamp" type="xsd:dateTime" use="required"/>

<xsd:attribute name="UniqueReference" type="core:PopulatedStringType"/>

</xsd:complexType>



<xsd:complexType name="PublisherStructure">

<xsd:annotation>

<xsd:documentation>This element is used to describe the source system that published the event</xsd:documentation>

</xsd:annotation>

<xsd:sequence>

<xsd:element name="Org_Name" type="OrganisationNameType"/><xsd:element name="Org_Section_Name"

type="OrganisationNameType"/><xsd:element name="System_Name" type="xsd:String"/><xsd:element name="Contact" type="ContactStructure"

minOccurs="0" maxOccurs="20"/>

</xsd:sequence>

</xsd:complexType>



<xsd:complexType name="PromoterStructure">

<xsd:annotation>

<xsd:documentation>This element is used to detail the organisation that booked or reported the event</xsd:documentation>

</xsd:annotation>

<xsd:sequence>

<xsd:element name="Org_Name" type="OrganisationNameType"/> <xsd:element name="OrganisationId"

type="OrganisationNameType"/>

<xsd:element name="Personal_Contact" type="PersonalContactStructure" minOccurs="0" maxOccurs="20"/>

</xsd:sequence>

</xsd:complexType>



<xsd:complexType name="PersonalContactStructure">

<xsd:annotation>

<xsd:documentation>This element is used describe an individual person</xsd:documentation>

<xsd:sequence><xsd:element name="name" type="xsd:string"><xsd:element name="contactdetail" type="xsd:string"><xsd:element name="Address" type="xsd:string">

</xsd:sequence></xsd:complexType>



<xsd:complexType name="WGS8hStructure">

<xsd:annotation>

<xsd:documentation>WGS8h element is used to describe the position or location of the event using World Geodetic System. The Expansion radius should be set in metres</xsd:documentation>

</xsd:annotation>

<xsd:sequence>

<xsd:element name="Longitude" type="xsd:positivenumber"/><xsd:element name="Latitude" type="xsd:positivenumber"/><xsd:element name="Expansion_Radius" type="xsd:positivenumber"

minOccurs="0" maxOccurs="20"/>

</xsd:sequence></xsd:complexType>

<xsd:complexType name="LocationStructure">

<xsd:annotation>

<xsd:documentation>Location element is used to describe the position or location of the event</xsd:documentation> </xsd:annotation>

<xsd:choice><xsd:element name="Loc_Desc" type="xsd:String"/><xsd:element name="Location_WGS8h" type="xsd:string"/></xsd:choice>

</xsd:complexType>



<xsd:complexType name="EventTimeStructure">

<xsd:annotation><xsd:documentation>This date / time element has a mandatory attribute to

indicate if the time is estimated or not</xsd:documentation></xsd:annotation>

<xsd:sequence><xsd:element name="Date" type="xsd:date"/><xsd:element name="Time" type="xsd:time"/>

</xsd:sequence><xsd:attribute name="Estimate" type="xsd:boolean " use="required"/>

</xsd:complexType>

<xsd:complexType name="RestrictionStructure">

<xsd:annotation><xsd:documentation>This element describes information about any road

restriction</xsd:documentation></xsd:annotation> <xsd:sequence><xsd:element name="Description" type="xsd:String"/>

</xsd:sequence><xsd:attribute name="Traffic_Management_Code" type="TrafficManagementType" use="required"/><xsd:attribute name="Type" type="EventType" use="required"/>

</xsd:complexType>

<xsd:simpleType name="EventType">

<xsd:annotation>

<xsd:documentation>This type is used to code the type of road event</xsd:documentation>

</xsd:annotation><xsd:restriction base="xsd:NMTOKEN">

<xsd:enumeration value="RoadWork"/>

<xsd:enumeration value="RoadTrafficAccident"/><xsd:enumeration value="CarriagewayHazard"/><xsd:enumeration value="Flooding"/><xsd:enumeration value="PublicEvent"/>

<xsd:enumeration value="TrafficLightFaults"/><xsd:enumeration value="TemporaryTrafficOrder"/>

<xsd:enumeration value="WinterMaintenance"/><xsd:enumeration value="HighwayRoadResurfacing"/><xsd:enumeration value="HighwayRoutineMaintenance"/>

<xsd:enumeration value="HighwayMaintenance"/><xsd:enumeration value="HighwayRoadImprovements"/>

<xsd:enumeration value="HighwayTrafficCalming"/><xsd:enumeration value="HighwaySafetyIssues"/>

<xsd:enumeration value="UtilityMaintenance"/><xsd:enumeration value="UtilityServiceProvision"/><xsd:enumeration value="UtilityRepairsToServices"/><xsd:enumeration value="UtilityNewBuild"/>

</xsd:restriction></xsd:simpleType>

<xsd:simpleType name="TrafficManagementType">

<xsd:annotation><xsd:documentation>This type is used to codes the type of traffic

management used.</xsd:documentation></xsd:annotation>

<xsd:restriction base="xsd:NMTOKEN">

<xsd:enumeration value="None"/><xsd:enumeration value="SigningOnly"/><xsd:enumeration value="TrafficControlStopGoBoards"/><xsd:enumeration value="TrafficControlTwoWaySignals"/><xsd:enumeration value="TrafficControlMultiWaySignals"/>

<xsd:enumeration value="TrafficControlGiveAndTake"/><xsd:enumeration value="TrafficControlPriorityWorking"/><xsd:enumeration value="TrafficControlConvoyWorking"/><xsd:enumeration value="LaneClosure"/><xsd:enumeration value="ContraFlow"/><xsd:enumeration value="RoadClosure"/><xsd:enumeration value="FootwayClosure"/><xsd:enumeration value="AgreedScheme"/>

</xsd:restriction></xsd:simpleType>

<xsd:annotation>

<xsd:documentation>This type is used to code the current status of the Event </xsd:documentation>

</xsd:annotation>

<xsd:simpleType name="EventStatusType"><xsd:restriction base="xsd:NMTOKEN">

<xsd:enumeration value="Active"/><xsd:enumeration value="InActive"/>

<xsd:enumeration value="Cancelled"/>

</xsd:restriction>

</xsd:simpleType>



<xsd:complexType name="ContactStructure"><xsd:annotation>

<xsd:documentation>This element is used describe an individual person</xsd:documentation>

</xsd:annotation><xsd:sequence>

<xsd:element name="Name" type="xsd:string"/><xsd:element name="ContactDetails"

type="apd:CitizenContactDetailsStructure"/>

<xsd:choice>

<xsd: element name = "PrimaryContact" type="xsd.string "><xsd: element name = "SecondaryContact" type="xsd:string">

</xsd:choice></xsd:sequence>

</xsd:complexType>

<xsd:simpleType name="OrganisationNameType">

<xsd:restriction base="xsd:string">

<xsd:minLength value="1"/> <xsd:maxLength value="255"/>

</xsd:restriction>

</xsd:simpleType></xsd:schema>

The root (RoadEvent) of the XML Schema has been taken as the root of CCM and thus only the forward cardinalities has been taken. The reverse cardinalities could be resolved by some Query. The <choice> present in the XML schema has been taken forward to CCM generating two different notations: Disjoint and is-a.Its corresponding CCM has been shown in the figure 7-c.

The CCM has now being converted into attribute preserving its Object Oriented features. The relationships has been resolved using pruning and grafting of nodes as shown in the figure 7-d.

Now attribute tree has been converted to Star Schema. The fact chosen in fact table are as follows:

1. NO_OF_EVENTS 2. NO_OF_PROMOTER

3. NO_OF_PUBLISHER

CONCLUSION

In this project we have proposed a semi-automated approach of arriving at a star schema from XML schema using CCM. We have considered disjoint association and inheritance in the XML schema. The approach adapted by us converts an XML schema to a CCM schema which is further converted to an attribute tree. We have identified different associations in the XML schema and carried it over to an attribute tree. The attribute tree has all

these features. This tree finally leads to a data warehouse design.

Scope of improvement:


Location description

Future work can be done on the project to enhance its quality and

properties:

1. Object-oriented :More object-oriented features can be added like generalization, specialization, derivability, polymorphism etc.

2. Reverse cardinality : We can have a mechanism to find the reverse cardinalities in CCM.

3. Galaxy : The concept of galaxy can be incorporated in the star schema.

BIBLIOGRAPHY

[1] W3C. Extensible Markup Language (XML) 1.0 http://www.w3c.org/TR/REC-XML[2] W. A. Giovinazzo. “Object-Oriented Data Warehouse Design”. Prentice Hall, 2000.[3] M. Golfarelli, D. Maio, S. Rizzi: “Conceptual Design of Data warehouse from E/R Schemes”, Proc. HICSS-31, Vol. VII, Kona, Hawaii, pp. 334-343, 1998.[4] M. Golfarelli, D. Maio, S. Rizzi. “The Dimensional Fact Model: a conceptual model for data warehouses”. Int. Tour of cooperative Inf. Systems 7, 2 & 3 (1998)[5] S. Abiteboul, P. Buneman and D. Suciu. “Data on the web”. Morgan Kaufmann, California, 2000


http://www.w3c.org/TR/REC-XML

[6] B. Nguyen, S. Abiteboul, G. Cobena and M. Preda. “Monitoring XML data on the web”. In ACM Sigmod, 2001[7] M. Golfarelli, S. Rizzi and B. Vrdoljak, “Data warehouse design from XML sources”, Proc. DOLAP’01, Atlanta, pp. 40-47, 2001.[8] M. Golfarelli, S. Rizzi and B. Vrdoljak, “Integrating XML sources into a Data warehouse Environment”.[9] B. Vrdoljak, M. Banek and S. Rizzi, “Designing web warehouse from XML schemas”.[10] W. H. Inmon. “Building the Data warehouse”. John Wiley & Sons, Second Edition, 1996[11] R.D.S. Mello and C.A. Heuser.” A Bottom-up Approach for Integration of XML Sources”[12] II-Yeol Song, W. Rowen, C. Medsker and E. Ewen. “An Analysis of Many-to-Many Relationships Between Fact and Dimension Tables in Dimensional Modeling”. In Proceedings of the 3rd International Workshop on Design and Management of Data Warehouses (DMDW’2001), CEUR-WS (http://www.ceur-ws.org), 2001[13] R.Kimball “The data warehouse toolkit” John Wiley & Sons, 1996.[14] M. Golfarelli and S. Rizzi. “ A Methodological Framework for Data Warehouse Design”. In proceedings of the 1st International Workshop on Data Warehousing and OLAP (DOLAP 98), pages 3-9. ACM, 1998


http://www.ceur-ws.org/


Location

Time

Promoter

Date1

Location_wgs8h

Lattitude

Expansion_radius

Estimate

Time

Date1Org_idOrg_Name

Time