systematic approach for information systems reengineering shi-ming huang
Post on 21-Dec-2015
217 views
TRANSCRIPT
Systematic Approach for Systematic Approach for Information Systems Information Systems
ReengineeringReengineering
Shi-Ming Huang
ReferencesReferences
T. Cheung, J. Fong, and B. Siu, “Database Reengineering and Interoperability”, Plenum, 1995, ISBN 0-306-45288-X
R.S. Arnold, “Software Reengineering”, IEEE Press 1993, ISBN 0-8186-3272-0
Fong and S. Huang, “Information Systems Reengineering”, Springer Verlag, 1997, ISBN 981-3083-15-8
DataBase Reengineering
DatabaseRe-engineering
ProgramConversion
SchemaTranslation
DataConversion
DirectTranslation
IndirectTranslation
Rewrite
Bridge Program
Emulation
Decompilation
Co-existence
PhysicalConversion
LogicalConversion
Bridge Program
Database Reengineering- Database Reengineering- Schema TranslationSchema Translation Direct translation –
One can directly translate a nonrelational schema to relational schema. However, such translation may cause loss of information because of its primitive method that cannot recover all the original nonrelational schema semantics. Certain advanced semantics are lost once they are mapped from a conceptual schema such as an entity-relationship model to a logical schema such as hierarchical or network schema. Thus, users input are needed to recover the lost semantics.
Database Reengineering- Database Reengineering- Schema TranslationSchema Translation Indirect translation –
Indirect translations can be accomplished by mapping logical hierarchical or network schema into a conceptual entity-relationship model schema in a reverse engineering. The translated conceptual schema must have all the original logical schema semantics. Users must provide information of advanced semantics in the logical schema. Then the conceptual schema can be automatically mapped to another logical relational schema. Similarly, in order to translate a relational schema to an object-oriented schema, we can map the entity-relationship model, a conceptual model for relational schema, to OMT, a conceptual model for object-oriented model in a peer-to-peer translation. Then the OMT model can be mapped automatically to an object-oriented model(database).
Database Reengineering-Data Database Reengineering-Data ConversionConversion Physical conversion –
The physical data of the nonrelational database is directly converted to the physical data of the relational database. This can be done in interpreter approach or generator approach. The former is a direct translation from a data item to another. The latter is to provide a generator that generates program to accomplish the physical data conversion.
Logical conversion – The logical approach is to unload the nonrelational database to sequential file in the logical sequence similar to the relational model. The sequential files can then be reloaded back to a target relational database. This approach concerns with the logical sequence of the data rather the physical attributes of each data item.
Bridge Program – Each nonrelational file requires a bridge program to convert it to a relational file.
Database Reengineering-Database Reengineering-Database Program Translation Database Program Translation Rewrite Rewrite ––
One can translate the nonrelational schema into relational, map a nonrelational database into a relational database, and rewrite all the application programs to run on the relational database.
Bridge programBridge program – – One can map the nonrelational schema into a relational schema, and then add relational interface software layer on the top of nonrelational DBMS. The relational interface layer translate the relational program DML into nonrelational program DML statements to access an existing nonrelational database. The relational interface is transparent to the users as a relation DBMS, but actually, the physical database is still nonrelational.
Database Reengineering-Database Reengineering-Database Program Translation Database Program Translation EmulationEmulation – –
It is the technique of providing software or firmware in the target system which map source program commands into functionally equivalent commands in the target system. Each nonrelational DML is substituted by relational DML statements to access the converted relational database.
DecompilationDecompilation – – One can translate schema from nonrelational to relational, convert data from nonrelational to relational, and then convert application programs from nonrelational to relational by decompilation. Decompilation is the process of transforming a program written in a low level language into an equivalent but more abstract version and the implementation of the new programs to meet the new environmental, database files and DBMS requirements.
Data ModelData Model A data model is a general structure for data A data model is a general structure for data
organization.organization. It enables us to capture, partially, the meaning of It enables us to capture, partially, the meaning of
data as related to the complete meaning of the data as related to the complete meaning of the world. world.
It is the primary tool for designing a database. It is the primary tool for designing a database. The basic components of such a data model include:The basic components of such a data model include:
1.1. a set of rules (i.e. schema description) to describe a set of rules (i.e. schema description) to describe the structure and meaning of data in a database the structure and meaning of data in a database
2.2. and the atomic operations (i.e. data language) that and the atomic operations (i.e. data language) that may be performed on the data in that database.may be performed on the data in that database.
Data ModelData Model Schema is a term used to represent the name of a Schema is a term used to represent the name of a
class together with the properties of that class. class together with the properties of that class.
The schema description includes two parts: The schema description includes two parts: 1.1. one is the structure specification part which one is the structure specification part which
represents objects, attributes, and the relationship represents objects, attributes, and the relationship between objects; between objects;
2.2. the other is rule specification for the inferences and the other is rule specification for the inferences and constraints.constraints.
Data Model: Hierarchy Data Data Model: Hierarchy Data ModelModel There is a set of relationships connecting all record types in onThere is a set of relationships connecting all record types in on
e data structure diagram.e data structure diagram. The relationships expressed in the data structure diagram form The relationships expressed in the data structure diagram form
a tree with all edges pointing towards the leaves.a tree with all edges pointing towards the leaves. Each relationship is 1:n and it is total. That is, if Ri is the parent Each relationship is 1:n and it is total. That is, if Ri is the parent
of Rj in the hierarchy then for every record occurrence of Rj theof Rj in the hierarchy then for every record occurrence of Rj there is exactly one Ri record connected to it.re is exactly one Ri record connected to it.
The linkage between record types is in automatic fixed set meThe linkage between record types is in automatic fixed set membership. mbership.
The database access path of hierarchical database follows the The database access path of hierarchical database follows the hierarchical path from parent to child record. The default path ihierarchical path from parent to child record. The default path is a hierarchical sequence of top-to-bottom, left-to-right and frons a hierarchical sequence of top-to-bottom, left-to-right and front-to-back.t-to-back.
Data Model: Hierarchy Data Data Model: Hierarchy Data ModelModel
9LoanContracts 1
14
10LoanDrawdown 2
11LoanInterest 3
7LoanRepayment 6
15LoanBalence 8
12FixedRate 4
13IndexRate 5
Hierarchical Data Manipulation Hierarchical Data Manipulation LanguageLanguage
Hierarchical data manipulation language(HDML) is a recoHierarchical data manipulation language(HDML) is a record-at-a-time language for manipulating hierarchical databrd-at-a-time language for manipulating hierarchical databases. ases.
The commands of a HDML must be embedded in a generThe commands of a HDML must be embedded in a general-purpose programming language, called host language.al-purpose programming language, called host language.
Hierarchical Data Manipulation Hierarchical Data Manipulation LanguageLanguage The followings are the syntax of a hierarchical DML of IMS (InformThe followings are the syntax of a hierarchical DML of IMS (Inform
ation Management System, a hierarchical DBMS). There are four ation Management System, a hierarchical DBMS). There are four parameters in IMS DML. They are:parameters in IMS DML. They are:
Function Code, which defines the database access function; Function Code, which defines the database access function; Program Control Block, which defines the external subschema access Program Control Block, which defines the external subschema access
path; path; I-O-Area, which is a target segment address; and I-O-Area, which is a target segment address; and Segment Search Argument, which defines the target segment Segment Search Argument, which defines the target segment
selection criteria as follows:selection criteria as follows:
CALL BLTDLI” USING FUNCTION-CODECALL BLTDLI” USING FUNCTION-CODE PCB-MASKPCB-MASK I-O-AREAI-O-AREA SSA-1 …SSA-1 …
SSA-n.SSA-n.
Hierarchical Data Manipulation Hierarchical Data Manipulation LanguageLanguage
Retrieval Command: Modification Commands:
1. Get Unique (GU) 2. Get Next (GN) 3. Get Next WITHIN PARE
NT(GNP)
1. INSERT(ISRT)2. REPLACE(REPL) 3. DELETE (DELT)
Example:CALL BLTDLI” USING GU PCB-MASK I-O-AREA LOAN_CONTRACT# = 277988. BALANCE_DATE = 19960722. BALANCE_AMOUNT = 1000000.CALL BLTDLI” USING ISRT PCB-MASK LOAN_BALANCE.
NETWORK (Codasyl) MODNETWORK (Codasyl) MODEL EL SYSTEM
Course StudentDepartment
Course#coure-location
student#s-name
Prerequisite
inst-nameinst-addr
Prerequisite#prerequisite-title
grade
section#
set set set
set set
set
set
Section
setInstructor
dept#dept-name
Grade
set
NETWORK (Codasyl) MODNETWORK (Codasyl) MODEL EL Date ItemDate Item – –
It is an occurrence of the smallest unit of named It is an occurrence of the smallest unit of named data. It is represented in the database by a value. data. It is represented in the database by a value. A data item may be used to build other more A data item may be used to build other more complicated data constructs. This corresponds to complicated data constructs. This corresponds to an attribute in the ER data model.an attribute in the ER data model.
Data AggregationData Aggregation – –
It is an occurrence of a named collection of data It is an occurrence of a named collection of data items within a record. items within a record.
NETWORK (Codasyl) MODNETWORK (Codasyl) MODEL EL RecordRecord - - It is an occurrence of a named collection of data It is an occurrence of a named collection of data
items or data aggregates. This collection is in conformity with items or data aggregates. This collection is in conformity with the record type definition specified in the database schema. the record type definition specified in the database schema.
SetSet - - It is an occurrence of a named collection of records. A It is an occurrence of a named collection of records. A set occurrence is in conformity with the set type definition set occurrence is in conformity with the set type definition specified in the database schema. Each set type consists of specified in the database schema. Each set type consists of one owner record type and at least one member record type.one owner record type and at least one member record type.
AreaArea - - The notion of an area is used to identify the partition of The notion of an area is used to identify the partition of record occurrences. An area is a named collection of records record occurrences. An area is a named collection of records which need not preserve owner-member relationships. An area which need not preserve owner-member relationships. An area may contain occurrences of one or more record types and a may contain occurrences of one or more record types and a record type may have occurrences in more than one area.record type may have occurrences in more than one area.
Relational Model A Publishing Company Relational Database Schema:
au_id (FK)title_id (FK)
au_ordroyaltyper
titleauthor
title_id
titletypepub_id(FK)priceadvanceroyaltyytd_salesnotespubdate
titles
au_id
au_lnameau_fnamephoneaddresscitystatezipcntract
authors
pub_id
pub_namecitystatecountry
publishers
pub_id(FK)
logopr_info
pub_info
emp_id
fnameminitlnamejob_id(FK)job_lvlpub_id(FK)hire_date
employee
job_id
job_descmin_lvlmax_lvl
jobs
stor_id
stor_namestor_addresscitystatezip
stores
stor_id(FK)ord_num
ord_dateqtypaytermstitle_id(FK)
sales
discounttypestor_id(FK)lowqtyhighqtydiscount
discounts
title_id(FK)lorangehirangeroyalty
roysched
pub_id
pub_id
title_id title_id
pub_id
pub_id
job_id job_id
stor_id job_id
job_id
job_id
title_idtitle_id
title_id title_id
au_id
au_id
pub_id
pub_id
Primary key
BForeign key
(B refer to A)A
Relational Model Relational model is a logical schema in the form of tables (reRelational model is a logical schema in the form of tables (re
lations) corresponding to the representation of an entity typlations) corresponding to the representation of an entity type. e.
A column(attribute) of the table represents the extension of A column(attribute) of the table represents the extension of an attribute in the entity. an attribute in the entity.
A row(tuple) of the table represents an instance of the entity. A row(tuple) of the table represents an instance of the entity. Such table is commonly called a record type and consists of Such table is commonly called a record type and consists of
a primary key as an attribute of non-null value that can uniqua primary key as an attribute of non-null value that can uniquely identify a tuple.ely identify a tuple.
The parent child relationship of relations are represented in tThe parent child relationship of relations are represented in the foreign key residing in the child relation referencing the phe foreign key residing in the child relation referencing the primary key of parent relation.rimary key of parent relation.
OBJECT-ORIENTED ModelOBJECT-ORIENTED Model
Dept# Dept-name hire
Inst-name Inst-addr
..... ......
..... ......
Department
OID Inst-name Inst-addr hired-by
xxx John Doe 1 Main St, HK
Class Instructor
OID Dep# Dept-name hire
yyy D01 Marketing
Class Department
OID
zzz
Class defining object
OIDs of Instructor
OBJECT-ORIENTED ModelOBJECT-ORIENTED Model an object is an instance value of a class. A collection of similar objects an object is an instance value of a class. A collection of similar objects
forms a class. A class has attributes and methods. The attributes of a cforms a class. A class has attributes and methods. The attributes of a class describe its properties. The methods of a class describe its operatlass describe its properties. The methods of a class describe its operations.ions.
a class must support encapsulation (i.e. hiding operations from the usa class must support encapsulation (i.e. hiding operations from the uses) such that object = data + program es) such that object = data + program
data = values of attributes program = methods that operates on the stdata = values of attributes program = methods that operates on the stateate
object attributes can be either simple or complex. The value of a complobject attributes can be either simple or complex. The value of a complex attribute is a reference to the instance of another class. In other worex attribute is a reference to the instance of another class. In other words, an object can be a nested object such that the value of an object is ds, an object can be a nested object such that the value of an object is another object.another object.
Object attributes can be single-valued or mutli-valued.Object attributes can be single-valued or mutli-valued. Objects are uniquely identified by object identifier (OID) that are assignObjects are uniquely identified by object identifier (OID) that are assign
ed by the system.ed by the system.
Direct translation from a Direct translation from a Network Model to a Network Model to a Relational ModelRelational Model Step 1 Derive relationsStep 1 Derive relations
Map each Network record type to a relation in a one-to-one mMap each Network record type to a relation in a one-to-one manner.anner.
Step 2 Derive relation keysStep 2 Derive relation keys
Map each record key of a Network schema to a primary key in Map each record key of a Network schema to a primary key in a Relational table. However, if the existing Network record kea Relational table. However, if the existing Network record key is not unique, then it needs to concatenate with its owner rey is not unique, then it needs to concatenate with its owner record key in order to be mapped as a primary key. The owner rcord key in order to be mapped as a primary key. The owner record key is also mapped as a foreign key in the Relational taecord key is also mapped as a foreign key in the Relational table to link between the parent and child records. If the set meble to link between the parent and child records. If the set membership in the logical Network schema is manual, then its rembership in the logical Network schema is manual, then its record key of member record will be mapped as a candidate kecord key of member record will be mapped as a candidate key in the relational table to to link between the parent and child y in the relational table to to link between the parent and child records. For instance, Figure 3-1 is the network schema for a records. For instance, Figure 3-1 is the network schema for a US President.US President.
Direct translation from a Direct translation from a Network Model to a Network Model to a Relational ModelRelational Model
SYSTEM
sys
set
Plname , pfname , party, collg
Eyear ,winvotes
ADM# ,iny,inm,ind
sys
CNGR# ,HD,HR,SD,SR
sys
set
set
SNAME ,CAP,yad
sys
set
set
PRESIDENT (Plname, Pfname, Party, Collg, *Sname)ADMINISTRATION (Adm#, Iny, Inm, Ind, *Plname, *Pfname)STATE (Sname, Cap, Pln, Pfn, Adm#, Yad)ELECTION (Eyear, Winvotes, *Plname, *Pfname)LINK (*Plname, *Pfname, Cngr#)CONGRESS (Cngr#, Hd, Hr, Sd, Sr)
Direct translation from a Direct translation from a hierarchical model to a hierarchical model to a relational modelrelational model Step 1 Step 1 Derive relations:Derive relations:
Map each record type to a relation.Map each record type to a relation. Step 2 Derive relation keysStep 2 Derive relation keys
The record key of a hierarchical schema is mapped as a The record key of a hierarchical schema is mapped as a primary key of a relation. However, if the record type of primary key of a relation. However, if the record type of the hierarchical schema is a child record, then the primthe hierarchical schema is a child record, then the primary key is derived by concatenating with its parent recorary key is derived by concatenating with its parent record key. The parent record key is also mapped as a foreigd key. The parent record key is also mapped as a foreign key in the child relation (Quizon, 1990).n key in the child relation (Quizon, 1990).
Direct translation from a Direct translation from a hierarchical model to a hierarchical model to a relational modelrelational model
GAA
Hierarchcial schema
GAB GAC
acct#name
meter#billmonet_charge
=
Mapped relational schema
Relations:GAA ( acct#,name )GAB ( acct#,meter# )GAC ( acct#,billmo,net_charge)
Indirect translation from a Indirect translation from a network model to a relational network model to a relational modelmodel
Hierarchcialor network
schema
ConceptualERR Model
Relationalschema
ReverseEngineeringfrom logicalmodel toconceptualmodel
ForwardEngineeringfromconceptualmodel tological model
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 1 Derive implied relationships: The explicit semantic implies a 1:n relationship if there is one duplicate key in one record type, or 1:1 if there is a duplicate key found in the record on both sides of the relationships. User input is sought to confirm the existence of such a semantic.
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model
CUSTOMER LOAN
Non-relational record types with one duplicate key
Customer#(record key)Loan#(Duplicate key)
Loan#(record key)
Implied relationshipCustomer Loan N : 1
CUSTOMER LOAN
Non-relational record types with two duplicate keys
Customer#(record key)Loan#(Duplicate key)
Loan#(record key)Customer#(Duplicate key)
Implied relationshipCustomer Loan
1 : 1
Step 1 Derive implied relationships
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 2 Derive multiple (alternative) relationships
In a network schema, a set of circuit loopy record types may carry different navigational semantics. It is thus up to user to confirm the original database designer's idea on the function of alternative path. If the user confirms the existence of a navigational semantic. then the record types and Sets in the alternative path are mapped to different Network subschema (one subschema for each path) before translating to the Relational schema.
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 2 Derive multiple (alternative) relationships
SYSTEM
CITIES
set
ITEMS
STORES
set
set
setstorestore-address
item qty
citycity-headquarter
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 3 Derive unary relationships.
Record Employee
Dummy Record
set set
Network Schema
Entity Employee manages
n
1
Corresponding EER model
1
1
1
n
Figure Map unary 1:n relationship from network to EER model
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 4 Derive binary relationships
Figure Map 1:n and m:n relationship from network to EER model
RECORDEMPLOYEE
RECORDDEPARTMENT
set
RECORD QTY
Network Schema
RECORDSUPPLIER
set
RECORDPARTS
ENTITYDEPARTMENT
ENTITYEMPLOYEE
HAS 1 N
corresponding EER model
ENTITYSUPPLIER
ENTITYPARTS
SUPPLYQTY N N
corresponding EER model
set
Network Schema
1
N
1 1
N N
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 5 Derive entities of n-ary relationships
Skill-used
setset
SkillProjectEmployee
set
Project Skill
Employee
Text-book-used
m
n
n
m
mn
:
:
:
Network schema corresponding EER model
Figure Map n-ary relationship to EER model
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 6 Derive aggregation, generalization and categorization
Figure Map set of relationships to aggregation in EER model
RECORDSECTION
RECORDCLASS
set
RECORDLECTURER
set
Network Schema
N N
RECORDSTUDENT
set
ENTITYCLASS SECTION ENTITY
LECTURER
ATTENDED BY
ENTITYSTUDENT
N M
N
1
Translated EER model
1 1
1
N
Map is a relationship to overlap Map is a relationship to overlap generalizationgeneralization
Network schema
Person
set set
Employee Alumnus Student
set
EMPLOYEE
Alumnus
o
Employee
corresponding EER model
Employee-flagAlumnus-falgStudent-flag
Student
Map is a relationships to categorization Map is a relationships to categorization in EER modelin EER model
Owner
setset
CompanyPersonBank
set
Network schema
Owner
Person
u
Company
corresponding EER model
Bank
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 7 Derive entity keys and other constraints.
Customer
set
Loan
Collateral
set
Customer
Loan
Collateral
Customer#
Loan#
Collateral#
Record identifier(Customer#)
Record identifier(Loan#)
Record identifier(Collateral#)
Figure Map network schema with fully internally identifier to relational
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 7 Derive entity keys and other constraints.
Figure Map network schema with partially internally identifier to relational
Customer
set
Loan
Collateral
set
Customer
Loan
Collateral
Customer#
Loan#
Collateral#
Record identifier(Customer#)
Record identifier(Customer#, Loan#)
Record identifier(Customer#, Loan#, Collateral#)
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 7 Derive entity keys and other constraints.
Figure Map network schema with internally unidentified to relational
Customer
set
Loan
Collateral
set
Customer
Loan
Collateral
Customer#
Loan#
Collateral#
Record identifier(Customer#)
Record identifier(Customer#, Loan#)
Record identifier(Customer#, Loan#, Sequence#)
Figure network schema Figure network schema dependency relationship dependency relationship translationtranslation
MANUAL-OPTIONAL/MANUAL-FIXED/MANUAL-MANDATORY/AUTOMATIC-OPTIONAL
RecordA
SET AB
RecordB
a
b
Network Schema corresponding EER model
AUTOMATIC-FIXEDAUTOMATIC-MANDATORY
RecordA
SET AB
RecordB
a
b
FD: B.b -> A.a
ID: B.a A.aENTITY
A
ENTITYB
a
ba
R
corresponding EER model
ENTITYA
ENTITYB
a
ba
R
Reverse engineering from Reverse engineering from relational model to relational model to conceptual EER modelconceptual EER modelStep 1. Define each relation, key and field • Primary relation. These relations describe entities.• Primary relation - Type 1 (PR1). This is a relation whose primary
key does not contain a key of another relation.• Primary relation - Type 2 (PR2). This is a relation whose primary
key does contain a key of another relation.• Secondary relation. This is a relation whose primary key is full or
partially formed by concatenation of primary keys of other relations.
Reverse engineering from Reverse engineering from relational model to relational model to conceptual EER modelconceptual EER modelStep 1. Define each relation, key and field
• Secondary relation - Type 1 (SR1). If the key of the secondary relation is formed fully by concatenation of primary keys of primary relations, it is of Type 1 or SR1
• Secondary relation - Type 2 (SR2). Secondary relations that are not of Type 1
• Key attribute - Primary (KAP). This is an attribute in the primary key of a secondary relation that is also a key of some primary relation.
• Key attribute - General (KAG). These are all the other primary key attributes in a secondary relation that are not of the KAP type.
• Foreign key attribute (FKA). This is a non-primary key attribute of a primary relation that is a foreign key.
• Nonkey attribute (NKA). The rest of the non-primary-key attributes.
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model Step 2. Map each PR1 into entity
Figure Map primary relations to entities
Department Prerequisite
Dept#Dept_name
Student Course
Pre#prer_title
Student#Student_name
Course#Course_Location
Step 3. Map each PR2 into weak entity.
Figure Map PR2 into EER model
Department Instructor
Dept#Dept_name
Dept#Inst_nameInst_addr
hire
Reverse engineering from Reverse engineering from network schema to network schema to conceptual EER modelconceptual EER model
Student grade
Student#Student_name
Section#
Section
Step 4. Map SR1 into binary/n-ary relati
onship.
Figure Map SR1 into EER model
Step 5. Map SR2 into binary/n-ary relationship
Figure Map SR1 into EER model
Section
teach
Instructor
Dept#Inst_NameCourse#Section#
Dept#Inst _nameInst_addr
Course
has
Course#Course_Location