database planning & design

28
1 Database Planning & Design Brendan Tierney Do we need a Data Model ? Why ? What about un-structured data ? What about data stored as JSON ? 2

Upload: others

Post on 19-Jan-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

1

Database Planning & Design

Brendan Tierney

• Do we need a Data Model ?

• Why ?

• What about un-structured data ?

• What about data stored as JSON ?

2

2

Homework - 1§ SMS Text Messaging Application on your phone

§ What are the Data requirements ?

§ What data do we need to exist before we use the application ?

§ What data will be captured ?

§ What are the data rules ?

Homework - 2

§ Purchasing a train ticket

§ What data do we need to exist before we use the application ?

§ What data will be captured ?

§ What are the data rules ?

3

Mixing (Relational) Tables and JSON ObjectsCREATE TABLE json_documents

( id RAW(16) NOT NULL,

data CLOB,

CONSTRAINT json_documents_pk PRIMARY KEY (id),

CONSTRAINT json_documents_json CHECK (data IS JSON (STRICT)) );

INSERT INTO json_documents (id, data)

VALUES (SYS_GUID(),

'{ "FirstName" : "John",

"LastName" : "Doe",

"Job" : "Clerk", "Address" : { "Street" : "99 My Street",

"City" : "My City",

"Country" : "UK",

"Postcode" : "A12 34B" },

"ContactDetails" : { "Email" : "[email protected]",

"Phone" : "44 123 123456", "Twitter" : "@johndoe" },

"DateOfBirth" : "01-JAN-1980",

"Active" : true }');

6https://oracle-base.com/articles/12c/json-support-in-oracle-database-12cr1

4

Mixing (Relational) Tables and JSON ObjectsSELECT a.data.FirstName,

a.data.LastName, a.data.Address.Postcode AS Postcode,

a.data.ContactDetails.Email AS Email FROM json_documents a

ORDER BY a.data.FirstName, a.data.LastName;

FIRSTNAME LASTNAME POSTCODE EMAIL

--------------- --------------- ---------- -------------------------Jayne Doe A12 34B [email protected]

7https://oracle-base.com/articles/12c/json-support-in-oracle-database-12cr1

Check out Relational/JSON resources

• Tim Hall – www.oracle-base.com– https://oracle-base.com/articles/12c/json-support-in-oracle-database-12cr1– Check out his other articles on using JSON in Oracle

• Online tutorials– https://livesql.oracle.com– JSON tutorial

• You will need to create a login

8

5

Need to design for

§ Capturing & storing the Data

§ How the data will be used

§ Need to define appropriate structures

§ Need to define & embed all rules, syntax & semantics in DB

§ Ensure all users are forced to use all of these.

§ Ensure the DB is optimised to use all of these

§ Minimizes work that Developers have to do

Database Application Lifecycle§ Database planning

§ System definition

§ Requirements collection and analysis

§ Database design

§ DBMS selection (optional)

§ Application design§ Prototyping (optional)

§ Implementation

§ Data conversion and loading

§ Testing

§ Operational maintenance

10

What are the similarities to the Software Development Life Cycle ?

6

Database PlanningManagement activities that allow stages of database application lifecycle to be realized as efficiently and effectively as possible.

§ Must be integrated with overall IS strategy of the organization.

§ Database planning should also include development of standards that govern:§ how data will be collected, § how the format should be specified, § what necessary documentation will be needed,§ how design and implementation should proceed.

§ Data Management§ Master Data Management§ Data Governance

11

These topics are increasing in importance in

organisations

System DefinitionDescribes scope and boundaries of database application and the major user views.

§ User view defines what is required of a database application from perspective of:§ a particular job role (such as Manager or Supervisor) or § enterprise application area (such as marketing, personnel, or stock control).

§ Database application may have one or more user views.

§ Identifying user views helps ensure that no major users of the database are forgotten when developing requirements for new application.

§ User views also help in development of complex database application allowing requirements to be broken down into manageable pieces.

12

7

Representation of a Database Application with Multiple User Views

13

Requirements Collection and AnalysisProcess of collecting and analyzing information about the part of organization to be supported by the database application, and using this information to identify users’ requirements of new system.

§ Information is gathered for each major user view including:§ a description of data used or generated;§ details of how data is to be used/generated;§ any additional requirements for new database application.

§ Information is analyzed to identify requirements to be included in new database application.

§ Another important activity is deciding how to manage database application with multiple user views.

§ Three main approaches:§ centralized approach;§ view integration approach;§ combination of both approaches.

14

8

Requirements Collection and Analysis§ Centralized approach

§ Requirements for each user view are merged into a single set of requirements. § A global data model is created based on the merged requirements (which represents all user views).

15

Requirements Collection and Analysis§ View integration approach

§ Requirements for each user view are used to build a separate data model.

§ Data model representing single user view is called a local data model, composed of diagrams and documentation describing requirements of a particular user view of database.

§ Local data models are then merged to produce a global data model, which represents all user views for the database.

16

9

View Integration Approach to Managing Multiple User Views

17

Database DesignProcess of creating a design for a database that will support the enterprise’s operations and objectives.

§ Major aims:§ Represent data and relationships between data required by all major application areas and user groups.§ Provide data model that supports any transactions required on the data.§ Specify a minimal design that is appropriately structured to achieve stated performance requirements for the

system (such as response times).

§ Approaches include:§ Top-down§ Bottom-up§ Inside-out§ Mixed

18

10

Database Design§ Main purposes of data modeling include:

§ to assist in understanding the meaning (semantics) of the data;§ to facilitate communication about the information requirements.

§ Building data model requires answering questions about entities, relationships, and attributes.

§ A data model ensures we understand:- each user’s perspective of the data;- nature of the data itself, independent of its physical representations;- use of data across user views.

19

Criteria to Produce an Optimal Data Model

20

11

Database Design§ Three phases of database design:

§ Conceptual database design§ Logical database design§ Physical database design.

21

Conceptual Database DesignProcess of constructing a model of the information used in an enterprise, independent of all physical considerations.

§ Data model is built using the information in users’ requirements specification.

§ Source of information for logical design phase.

22

12

Logical Database DesignProcess of constructing a model of the information used in an enterprise based on a specific data model (e.g. relational), but independent of a particular DBMS and other physical considerations.

§ Conceptual data model is refined and mapped on to a logical data model.

§ Major outputs are§ ER Diagram§ Validations (Domains)§ Constraints

§ Rules§ Assumptions§ Limitations

§ We will explore these in more detail when we cover Database Design

23

All feeds into the Application Design

Physical Database DesignProcess of producing a description of the database implementation on secondary storage.

§ Describes storage structures and access methods used to achieve efficient access to data.

§ Tailored to a specific DBMS system.§ Specific DBMS ways of implementing things§ Location of the Data§ Indexes§ Buffer sizes§ DBA will do a lot of tuning here

§ How you are going to use and implement the various Database (specific) features and (paid for) Database Options

24

13

Three-Level ANSI-SPARC Architecture and Phases of Database Design

25

Conceptual SchemaConceptual Level

Physical schemaInternal Level

External LevelExternal View A

External View B

External View N

external/conceptual mapping

conceptual/internal mapping

End Users

Stored Database

DBMS SelectionSelection of an appropriate DBMS to support the database application.

§ Undertaken at any time prior to logical design provided sufficient information is available regarding system requirements.

§ Main steps to selecting a DBMS:§ define Terms of Reference of study;§ shortlist two or three products;§ evaluate products;§ recommend selection and produce report.

26

14

Application DesignDesign of user interface and application programs that use and process the database.

§ Database and application design are parallel activities.

§ Includes two important activities:§ transaction design;

§ ER design – Validations (Domains), Constraints & Rules§ user interface design.

27

Application Design - TransactionsAn action, or series of actions, carried out by a single user or application program, which accesses or changes content of the database.

§ Should define and document the high-level characteristics of the transactions required.

§ Important characteristics of transactions:§ data to be used by the transaction;§ functional characteristics of the transaction;§ output of the transaction;§ importance to the users;§ expected rate of usage.

§ Three main types of transactions: § Insert / New§ Retrieval § Mixed – Retrieval + Update

§ Need to decide what should and should not be updated

28

15

PrototypingBuilding working model of a database application.

§ Purpose§ to identify features of a system that work well, or are inadequate;§ to suggest improvements or even new features;§ to clarify the users’ requirements;§ to evaluate feasibility of a particular system design.

29

Implementation§ But what about

§ Data Conversion

§ Testing the DB for Load & Volume

30

16

Data Conversion and LoadingTransferring any existing data into new database and converting any existing applications to run on new database.

§ Only required when new database system is replacing an old system. § DBMS normally has utility that loads existing files into new database.

§ May be possible to convert and use application programs from old system for use by new system.

31

TestingProcess of executing application programs with intent of finding errors.

§ Use carefully planned test strategies and realistic data. § Testing cannot show absence of faults; it can show only that software faults are present.§ Demonstrates that database and application programs appear to be working according to

requirements.

§ Load & Volume testing§ Query Retrieval Timings § Backup & Recovery§ Security

32

17

Implementation§ Physical realisation of the database and application designs.

§ Use DDL to create database schemas and empty database files.§ Create the conceptual schema

§ Use DDL to create any specified user views.§ Create the external schema

§ Create the Physical/Storage schema § Based on the design considerations

§ Use 3GL or 4GL to create the application programs. This will include the database transactions implemented using the DML, possibly embedded in a host programming language.§ Need to tune/optimise the code that runs in the database§ Add additional physical/storage structures

33

Operational MaintenanceProcess of monitoring and maintaining system following installation.

§ Monitoring performance of system. § if performance falls, may require tuning or reorganization of the database.

§ Maintaining and upgrading database application (when required). § Incorporating new requirements into database application.

§ Load & Volume testing§ Query Retrieval Timings § Backup & Recovery§ Disaster Recovery§ Space Management§ Upgrades§ Security

34

18

CASE Tools§ Support provided by CASE tools include:

- data dictionary to store information about database application’s data;- design tools to support data analysis;- tools to permit development of corporate data model, and conceptual and logical data models;- tools to enable prototyping of applications.

§ Provide following benefits:§ standards; § integration;§ support for standard methods;§ consistency;§ automation .

35

36

ER Data Modelling Tools

19

ER Diagram in Oracle Data Modeller

ER Database Design

A very quick refresher

38

20

39

E-R Model

• Entity-Relationship model is a set of concepts and graphical symbols that can be used to create conceptual schemas.

• Versions– Original E-R model — Peter Chen (1976).– Extended E-R model — Extensions to the Chen model.– Information Engineering (IE) — James Martin (1990); it uses “crow’s foot” notation, is

easier to understand and we will use it.– IDEF1X — A national standard developed by the National Institute of Standards and

Technology– Unified Modeling Language (UML) — The Object Management Group; it supports object-

oriented methodology

40

Entities

• An entity represents a thing that has meaning in a given context and about which there is a need to record data.

• Something that can be identified and the users want to track– Entity class — a collection of entities of a given type– Entity instance — the occurence of a particular entity

• There are usually many instances of an entity in an entity class.

21

41

Attributes• Attributes describe an entity’s characteristics.• All entity instances of a given entity class have the same attributes, but vary in the values

of those attributes.• Originally shown in data models as ellipses.• Data modeling products today commonly show attributes in rectangular form.

42

Identifiers

• Identifiers are attributes that name, or identify, entity instances.• The identifier of an entity instance consists of one or more of the entity’s attributes.• Composite identifiers: Identifiers that consist of two or more attributes• Identifiers in data models become keys in database designs:

– Entities have identifiers.– Tables (or relations) have keys.

• Candidate Keys• Alternate Keys• Primary Key

– CK = PK + AK• Foreign Keys

An Identifier does not have to be a Sequence Number

An identifier should have some meaning/semantics

It is OK to have a composite Key

The identifier and other Keys are very importing : Data Integrity, Internal DB

processing & DB Optimizer

22

43

Relationship Cardinality

• Cardinality means “count,” and is expressed as a number.

• Maximum cardinality is the maximum number of entity instances that can participate in a relationship.– There are three types of maximum cardinality:

• One-to-One [1:1]• One-to-Many [1:N]• Many-to-Many [N:M]

• Minimum cardinality is the minimum number of entity instances that must participate in a relationship.– defines the participation : Optional or Mandatory

Creating & Reading an ER Diagram

Student

Manages

Also known as a Parent / Child relationship

Region

23

45

Create Relationships:N:M Strong Entity Relationships

• The solution is to create an intersection table that stores data about the corresponding rows from each entity

• The intersection table consists only of the primary keys of each table which form a composite primary key

• Each table’s primary key becomes a foreign key linking back to that table

COMPANY_PART_INT (CompanyName, PartNumber)

Exercise• Discuss and complete the following

46

24

Analysing Text in ER Questions• A simplified version of what is in the text is

1. Identify the entities– Are there any sub-types

2. Identify the relationships3. Identify the degree4. Identify the participation

– Mandatory / Optional

5. Identify the attributes6. Identify the Primary Keys

7. Identify the data constraints8. Identify the business rules These are usually built up when doing 2-5 above

9. Identify the limitations

Constraints• The constraints record any conditions that, if not enforced, would permit the existence of

inconsistent data in the database. Only those constraints that cannot be shown in the E–R model are detailed.

• Data Constraints– Validation rules

• Data size, formats, etc• Valid values• Look-up values• Domains

– Participation conditions (null / not null)• Business Rules

– What rules need to be applied to the data that require additional processing• Eg. Person should be older than 18

– Rules between attributes in the same table (Intra-relation constraints)– Rules with data in other tables (Inter-relation constraints)

• The StartDate of a WorksOn entity must be on or after the StartDate of the Project to which the WorksOn entity is related via StaffedBy

48

25

Assumptions• These record the working assumptions the analyst has made while developing the

model. • When working with stakeholders in the ‘real world’ these assumptions would be

resolved, – but in the academic environment we use the assumptions to record any specific

decisions made about places where there may be ambiguity in the requirements.

• In ‘real world’, assumptions will be replaced by definite statements of requirements when the refinement process is completed

– Are Employees allocated only once to a Project then remain with it until the end of the Project? I have assumed so. (If this is not the case then additional work will be required to resolve the resulting many to many relationship between Employee and Project.)

49

Limitations

• These record the scope of the model.

• The list of limitations represents a description of expectations about the ‘reality’ of the situation being modelled that indicates the limits under which the entire model is meaningful.

• They are intended to ensure that the model is only used for purposes that are compatible with the domain of discourse for which it was produced.

– No historic records are held, the recorded data is for a moment in time50

26

The Importance of GOOD DB Design• Represents the Data in your Application or Enterprise• Everything has a meaning• The KEYS have a meaning• You always define PK->FK relations• Define all data behavior in the DB

• There are many ways to access the database• Need to ensure data integrity all the time

• Helps the Database to run efficiently• Helps the Database to Store and Select the data efficiently• Helps the DB Optimizer to understand and process your data• ……• Reduces the coding that the developer has to do• Ensures the security of the data• Utilizes the performance and scalability of the Database Server• Reuses data and query results for other end-users 51

Exercises – Extend Work§ What are the data requirements for a text messaging application ?

§ Groups of 3§ 10 minutes§ 1 person to be spokesperson§ What are the data issues ?

§ What are the data requirements for a train ticket ?§ Groups of 3§ 10 minutes§ 1 person to be spokesperson§ What are the data issues ?

§ What are the data issues?

27

Exercise§ Have you used the DIT Copy/Print Centres?

§ Design an ER diagram for the data that is captured.

§ Think out your interactions with using this service§ Topping up/Adding Credit§ Using the copy/print service§ Getting a statement or current balance

§ What are the data issues ?§ What systems/applications does it interact with ?

§ What issues have you experienced as an end user?

Oracle SQL Developer Demo & Videos

54

Creating a Logical ER Data Model

Create the Physical Model and DDL

28

Home Work – To be completed for next week

Complete all ER Diagram exercises in SQL Developer.Generate the Logical modelsGenerate the Physical modelsGenerate the DDL

55