data management: databases and organizations richard watson

180
Data Management: Databases and Organizations Richard Watson Summary of Chapter 7 and Basic Structures prepared by Kirk Scott 1

Upload: wilbur

Post on 23-Feb-2016

39 views

Category:

Documents


0 download

DESCRIPTION

Data Management: Databases and Organizations Richard Watson. Summary of Chapter 7 and Basic Structures prepared by Kirk Scott. Data Modeling and SQL. Chapter 7. Data Modeling Reference: Basic Structures. Chapter 7. Data Modeling. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Management:  Databases and Organizations Richard Watson

1

Data Management: Databases and OrganizationsRichard Watson

Summary of Chapter 7 and Basic Structures prepared by Kirk Scott

Page 2: Data Management:  Databases and Organizations Richard Watson

2

Page 3: Data Management:  Databases and Organizations Richard Watson

3

Data Modeling and SQL

• Chapter 7. Data Modeling• Reference: Basic Structures

Page 4: Data Management:  Databases and Organizations Richard Watson

4

Chapter 7. Data Modeling

• The building blocks of data modeling should be familiar to you:

• Entities• Attributes• Relationships• Identifiers (keys)• The next five overheads taken from chapter 7

review the ER notation for these things

Page 5: Data Management:  Databases and Organizations Richard Watson

5

Page 6: Data Management:  Databases and Organizations Richard Watson

6

Page 7: Data Management:  Databases and Organizations Richard Watson

7

Page 8: Data Management:  Databases and Organizations Richard Watson

8

Page 9: Data Management:  Databases and Organizations Richard Watson

9

Page 10: Data Management:  Databases and Organizations Richard Watson

10

• A model is the starting point for creating a database• No table need be created before the model is

complete• Quality of the data model is essential• The model should be well formed: It should follow

the basic rules for entities, attributes, relationships, and keys

• The following overhead summarizes the characteristics of a well formed model

Page 11: Data Management:  Databases and Organizations Richard Watson

11

Page 12: Data Management:  Databases and Organizations Richard Watson

12

• A quality data model should be high-fidelity• This means that it has to accurately and

completely model the situation in the problem domain

• A model which is well formed but does not model the problem domain is useless from a practical point of view

Page 13: Data Management:  Databases and Organizations Richard Watson

13

• The phrase “quality improvement” in the context of data models means this:

• It is unrealistic to assume that a good data model can be created on the first try

• A data model will evolve as technical mistakes are caught

• More importantly, it will evolve as a result of interaction with users as the problem domain and requirements are more completely understood

Page 14: Data Management:  Databases and Organizations Richard Watson

14

The Stock Example

• A simple data model for nations and stocks is given on the next overhead

• Superficially, it seems OK• It could be verbally summarized as “Nations

have stocks”

Page 15: Data Management:  Databases and Organizations Richard Watson

15

Page 16: Data Management:  Databases and Organizations Richard Watson

16

• The book now introduces the following additional textual information

• Stocks are listed on stock exchanges (a new entity)

• A nation may have >1 stock exchange• A given stock may be listed on >1 exchange,

but it has 1 home exchange

Page 17: Data Management:  Databases and Organizations Richard Watson

17

• Stocks can be listed on the exchanges of >1 country

• Notice that the abstraction of a listing is repeated in this description

• That suggests that a listing itself will be an entity

• The next overhead shows a revised model that takes into account the new assumptions

Page 18: Data Management:  Databases and Organizations Richard Watson

18

Page 19: Data Management:  Databases and Organizations Richard Watson

19

The Geography Example

• Next the book gives a simple example that’s supposed to model the relationships between nations, administrative units (states), and cities

• See the next overhead for a straightforward model of this

Page 20: Data Management:  Databases and Organizations Richard Watson

20

Page 21: Data Management:  Databases and Organizations Richard Watson

21

• The book next observes that exceptions are the bane of a good model

• If you presume to model these globally, then your model should accommodate all possible situations

• The book asks, “How many errors can you find in the initial data model?”

• See the table on the following overheads for answers

Page 22: Data Management:  Databases and Organizations Richard Watson

22

Page 23: Data Management:  Databases and Organizations Richard Watson

23

Page 24: Data Management:  Databases and Organizations Richard Watson

24

• The next overhead shows a nations, administrative units, cities data model that has been revised to take into account these exceptional cases/errors in the initial model

• This revised model may seem needlessly complex• However, the complexity is not needless• This is an accurate model of the situation that covers all

cases• The initial model was insufficiently complex• It was wrong

Page 25: Data Management:  Databases and Organizations Richard Watson

25

Page 26: Data Management:  Databases and Organizations Richard Watson

26

The Women, Men, Marriage, and People Examples

• This topic was brushed on all the way back in unit one

• Capturing the relationships among people is a very common problem that leads to some familiar challenges and design/model choices

• On the following overhead is an ER diagram of the relationship between married men and women

Page 27: Data Management:  Databases and Organizations Richard Watson

27

Page 28: Data Management:  Databases and Organizations Richard Watson

28

• The foregoing model is obviously hilariously limited in the kind of relationship it can capture

• In addition, the book points out the following characteristics of the model which might indicate that a different model would be better

• 1. The labeling of the model indicates that this is a marriage, but there is nothing in the fields that spells this out

• In particular, you might think that there would be a date field, a marriage license number, something among the fields that was specific to marriage

Page 29: Data Management:  Databases and Organizations Richard Watson

29

• 2. The Man and Woman tables have the same set of attributes, different only in their being name manX or womanX

• This might suggest that we are dealing with one entity type, person, rather than two distinct entity types, man and woman

Page 30: Data Management:  Databases and Organizations Richard Watson

30

• 3. The last observation concerns the fields manoname and womanoname

• These stand for “other” name• As the model stands, a person can only have one other

name• Alternatively, if the other name field is text, it might be

filled with multiple values—not an ideal solution• A complete treatment of people and other names might

introduce another table so that there could be a one-to-many relationship between people and their various names

Page 31: Data Management:  Databases and Organizations Richard Watson

31

• The book doesn’t solve all of these problems, but it does come up with a second model

• If there were two types to begin with and you combine into one, you frequently get a new field in the result

• A person now has a gender field• Also, the labeling of the relationship could be

made more generic• See the following overhead

Page 32: Data Management:  Databases and Organizations Richard Watson

32

Page 33: Data Management:  Databases and Organizations Richard Watson

33

• Next the book tackles the topic of multiple marriages• If you’re dealing with a Person table, then the table

is in a many-to-many relationship with itself• To distinguish between multiple marriages,

potentially between the same partners, beginning and ending date fields can be added to the table in the middle

• See the next overhead for the third version of the model

Page 34: Data Management:  Databases and Organizations Richard Watson

34

Page 35: Data Management:  Databases and Organizations Richard Watson

35

• In the long run, some sort of arbitrary numbering scheme might be desirable

• A marriage license number might work, but the book points out that legally speaking it might also be desirable to record common law marriages

• Notice in general that a lot of data integrity questions start to arise with a model like this

• See the next overhead for the fourth version of the model

Page 36: Data Management:  Databases and Organizations Richard Watson

36

Page 37: Data Management:  Databases and Organizations Richard Watson

37

• Next the book considers adding children to the model• Children are modeled as the result of marriage• Of course, this is not always the case• As long as the marriageno field in the person table can

be null, the model accommodates that• Still, it doesn’t allow you to record who a person’s

parents are if the person wasn’t the result of marriage• See the next overhead for the fifth version of the

model

Page 38: Data Management:  Databases and Organizations Richard Watson

38

Page 39: Data Management:  Databases and Organizations Richard Watson

39

• The person model could be developed even further

• This model barely scratches the surface of the variety of human relationships

• It is already moderately complex but could become more complex

Page 40: Data Management:  Databases and Organizations Richard Watson

40

• A model is complete when it contains everything needed in practice for a given problem

• The model is unsuitable if it isn’t complex enough

• It is also unsuitable if it contains detail that isn’t needed

Page 41: Data Management:  Databases and Organizations Richard Watson

41

The Book Example

• The book entitles this “When’s a book not a book?”• In other words, the example is an invitation to clarify

what you mean when you refer to entities in a design• Are you referring to individual objects?• Are you referring to kinds of objects?• What elements of a design make it possible to

distinguish between these meanings?• A simplistic initial design is given on the next

overhead

Page 42: Data Management:  Databases and Organizations Richard Watson

42

Page 43: Data Management:  Databases and Organizations Richard Watson

43

• The book observes that a library may have more than one copy of a book

• You might be tempted to model this by adding a copy number to the book record

• The problem with that solution is that the basic book information would be repeated for every copy

• The solution is to treat a “book” as an abstract entity and a copy as a separate, concrete entity

• Such a design is shown on the next overhead

Page 44: Data Management:  Databases and Organizations Richard Watson

44

Page 45: Data Management:  Databases and Organizations Richard Watson

45

• You may have noticed that although the ISBN should be a unique identifier for a book (not a copy) it is not used as a primary key in these designs

• The problem is that books before a certain date did not have ISBN’s

• Also, you may have hand-crafted modern books that weren’t commercially published and don’t have ISBN’s

Page 46: Data Management:  Databases and Organizations Richard Watson

46

The Employment History Example

• This example starts out simply enough• A given company has divisions• The divisions have departments• Departments have employees• This is shown in the ER diagram on the next

overhead

Page 47: Data Management:  Databases and Organizations Richard Watson

47

Page 48: Data Management:  Databases and Organizations Richard Watson

48

• Next, the author observes that over time a given employee may hold different positions

• These positions may be in different departments• Like marriages, the distinguishing features of

positions may include a beginning and ending date

• This is shown in the ER diagram on the next overhead

Page 49: Data Management:  Databases and Organizations Richard Watson

49

Page 50: Data Management:  Databases and Organizations Richard Watson

50

• Next the author introduces the concept of a payslip into the record-keeping that the model includes

• It’s not fully fleshed out in the next example, but when you look at the diagram you may have an inkling that the treatment of payslips is reminiscent of the treatment of line items

• This is shown in the ER diagram on the next overhead

Page 51: Data Management:  Databases and Organizations Richard Watson

51

Page 52: Data Management:  Databases and Organizations Richard Watson

52

• The final design treats payslips exactly like the line item example

• A payslip is like a bill of sale• Pay slip text is like an item• The table in the middle, PaySlipLine, is like LineItem• The pk of PaySlipLine is the concatenation of the pk of

Payslip embedded as a fk, plus a pay slip number (payslipno)

• The pk of PslText is embedded separately as a fk• This is shown in the ER diagram on the next overhead

Page 53: Data Management:  Databases and Organizations Richard Watson

53

Page 54: Data Management:  Databases and Organizations Richard Watson

54

The Aircraft Leasing Example

• In the previous set of overheads the first design containing a cycle cropped up

• This example also contains a cycle• There are three base tables and three tables in the

middle• Each of the base tables is in a many-to-many

relationship with each other• Overall, the tables are in a many-to-many-to-may

relationship• This is shown in the ER diagram on the next overhead

Page 55: Data Management:  Databases and Organizations Richard Watson

55

Page 56: Data Management:  Databases and Organizations Richard Watson

56

• How to properly model a situation becomes an important question in the next chapter, on normalization

• In the meantime, the following observation can be made:

• An aircraft lease is an abstract entity that seems to be part of the business problem

• However, it doesn’t appear in the design• This isn’t just a problem in a theoretical sense

Page 57: Data Management:  Databases and Organizations Richard Watson

57

• First of all it’s clear that in order to get complete information about a lease from this design a 6-way join would be needed

• That’s inconvenient• Also, leases themselves may have attributes like

starting and ending dates• There is no place to record them• An improved, star-like design for the problem is

shown on the next overhead

Page 58: Data Management:  Databases and Organizations Richard Watson

58

Page 59: Data Management:  Databases and Organizations Richard Watson

59

The Project Management Example

• This example addresses the question of where something could or should be modeled

• It impinges on the question of how the model has to be changed to capture a more detailed business situation

• The first model is given on the next overhead• It should be relatively self-explanatory

Page 60: Data Management:  Databases and Organizations Richard Watson

60

Page 61: Data Management:  Databases and Organizations Richard Watson

61

• Now consider the altered model on the following overhead

• The planned hours attribute has been moved from the Activity entity to the Daily Work entity

• This small change in location of a field has a clear and logical outcome

• The planning of project hours is done on a daily basis, not an activity basis

Page 62: Data Management:  Databases and Organizations Richard Watson

62

Page 63: Data Management:  Databases and Organizations Richard Watson

63

Cardinality and Modality

• Cardinality refers to the count of the number of instances of entities in a relationship

• Modality is a fancy way of saying that there can be 0 entities in a relationship

• In other words, one end of a relationship is optional

• This condition obtains, for example, when a pk in one table has no fk entries in another

• It also obtains when a fk value is null

Page 64: Data Management:  Databases and Organizations Richard Watson

64

• The book gives the table shown on the next overhead summarizing cardinality and modality

Page 65: Data Management:  Databases and Organizations Richard Watson

65

Page 66: Data Management:  Databases and Organizations Richard Watson

66

• The author now enhances the notation for ER diagrams

• Unlike UML, it is not customary to mark actual digits at the ends of crows’ feet

• Instead, a short vertical bar marks the end of a relationship where an instance of an entity is mandatory

• An “o” marks the end of a relationship where an instance of an entity is optional

Page 67: Data Management:  Databases and Organizations Richard Watson

67

The Nation and Stock Example

• The following diagram of the 1-m relationship between nations and stocks illustrates this new notation

• Nation has a bar• Stock has an o• Every stock has to have a nation• The nation code field in the Stock table can’t be null• A nation doesn’t have to have a stock• There can be nation code values in the Nation table

where no such nation code appears in the Stock table

Page 68: Data Management:  Databases and Organizations Richard Watson

68

Page 69: Data Management:  Databases and Organizations Richard Watson

69

The Sale, Item, and Lineitem Example

• The following diagram of the m-n relationship between sales and items also illustrates this new notation

• A sale has to have at least one line item• A line item has to belong to a sale• A line item has to have an item• An item doesn’t have to be part of a sale

Page 70: Data Management:  Databases and Organizations Richard Watson

70

• These verbal statements can be translated into null/not null and existence/non-existence requirements for fields and rows in tables

• The new thing illustrated by this example is that if you have a row for a sale in the Sale table, the ER diagram now states that it has to have a corresponding record in the Lineitem table

• This is not something that can be enforced by the database using referential integrity, for example

• It is a new kind of data integrity constraint

Page 71: Data Management:  Databases and Organizations Richard Watson

71

Page 72: Data Management:  Databases and Organizations Richard Watson

72

The Department and Employee Example

• The following diagram of the 1-m and 1-1 relationships between departments and employees also illustrates this new notation

• An employee has to have a department• A department doesn’t have to have employees• A department has to have a boss• An employee doesn’t have to be the boss of a

department

Page 73: Data Management:  Databases and Organizations Richard Watson

73

• It is worth paying close attention to the 1-1 relationship

• It looks a little odd to have a line with no crow’s foot with a bar at one end and an o at the other

• Recall that in order to reduce the number of nulls, the 1-1 relationship was captured by embedding the pk of Employee as a fk in Department

• The notation means that the fk can’t be null• It also means that not every pk of Employee has to

appear as a fk value

Page 74: Data Management:  Databases and Organizations Richard Watson

74

Page 75: Data Management:  Databases and Organizations Richard Watson

75

• Recall that when presented earlier, the Employee-Department diagram grew to include the (recursive) relationship telling which employee was which other employee’s boss

• In the following diagram, this line has o’s at both ends• This means it’s possible to have employees who are not

bosses• It also means that the embedded fk field can be null• In other words, there can be employees who don’t

have bosses

Page 76: Data Management:  Databases and Organizations Richard Watson

76

Page 77: Data Management:  Databases and Organizations Richard Watson

77

The Monarch Example

• The monarch succession relationship can also be marked for modality

• The first monarch would have no predecessor• The current monarch would have no successor

(yet)• Both ends of the relationship are optional• This is shown in the ER diagram on the

following overhead

Page 78: Data Management:  Databases and Organizations Richard Watson

78

Page 79: Data Management:  Databases and Organizations Richard Watson

79

The Product Assembly Example

• Modality can also be added to the product-assembly example

• If there is an assembly entry, it has to have a super-product

• Likewise, if there is an assembly entry, it has to have a sub-product

• On the other hand, there can be products that are neither super-products nor sub-products

Page 80: Data Management:  Databases and Organizations Richard Watson

80

• It is interesting to note that in this situation the vertical bars repeat information that can be inferred from the rest of the diagram

• The + signs on the crow’s feet mean that the embedded foreign keys are also primary keys

• As primary keys, they can’t be null• As foreign keys, referential integrity states that their values

have to occur in the corresponding primary key table• Therefore, the corresponding super-product or sub-product

entry has to exist• It is mandatory

Page 81: Data Management:  Databases and Organizations Richard Watson

81

Page 82: Data Management:  Databases and Organizations Richard Watson

82

Entity Types

• The author categorizes entities into the following types:

• Independent• Weak or dependent• Associative• Aggregate• Subordinate

Page 83: Data Management:  Databases and Organizations Richard Watson

83

Independent entities

• The following ER diagram shows two independent entities

• Instances of each can exist regardless of the existence of matching instances of the other

• Although a pk is embedded as a fk, the pk may have no matches and the fk may be null

• Independent entities are usually the easiest base tables to recognize in a problem domain

Page 84: Data Management:  Databases and Organizations Richard Watson

84

Page 85: Data Management:  Databases and Organizations Richard Watson

85

Weak or Dependent Entities• A weak entity is one where the pk of another table is

embedded as a fk in it• And the fk is part of the primary key of the dependent table• Because the pk of the weak entity can’t be null, in its role as

a fk, that field has to have a corresponding pk value in the other table

• In other words, an instance of a dependent entity simply can’t exist without the existence of a matching instance in the other table

• In the following ER diagram, cities can’t exist without their corresponding regions

Page 86: Data Management:  Databases and Organizations Richard Watson

86

Page 87: Data Management:  Databases and Organizations Richard Watson

87

Associative Entities

• Associative entities have already been explained

• This is an alternative name for the table in the middle

• The table in the middle may or may not have a concatenated key

• It may or may not have attributes of its own

Page 88: Data Management:  Databases and Organizations Richard Watson

88

• If the table in the middle does have attributes, in practice they are frequently date or time attributes

• This makes it possible to keep track of multiple pairings of the same base entities over time

• The following ER diagram gives Position as the table in the middle

Page 89: Data Management:  Databases and Organizations Richard Watson

89

Page 90: Data Management:  Databases and Organizations Richard Watson

90

Aggregate Entities

• The book doesn’t have a diagram for this concept

• It explains it verbally• Customers and suppliers are two different

entities which might both have addresses• Address information could be broken out of

both

Page 91: Data Management:  Databases and Organizations Richard Watson

91

• Once it is broken out, there is no reason to have two different kinds of address

• An address is an address, and both the Customer and Supplier tables could be in a relationship with an address (address line…) table

• The reason there is no diagram is that once model analysis is complete, the aggregate table simply becomes another base table

Page 92: Data Management:  Databases and Organizations Richard Watson

92

Subordinate Entities

• Subordinate entities are entities which are a more detailed kind of some other entity

• In other words, the main entity holds attributes common to all different kinds

• The subordinate entity holds attributes for a specific kind

Page 93: Data Management:  Databases and Organizations Richard Watson

93

• You know you have a subordinate entity when the pk of one table is the pk of the other

• The following ER diagram illustrates the idea with animals

• Notice that the relationships are one-to-one with a + sign

Page 94: Data Management:  Databases and Organizations Richard Watson

94

Page 95: Data Management:  Databases and Organizations Richard Watson

95

Generalization and Aggregation

• First, keep in mind that the use of the term aggregation here is different from its use in the phrase “aggregate entity”

• Also note that the author is now introducing object-oriented ideas

• This makes it possible to compare ER notation with UML notation

Page 96: Data Management:  Databases and Organizations Richard Watson

96

• The following UML diagram captures the relationship between animals, sheep, and horse that was illustrated in the previous ER diagram

• Animal is a generalization of the other two kinds of animals

• Together, the different kinds of animals form a hierarchy of the type “is-a” or “is-a-kind-of” which should be familiar from the object-oriented world

Page 97: Data Management:  Databases and Organizations Richard Watson

97

Page 98: Data Management:  Databases and Organizations Richard Watson

98

Aggregation

• Aggregation and composition are usually treated together in object-orientation

• Aggregation captures a “has-a” or containment relationship

• In UML it is symbolized by a diamond• The diamond is less intuitive than the crow’s

foot, but they are roughly equivalent• This is illustrated on the next overhead

Page 99: Data Management:  Databases and Organizations Richard Watson

99

Page 100: Data Management:  Databases and Organizations Richard Watson

100

Aggregation and Composition; One-to-Many and Many-to-Many Relationships

• In UML, the term aggregation is usually described as a simple “has-a” relationship and is symbolized with a white diamond

• Composition is usually described by a phrase like, “the parts can’t exist without the whole” and is symbolized by a black diamond

• These concepts translate at least in part into relational database concepts and ER diagrams

Page 101: Data Management:  Databases and Organizations Richard Watson

101

• The translation between object-oriented and relational isn’t perfect though

• Object-oriented code can have references• All relationships in the relational model are

captured by the values of fields• The diagrams on the next overhead show the

relationship between classes and students• They will be followed by commentary

Page 102: Data Management:  Databases and Organizations Richard Watson

102

Page 103: Data Management:  Databases and Organizations Richard Watson

103

• In the UML diagram a white diamond is used• Students can exist without enrolling in any

classes• More importantly, in the UML diagram there is

no Enrollment class• A many-to-many relationship can be captured

using references alone

Page 104: Data Management:  Databases and Organizations Richard Watson

104

• In the ER diagram there is an Enrollment class to capture the many-to-many relationship

• This is a classic table in the middle with a concatenated primary key, indicated by the + signs

• As such, enrollment records are dependent

Page 105: Data Management:  Databases and Organizations Richard Watson

105

• Enrollment records cannot exist without corresponding student and class records

• If there were an Enrollment class in the O-O model, it would be an example of composition, not aggregation

• For further explanation, see the next example

Page 106: Data Management:  Databases and Organizations Richard Watson

106

• Next the book illustrates the relationship between students and aptitude tests

• The key to the diagrams is the relationship label “taken”

• Aptitude tests themselves can exist without students

Page 107: Data Management:  Databases and Organizations Richard Watson

107

• However, the class/entity labeled Aptitude test actually means a specific aptitude test score

• The diagram is given on the next overhead• More explanations follow it

Page 108: Data Management:  Databases and Organizations Richard Watson

108

Page 109: Data Management:  Databases and Organizations Richard Watson

109

• The point is that specific aptitude test scores can’t exist without the student who took the test and got that score

• In the UML diagram the relationship is shown with a black diamond

• The part can’t exist without the whole• In the ER diagram there is a crow’s foot with a + sign• This means that an aptitude test record can’t exist

without a corresponding student record

Page 110: Data Management:  Databases and Organizations Richard Watson

110

Data Modeling Hints

• The book next addresses these subpoints:• 1. The rise and fall of a data model• 2. Identifier• 3. Position and order• 4. Attributes and consistency• 5. Names and addresses• 6. Single instance entities

Page 111: Data Management:  Databases and Organizations Richard Watson

111

• 7. Picking words• 8. Synonyms• 9. Homonyms• 10. Exception hunting• 11. Relationship labeling• 12. Keeping the data model in shape• 13. Used entities

Page 112: Data Management:  Databases and Organizations Richard Watson

112

The rise and fall of a data model

• The book points out that a model will both grow and shrink as it develops

• Discovering new entities will cause it to grow• Trying to handle greater specificity will cause it to

grow• Generalization happens when you recognize useful

commonality• This will cause a model to shrink in a useful way• Consider the diagrams on the following overheads

Page 113: Data Management:  Databases and Organizations Richard Watson

113

Growth, specificity

Page 114: Data Management:  Databases and Organizations Richard Watson

114

Generalization, shrinkage

Page 115: Data Management:  Databases and Organizations Richard Watson

115

Identifier

• The basic rule, except in cases where a simple concatenated key works:

• If there is no obvious identifier (pk), simply make up an arbitrary one

• Consecutive numbering by entry order would be a simple choice

• Notice that packages like Access have features like this

Page 116: Data Management:  Databases and Organizations Richard Watson

116

• There is an irony to such “helpful” features• They are most likely to be used by people who

don’t even know what a pk is, and they will end up making a confusing mess

• For more informed users, the feature isn’t really necessary, and they’re more likely to want full control over the values entered anyway

Page 117: Data Management:  Databases and Organizations Richard Watson

117

Position and Order

• The concepts of position and order apply to both the ER diagram of the model and the contents of tables

• The general rule for presenting a model is to be organized

• The most important base entities might appear in the center, at the top, starting at the left—somewhere, anywhere where they aren’t hidden as afterthoughts

• Also, arranging things so that lines don’t cross is important for understanding

Page 118: Data Management:  Databases and Organizations Richard Watson

118

• The important point is that all entities and relationships be correctly identified

• Similarly, there is no required order to the attributes in an entity

• However, common sense dictates being consistent, putting the pk first, listing more important attributes nearer the top

• In the author’s notation, fk’s don’t appear• I still recommend that they be included, marked with

fk so that they can’t be overlooked

Page 119: Data Management:  Databases and Organizations Richard Watson

119

• As usual, the rows in a table are not stored in sorted order

• When picking fields for a table you want to keep in mind any ordering that you might eventually want produced by a query

• There has to be a field for the ORDER BY if you want the data presented in that order

Page 120: Data Management:  Databases and Organizations Richard Watson

120

• On the next overhead the monarch data model is shown again

• There is an implicit ordering to the data based on the values

• It is interesting to consider whether you could write a query that would show the monarchs in succession order

• It seems that this would have to be procedural, like the recursive query to find all products in a given product

• The solution to the problem would be a design where the monarchs were simply numbered in order

Page 121: Data Management:  Databases and Organizations Richard Watson

121

Page 122: Data Management:  Databases and Organizations Richard Watson

122

Attributes and consistency

• In simple terms, if you use the same field name in different tables, it should have the same meaning in the different tables

• Within a single table, the field should also have exactly the same meaning for every row

• The book’s example of what shouldn’t be done with attributes is outlined on the next overhead

Page 123: Data Management:  Databases and Organizations Richard Watson

123

• Let there be an attribute “stock info” that stores either the stock’s price to earning ratio or its return on investment

• Which value is held in that field is determined by whether the value 1 or 2 appears in another field named “stock info code”

• This is very unfortunate• One field in a table now depends on another• The meaning of the dependent field varies from record

to record

Page 124: Data Management:  Databases and Organizations Richard Watson

124

Names and addresses

• Names and addresses frequently occur in databases

• Although not incredibly hard, their treatment is usually a little more complex than what you might think at first glance

• There are several basic rules that apply

Page 125: Data Management:  Databases and Organizations Richard Watson

125

• Have you subdivided into sufficiently small fields?

• Can you handle multiple occurrences?• Can you construct something that consists of

multiple parts?

Page 126: Data Management:  Databases and Organizations Richard Watson

126

• Although SQL has string operators that allow you to form queries based on subparts of fields, it is not wise to depend on this

• For example, in the long run it is easier and more logical to have first name, middle name, and last name fields in place of one monolithic name fields

Page 127: Data Management:  Databases and Organizations Richard Watson

127

• The book also mentions the question of including titles with names (Mr., Mrs., etc.)

• There is also the question of suffixes (jr., III, and so on)

• There are those people who have multiple given names (George Herbert Walker Bush)

Page 128: Data Management:  Databases and Organizations Richard Watson

128

• There are also those people who have different names married than when they were single

• Or there are people who have changed their names legally or simply use aliases

• The point is that a complete design will handle all of these cases

Page 129: Data Management:  Databases and Organizations Richard Watson

129

• The question of addresses is not really more difficult, but it is somewhat less familiar than names

• The fundamental problem is that addresses can take many different forms

• Depending on the organization, an address may be many lines long

• Depending on the country, an address might come in an uncustomary order

Page 130: Data Management:  Databases and Organizations Richard Watson

130

• The handling of addresses was mentioned earlier

• It has something in common with line items and pay slips in its most complete treatment

• The book gives the model on the next overhead as a reminder

Page 131: Data Management:  Databases and Organizations Richard Watson

131

Page 132: Data Management:  Databases and Organizations Richard Watson

132

• Finally, people might have more than one address

• A home address vs. a school address• A mailing address vs. a residential address• This doesn’t add a great deal of complexity—

it’s just a one-to-many relationship• The book gives the ER diagram on the

following overhead to illustrate

Page 133: Data Management:  Databases and Organizations Richard Watson

133

Page 134: Data Management:  Databases and Organizations Richard Watson

134

Single Instance Entities

• The moral of this story is that single instance entities—one row tables—are not a crime

• In the example shown below there would be one firm listed

• Quite simply, this allows firm information to be stored

• It also makes it easy if to hold information about >1 firm if there happens to be a merger

Page 135: Data Management:  Databases and Organizations Richard Watson

135

Page 136: Data Management:  Databases and Organizations Richard Watson

136

Picking words

• The moral of the story here is to base the model on the vocabulary of the users

• It is important to root out inconsistencies in the users’ vocabulary if there are any

• However, cramming a different vocabulary down their throat won’t work

Page 137: Data Management:  Databases and Organizations Richard Watson

137

Synonyms

• Synonyms in data modeling are just like regular synonyms

• Different users or different groups of users use different words for the same thing

• Synonyms are not a technical problem• You may get users to agree on one word• You may also provide different views with

different vocabularies

Page 138: Data Management:  Databases and Organizations Richard Watson

138

Homonyms

• Homonyms in data modeling are just like regular homonyms

• Different users or different groups of users use the same words for different things

• Homonyms are not a technical problem, but they are a big practical problem

Page 139: Data Management:  Databases and Organizations Richard Watson

139

• They cause ambiguity and confusion and have to be tracked down

• Once identified, they are easy to fix• Qualify or expand the names of things so that

they are distinguished from each other

Page 140: Data Management:  Databases and Organizations Richard Watson

140

Exception hunting

• When working with clients (including yourself) ask these questions:

• Is it always like this?• Would there be any situations where this could be

an m:m relationship?• Have there ever been any exceptions?• Are things likely to change in the future?• A good data model should be able to handle

exceptional cases

Page 141: Data Management:  Databases and Organizations Richard Watson

141

Relationship Labeling

• The book recommends avoiding relationship labels because they tend to clutter up an ER diagram

• It is true that most 1-m relationships should be clear

• However, it’s customary to label 1-1 relationships because they aren’t inherently clear

• If any relationship is unclear, it should be labeled

Page 142: Data Management:  Databases and Organizations Richard Watson

142

Keeping the data model in shape

• An illustrative example of this idea is very simple• As you work on a model, you might add an

entity• If you do so, do not forget to work out its

identifier, attribute(s), and relationship(s) before moving on to something else

• Making incomplete additions will quickly turn a model into a mess

Page 143: Data Management:  Databases and Organizations Richard Watson

143

Used entities

• Developing models is just like writing code• If you have an earlier model or someone else’s

model (that you trust) as a starting point, work from there

• There is no need to start from scratch every time

Page 144: Data Management:  Databases and Organizations Richard Watson

144

Meaningful identifiers• This is the next major subsection• It mentions some things to avoid and some things worth

trying when picking identifiers, that is, when setting up primary key fields

• The phrase “meaningful identifier” means that you can read the key value and find out something useful about the record

• For example, id numbers for blue items always start with the characters “BLU”

• The complete opposite would be randomly generated identifiers

Page 145: Data Management:  Databases and Organizations Richard Watson

145

• In simple cases, meaningful identifiers might seem like an attractive option

• That would be memorable for users• They might be simple to administer

Page 146: Data Management:  Databases and Organizations Richard Watson

146

• However, they have disadvantages• If the reality you’re modeling becomes

complex, the identifiers are no longer easy to remember or administer

• If they are based on ranges of values, you may exhaust the available ranges

• If the underlying reality changes, previously meaningful identifiers lose their meaning

Page 147: Data Management:  Databases and Organizations Richard Watson

147

• Some large organizations have embedded codes into identifiers

• Vin’s contain certain identifiable parts• UPC codes also contain identifiable parts• If organizations choose to do this, it’s their

business• However, no independent organization has to

go down this path

Page 148: Data Management:  Databases and Organizations Richard Watson

148

• The general rule is that the disadvantages of meaningful identifiers outweigh the advantages

• Everything that could be coded into an identifier could be, and probably is recorded in an attribute field

• This means that you’ve returned to a situation where you have redundancy and one field is dependent on another

Page 149: Data Management:  Databases and Organizations Richard Watson

149

• The possible result is inconsistency between the information coded in the identifier and the data recorded in the fields

• Whether random, entry order, or some other scheme, non-meaningful identifiers are preferable

Page 150: Data Management:  Databases and Organizations Richard Watson

150

The seven habits of highly effective data modelers

• This sounds like a bunch of management bullshit, but if you have to do modeling for clients, these are worthwhile hints:

• 1. Immerse• As a computer person, when modeling for

someone else, you have to learn their problem domain and terminology before you can make a good model.

Page 151: Data Management:  Databases and Organizations Richard Watson

151

• 2. Challenge• This means challenge the assumptions and

find the exceptions• 3. Generalize• This means, when possible, to merge entities

together so the model doesn’t proliferate

Page 152: Data Management:  Databases and Organizations Richard Watson

152

• 4. Test• Have structured walk-throughs at a detailed

level• Check entities, identifiers, attributes, and

especially, relationships

Page 153: Data Management:  Databases and Organizations Richard Watson

153

• 5. Limit• This means let the project drive the modeling • Don’t do modeling for the sake of modeling. • The model is supposed to lead to practical results,

not theoretically perfect results.• Model elements of no use to the users are simply

of no use at all • If necessary, use the 80-20 rule in order to control

the process and limit the amount of time spent.

Page 154: Data Management:  Databases and Organizations Richard Watson

154

• 6. Integrate• This means that modeling doesn’t happen in a vacuum. • If an organization has existing systems, fit the new

model into the existing one.• 7. Complete• Whatever limits you’ve set for yourself, complete the

model within those limits. • Few things are more wasteful and worthless than a

model that hasn’t been finished.

Page 155: Data Management:  Databases and Organizations Richard Watson

155

Reference: Basic Structures

• The next set of overheads will be given without textual commentary

• This is a review of the concepts that have been raised in chapters 3 through 7 in this set of overheads and the previous one

• Some of the items are repetitions of examples• Others are new examples illustrating an idea• This review of basic structures will be followed by a

brief review of the book’s exercises for this chapter

Page 156: Data Management:  Databases and Organizations Richard Watson

156

No relationships

Page 157: Data Management:  Databases and Organizations Richard Watson

157

A 1:1 recursive relationship

Page 158: Data Management:  Databases and Organizations Richard Watson

158

A recursive 1:m relationship

Page 159: Data Management:  Databases and Organizations Richard Watson

159

A recursive m:m relationship

Page 160: Data Management:  Databases and Organizations Richard Watson

160

A 1:1 relationship

Page 161: Data Management:  Databases and Organizations Richard Watson

161

A 1:m relationship

Page 162: Data Management:  Databases and Organizations Richard Watson

162

A m:m relationship

Page 163: Data Management:  Databases and Organizations Richard Watson

163

A weak or dependent entity

Page 164: Data Management:  Databases and Organizations Richard Watson

164

An associative entity

Page 165: Data Management:  Databases and Organizations Richard Watson

165

A tree structure

Page 166: Data Management:  Databases and Organizations Richard Watson

166

Another approach to a tree structure

Page 167: Data Management:  Databases and Organizations Richard Watson

167

Exercises

• The basic structures chapter ends with exercises where the directions are to write the SQL CREATE statements for the designs shown

• The designs came up in the chapters• They are repeated here just for reference• You should recognize the structures and know what

they represent• The practical SQL part of the course is over, but in

theory you should know how to write the SQL CREATE statements, including all needed constraints

Page 168: Data Management:  Databases and Organizations Richard Watson

168

Page 169: Data Management:  Databases and Organizations Richard Watson

169

Page 170: Data Management:  Databases and Organizations Richard Watson

170

Page 171: Data Management:  Databases and Organizations Richard Watson

171

Page 172: Data Management:  Databases and Organizations Richard Watson

172

Page 173: Data Management:  Databases and Organizations Richard Watson

173

The End

Page 174: Data Management:  Databases and Organizations Richard Watson

174

• Ignore the remainder of the overheads• Material has been taken from another book

and included here• However, it will not be covered• It is simply kept here for future reference

Page 175: Data Management:  Databases and Organizations Richard Watson

175

Developing Software with UML: Object-Oriented Analysis and Design in Practice

• Bernd Oestereich• Chapter 2, Object-Orientation for Beginners• Section 2.13, Persistence• Synopsis– Persistence is the storing of objects on a non-

volatile medium– There is no one-to-one mapping to relational

databases

Page 176: Data Management:  Databases and Organizations Richard Watson

176

Page 177: Data Management:  Databases and Organizations Richard Watson

177

Page 178: Data Management:  Databases and Organizations Richard Watson

178

Page 179: Data Management:  Databases and Organizations Richard Watson

179

Page 180: Data Management:  Databases and Organizations Richard Watson

180

The End