an introduction to database concepts and vocabulary csc 240 (blum) 1

49
An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Upload: emory-craig

Post on 05-Jan-2016

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

An introduction to database concepts and vocabulary

CSC 240 (Blum)

1

Page 2: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Mauchly and Eckert designed the ENIAC during World War II to perform calculations (shell trajectories).

After the ENIAC was built, it was used to do thermonuclear chain reaction calculations.

But when Mauchly and Eckert went into business, their first customer was the census bureau.

And ever since computers have played an important role in filing and record keeping.

Suffice it to say Databases are very important. Databases are all around us.

CSC 240 (Blum)

2

Page 3: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

CSC 240 (Blum)

3

Page 4: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Desired Features of our database Storage:

We want to store data efficiently, have it centralized (or at least seemingly centralized).

Centralization (integration) are subject to bottlenecks and single-point-of-failure issues.

Retrieval: We want to have the data at our fingertips when

we want it. Querying:

We want to ask various questions about the data (and get answers in a timely manner).

(These desires are to some extent in conflict.)

CSC 240 (Blum)

4

Page 5: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

We would like to have the computer perform the tedious aspects of such tasks.

An outdated approach would be to use a file-based system, that is, to have the data stored in various (flat, simple text) files and write a program that reads the files, parses the information, does the required searching, sorting, correlating, etc.

CSC 240 (Blum)

5

Page 6: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Even in the file-based approach, one must identify units of information that will be contained in a single file. These are known as the entities.

An entity is somewhat similar to an object in programming, it collects data that belongs together in some immediate way.

Entities also separate the data into distinct units. Database entities often reflect real

objects/entities (persons, buildings, courses, etc.)

CSC 240 (Blum)

6

Page 7: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

The lower-level pieces of data gathered together to form an entity are known as fields or attributes or properties. The Person entity might consist of fields like

FirstName, LastName, JobType, SocSecNum, etc. Fields are analogous to properties of an object.

Fields have a type (Text, Number, Yes/No, Memo, Date/Time, etc.) which indicate how the information is to be stored and interpreted.

CSC 240 (Blum)

7

Page 8: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

The various entities may be distinct, but they are not completely disconnected. E.g. a Customer places an order

An association between two entities is known as a relationship. The Customer-places-Order relationship

was realized in Access by using the Lookup Wizard to ensure that the two tables had a common field (CustomerID).

CSC 240 (Blum)

8

Page 9: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

One can visualize the entities and their relationship using an Entity-Relationship (ER) diagram. The entities are represented by rectangles. The relationships are represented by

arrows between the rectangles. The arrow may include a verb to capture the

nature of the relationship (as well as other notations).

CSC 240 (Blum)

9

Page 10: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

CSC 240 (Blum)

10

CustomerCustomerID

CustomerFirstName

OrderOrderID

ShippingCost

ItemItemID

ItemDescription

Places

Is part of

Page 11: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

In a file-based approach, there would be a file corresponding to each entity. (There may be more files than entities since

some relationships are realized through their own tables/files.)

These files must be located, read, parsed. The data is then used to initialize some variables and/or objects which are then analyzed (searched, sorted, etc.) by the remainder of the program.

CSC 240 (Blum)

11

Page 12: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

The programmer must have information about the data files. For example: where they are to be found the order in which the fields occur the length of the fields and/or the delimiter

used Changing the length of a field or adding a

field may require that all of the corresponding programs be rewritten.

Such features of the file-based approach are called program-data dependence.

CSC 240 (Blum)

12

Page 13: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Since Reading is reading Parsing is parsing Searching is searching Sorting is sorting

Why have programmers continually repeating these tedious tasks?

Automate and/or generalize the process. These are some of the aspects of a

database management system (DBMS).

CSC 240 (Blum)

13

Page 14: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

The generalized routines for reading, parsing, searching, sorting etc. are in the DBMS.

But information specific to a particular case (number of fields, their type, size and so on) is still required. This data is placed together with the “actual” data in the database.

CSC 240 (Blum)

14

Page 15: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

This data about the data is known as meta-data. Meta: a prefix meaning: after, along with

or beyond The meta-data describes the actual

data, and so databases are sometimes called self-describing.

Related terms include: data dictionary, system catalog and schema.

CSC 240 (Blum)

15

Page 16: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

CSC 240 (Blum)

16

Page 17: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

The inclusion of the meta-data (the self-describing aspect of a database) allows a separation of the data from the processing, providing program-data independence.

Another way to say this is that there is a separation between the database (specific) and the database management system (generic).

CSC 240 (Blum)

17

Page 18: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

CSC 240 (Blum)

18

Database

Raw-data and meta-data

DBMS

User Application

Users and applications interact with a database only through the DBMS.

Page 19: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Control of data redundancy

Data consistency More info from same

data Sharing of data Improved data integrity Improved security Enforcement of

standards

Economy of scale Balancing of conflicting

requirements Improved accessibility

and responsiveness Improved maintenance

through data independence

Increased concurrency Improved backup and

recovery services

CSC 240 (Blum)

19

Page 20: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Control of data redundancy and consistency If the same data is entered more than once, it is said

to be redundant. An obvious point is that this wastes space. If the data is updated, it must be updated in several

places or the data will be inconsistent. Relationships are realized through repeated data,

but one tries to use something like an ID# (a name might change but an ID# does not have to).

(Redundancy reduction and query simplicity can be at odds, sometimes one sacrifices redundancy in order to make querying easier, e.g. in data mining. )

CSC 240 (Blum)

20

Page 21: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

More information, sharing of data and standardization Because databases facilitate querying,

they can yield more information. A database approach often centralizes

(integrates) the records of different departments, making more (raw) data and information available to the users

Integration often leads to standardization, consistent naming schemes, consistent report formats, etc.

CSC 240 (Blum)

21

Page 22: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Improved data integrity An old computing axiom says garbage in,

garbage out (GIGO). If the raw data is bad, so too is the resulting information.

In the database approach, one can apply constraints to help ensure that the data is reliable.

Access’s lookup table for the foreign keys is an example. A foreign key is supposed to match an entry from another table, the lookup table helps ensure that.

We also saw that we could “Enforce Referential Integrity.”

We also mentioned masks, which are another integrity check.

CSC 240 (Blum)

22

Page 23: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Improved security Part of the meta-data can be used to

authenticate users who are allowed to access the data.

Different users may have different access Data is often not entered directly into a

Table using the DataSheet but by using Views and/or Forms, which can hide sensitive data from certain users.

CSC 240 (Blum)

23

Page 24: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Economy of scale A benefit of an organization centralizing

(integrating) its record-keeping efforts is the money applied to individual departments is pooled.

Not only is duplication of effort reduced or eliminated, but so too is duplication of hardware and software.

Balancing of conflicting requirements Integration can lead to a resolution or at least

a balancing of different departments, which may have conflicting goals.

CSC 240 (Blum)

24

Page 25: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Improved data accessibility and responsiveness and increased productivity Because nitty-gritty details (reading,

parsing, sorting, searching, etc.) are built into the DBMS, the database staff work at a higher level closer to the users, responding to their particular needs.

Again with fewer details to attend to, more work can be accomplished.

CSC 240 (Blum)

25

Page 26: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Improved maintenance through data independence Change of a field’s type or size or

introduction of a new field changes only the database and not the DBMS.

This layering yields independence which simplifies maintenance. Changing the database does not require changing the DBMS, which was not the case in the file-based approach.

CSC 240 (Blum)

26

Page 27: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Increased concurrency The DBMS can handle multi-users using

and even updating the database. There are built-in mechanisms to prevent

two users from changing the data in conflicting ways.

Improved backup and recovery services Backing up and recovering the database

may be handled by the DBMS (that is, they are integrated services) rather than externally.

CSC 240 (Blum)

27

Page 28: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Complexity Size Cost of DBMS Additional hardware costs Cost of conversion Performance Higher impact of failure

CSC 240 (Blum)

28

Page 29: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Complexity and Size Because so many features have been

integrated into the DBMS’s, they have become complicated software packages. One must understand these features to utilize them properly.

Integrating information from various departments makes the database more complicated. Good design is crucial.

Integration of features into the DBMS and data into the database means that both may become quite large.

CSC 240 (Blum)

29

Page 30: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Cost OF DBMSs and the hardware Again the size and complexity of the

software means that such packages are expensive.

The larger, more complex software requires more powerful hardware to run on.

It also requires a knowledgeable, well-trained (hopefully high paid) staff.

CSC 240 (Blum)

30

Page 31: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Conversion cost Legacy system

Performance More complexity may slow down some

tasks. Higher impact of failure.

Integrating (centralizing) the information can mean that everything is lost at once.

CSC 240 (Blum)

31

Page 32: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

In addition to the data, there’s Hardware that stores and

manipulates the data Software to

Interface with the hardware (actually the operating system which

interfaces with the BIOS which interfaces with the hardware)

Provide the data with structure Interface with the user and/or

applications People

CSC 240 (Blum)

32

Page 33: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Could be: A single PC A mainframe and terminals A network of computers

CSC 240 (Blum)

33

Page 34: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

CSC 240 (Blum)

34

Database

Database Server

Network ServerNetwork ServerNetwork Server

A scenario

Client Client Client Client Client Client

Page 35: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

The client-server model is a way for transactions to take place. The transaction is viewed as a service. The client requests the service. The server provides the service.

For example, to query a networked database A client would request the network server(s) to

connect it to the database server The database server queries the database The result is passed from database server to network

server to client. The client-server terminology can be applied to both

software and hardware.

CSC 240 (Blum)

35

Page 36: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

In large-scale client-server interaction, there may be many intermediate client-server interactions (e.g. the network servers become clients of the database server).

The software and hardware near the beginning of the transaction (initial request) is called front-end while that near the ultimate providing of the service is known as back-end.

In the analogy of getting a meal at a restaurant, the waiter is front-end and the cook is back-end.

CSC 240 (Blum)

36

Page 37: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

The bulk of the software is contained in the database management system (DBMS). It handles everything from storage and structure to security and integrity.

There may also be application software that interfaces with the DBMS.

The DBMS allows one to interface with the database on a higher level.

CSC 240 (Blum)

37

Page 38: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

In a file-based approach, one’s program is a step-by-step procedure explicitly determining how a question will be answered Read this file, parse it this way, create these

objects, search them this way This approach is sometimes called

prescriptive Prescription originally meant a set of instructions

for preparing and/or taking a drug, only later did the word become synonymous with the drug itself.

CSC 240 (Blum)

38

Page 39: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

In the database approach, most of the nitty-gritty, step-by-step instructions are hidden in the DBMS and the user need only describe the data (the meta-data, the self-describing database) and describe what he or she wants from the data.

This approach is sometimes called descriptive.

CSC 240 (Blum)

39

Page 40: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

People talk about generations of programming languages or the level of a language. A first generation language (1GL) is machine code, that

is, a binary representation of instructions (e.g. 11001101) A second generation language (2GL) is assembly

language, that is, mnemonics for machine code (e.g. STA 13)

A third generation language (3GL) is a high-level language, which includes most compiled languages, such as Fortran, C, BASIC, Java, etc. (e.g. int a = 13)

A fourth generation language (4GL) is used to develop database applications. They are designed to be closer to natural language.

CSC 240 (Blum)

40

Page 41: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

SQL (Structured Query Language), pronounced S-Q-L or See-Quel, has become the standard language for relational databases.

SQL is part third generation and part fourth generation.

CSC 240 (Blum)

41

Page 42: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

The original version was called SEQUEL and was developed at IBM in the mid-70’s.

However, Oracle Corporation was the first company to use SQL in a commercial product in 1979.

CSC 240 (Blum)

42

Page 43: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

SQL has 3 components: Data Definition Language (DDL)

The part that allows you to establish the structure of the database

Data Manipulation Language (DML) The part that allows you to enter data,

update data and ask questions of the data (queries)

Data Control Language (DCL) The part that allows you to add security

features (e.g. user authentication), concurrency (multi-user) features, recovery features, etc.

CSC 240 (Blum)

43

Page 44: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Data Administrator (DA): Oversees data resources. More of a hands-off role. Deals with other managers. Sets policies. Handles budgets. Plans for future.

CSC 240 (Blum)

44

Page 45: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Database administrator (DBA): More hands-on and more technical than

the Data Administrator (DA) Oversees hardware and software design,

implementation and maintenance Responsible for security and integrity Ensures users have appropriate

accessibility Etc.

CSC 240 (Blum)

45

Page 46: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Database Designer: Logical database designer:

Identifies entities, fields, relationships Applies high-level constraints including the business

rules E.g. A Simpsons database might have a business rule that

there must be between 10 and 30 episodes in a complete season

Physical database designer: Actually creates tables Implements constraints Introduces security measures Etc.

CSC 240 (Blum)

46

Page 47: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Application Developer: After the overall structure of the database

is laid out and implemented, the application developer considers the more individual needs, such as what software does the payroll department need

Application may involve third-generation or fourth generation languages or a combination

E.g. a Visual Basic program could use an SQL statement to query a database

CSC 240 (Blum)

47

Page 48: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

End-Users: Naïve:

Has little to no database knowledge Uses applications that simplify interaction with

the database Cashier scanning an item’s barcode

Sophisticated: Knows something to a lot about databases May use SQL to update or query database

CSC 240 (Blum)

48

Page 49: An introduction to database concepts and vocabulary CSC 240 (Blum) 1

Database Systems Rob and Coronel Database Systems, Connolly and Begg SQL for Dummies, Taylor http://www.metacard.com/wp1a.html http://www.oracle.com/glossary/index.html?

axx.html Concepts of Database Management,

Pratt and Adamski

CSC 240 (Blum)

49