mis, database management system. management information system
TRANSCRIPT
Database Management SystemsDatabase Management Systems
MIS
Learning Objectives
What are the problems of managing data resources in a traditional file environment?
What are the major capabilities of database management systems (DBMS) and why is a relational DBMS so powerful?
What are the principal tools and technologies for accessing information from databases to improve business performance and decision making?
Why are information policy, data administration, and data quality assurance essential for managing the firm’s data resources?
Organizing Data in a Traditional File Environment
• File organization Term and Concepts• Computer system organizes data in a hierarchy
• Bit: Smallest unit of data; binary digit (0,1)• Byte: Group of bits that represents a single
character• Field: Group of characters as word(s) or
number• Record: Group of related fields• File: Group of records of same type
Organizing Data in a Traditional File Environment
• File organization Term and Concepts• Computer system organizes data in a hierarchy
• Database: Group of related files• Entity: Person, place, thing on which we
store information.• Attribute: Each characteristic, or quality,
describing entity• E.g., Attributes Date or Grade belong to entity
COURSE
The Data HierarchyThe Data HierarchyA computer system organizes data in a hierarchy that starts with the bit, which represents either a 0 or a 1. Bits can be grouped to form a byte to represent one character, number, or symbol. Bytes can be grouped to form a field, and related fields can be grouped to form a record. Related records can be collected to form a file, and related files can be organized into a database.
Organizing Data in a Traditional File Environment
Traditional File ProcessingTraditional File Processing
The use of a traditional approach to file processing encourages each functional area in a corporation to develop specialized applications and files. Each application requires a unique data file that is likely to be a subset of the master file. These subsets of the master file lead to data redundancy and inconsistency, processing inflexibility, and wasted storage resources.
Organizing Data in a Traditional File Environment
Database
• A database is a collection of information that is organized so that it can easily be accessed, managed, and updated. In one view, databases can be classified according to types of content: bibliographic, full-text, numeric, and images.
• A database is a logically coherent collection of data with some inherent meaning, representing some aspect of real world and which is designed, built and populated with data for a specific purpose. A database is not necessarily computerized. It can be generated and maintained manually, or it may be computerized.
Database
Databases are used in every part of day-to-day life. Examples of common database use include: depositing or withdrawing money from a bank, making a travel reservation, accessing a library catalog, buying something from the internet etc. These are examples of traditional database applications, where data is stored either in textual or numeric format. Less traditional database applications that are starting to become more popular include multimedia databases, which store pictures, video clips, and sounds, Geographic Information Systems (GIS) that store maps, satellite images and weather data.
Database Management System
A database management system (DBMS) is a collection of interrelated data and a set of programs to access those data. The collection of data, usually referred to a database, contains information relevant to an enterprise.
Database Management System
A database management system is a collection of programs that enable users to create and maintain a database. -----Elmarsi & Navathe
A database management system, or DBMS, is a software designed to assist in maintaining and utilizing large collections of data. --------- Ramakrishnan & Gehrke
Database Systems Vs File Systems ( Why DBMS?)
Ordinary file system has a number of major drawbacks:
1. Data redundancy and inconsistency- Multiple file formats, duplication of information in different files.
2.Difficulty in accessing data- Need to write a new program to carry out each new task
Database Systems Vs File Systems ( Why DBMS?)
3.Data isolation -Multiple files and formats
4.Integrity problems - Integrity constraints (e.g. account balance >
0) become part of program code - Hard to add new constraints or change
existing ones.
Database Systems Vs File Systems ( Why DBMS?)
5. Atomicity problems - Failures may leave database in an
inconsistent state with partial updates carried out. E.g., transfer of funds from one account to another should either complete or not happen at all.
Database Systems Vs File Systems ( Why DBMS?)
6. Concurrent-access anomalies- Needed for system performance and usability
- Uncontrolled concurrent accesses can lead to inconsistencies. E.g. two people reading a balance and updating it at the same time.
Database Systems Vs File Systems ( Why DBMS?)
7. Security problems: Not every user of the database system should be able to access all the data.
Database systems offer solutions toall these problems
Some Commercial Database Management Software
For Personal Computers
1. Microsoft Access2. FoxPro3. dBase
Some Commercial Database Management Software
1.Oracle – Oracle 8i, Oracle9i, Oracle 10g, 11g 2. Microsoft SQL Server3. IBM DB2/DB2UDB4. Informix5. Sybase
6. Ingress
Some Open Source Database Management Software
1. CUBRID 2. Firebird
3. MariaDB 4. MongoDB
5. Postgre SQL 6. MySQL
7. SQLite
Database System Applications
Databases are widely used. Some representative applications are:
1. Banking: For customer information, accounts, loans and banking transactions.2. Airlines/Railways/Road Transport: For ticket reservation, schedules and routes.
Database System Applications
3. Universities: For student information, courses and grades (education management).
4.Credit card transaction: For purchases on credit card, monthly statement generation
Database System Applications
5.Telecommunication: For keeping records of call made, generating monthly bills, maintaining balances on prepaid calling cards, storing information about the communication networks.
Database System Applications
6.Finance: For storing information about holdings, sales, and purchases of financial instruments such as stocks and bonds.
7. Sales: For customer, product and purchase information.
Database System Applications
8. Manufacturing: For management of supply chains and for tracking production of items in factories, inventories of items in warehouses/stores and order for items. 9. Human resources: For information about employees, salaries, payroll taxes and benefits and for generation of pay checks.
Data Models
A data model is a collection of conceptual tools for describing data, data relationship, data semantics and consistency constraints.
A. Base Models: Describes the design of the database at the logical level.
Data Models
A. 1. Entity-Relationship Model: This is a higher-level data model. It is based on a perception of a real world that consists of a collection of basic objects, called entities and the relationship among these objects.
Data Models
Entity: An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. An entity has a set of properties, called attributes and the values for some set of properties/attributes may uniquely identify an entity. An entity may be concrete, such as a person or a book, or it may be abstract, such as loan, or a holiday, or a concept.
Data Models
customer- customer-id, customer-name, customer-street, customer-city
loan – loan-number, amount
Data Models
Relationship: A relationship is an association among several entities. A depositor relationship associates a customer with each account that he or she has.
The set of all entities of the same type and the set of all relationships of the same type are termed an entity set and relationship set, respectively
Data Models
Data Models
A. 2. Relational Model: This is a lower level model. It uses a collection of tables to represent both data and relationships among those data.
Each table has multiple columns, and each column has a unique name.
Data Models
The relational model is an example of a record-based model. This is because the database is structured in fixed-format records of several types. Each table contains records of a particular type. Each record type defines a fixed no. of fields, or attributes. The columns of the table correspond to the attributes of the record type.
Data Models
The relational model is the most widely used data model and a vast majority of current database systems are based on the relational model.The relational model is at a lower level of abstraction than the E-R model. Database designs are often carried out in the E-R model and then translated to the relational model.
Data Models
Data Models B. Other Models: B. 1. Object-oriented data model: Drawing increasing
attention. It can be seen as extending of E-R model with notions of encapsulation, methods (functions) and object identity.
An object database (also object-oriented database management system, OODBMS) is a database management system in which information is represented in the form of objects as used in object-oriented programming. Object databases are different from relational databases which are table-oriented.
Data Models
Data Models
B. 2. Object-relational data model: Combines the features of object-oriented data model and relational data model.
An object-relational database (ORD), or object-relational database management system (ORDBMS), is a database management system (DBMS) similar to a relational database, but with an object-oriented database model: objects, classes and inheritance are directly supported in database schemas and in the query language.
Data Models B. 3. Semi-structured data model: Permits the
specification of data where individual data items of the same type may have different sets of attributes. The extensible markup language (XML) is widely used to represent semi-structured data.
The semi-structured model is a database model where there is no separation between the data and the schema, and the amount of structure used depends on the purpose. It can represent the information of some data sources that cannot be constrained by schema.
Data Models
Data Models
C. Historical Models: These are in little use now.
C. 1. Network data model
C. 2. Hierarchical model
Data Models
The network model is a database model conceived as a flexible way of representing objects and their relationships. Its distinguishing feature is that the schema, viewed as a graph in which object types are nodes and relationship types are arcs, is not restricted to being a hierarchy or lattice.
Network DBMS:• Depicts data logically as many-to-many
relationships
The Network Data Model
Data Models
TYPES OF RELATIONS
ONE-TO-ONE:ONE-TO-ONE: STUDENT ID
ONE-TO-MANY:ONE-TO-MANY: CLASS
STUDENTA
STUDENTB
STUDENTC
MANY-TO-MANY:MANY-TO-MANY:
STUDENTA
STUDENTB
STUDENTC
CLASS1
CLASS2
Data Models
A hierarchical database model is a data model in which the data is organized into a tree-like structure. The data is stored as records which are connected to one another through links. A record is a collection of fields, with each field containing only one value.
Hierarchical DBMS:
• Organizes data in a tree-like structure
• Supports one-to-many parent-child relationships
• Prevalent in large legacy systems
Data Models
A Hierarchical Database for a Human Resources System
HUMAN RESOURCES DATABASE WITH MULTIPLE VIEWS
A single human resources database provides many different views of data, depending on the information requirements of the user. Illustrated here are two possible views, one of interest to a benefits specialist and one of interest to a member of the company’s payroll department.
Primary Key and Foreign Key
Each record requires a key field, or unique identifier. The best example of this is your social security number—there is only one per person. That explains in part why so many companies and organizations ask for your social security number when you do business with them.
Primary Key and Foreign Key
In a relational database, each table contains a primary key, a unique identifier for each record. To make sure the tables relate to each other, the primary key from one table is stored in a related table as a foreign key. For instance, in the customer table below the primary key is the unique customer ID. That primary key is then stored in the order table as the foreign key so that the two tables have a direct relationship.
Primary Key and Foreign Key
Customer Table Order Table
Field Name Description Field Name Description
Customer Name
Self-Explanatory Order Number Primary Key
Customer Address
Self-Explanatory Order Item Self-Explanatory
Customer ID
Primary Key Number of Items Ordered
Self-Explanatory
Order Number
Foreign Key Customer ID Foreign Key
Relational DatabaseThere are two important points you should remember about creating and maintaining relational database tables. First, you should ensure that attributes for a particular entity apply only to that entity. That is, you would not include fields in the customer record that apply to products the customer orders. Fields relating to products would be in a separate table. Second, you want to create the smallest possible fields for each record. For instance, you would create separate fields for a customer’s first name and last name rather than a single field for the entire name. It makes it easier to sort and manipulate the records later when you are creating reports.
Relational Database
Name Address Telephone number
John L. Jones 111 Main St Center City Ohio 22334
555-123-6666
First Name
Middle Initial
Last Name
Street City State Zip Telephone
John L. Jones 111 Main St
Center City
Ohio 22334 555-123-6666
Wrong way:
Right way:
THE THREE BASIC OPERATIONS OF A RELATIONAL DBMS
The select, join, and project operations enable data from two different tables to be combined and only selected attributes to be displayed.
Non-Relational Databases and Databases in the Cloud
Data are now stored in text messages, social media postings, maps, and the like. Non-relational database management systems are better at managing large data set on distributed computing networks. They can easily be scaled up or down depending on the particular needs of your business at a particular time.
Cloud computing service companies provide a way for you to manage your company’s data through Internet access using a Web browser.
Non-Relational Databases and Databases in the Cloud
Non-relational databases: “NoSQL” More flexible data model Data sets stored across distributed machines Easier to scale Handle large volumes of unstructured and structured data (Web, social media, graphics)
Databases in the cloud Typically, less functionality than on-premises DBs Amazon Relational Database Service, Microsoft SQL
Azure Private clouds
Capabilities of Database Management Systems (DBMSs)
There are three important capabilities of DBMS that traditional file environments lack—data definition, data dictionary, and a data manipulation language. –Data definition capability: Specifies structure of database content, used to create tables and define characteristics of fields–Data dictionary: Automated or manual file storing definitions of data elements and their characteristics–Data manipulation language: Used to add, change, delete, retrieve data from database
• Structured Query Language (SQL)• Microsoft Access user tools for generating SQL
–Many DBMS have report generation capabilities for creating polished reports (Crystal Reports)
MICROSOFT ACCESS DATA DICTIONARY FEATURES
Microsoft Access has a rudimentary data dictionary capability that displays information about the size, format, and other characteristics of each field in a database. Displayed here is the information maintained in the SUPPLIER table. The small key icon to the left of Supplier_Number indicates that it is a key field.
AN ACCESS QUERY
Capabilities of Database Management Systems (DBMSs)
• Designing Databases– Conceptual (logical) design: abstract model from business perspective– Physical design: How database is arranged on direct-access storage
devices• Design process identifies:
– Relationships among data elements, redundant database elements– Most efficient way to group data elements to meet business
requirements, needs of application programs• Normalization
– Streamlining complex groupings of data to minimize redundant data elements and awkward many-to-many relationships
Normalization
Normalization: Database Normalization is a technique of organizing the data in the database. Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step process that puts data into tabular form by removing duplicated data from the relation tables.
Database normalization, or simply normalization, is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity.
Normalization Normalization is a process of organizing the data in database
to avoid data redundancy, insertion anomaly, update anomaly & deletion anomaly.
Through normalization process, the collection of data in a single table is replaced, by the same data being distributed over multiple tables with a specific relationship being setup between the tables.
Streamlining complex groupings of data to minimize redundant data elements and awkward many-to-many relationships.
Process of creating small stable data structure from complex groups of data.
AN UNNORMALIZED RELATION FOR ORDER
An unnormalized relation contains repeating groups. For example, there can be many parts and suppliers for each order. There is only a one-to-one correspondence between Order_Number and Order_Date.
NORMALIZED TABLES CREATED FROM ORDER
After normalization, the original relation ORDER has been broken down into four smaller relations. The relation ORDER is left with only two attributes and the relation LINE_ITEM has a combined, or concatenated, key consisting of Order_Number and Part_Number.
Tools for Improving Business Performance and Decision Making
Business intelligence infrastructure Today includes an array of tools for separate systems,
and big data Contemporary tools:
Data warehouses Data marts Hadoop In-memory computing Analytical platforms
Data Warehouses
A data warehouse is a large store of data accumulated from a wide range of sources within a company and used to guide management decisions.
A data warehouse is a collection of data drawn from other databases used by the business.
It is a database that stores current and historical data of potential interest to decision makers throughout the company.
Data Warehouses
Supports reporting and query tools Stores current and historical data Consolidates data for management
analysis and decision making Improved and easy accessibility to
information Ability to model and remodel the data
Components of a Data WarehouseDATABASE TRENDS
Data Mart
The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data warehouses have an enterprise-wide depth, the information in data marts pertains to a single department.
A data mart represents the specific data from a data warehouse which a user needs.
It is a subset of data warehouse in which a summarized or highly focused portion of the organization’s data is placed in a separate database for a specified function or group of users.
CONTEMPORARY BUSINESS INTELLIGENCE INFRASTRUCTURE
A contemporary business intelligence infrastructure features capabilities and tools to manage and analyze large quantities and different types of data from multiple sources. Easy-to-use query and reporting tools for casual business users and more sophisticated analytical toolsets for power users are included.
Tools for Improving Business Performance and Decision Making
Hadoop Enables distributed parallel processing of big
data across inexpensive computers Key services
Hadoop Distributed File System (HDFS): data storage MapReduce: breaks data into clusters for work Hbase: NoSQL database
Used by Facebook, Yahoo, NextBio
Tools for Improving Business Performance and Decision Making
In-memory computing Used in big data analysis Uses computers main memory (RAM) for data storage
to avoid delays in retrieving data from disk storage Can reduce hours/days of processing to seconds Requires optimized hardware
Analytic platforms High-speed platforms using both relational and non-
relational tools optimized for large datasets
Tools for Improving Business Performance and Decision Making
Analytical tools: Relationships, patterns, trends
– Tools for consolidating, analyzing, and providing access to vast amounts of data to help users make better business decisions• Multidimensional data analysis (OLAP)• Data mining• Text mining• Web mining
Data Mining
Data mining (sometimes called data or knowledge discovery) is the process of analyzing data from different perspectives and summarizing it into useful information that can be used to increase revenue, cuts costs, or both. It is a process used by companies to turn raw data into useful information.
Data mining is the analysis of data for relationships that have not previously been discovered. For example, the sales records for a particular brand of tennis. It is the technique of searching for patterns in the data.
Tools for Improving Business Performance and Decision Making
• Online analytical processing (OLAP)– Supports multidimensional data analysis
• Viewing data using multiple dimensions• Each aspect of information (product, pricing, cost,
region, time period) is different dimension• Example: How many washers sold in the East in
June compared with other regions?– OLAP enables rapid, online answers to ad hoc
queries
Multidimensional Databases
A multidimensional database presents the data to the user in several dimensions. A three dimensional database might present the information by Sales Region Season Product Line
Tools for Improving Business Performance and Decision Making
MULTIDIMENSIONAL DATA MODEL: The view that is showing is product versus region. If you rotate the cube 90 degrees, the face that will show product versus actual and projected sales. If you rotate the cube 90 degrees again, you will see region versus actual and projected sales. Other views are possible.
Tools for Improving Business Performance and Decision Making
Data mining: Data mining technology allows a digital firm to get more information than ever before from its data. Finds hidden patterns, relationships in datasets
Example: customer buying patterns Infers rules to predict future behavior Types of information obtainable from data mining:
Associations Sequences Classification Clustering Forecasting
Tools for Improving Business Performance and Decision Making
Text mining: Text mining tools help scrub text files to find data or to discern patterns and relationships.
Extracts key elements from large unstructured data sets Stored e-mails Call center transcripts Legal cases Patent descriptions Service reports, and so on
Sentiment analysis software Mines e-mails, blogs, social media to detect opinions
Tools for Improving Business Performance and Decision Making
• Web mining– Discovery and analysis of useful patterns and
information from Web– Understand customer behavior– Evaluate effectiveness of Web site, and so on
– Web content mining• Mines content of Web pages
– Web structure mining• Analyzes links to and from Web page
– Web usage mining• Mines user interaction data recorded by Web server
Tools for Improving Business Performance and Decision Making
• Databases and the Web– Many companies use Web to make some internal
databases available to customers or partners– Typical configuration includes:
• Web server• Application server/middleware/CGI scripts• Database server (hosting DBMS)
– Advantages of using Web for database access:• Ease of use of browser software• Web interface requires few or no changes to database• Inexpensive to add Web interface to system
Database Users
Users are differentiated by the way they expect to interact with the system. Four different types:
1. Naive users – are unsophisticated users who interact with the system by invoking one of the permanent application programs that have been written previously.
E.g. people accessing database over the web, bank tellers, clerical staff
Database Users
2. Application programmers – are computer professionals who write application programs. Application programmers can choose from many tools to develop user interface.
3. Sophisticated users – interact with the system without writing programs. Instead, they form their requests in a database query language. Analysts who submits queries to explore data in the database.
Database Users
Engineers, scientists, analysts who implement applications to meet their requirements.
e.g., analyst looking at sales data (OLAP – Online analytical processing), data mining – finds certain kinds of patterns in data.
Database Users
4. Specialized users – are sophisticated users who write specialized database applications that do not fit into the traditional data processing framework.
e.g., computer-aided design systems, knowledge-base and expert systems and environment-modeling systems – uses complex data types.
LINKING INTERNAL DATABASES TO THE WEB
Users access an organization’s internal database through the Web using their desktop PCs and Web browser software.
Managing the Firm’s Data Resources
Establishing an information policy Firm’s rules, procedures, roles for sharing, managing,
standardizing data Data administration
Establishes policies and procedures to manage data Data governance
Deals with policies and processes for managing availability, usability, integrity, and security of data, especially regarding government regulations
Database administration Creating and maintaining database
Managing the Firm’s Data Resources
• Ensuring data quality – More than 25 percent of critical data in Fortune 1000
company databases are inaccurate or incomplete– Redundant data– Inconsistent data– Faulty input
– Before new database in place, need to:• Identify and correct faulty data • Establish better routines for editing data once database in
operation
Managing the Firm’s Data Resources
• Data quality audit:– Structured survey of the accuracy and level of
completeness of the data in an information system• Survey samples from data files, or• Survey end users for perceptions of quality
• Data cleansing– Software to detect and correct data that are incorrect,
incomplete, improperly formatted, or redundant– Enforces consistency among different sets of data from
separate information systems
Managing the Firm’s Data Resources
• Data quality audit:– Structured survey of the accuracy and level of
completeness of the data in an information system• Survey samples from data files, or• Survey end users for perceptions of quality
• Data cleansing– Software to detect and correct data that are incorrect,
incomplete, improperly formatted, or redundant– Enforces consistency among different sets of data from
separate information systems
ThankYou