1 database technology prof. hyoung-joo kim internet database lab school of computer sci & eng...

Post on 26-Dec-2015

218 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

11

Database Technology

Prof. Hyoung-Joo Kim Internet Database Lab

School of Computer Sci & EngSeoul National University

22

Contents

Research in IDB Lab.

• A general survey of DBMS• History of DBMS• Database market share• The current DBMS trend

33

What is a Database?(1/10)DBMS

A software system which provides the environment enables to store and retrieve massive data effectively

44

What is a Database?(2/10)A large collection of dataData + Programs

DatabaseDatabase

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

Data

STORESTORE

55

What is a Database?(3/10) Information about register and course of

40,000 students of the Seoul Natl’ Univ.course term register grade prof

45 courses,10K records per student

10K Byte * 40,000 = 400M Byte Others: library, health center, S-card, …

course term register grade prof

66

What is a Database?(4/10) Information of SAT management

profile answer rate ranking …

Profile Answer Rate ranking …

8K records per student

8K Byte * 550,000 = 4.4G Byte (109)

Year 2006: 550,000

Year 2005: 570,000

77

What is a Database?(5/10) Information of mobile phone

phone number station time …

phone number station time …

60KB record per one

39M * 60 Byte * 5calls/day * 365 days = 4T Byte Korea 2006.7

China 370M in 2005

88

What is a Database?(6/10) Information of resident registration

SSN name addr domicile …

SSN name addr domicile …10KB record per one

10K Byte * 470 M = 5T Byte (47millions)

99

What is a Database?(7/10) Google database

8billion’s Websites, 2billion’s indexing terminology management

Usenet archive = 700 Million messages * 20KB/message = 14 TB

1010

What is a Database?(8/10) Hubble space telescope data from Mars

Data constructed by 2005 : over 12 TB

Constructing and sending 3~5GB’s data abroad daily

1111

What is a Database?(9/10) NCBI (National Center for Biotechnology Information)

GenBank• management of information of 165,000 species• add 3million’s new DNA sequence monthly

1212

What is a Database?(10/10) Genome map of Koreans

Venture “MacroGen” SNU Medical School

Early version: 900G Byte Final product: 15T Byte

1313

What do we do with Database?(1/2) Record search

Retrieve math grade of the student whose SSN is “840101-12121”

DBMS

12ms to fetch a record and check content

740,000 * 5 records = 3.7 M records

3.7M * 12ms = 44.4Kseconds = over 12 hours

If we use DBMS, it will be less than 0.1sec!

Statistical processingfor population census

Search for the correlationbetween gene and disease

Search for the purchase pattern on customer groups

1414

What do we do with Database?(2/2) Most (all?) computing applications use some type of a database

EDPS

MIS, ERP

OLTP

Data WarehouseERP

CRM

Database DatabaseDatabase

Database

1515

Warehouse

Database Management System (DBMS) (1/3)

1616

Warehouse

Warehouse keeper

Database Management System (DBMS) (2/3)

1717

user

Management of orders on-line

DBMS

Database

Management ofwages

Management ofmanager info.

profile

salestock

product

customer

Application

Database Management System (DBMS) (3/3)

1818

DBMS Architecture

naiveusers

naiveusers application

programmersapplication

programmers casualusers

casualusers database

administratordatabase

administrator

applicationprograms

applicationprograms system

callssystem

calls queryquery databasescheme

databasescheme

filemanager

filemanager

applicationprograms

object

applicationprograms

objectdatabasemanager

databasemanager

data manipulationlanguage

pre-compiler

data manipulationlanguage

pre-compilerquery

processorquery

processor data definitionlanguagecompiler

data definitionlanguagecompiler

DBMS

Disk storage

1919

A Sample Relational Database

2020

SQL SQL: widely used commercial query language

E.g. find the name of the customer with customer-id 192-83-7465select customer.customer-namefrom customerwhere customer.customer-id = ‘192-83-7465’

E.g. find the balances of all accounts held by the customer with customer-id 192-83-7465

select account.balancefrom depositor, accountwhere depositor.customer-id = ‘192-83-7465’ and

depositor.account-number = account.account-number

2121

Major Commercial DBMS in 2006(1/3)

Market Leader

Stability

Mass storage literacy

Famous CEO

10g

2222

Major Commercial DBMS in 2006(2/3)

PC based (Windows NT)

Microsoft!!!

Integration with Window NT/XP

2323

Major Commercial DBMS in 2006(3/3)

Stability

Mainframe

Informix purchase

IBM

2424

Database Companies in the World

2525

Contents

Research in IDB Lab.

• A general survey of DBMS• History of DBMS• Database market share• The current DBMS trend

2626

Hierarchical, Network DBMS

Drawback: impossible to make out independent application

Advantage: quick data access using link

DMS 1100 (Sperry), Total (Cincom)

IMS (IBM), System/2000(MRA)

The early 70’

2727

Network Database example

Query

What’s the total balance of Mr. Shiver in Bronx?

Lowery Maple Queens Hodges SideHill

Brooklyn

Shiver North Bronx

900 556 647 647 801

Customer records

Amount records

Root Record

2828

Network DB query example

sum:=0

get first customer where customer.name=“Shiver” and customer.city =“Bronx”;

while DB_status = 0 do begin

sum:=sum+customer.amount;get next customer

where customer.name = “Shiver”

and customer.city =“Bronx”; end print(sum);

2929

Relational DBMSThe late 70’ and early 80’

E.F.Codd, 1970 CACM paper, “The Relational Data Model”

Relational Algebra & Calculus The Spartan Simplicity! SQL: Structured Query Language System/R - 1976, first commercial RDBMS Ingres - 1976, first academic RDBMS

3030

Relational DBMS example

Select sum(amount) from customer where customer.name = “Shiver” and customer.city=“Bronx”;

name street city amount

Lowerly Maple Queens 900

Shiver North Bronx 556

Shiver North Bronx 647

Hodges SideHill Brooklyn 801

Hodges SideHill Brooklyn 647

3131

The advent of new DB application in 80’ (1/4)

Rich data model & DBMS function

Multimedia: IMAGE, TEXT, AUDIO, VIDEO, etc.

Telecommunication

Artificial Intelligence: Expert systems

CAD/CASE/CAM: massive design data

3232

Massive design data in CAD/CASE/CAM

The advent of new DB application in 80’ (2/4)

name street city amount

Lowerly Maple Queens 900

Shiver North Bronx 556

Shiver North Bronx 647

Hodges SideHill Brooklyn 801

Hodges SideHill Brooklyn 647

Previous DATA CAD DATA

3333

Artificial Intelligence: Expert systems

The advent of new DB application in 80’(3/4)

name street city amount

Lowerly Maple Queens 900

Shiver North Bronx 556

Shiver North Bronx 647

Hodges SideHill Brooklyn 801

Hodges SideHill Brooklyn 647

Previous DATA Expertise DATA

Vehicle disorder

Control Drive

Break Handle Gearbox Engine

Symptoms

conclusion : engine ECU disorder

3434

Multimedia: image, audio, video

The advent of new DB application in 80’(4/4)

name street city amount

Lowerly Maple Queens 900

Shiver North Bronx 556

Shiver North Bronx 647

Hodges SideHill Brooklyn 801

Hodges SideHill Brooklyn 647

Previous DATA MULTIMEDIA DATA

3535

Advent of Object Oriented DBMS

17

The mid 80’ ~ mid 90’The mid 80’ ~ mid 90’

Research prototypeORION, POSTGRES, ENCORE/ObServer

Research prototypeORION, POSTGRES, ENCORE/ObServer

Commercial Products:

O2, ObjectStore, Objectivity, Versant, etc. Commercial Products:

O2, ObjectStore, Objectivity, Versant, etc.

ODMG-93 OODB standardODMG-93 OODB standard

3636

Feature of Object Oriented DBMS

Large object

Persistent programming language

Semantic Data Model extensionVersion & Composite object

Object-Oriented Paradigm supportobject, object identity,

go back to traversal Network DB?Class hierarchy, inheritance

Long-duration transaction

3737

Object Oriented Database example

ISA relationshipIs-part-of relationship

name street city amount

Lowerly Maple Queens 900

Shiver North Bronx 556

Shiver North Bronx 647

Hodges SideHill Brooklyn 801

Hodges SideHill Brooklyn 647

3838

OQL query of Object Oriented DBMS

select sum(customer.deposit.balance)

from Customer customer

where customer.name = “Shiver”

and customer.deposit.branch.city = “Bronx”;

3939

Object Relational DBMS

1980 – 1985: ORDBMS Research PrototypePostGres by UC Berkeley

System/R Engineering Extension

1980 – 1985: ORDBMS Research PrototypePostGres by UC Berkeley

System/R Engineering Extension

Relational DBMS with Object Oriented function Extension within SQL & Tables!

The early 90’: OODBMS (Illustra, UniSQL, Mattise) downfall

1997, Big3 ORDBMS advent

Relational DBMS with Object Oriented function Extension within SQL & Tables!

The early 90’: OODBMS (Illustra, UniSQL, Mattise) downfall

1997, Big3 ORDBMS advent

4040

Object Relational Database example

name street city amount

Lowerly Maple Queens 900

Shiver North Bronx 556

Shiver North Bronx 647

Hodges SideHill Brooklyn 801

Hodges SideHill Brooklyn 647

4141

Principal functions of Object Relational DBMS

LOB(large object)

supportAbstract

Data Type

support

Type

Inheritance

support

User definedtype &

Stored proceduresupport

Application

domain specific

extension support

SQL procedureextension

Rule/trigger

System support

4242

Product of Object Relational DBMS

ORACLE-8 Universal Server

Informix Universal Server

IBM DB2 Universal Database

Sybase Adaptive Server

Microsoft Access

4343

Contents

Research in IDB Lab.

• A general survey of DBMS• History of DBMS• Database market share• The current DBMS trend

4444

DBMS market share(1/2) Worldwide market share for biggest sellers

of corporate databases, 2005

Oracle IBM Microsoft

48.6%22%

15%

Source: Gartner Dataquest

4545

DBMS market share(2/2) Worldwide sales for biggest sellers of

corporate databases, 2005

Source: Gartner Dataquest

0

1

2

3

4

5

6

7

OracleIBMMicrosoft

billions of dollars

$6.7

$3.0$2.1

4646

Domestic DBMS market share

source : Report for database industry and perspective in Korea, 2004

4747

Domestic DBMS market sales Domestic market share for biggest sellers of

corporate databases, 2004

0

10

20

30

40

50

60

OracleIBMMicrosoft

₩ 57.2

₩ 25.1₩ 45.3

Source: Gartner Dataquest, South Korea(2005)

billions of won

4848

Preference in domestic market

Others 3%

source : Report for database industry and perspective in Korea, 2004

4949

Contents

Research in IDB Lab.

• A general survey of DBMS• History of DBMS• Database market share• The current DBMS trend

5050

XML Technology(1/2) The late 90’ and now What is XML1)?

Developed by the W3C Semi-structured text for dissemination and publication Self-describing

1) eXtensible Markup Language

<tr> <td> <font color=“red”> 이름 </font> </td> <td> 홍길동 </td></tr><tr> <td> <b> 주소 </b> </td>

<person> <name> 홍길동 </name> <city> 서울 </city> <age>20</age> …</person>

Tagging for Display Tagging for structure and semantics

HTML XML

5151

XML Technology(2/2) Why XML

Standard data format for storing and exchange

<person> <name> 홍길동 </name> <city> 서울 </city> …</person>

XML

5252

Semantic Web(1/2) 기존의 web:

1) 환자가 검색 엔진에서 치과를 검색 2) 자신의 장소와 가까운 치과의 홈페이지를 찾음 3) 치과의 진료 스케줄을 확인하고 자신과 시간이 맞을 경우 예약 예약을 하기까지 다수의 반복 작업 필요

appointment schedule

Patient

clinic’s web pages

search engine

5353

Semantic Web(2/2) Semantic web:

Semantic web 으로 다음의 정보가 구축된 상태 환자의 개인 스케줄 , 각 치과의 위치 , 진료 과목 , 진료

1) 환자는 software agent 에게 예약 요청 2) 각 병원의 홈페이지의 내용이나 구조가 다르더라도 software agent 가 환자와

치과의 시멘틱웹 데이타를 분석 , 환자의 시간과 위치에서 진료 가능한 치과를 예약해 줌

Software Agents

clinic’s web pages (with Semantic web)

appointment schedule

Patient

5454

Knowledge discovery

Database

decision

Knowledge DiscoveryProcessing: Data mining

Data Warehouse

useful,interestinghiddeninformation

apply

5555

Data warehouse(1/2) Storing data of time Analyze the pattern in times Summarized data Observation data in various view point Non-volatile

Need for new data model: Dimensional model

5656

Data warehouse(2/2)

Sales Volumes

time Product

Sales person

Jan

Feb

Mar

WongStonebreaker

Dewitt AB

C

5757

Data mining(1/2) 넓은 의미

대상이 되는 데이터를 추출하는 단계에서부터 발견된 패턴을 정제 , 해석한 후 사람이 이해할 수 있는 언어[ 텍스트 , 그림 , 그래픽 ] 로 표현하는 단계까지를 포함

좁은 의미 대용량 데이터에서 흥미 있고 사람이 이해할 수 있는

패턴과 규칙성을 추출하는 여러 가지 알고리즘 [data mining algorithm] 또는 소프트웨어의 사용

5858

Data mining(2/2)

패턴발견

빵과 과자를 사는 사람의 80% 는 우유를 같이 산다분유와 기저귀를 사는 사람의 74% 는 맥주를 같이 산다

의사결정

맥주 소비는 분유와 기저귀 소비에 영향을 미침빵과 과자 가격 인상은 우유 소비에 영향을 미침

상품 진열대에 ( 빵 , 과자 , 우유 ), ( 분유 , 기저귀 , 맥주 ) 를 같이 진열우유 소비를 조절하기 위해 빵 , 과자 가격을 조정

업무적용

5959

The emerging challenges

EnvironmentRapid spread of

Web and Internet

Rapid developmentof H/W

Disks and RAM sizeAccess time Bandwidth

Sensor Streams, Scientific dataUncertain data, Information privacy

New areas emerging

Millions of usersConnected on Web

6060

The Emerging Challenges Sophisticated Data type support

sound video

image

temporal

spatial

New DBMS

Structured data

Unstructured data

6161

The Emerging Challenges Sensor streams

Battery constraint, communication cost

Rapidly changing configuration(Sensors die or disconnect)

Complex forms of information integration“Locate a person from the heat, sound and vibration sensors”

6262

The Emerging Challenges Reasoning about uncertain data

Scientific measurement errors Location data for moving objects Sequence, image and text similarity

Location data Sequence dataScientific measurement

6363

The Emerging Challenges Personalization

Different person, different answer

WEB CRM example

Web Site Entry

Page Views

Event:Select product

Insert item to Shopping Cart

Personalized View of Recommendation

Recommendation Engine

6464

The Emerging Challenges Privacy

How to support the protection of personal or sensitive information

Access by user and usage Include purpose description in query

Alice | 25K | …

John | 40K | …

Name | income | … We just want the statistics of the income not the personal information !

top related