data warehouse

Download Data Warehouse

Post on 23-Dec-2014




0 download

Embed Size (px)




  • 1. Kho d liu v H h tr quyt nh Nguyn Thanh Bnh

2. cng

  • Phn 1: Tng quan
  • Chng 1: Gii thiu
  • Chng 2: i cng
  • Chng 3: Kin trc kho d liu

3. cng (tt)

  • Phn 2: M hnh ha
  • Chng 4: D liu v cc m hnh
  • Chng 5: M hnh ha
  • Chng 6: Siu D liu
  • Chng 7: Phng thc kho d liu
  • Chng 8: Tng lai v tng kt mn hc

4. Chng1:Gii thiu 5. Vn :Cc ngun thng tin a tp

  • Cch mng thng tinvas bng n thng tin
  • Nhiu h thng thng tin c xy dng:
    • Nhng giao din khc nhau
    • Nhng dng biu d liu khc nhau
    • Thng tin trng lp v khng nht qun

6. Vn : Qun l d liutrong nhng x nghip ln

  • S phn mnh theo chiu dc trong cc h thng thng tin
    • Thnh nhiu h thng tc nghip v h thng x l ton tc trc tuyn (OLTP) a tp

7. Mc tiu:Truy cp d liu mt cch thng nht

  • Thu thp v kt hp thng tin
  • Cung cp mt khung nhn tch hp, giao din ngi s dngkhng bin i
  • H tr kh nng chia s

8. Kho d liu

  • D liu c tch hp v t chc cho cc mc ch:
    • Lm cho h thng tr nn d hiu
    • R rng
    • D phn tch
  • D liu c thu thp t nhiu ngun
    • Lm sch
    • Tch hp
    • Chuyn dch
    • Tng hp

9. H h tr quyt nh

  • c bit n nh l mt phn ca kho d liu
  • Cung cp cc bo co, phn tch tin tnh ton, cc th, biu
  • Cho php phn tch trc tuyn d liu
  • Thm d s tng tc d liu
  • Cung cp cc giao din a dng cho ngi dng
  • Cung cp kh nng phn tch d liu phc tp bng phng thc n gin

10. Yu cu ca h h tr quyt nh

  • Khung nhn d liu a chiu
  • H tr phn cp d liu, v kh nng i su vo chi tit
  • Tr li nhanh cc cu hi

11. Lch s pht trin

  • Khi u vo nhng nm 1990s
  • Thng 2 nm 1996, theo bo co ca nhm META:
    • 13 000 triu USD (phn cng: 8000, service:5000)
  • 1998: 14 600 triu USD
  • 2001: >20 000 tr USD

12. Lch s pht trin 1996 2001 0 5 10 15 20 25 1996 2001 Revenue Projected Growth USA Europe APAC Other 0 10 20 30 40 50 60 USA Europe APAC Other Installed Base Current Revenue 13. Ti sao nghin cu kho d liu

  • Kho lu tr d liu, thng tin, tri thc, v siu d liu
    • Tng hp ton b thng tin phc v cho phn tch su
    • Tch vic phn tch ra khi x l ton tc trc tuyn
  • Chuyn i d liu thnh thng tin
    • Cung cp thng tin chnh xc ng thi im v ng nh dng

14. Ti sao nghin cu kho d liu

  • Thi hnh cc phn tch d Iiu phc tp
  • Thc hin phn tch:
      • Phn tch nh hng
      • Phn tch chui thi gian
      • Phn tch ri ro
    • Thm d cc h h tr quyt nh
    • Khm ph v a ra cc yu t n thng qua cc k thut khai ph d liu

15. Cc c im ca kho d liu

  • Thit k cho cc cng vic phn tch
  • Thit k cho mt nhm nh ngi dng (decision makers)
  • Ch c
  • Cp nhp theo giai on: ch thm d liu
  • D liu lch s theo chiu thi gian
  • Cc cu hi tr v cc tp kt qu ln, a kt ni.
  • Ton cc

16. Cc v d

  • X l ton tc trc tuyn OLTP
    • S lng coca cola c va c bn
  • X l phn tch trc tuyn OLAP
    • S lng coca cola c bn thng trc ti cc ca hng pha bc tnh Tha thin Hu
    • Ca hng no pha bc tnh Tha thin Hu c s lng coca cola c bn ra thng trc ln nht
    • Thng no trong nm s lng coca cola c bn ra nhiu nht ti tnh Tha thin Hu

17. Cc ng dng ca KDL

    • Hng khng Airline
    • Ngn hng Banking
    • Chm sc sc khe Health care
    • u t Investment
    • Bo him Insurance
    • Bn l Retail
    • Vin thng
    • Cc ngnh cng nghip Manufacturers
    • Credit card suppliers
    • Clothing distributors

0 10 20 30 40 Financial Retail Telecom Manufacturing Other Percentage Market Coverage 18. Kho d liu-Cc nh ngha

  • W.H. Inmon
    • Hng ch th subject-oriented.
    • Tch hp integrated,
    • Bin thi gian time-variant,
    • Bn vng non-volatile
    • Su tp d liu phc v cho cc thao tc h tr quyt nh (collection of data in support of management's decision-making process)

19. Kho d liu-Cc nh ngha

  • Hng ch th subject-oriented.
    • Chuyn t hng ng dng sang hng h tr quyt nh
  • Tch hp integrated,
  • Bin thi gian time-variant,
    • so snh d liu theo chiu thi gian
  • B vng non-volatile, ch c thm vo v khng thay th

20. Kho d liu-Cc nh ngha Subject Oriented Integrated Time Variant Non Volatile Data Warehouse 21. Hung ch th

  • c t chc xung quanh cc ch th chnh, nh khch hng (customer), sn phm (product), bn hng (sales).
  • Tp trung vo vic m hnh ha v phn tch d liu cho cc nh a ra quyt nh, m khng tp trung vo cc hot ng hay cc x l ton tc hng ngy.
  • Cung cp mt khung nhn n gin v sc tch xung quanh cc s kin ca cc ch th

22. Subject Oriented

  • Data is categorized and storedby business subject rather than
  • by application.

Operational Systems Savings Shares Loans Insurance Equity Plans Customer Product, SalesInformation Data WarehouseSubject Area 23. Subject Areas

  • Cc ch th in hnh.
    • Cc ti khon khch hng
    • Vic bn hng
    • Tin tit kim ca khch hng
    • Cc yu sch bo him
    • t ch hnh khch

24. Integrated

  • c xy dng bng vic tch hp d liu t cc ngun d liu hn tp, a b
    • C s d liu quan h (relational databases), flat files, cc bng ghi ton tc trc tuyn.
  • Cc k thut lm sch v tch hp d liu c p dng
    • m bo s ng nht trong cc quy c tn, cu trc m ha, cc n v o thuc tnh, gia cc ngun khc nhau
      • V d nh: Hotel price: currency, tax, breakfast covered, ...
    • Khi d liu c chuyn n kho d liu, n s c chuyn i.

25. Integrated Data Warehouse Operational Environment Subject = Customer Savings Application CurrentAccounts Application Loans Application No Application Flavor 26. Integrated Data

  • D liu c tng hp t cc ngun khc nhau
  • L mt tp hp thng tin chnh xc, cht lng v nht qun
  • Chun ha
    • Cc qui c tn
    • Cc thuc tnh
    • Cc n v o lng
  • Qu trnh lm sch v tch hp

27. Time Variant

  • Data is stored as a series of snapshots, each representing a
  • period of time.

Data Time 01/97 02/97 03/97 Data for January Data for February Data for March DataWarehouse 28. Time Variant

  • Yu cu quan trong cho kho d liu l phm vi v thi gian di hn so vi cc h thng tc nghip.
    • C s d liu tc nghip: d liu c gi tr hin thi
    • D liu ca kho d liu: cung cp thng tin lch s (v d nh, 5-10 nm trc)
  • Yu t thi gian c lu tr trong CSDL

Data Time 01/97 02/97 03/97 Data for January Data for February Data for March 29. Non Volatile

  • Typically data in the data warehouse is not updated or deleted .

Read Load INSERTRead UPDATE DELETE Operational Databases Warehouse Database 30. Non Volatile Operational Databases Warehouse Database First time load Refresh Refresh Refresh Purge or Archive 31. Non-Volatile

  • L mt lu tr vt l ca d liu c chuyn i t mi trng tc nghip.
  • Cp nht tc nghip ca d liu khng xut hin trong mi trng kho d liu.
    • Khng yu cu cc c ch x l ton tc, phc hi v iu khin tng tranh.
    • Ch yu cu hai thao tc trong truy cp d liu:
      • Np d liu v truy cp d liu.

32. Kho d liu-Cc nh ngha (tt)

  • Pandora, Swinburn University
    • L mt phng thc cho vic kt ni d liu t nhiu h thng khc nhau.
    • L mt im