Áp dỤng kỸ thuẬt olap vÀ kho dỮ liỆu trong dỰ bÁo tÀi chÍnh

Upload: do-quang-tuan

Post on 28-Oct-2015

304 views

Category:

Documents


2 download

DESCRIPTION

ÁP DỤNG KỸ THUẬT OLAP VÀ KHO DỮ LIỆU TRONG DỰ BÁO TÀI CHÍNH

TRANSCRIPT

  • 2

    I HC QUC GIA H NI TRNG I HC CNG NGH

    V Ngc Anh

    P DNG K THUT OLAP V KHO D LIU TRONG D BO TI CHNH

    KHO LUN TT NGHIP I HC H CHNH QUY

    Ngnh: Cc h thng thng tin

    H NI - 2010

  • 3

    I HC QUC GIA H NI TRNG I HC CNG NGH

    V Ngc Anh

    P DNG K THUT OLAP V KHO D LIU TRONG D BO TI CHNH

    KHO LUN TT NGHIP I HC H CHNH QUY

    Ngnh: Cc h thng thng tin

    Cn b hng dn: TS. Nguyn H Nam

    Cn b ng hng dn: Ths. Nguyn Thu Trang

    H NI - 2010

  • 4

    Li cm n Trc tin ti xin gi li cm n v lng bit n su sc ti TS.Nguyn H Nam v Ths.Nguyn Thu Trang tn tnh ch bo v hng dn ti trong sut qu trnh thc hin kha lun tt nghip.

    Ti xin chn thnh cm n cc thy, cc c to cho ti nhng iu kin thun li hc tp v nghin cu ti trng i Hc Cng Ngh.

    Ti xin cm n cc bn trong nhm lm Data Warehouse v OLAP cng tho lun v trao i v gip ti rt nhiu trong qu trnh thu thp ti liu.

    Ti xin gi li cm n v hn ti gia nh, bn b, nhng ngi thn yu lun bn cnh ng vin ti trong sut qu trnh thc hin kha lun.

    Ti xin chn thnh cm n!

    Sinh vin

    V Ngc Anh

  • 1

    Mc lc

    Mc lc ....................................................................................................................... 1 Danh sch cc hnh ...................................................................................................... 3 Bng t vit tt ............................................................................................................ 5 Li m u .................................................................................................................. 6 Chng 1. Gii thiu kho d liu v d liu ti chnh .................................................. 7

    1.1. D liu trong lnh vc ti chnh ...................................................................... 7 1.2. Kho d liu (Data warehouse) ........................................................................ 8

    1.2.1. Kho d liu .............................................................................................. 8 1.2.2. Mc ch ca kho d liu ......................................................................... 9 1.2.3. Li ch ca kho d liu............................................................................. 9 1.2.4. Thnh phn ca kho d liu ................................................................... 10 1.2.5. Cu trc ca kho d liu......................................................................... 11 1.2.6. M hnh thc th trong kho d liu ........................................................ 12 1.2.7. Cc lnh vc ng dng ca kho d liu .................................................. 15

    Chng 2. K thut phn tch OLAP ......................................................................... 16 2.1. Gii thiu OLAP .......................................................................................... 16 2.2. M hnh d liu a chiu .............................................................................. 16 2.3. Kin trc khi (Cube) ca OLAP .................................................................. 18 2.4. So snh OLAP v OLTP ............................................................................... 19 2.5. Cc thnh phn ca OLAP ............................................................................ 20 2.6. Chuyn i d liu t OLTP ti OLAP ........................................................ 21 2.7. Cc m hnh lu tr h tr OLAP ................................................................. 22

    2.7.1. M hnh Multidimentional OLAP (MOLAP) ......................................... 22 2.7.2. M hnh Relational OLAP (ROLAP) ..................................................... 23 2.7.3. M hnh Hybird OLAP (HOLAP) .......................................................... 24 2.7.4. So snh cc m hnh ............................................................................... 25

    Chng 3. B cng c Pentaho .................................................................................. 26 3.1 Tng quan .................................................................................................... 26 3.2 Cc kh nng BI ca pentaho ........................................................................ 26

  • 2

    3.3 Nhng c tnh v li ch .............................................................................. 29 Chng 4. Gii thiu bi ton trin khai trn Pentaho v kt qu t c ................ 33

    4.1. Gii thiu bi ton ........................................................................................ 33 4.2. Thu thp,x l d liu................................................................................... 33 4.3. To data warehouse ...................................................................................... 36 4.4. X l d liu bng k thut OLAP ............................................................... 42

    4.4.1. To cube ................................................................................................ 42 4.4.2. Analysis View ........................................................................................ 43

    Kt lun ..................................................................................................................... 52 Ti liu tham kho ..................................................................................................... 53

  • 3

    Danh sch cc hnh

    Hnh 1. Cc thnh phn ca kho d liu ....................................................................... 11

    Hnh 2. M hnh sao ...................................................................................................... 13

    Hnh 3. M hnh bng tuyt .......................................................................................... 14

    Hnh 4. M hnh chm sao ............................................................................................ 15

    Hnh 5. M phng cc chiu trong kinh doanh ............................................................. 17

    Hnh 6. M hnh d liu MOLAP ................................................................................. 22

    Hnh 7. M hnh d liu ROLAP .................................................................................. 23

    Hnh 8. M hnh d liu HOLAP .................................................................................. 24

    Hnh 9. Cu trc Pentaho ............................................................................................... 26

    Hnh 10. D liu t gi .................................................................................................. 33

    Hnh 11. D liu gi vng ............................................................................................. 34

    Hnh 12. D liu gi du ............................................................................................... 35

    Hnh 13. D liu ch s VnIndex ................................................................................... 35

    Hnh 14. D liu tng hp ............................................................................................. 36

    Hnh 15. M hnh kho d liu ....................................................................................... 37

    Hnh 16. Spoon workspace ............................................................................................ 37

    Hnh 17. Spoon nhp d liu ......................................................................................... 38

    Hnh 18. Combination Lookup/Update ......................................................................... 38

    Hnh 19. Thay i thuc tnh ......................................................................................... 39

    Hnh 20. Kt ni c s d liu ....................................................................................... 39

    Hnh 21. To bng Dim_time ........................................................................................ 40

    Hnh 22. To bng dim_factor ....................................................................................... 40

    Hnh 23. To Table Output ............................................................................................ 41

  • 4

    Hnh 24. To bng fact_price ........................................................................................ 41

    Hnh 25. Nhp d liu ................................................................................................... 42

    Hnh 26. Kt ni c s d liu ....................................................................................... 42

    Hnh 27. Kin trc Cube ................................................................................................ 43

    Hnh 28. Repository Login ............................................................................................ 43

    Hnh 29. Kt ni c s d liu ....................................................................................... 44

    Hnh 30. Khung lm vic Pentaho ................................................................................. 45

    Hnh 31. Chn schema v cube ..................................................................................... 45

    Hnh 32. D liu schema v cube .................................................................................. 45

    Hnh 33. Ni dung phn tch ......................................................................................... 46

    Hnh 34. Chn Measures ............................................................................................... 46

    Hnh 35. Chn factor ..................................................................................................... 46

    Hnh 36. Chn nm phn tch ........................................................................................ 47

    Hnh 37. Chn chi tit ngy thng ................................................................................. 47

    Hnh 38. Chn loi biu ............................................................................................ 48

    Hnh 39. Biu t gi USD/VND ............................................................................... 48

    Hnh 40. Biu gi vng ............................................................................................. 49

    Hnh 41. Biu gi du ............................................................................................... 49

    Hnh 42. Biu ch s VnIndex .................................................................................. 50

    Hnh 43. Biu gi vng v gi du ........................................................................... 50

    Hnh 44. Biu t gi v gi vng .............................................................................. 51

    Hnh 45. Biu gi vng v VNIndex ........................................................................ 51

  • 5

    Bng t vit tt

    OLAP Online Analysis Processing

    MOLAP Multidimensional Online Analysis Processing

    ROLAP Relational Online Analysis Processing

    HOLAP Hybird Online Analysis Processing

    BI Business Intelligence

    OLTP OnLine Transaction Processing

  • 6

    Li m u

    Cng vi vic p dng rng ri cng ngh thng tin vo trong hu ht cc lnh vc trong i sng, kinh t, x hi l vic d liu thu nhn c qua thi gian ngy cng nhiu.V vy, yu cu thit yu t ra i vi cc doanh nghip l vic khai thc cc d liu ny mt cc hiu qu phc v cho vic kinh doanh ngy cng tt hn.

    Kha lun ny vi ti p dng k thut OLAP v kho d liu trong bo co ti chnh gii thiu v kho d liu, phng php OLAP v ng dng trong phn tch bin ng gi du, gi vng v ch s VNIndex bng cng c Pentaho.

    Kha lun gm bn chng:

    Chng 1. Gii thiu kho d liu v d liu ti chnh gii thiu v c im ca d liu ti chnh, gii thiu tng quan v kho d liu, cu trc kho d liu, cc thnh phn ca kho d liu, cch thit k kho d liu v ng dng ca kho d liu.

    Chng 2. Gii thiu tng quan v OLAP gii thiu tng quan v k thut OLAP, cc m hnh lu tr h tr k thut OLAP, u im v nhc im ca cc m hnh. Cc bc chuyn d liu t OLTP sang OLAP.

    Chng 3. Gii thiu b cng c Pentaho gii thiu tng quan b cng c Pentaho, kin trc, cng ngh, v cc tin ch ca Pentaho.

    Chng 4. Gii thiu bi ton trin khai trn Pentaho v kt qu t c trin khai Pentaho trn mt bi ton thc, p dng k thut kho d liu v k thut OLAP thc hin

    Phn kt lun tng kt v tm lc nhng kt qu, ng gp chnh ca kha lun.

  • 7

    Chng 1. Gii thiu kho d liu v d liu ti chnh

    1.1. D liu trong lnh vc ti chnh

    Vi c im tnh ton chnh xc, nhanh chng, khch quan nn cng ngh thng tin c p dng kh rng ri trong lnh vc ti chnh t rt sm.

    D liu trong lnh vc ti chnh c c im sau:

    - Lun lun bin i

    - D liu phn tn

    - Giao dch chng cho

    - S lng giao dch ln

    Do , cn c mt chin lc lu tr d liu mt cch hiu qu.Nhng h thng p ng c cc c im trn thuc nhm h thng x l giao dch trc tuyn OLTP (OnLine Transaction Processing)[4].

    Cc ng dng x l giao dch trc tuyn OLTP (OnLine Transaction Processing) l nhng ng dng gip ngi dng truy cp trc tip thng tin theo hnh thc ng dng Client/Server. OLTP bao gm mt dy lnh: thu nhn (gathering) d liu u vo, x l (processing) d liu, v cp nht (updating) d liu c vi d liu mi c nhp v x l.

    OLTP l phng thc hiu qu khi ngi dng mun:

    - X l cc d liu n vi s lng v tn s khng th c lng.

    - Truy cp tc th vo d liu c cp nht, phn nh cc giao dch trc .

    - Thay i d liu tc th phn nh giao dch va x l.

    Cc chc nng c bn ca OLTP[4]: cng vi kh nng truy cp v cp nht cc d liu chia s, cc h thng OLTP cn h tr cc user kh nng truy cp trc tuyn (online), kh nng truy cp tc thi (availability), kh nng phn hi nhanh chng (response), v tit kim chi ph i vi tng transaction (low cost). tr li cc cu hi n gin trong qu trnh kinh doanh nh doanh thu ca thng

  • 8

    ny bao nhiu? Thng ny bn c bao nhiu sn phm nhng sn phm v s liu chi tit c h thng OLTP tr li 1 cch nhanh chng.Nhng i vi cc nh qu l cp co trong doanh nghip, h khng yu cu nhng d liu qu chi tit nh vy. H yu cu mun bit nhng thng tin mang tnh hoch nh v lnh o v d nh: mt hng ny ang bn chy khu vc ny liu c bn chy khu vc khc khng?...Nu tr li cc cu hi ny h thng OLTP th s rt kh v hiu qu thp v d liu ca OLTP qu chi tit, lu tr phn tn gii quyt vn ny, h thng data warehouse (kho d liu) ra i cng vi cc k thut OLAP, Data mining (khai ph d liu) c th gip c ngi qun tr cp cao tr li cc cu hi m h yu cu.

    1.2. Kho d liu (Data warehouse) 1.2.1. Kho d liu

    Data warehouse - kho d liu l 1 tp hp thng tin c bn trn my vi tnh m chng c tnh quyt nh n vic thc hin thnh cng bc u trong cng vic kinh doanh[1].

    Mt kho d liu, gi mt cch chnh xc hn l kho thng tin (information warehouse), l mt c s d liu hng i tng c thit k vi vic tip cn cc kin trong mi lnh vc kinh doanh. N cung cp cc cng c p ng thng tin cn thit cho cc nh qun tr kinh doanh ti mi cp t chc - khng nhng ch l nhng yu cu d liu phc hp, m cn l iu kin thun tin nht t c vic ly thng tin nhanh, chnh xc. Mt kho d liu c thit k ngi s dng c th nhn ra thng tin m h mun c v truy cp n bng nhng cng c n gin[9].

    Mt kho d liu l mt s pha trn ca nhiu cng ngh, bao gm cc c s d liu a chiu v mi quan h gia chng, kin trc ch khch, giao din ngi dng ha v nhiu na. D liu trong kho d liu khng ging d liu ca h iu hnh l loi ch c th c nhng khng chnh sa c. H iu hnh to ra, chnh sa v xa nhng d liu sn xut m nhng d liu ny cung cp cho kho d liu. Nguyn nhn chnh cho s pht trin mt kho d liu l hot ng tch hp d liu t nhin ngun khc nhau vo mt kho d liu n l v dy c m kho ny cung cp cho vic phn tch v ra quyt nh trong cng vic kinh doanh.

    i vi mt s cng vic kinh doanh thng tin l ngun ti nguyn c gi tr rt ln th mt kho d liu tng i ging nh mt nh kho cha hng. H iu hnh to

  • 9

    ra nhng phn d liu v np chng vo kho. Mt s phn c tm tt trong thnh phn thng tin v c ct vo kho. Ngi s dng kho d liu a ra nhng yu cu v c cung cp sn phm c to ra t cc thnh phn v cc phn on c lu trong kho.

    Mt kho d liu c xc nh ng hng, hot ng hiu qu c th tr thnh mt cng c cnh tranh c gi tr cao trong kinh doanh.

    1.2.2. Mc ch ca kho d liu Mc tiu chnh ca kho d liu l t nhng mc tiu sau:

    - Phi c kh nng p ng mi thng tin yu cu ca ngi dng

    - H tr nhn vin ca t chc thc hin tt, hiu qu cng vic ca h

    - Gip cc t chc xc nh, qun l, iu hnh cc d n, nghip v mt cch hiu qu v chnh xc.

    - Tc hp d liu v siu d liu t nhiu ngun khc nhau.

    Mun t c cc mc tiu trn th kho d liu phi:

    - Nng cao cht lng d liu bng cch lm sch v hng ch nht nh

    - Tng hp v kt ni d liu

    - ng b ha cc ngun d liu

    - Phn nh v ng nht cc h c s d liu tc nghip

    - Qun l siu d liu

    - Cung cp thng tin c tch hp, tm tt hoc c lin kt, t chc theo cc ch

    - Dng trong cc h thng h tr ra quyt nh.

    1.2.3. Li ch ca kho d liu

    To ra nhng quyt nh c nh hng ln. Mt kho d liu cho php trch rt ti nguyn nhn lc v my tnh theo yu cu cung cp cc cu truy vn v cc bo co da vo c s d liu hot ng v sn xut. iu ny to ra s tit kim ng k.

  • 10

    C kho d liu cng trch rt ti nguyn khan him ca h thng sn xut khi thc thi mt chng trnh qu lu hoc cc bo co v cc cu truy vn phc hp.

    Cng vic kinh doanh tr nn thng minh hn. Tng thm cht lng v tnh linh hot ca vic phn tch kinh doanh do pht sinh t cu trc d liu a tng ca kho d liu, l ni cung cp d liu c sp xp t mc chi tit ca cng vic kinh doanh cho n mc cao hn - mc tng qut. m bo c d liu chnh xc v ng tin cy do m bo c l trong kho d liu ch cha duy nht d liu c cht lng cao v n nh (trusted data).

    Dch v khch hng c nng cao. Mt doanh nghip c th gi gn mi quan h vi khch hng tt hn do c mi tng quan vi d liu ca tt c khch hng qua mt kho d liu ring.

    Ti sng to nhng tin trnh kinh doanh. S cho php phn tch khng ngng thng tin kinh doanh thng cung cp s hiu bit mi mt ca phng thc kinh doanh do c th lm ny sinh ra nhng kin cho s sng to ra nhng tin trnh ny li. Ch khi xc nh chnh xc cc nhu cu t kho d liu th mi gip ta nh gi c nhng hn ch v mc tiu kinh doanh mt cch chnh xc hn.

    Ti sng to h thng thng tin. Mt kho d liu l nn tng cho cc yu cu d liu trong mi lnh vc kinh doanh, n cung cp mt chi ph nh hng ngha l a ra thi quen cho cho c hai s chun ha d liu v s chun ha hot ng ca h iu hnh theo chun quc t.

    1.2.4. Thnh phn ca kho d liu

    Chi tit hin hnh

    Trung tm ca kho d liu l chi tit hin hnh ca n. l ni m phn ln d liu c lu tr. Chi tit hin hnh n trc tip t h iu hnh v c th c lu tr nh l d liu th hoc nh s tp hp ca d liu th.

  • 11

    Chi tit hin hnh l phn li d liu mc thp nht trong kho d liu. Mi thc th d liu trong chi tit hin hnh l mt bc nh chp nhanh, ti mt thi im, l s minh ha khi d liu chnh xc. Chi tit hin hnh l c trng t hai n nm nm. S chnh xc ca chi tit hin hnh xy ra thng xuyn nh l iu kin cn thit cung cp nhng yu cu trong kinh doanh.

    H thng bn ghi

    Mt h thng bn ghi l ngun d liu tt nht hoc phi nht (rightest data) dng nui dng kho d liu. D liu phi nht l d liu hp thi nht, y nht, chnh xc nht, v c s thch nghi v cu trc nht trong kho d liu. D liu phi nht thng ng nht i vi ngun ghi nhn trong mi trng sn xut. Trong nhng trng hp khc, mt h thng bn ghi c th l mt ni dng cha d liu tng hp.

    1.2.5. Cu trc ca kho d liu

    Mt kho d liu c th c mt vi phn ca cu trc sau:

    Relational

    Data Store

    Data Marts and Cubes

    Clients

    Source

    Hnh 1. Cc thnh phn ca kho d liu

  • 12

    Kho d liu mc vt l

    C s d liu mc vt l trong tt c d liu ca kho d liu c lu tr , theo cng vi metada v tin trnh x l logic cho vic lc, t chc v ng gi d liu, x l d liu chi tit.

    Kho d liu mc logic

    Cng cha ng metadata bao gm nhng lut kinh doanh v x l logic cho vic lc, t chc, ng gi v x l d liu, nhng khng cha ng d liu tht s. Thay vo n cha ng nhng thng tin cn thit truy cp d liu bt c ni u.

    Kho d liu thng minh hay d liu theo ch (Data mart)

    L tp con ca mt kho d liu din rng. in hnh l n cung cp nhng thnh phn ln (phn khu, vng, chc nng,). Ni tm li, Data mart nh l nhng phn chuyn bit ha ca kho d liu.

    1.2.6. M hnh thc th trong kho d liu

    M hnh thc th mi quan h c s dng ph bin trong m hnh c s d liu OLTP. Tuy nhin, m hnh c s d liu ER ny khng thch hp cho vic thit k kho d liu v phi truy vn ti qu nhiu bng khc nhau. Hu ht cc kho d liu s dng m hnh sao (star schema). M hnh ny ch gm duy nht mt bng s kin v mt bng chiu (dimention) cho mi chiu. Trong bng s kin s c cc trng kha ngoi lin kt vi kha chnh ca cc bng chiu. V d v m hnh sao:

  • 13

    M hnh sao khng h tr tt cho cc bng cha cc thuc tnh phn cp. M hnh bng tuyt (SnowFlake Schema) a ra gii php cho m hnh sao khi bng c thuc tnh phn cp.

    OrderNo

    OrderDate

    CustomerNo

    CustomerName

    CustomerAddress

    City

    OrderNo

    SalespersionID

    CustomerNo

    ProdNo

    DateKey

    CityName

    Quantity

    TotalPrice

    CityName

    State

    Country

    SalespersonID

    SalespersonName

    City

    Quota

    DateKey

    Date

    Month

    Year

    ProdNo

    ProdName

    ProdDescr

    Category

    CategoryDescr

    UnitPrice

    QOH

    Orders

    Custormers

    Salespersons

    Date

    Fact Table

    Products

    City

    Hnh 2. M hnh sao

  • 14

    iu ny gip cho vc bo tr cc bng chiu tt hn. Tuy nhin cu trc mc nh trong s sao ca cc bng chiu c th thch hp hn khi duyt cc chiu.

    S chm sao (fact constellation) l mt v d cho cu trc phc tp khi c nhiu hn 1 bng s kin. Mi s sao c th xy dng thnh s chm sao (v d bng cch chia tch cc lc sao gc thnh cc lc sao m mi chng c m t trn cc cp khc nhau ca cc chiu phn cp). Cc kin trc s chm sao bao gm nhiu bng s kin v c chia s cho nhiu bng chiu.

    OrderNo

    OrderDate

    Month

    Year

    OrderNo

    SalespersonID

    CustomerNo

    DateKey

    CityName

    ProdNo

    Quantity

    TotalPrice

    DateKey

    Date

    Month

    SalespersonID

    SalespersonName

    City

    Quota

    ProdNo

    ProdName

    ProdDescr

    Category

    UnitPrice

    QOH CustomerNo

    CustomerName

    CustomerAddress

    City

    CategoryName

    CategoryDescr

    CityName

    State

    Orders

    Customers

    Fact table

    Month

    Year

    Products Category

    Salesperson

    Date

    City State

    Hnh 3. M hnh bng tuyt

  • 15

    Hnh 4. M hnh chm sao

    1.2.7. Cc lnh vc ng dng ca kho d liu

    Cc lnh vc hin ti c ng dng data warehouse bao gm:

    - Thng mi in t. - K hoch ha ngun lc doanh nghip. - Qun l quan h khch hng. - Chm sc sc khe. - Vin thng.

  • 16

    Chng 2. K thut phn tch OLAP

    2.1. Gii thiu OLAP

    OLAP l mt k thut s dng cc th hin d liu a chiu gi l cc khi (cube) nhm cung cp kh nng truy xut nhanh n d liu ca kho d liu. To khi (cube) cho d liu trong cc bng chiu (dimension table) v bng s kin (fact table) trong kho d liu v cung cp kh nng thc hin cc truy vn tinh vi v phn tch cho cc ng dng client theo Hari Mailvaganam [5]. Trong khi kho d liu v data mart lu tr d liu cho phn tch, th OLAP l k thut cho php cc ng dng client truy xut hiu qu d liu ny. OLAP cung cp nhiu li ch cho ngi phn tch, cho v d nh:

    - Cung cp m hnh d liu a chiu trc quan cho php d dng la chn, nh hng v khm ph d liu. - Cung cp mt ngn ng truy vn phn tch, cung cp sc mnh khm ph cc

    mi quan h trong d liu kinh doanh phc tp. - D liu c tnh ton trc i vi cc truy vn thng xuyn nhm lm cho

    thi gian tr li rt nhanh i vi cc truy vn c bit. - Cung cp cc cng c mnh gip ngi dng to cc khung nhn mi ca d

    liu da trn mt tp cc hm tnh ton c bit. OLAP c t ra x l cc truy vn lin quan n lng d liu rt ln m nu cho thc thi cc truy vn ny trong h thng OLTP s khng th cho kt qu hoc s mt rt nhiu thi gian.

    2.2. M hnh d liu a chiu

    Cc nh qun l kinh doanh c khuynh hng suy ngh theo nhiu chiu (multidimensionally). V d nh h c khuynh hng m t nhng g m cng ty lm nh sau: Chng ti kinh doanh cc sn phm trong nhiu th trng khc nhau, v chng ti nh gi hiu qu thc hin ca chng ti qua thi gian.

    Nhng ngi thit k kho d liu thng lng nghe cn thn nhng t v h thm vo nhng nhn mnh c bit ca h nh: Chng ti kinh doanh cc sn phm trong nhiu th trng khc nhau, v chng ti nh gi hiu qu thc hin ca chng ti qua thi gian.

  • 17

    Suy ngh mt cch trc gic, vic kinh doanh nh mt khi (cube) d liu, vi cc nhn trn mi cnh ca khi (xem hnh bn di). Cc im bn trong khi l cc giao im ca cc cnh. Vi m t kinh doanh trn, cc cnh ca khi l Sn phm, Th trng, v Thi gian. Hu ht mi ngi u c th nhanh chng hiu v tng tng rng cc im bn trong khi l cc o hiu qu kinh doanh m c kt hp gia cc gi tr Sn phm, Th trng v Thi gian [5].

    Th trng

    Thi gian

    San pham

    Hnh 5. M phng cc chiu trong kinh doanh

    Mt khi d liu (datacube) th khng nht thit phi c cu trc 3 chiu (3-D), nhng v c bn l c th c N chiu (N-D). Nhng cnh ca khi c gi l cc chiu (dimensions), m l cc mt hoc cc thc th ng vi nhng kha cnh m t chc mun ghi nhn. Mi chiu c th kt hp vi mt bng chiu (dimension table) nhm m t cho chiu . V d, mt bng chiu ca Sn phm c th cha nhng thuc tnh nh Ma_sanpham, Mo_ta, Ten_sanpham, Loai_SP, m c th c ch ra bi nh qun tr hoc cc nh phn tch d liu. Vi nhng chiu khng c phn loi, nh l Thi gian, h thng kho d liu s c th t ng pht sinh tng ng vi bng chiu (dimension table) da trn loi d liu. Cn ni thm rng, chiu Thi gian trn thc t c ngha c bit i vi vic h tr quyt nh cho cc khuynh hng phn tch. Thng th n c mong mun c mt vi tri thc gn lin vi lch v nhng mt khc ca chiu thi gian.

    Hn na, mt khi d liu trong kho d liu phn ln c xy dng o hiu qu ca cng ty. Do mt m hnh d liu a chiu c th c t chc xung quanh mt ch m c th hin bi mt bng s kin (fact table) ca nhiu o s hc (l cc i tng ca phn tch). V d, mt bng s kin c th cha s mt hng bn, thu nhp, tn kho, ngn sch, Mi o s hc ph thuc vo mt tp cc chiu cung cp ng cnh cho o . V th, cc chiu kt hp vi nhau c xem nh xc nh duy nht o, l mt gi tr trong khng gian a chiu. V d nh mt kt hp

  • 18

    ca Sn phm, Thi gian, Th trng vo 1 thi im l mt o duy nht so vi cc kt hp khc.

    Cc chiu c phn cp theo loi. V d nh chiu Thi gian c th c m t bi cc thuc tnh nh Nm, Qu, Thng v Ngy. Mt khc, cc thuc tnh ca mt chiu c th c t chc vo mt li m ch ra mt phn trt t ca chiu. V th, cng vi chiu Thi gian c th c t chc thnh Nm, Qu, Thng, Tun v Ngy. Vi s sp xp ny, chiu Thi gian khng cn phn cp v c nhng tun trong nm c th thuc v nhiu thng khc nhau.

    V vy, nu mi chiu cha nhiu mc tru tng, d liu c th c xem t nhiu khung nhn linh ng khc nhau. Mt s thao tc in hnh ca khi d liu nh roll-up (tng mc tru tng), drill-down (gim mc tru tng hoc tng mc chi tit), slice and dice (chn v chiu), v pivot (nh hng li khung nhn a chiu ca d liu), cho php tng tc truy vn v phn tch d liu rt tin li. Nhng thao tc c bit nh X l phn tch trc tuyn (OnLine Analytical Processing OLAP).

    Nhng nh ra quyt nh thng c nhng cu hi c dng nh tnh ton v xp hng tng s lng hng ho bn c theo mi quc gia (hoc theo mi nm). H cng mun so snh hai o s hc nh s lng hng bn v ngn sch c tng hp bi cng cc chiu. Nh vy, mt c tnh phn bit ca m hnh d liu a chiu l n nhn mnh s tng hp ca cc o bi mt hoc nhiu chiu, m l mt trong nhng thao tc chnh yu tng tc x l truy vn.

    2.3. Kin trc khi (Cube) ca OLAP i tng chnh ca OLAP l khi (cube), mt th hin a chiu ca d liu chi

    tit v tng hp. Mt khi bao gm mt ngun d liu (Data source), cc chiu (Dimensions), cc o (Measures) v cc phn dnh ring (Partitions). Cc khi c thit k da trn yu cu phn tch ca ngi dng. Mt kho d liu c th h tr nhiu khi khc nhau nh khi Bn hng, khi Bng kim k

    D liu ngun ca mt khi ch ra ni cha kho d liu cung cp d liu cho khi. Cc chiu (dimension) c nh x t cc thng tin ca cc bng chiu (dimension

    table) trong kho d liu vo cc mc phn cp, v d nh chiu a l th gm cc mc nh Lc a, Quc gia, Tnh-Thnh ph. Cc chiu c th c to mt cch c lp v c th chia s gia cc khi nhm xy dng cc khi d dng v chc chn rng thng tin tng hp cho phn tch lun n nh. V d, nu mt chiu chia s mt

  • 19

    phn cp sn phm v c s dng trong tt c cc khi th cu to ca thng tin tng hp v sn phm s n nh gia cc khi s dng chiu .

    Mt chiu o (virtual dimension) l mt dng c bit ca chiu m nh x cc thuc tnh t cc thnh vin (member) ca mt chiu khc sau c th c s dng trong cc khi. V d, mt chiu o ca thuc tnh kch thc sn phm cho php mt khi (cube) tng hp d liu nh s lng sn phm bn c theo kch thc, hoc nh s lng o bn c theo kiu v theo kch thc. Cc chiu o (virtual dimension) v cc thuc tnh thnh vin c nh gi l cn thit cho cc truy vn v chng khng i hi phi c cc khi lu tr vt l.

    Cc o (measure) xc nh cc gi tr s t bng s kin (fact table) m c tng hp cho phn tch nh gi bn, chi ph hoc s lng bn.

    Cc phn dnh ring (partition) l cc vt cha lu tr a chiu, gi d liu ca khi. Mi khi cha t nht mt partition, v d liu ca khi c th kt hp t nhiu partition. Mi partition c th ly d liu mt ngun d liu khc nhau v c th lu trong mt v tr ring bit (separate). D liu ca mt partition c th c cp nht c lp vi cc partition khc trong mt khi. V d, d liu ca mt khi c th c chia theo thi gian, vi mt partition cha d liu ca nm hin hnh, mt partition khc cha d liu ca nm trc, v mt partition th ba cha tt c d liu ca cc nm trc na.

    Cc partition ca mt khi c th c lu tr c lp trong cc cch thc khc nhau vi cc mc tng kt khc nhau. Cc partition khng th hin i vi ngi dng, i vi h mt khi (cube) l mt i tng n, v chng cung cp cc tu chn a dng qun l d liu OLAP.

    Mt khi o (virtual cube) l mt khung nhn lun l (logic) ca cc phn chia ca mt hoc nhiu khi. Mt khi o c th c s dng ni (join) cc khi khc nhau chia s mt chiu chung no , v d nh c th kt gia khi Bn hng v khi Kho nhm cc mc ch phn tch c bit no trong khi duy tr cc khi tch bit cho n gin. Cc chiu (dimension) v cc o (measure) c th c chn t cc khi c kt th hin trong khi o.

    2.4. So snh OLAP v OLTP

    c trng ca cc ng dng OLTP (On-Line Transaction Processing) l cc tc v x l t ng ghi chp d liu x l tc v ca mt t chc nh ghi nhn n t hng v cc giao dch ngn hng (chng l nhng cng vic hng ngy ca t chc thng mi) m cn phi c hoc cp nht mt vi mu tin da trn kho chnh ca chng[5].

  • 20

    Nhng tc v c cu trc, c lp li, bao gm cc giao dch ngn, ti gin v tch bit, yu cu d liu chi tit v mi cp nht. Cc c s d liu tc nghip c xu hng t vi trm megabyte n hng gigabyte kch thc v ch lu tr cc d liu hin hnh. Tnh nht qun v kh nng phc hi ca c s d liu l then cht, v ti a thng lng giao dch l thc o chnh yu. V th c s d liu c thit k ti thiu cc xung t trng lp. Cn kho d liu, mc tiu l h tr quyt nh cho cc nh qun l. Tnh chi tit v ring l ca cc mu tin th t quan trng hn tnh lch s, tng kt v hp nht ca d liu. Do , kho d liu thng cha d liu hp nht t mt hoc nhiu c s d liu tc nghip v c thu thp qua mt thi gian di. Kt qu l kch thc kho d liu c khuynh hng t vi trm gigabyte n hng terabyte so vi cc c s d liu tc nghip. Kho d liu h tr cc truy vn phc tp vi thi gian hi p nhanh, cc truy vn phc tp c th truy xut hng triu mu tin v thc hin nhiu ln cc thao tc qut, kt v tng hp. i vi kho d liu, s lng truy vn a vo v thi gian hi p quan trng hn s lng giao dch a vo. M OLAP l mt trong nhng cng c cho php thc hin hiu qu cc truy vn ny. Cn c vo , cc c s d liu tc nghip c xy dng h tr tt cc tc v OLTP, v th nu c gng thc thi cc truy vn OLAP phc tp i vi cc c s d liu tc nghip s cho kt qu l hiu qu thc hin khng th chp nhn c.

    2.5. Cc thnh phn ca OLAP

    Nhng thnh phn m OLAP s dng thc hin cc dch v bao gm: - Ngun d liu: Cc c s d liu OLTP v cc ngun d liu hp l khc cha cc d liu c th chuyn i thnh d liu OLAP trong kho lu tr. - Kho trung gian: l ni lu tr v x l d liu c tp hp, sau c sp xp, sng lc, chuyn i thnh d liu OLAP hu ch. - My ch lu tr: Cc my tnh chy c s d liu lin kt cha cc kho d liu cho kho lu tr, v cc my ch qun l d liu OLAP (warehouse server). - ng dng thng minh: Cc b cng c v ng dng thc hin truy vn d liu OLAP v cung cp cc bo co v thng tin cho ngi ra quyt nh ca doanh nghip (Business Intelligence). - Siu d liu: Cc i tng nh cc bng biu trong c s d liu OLTP, cc khi trong kho lu tr d liu, v cc bn ghi m ng dng tham chiu ti cc on d liu khc nhau.

  • 21

    2.6. Chuyn i d liu t OLTP ti OLAP

    chuyn i d liu OLTP sang d liu OLAP trong kho d liu c thc hin thng qua cc qui trnh sau:

    - Hp nht d liu: tt c cc d liu lin quan ti cc mc c trng (sn phm, khch hng, hay nhn vin) phi c kh nng hp nht t nhiu h thng OLTP ti mt h thng OLAP n. Quy trnh hp nht phi gii quyt c s khc nhau v m ho gia cc h thng OLAP, ph hp vi cc d liu chung c s dng c hai h thng c th bng cch so snh cc trng tng t, c th bin i d liu lu tr t nhiu loi d liu khc nhau trong mi h thng OLTP thnh mt loi d liu duy nht c s dng trong h thng OLAP.Cc h thng cung cp cc d liu u vo cho mt h thng OLAP khng nht thit phi l cc h thng OLTP truyn thng m c th c lu tr nhiu dng hp l, chng hn nh cc bn ghi Microsoft Excel trong mt tp c chia s.

    - Qut d liu: Vic hp nht d liu OLTP vo mt kho d liu (data warehouse) to iu kin qut d liu. Mt s h thng OLTP nh vn cc mc khc nhau, hoc qu trnh hp nht c th gy ra cc li chnh t. S khng thng nht ny phi c chnh sa trc khi d liu c th c nhp vo kho lu tr phc v cho h thng OLAP.

    - Tp hp d liu: D liu OLTP ghi nhn tt c cc chi tit ca transaction. OLAP ch truy vn nhng d liu tng kt cn thit, hoc cc d liu c tp hp bng mt s quy tc nht nh. V d, mt truy vn ly tng doanh thu hng thng cho mi sn phm trong nm trc s chy nhanh hn nu c s d liu ch c cc dng tng kt doanh thu hng ngy (hoc tng gi) ca mi sn phm, so vi truy vn phi qut tt c cc bn ghi chi tit trong vng 1 nm. Mc tp hp d liu trong kho lu tr ph thuc vo s lng cc yu t thit k (ging nh lp trnh hng i tng).

    - Sp xp d liu: Khi d liu OLTP c chuyn vo kho lu tr, chng s phi c bin i theo cch sp xp hp l hn i vi nhu cu phn tch nhm a ra quyt nh v hn ch tiu ph thi gian. Qu trnh thit lp kho lu tr bao gm c vic sp xp li d liu OLTP, lu trong cc bng biu lin kt, thnh d liu OLAP c lu trong cc khi a chiu. D liu sau c ti vo kho lu tr.

    - Truy cp v phn tch d liu: Khi d liu c ti vo kho lu tr, OLAP cung cp kh nng truy cp, xem, v phn tch d liu vi linh hot v hiu qu

  • 22

    cao. OLAP trnh by d liu thng qua m hnh d liu t nhin v trc quan, gip cho ngi s dng xem v hiu mt cch tt nht nhng thng tin trong kho lu tr. T cho php ngi s dng nhn bit c gi tr ca d liu.

    2.7. Cc m hnh lu tr h tr OLAP

    Dch v OLAP h tr nhiu m hnh lu tr d liu khc nhau, mi m hnh c cc u v khuyt im ring, chng c s dng tu theo mc ch khai thc.

    2.7.1. M hnh Multidimentional OLAP (MOLAP) M hnh OLAP a chiu (MOLAP) lu tr d liu c s (l d liu t cc bng

    ca kho d liu hoc data mart) v thng tin tng hp (l cc o c tnh ton t cc bng) trong cc cu trc a chiu gi l cc khi (cube). Cc cu trc ny c lu bn ngoi c s d liu data mart hoc kho d liu.

    Lu tr cc khi (cube) trong cu trc MOLAP l tt nht cho cc truy vn tng hp d liu thng xuyn m cn thi gian hi p nhanh. V d, tng sn phm bn c ca tt c cc vng theo qu.

    u im ca m hnh MOLAP: - Thc thi nhanh: khi trong MOLAP thu hi d liu nhanh v ti u ha

    hot ng[15]. - C th thc hin cc php ton phc tp: mi tnh ton c to ra trc

    khi khi to ra [15].

    Mysql

    MOLAP data

    Oracle

    Other

    D liu trong mi trng OLAP

    Hnh 6. M hnh d liu MOLAP

  • 23

    Nhc im ca m hnh MOLAP: - Gii hn lng d liu c th x l: Bi v tt c cc tnh ton c sinh

    ra khi xy dng khi, do n khng th bao gm lng d liu ln trong khi ca chnh n. iu ny khng c ngha l d liu t khi khng th c xy dng t mt lng d liu ln. iu ny c th, nhng n ch tm tt thng tin cha trong chnh n [15].

    - Yu cu u t thm: Cng ngh to khi thng c c quyn v khng tn ti trong t chc no. V vy, s dng cng ngh MOLAP cn phi u t b sung thm vn v nhn lc [15].

    2.7.2. M hnh Relational OLAP (ROLAP) M hnh OLAP quan h (ROLAP) lu tr d liu c s v thng tin tng hp

    trong cc bng quan h. Cc bng ny c lu tr trong cng c s d liu nh l cc bng ca data mart hoc kho d liu.

    Hnh 7. M hnh d liu ROLAP

    Lu tr cc khi trong cu trc ROLAP l tt nht cho cc truy vn d liu khng thng xuyn. V d nh nu 80% ngi dng truy vn ch d liu trong vng mt nm tr li y, cc d liu c hn mt nm s c a vo mt cu trc ROLAP gim khng gian a b chim dng, hn na cn loi tr d liu trng lp.

    u im ca m hnh ROLAP: - C th x l lng d liu ln: Kch thc gii hn ca ROLAP ph thuc

    vo kch thc ca c s d liu ngn. Ni cch khc, bn thn cng ngh ROLAP khng c gii hn v kch thc d liu [15].

  • 24

    - C th vn dng chc nng vn c ca c s d liu quan h: C s d liu quan h thng i km vi rt nhiu chc nng. Cng ngh ROLAP c th tn dng cc chc nng ny, tit kim chi ph [15]. Nhc im ca ROLAP:

    - Hiu sut x l thp: Mi bo co ROLAP thng c tp hp d liu t nhiu bng khc nhau, iu ny s lm cho hiu qu ca ROLAP thp khi d liu ln, phn tn [15].

    - Gii hn bi chc nng ca SQL: Bi v cng ngh ROLAP ch yu da vo vic to ra cc cu lnh SQL truy vn c s d liu. M bo co da trn truy vn SQL trong mt s trng hp khng t c hiu qu mong mun. Cc nh pht trin khc phc iu ny bng cc to ra cc cng c h tr ngoi gip ngi dng to ra cc chc nng ca ring h [15].

    2.7.3. M hnh Hybird OLAP (HOLAP) M hnh OLAP lai (HOLAP) l s kt hp gia MOLAP v ROLAP.

    Hnh 8. M hnh d liu HOLAP

    Lu tr cc khi (cube) trong cu trc HOLAP l tt nht cho cc truy vn tng hp d liu thng xuyn da trn mt lng ln d liu c s. V d, chng ta s lu tr d liu bn hng theo hng qu, hng nm trong cu trong MOLAP v d liu hng thng, hng tun v hng ngy trong cu trc ROLAP[15].

    Li ch ca vic lu tr trong cu trc HOLAP l: - Ly d liu trong khi (cube) nhanh hn bng cch s dng x l truy vn tc

    cao ca MOLAP. - Tiu th t khng gian lu tr hn MOLAP. - Trnh trng lp d liu.

  • 25

    2.7.4. So snh cc m hnh Bng sau so snh tng hp ba m hnh lu tr h tr OLAP:

    MOLAP ROLAP HOLAP

    Lu tr d liu c s Khi Bng quan h Bng quan h

    Lu tr thng tin tng hp Khi Bng quan h Khi

    Hiu sut thc hin truy vn Nhanh nht Chm nht Nhanh

    Tiu th khng gian lu tr Nhiu Thp Trung bnh

    Chi ph bo tr Cao Thp Trung bnh

  • 26

    Chng 3. B cng c Pentaho

    3.1 Tng quan

    B cng c Pentaho open BI cung cp mt ci nhn ton cnh v kh nng kinh doanh thng minh (BI) ca doanh nghip bao gm: lp biu bo, phn tch, biu , tch hp d liu, v l mt h BI m ngun m ph bin nht th gii. Sn phm ca Pentaho c cc doanh nghip hng u s dng nh: MySql, Motorola, Terra Industries, DivX[6] B cng c pentaho c cc cng c:

    - Report designer - Design studio - Aggregation designer - Metadata editer - Pentaho data integartion - Schema wordbench Cu trc ca Pentaho:

    Hnh 9. Cu trc Pentaho

    3.2 Cc kh nng BI ca pentaho

  • 27

    Pentaho gip ngi dng: Bo co: Cc t chc s dng bo co t nhiu ngun nn bo co l ct li v c khai thc u tin trong kinh doanh thng minh. Bo co Pentaho cho php cc doanh nghip truy cp, nh dng v phn phi thng tin d dng n nhn vin, khch hng v cc i tc.

    - Linh hot trong trin khai t nhng bo co n n cc bo co dng web tch hp trong kinh doanh thng minh ca doanh nghip.

    - H tr nhiu ngun d liu nh OLAP, hay ngun d liu da trn XML. - Xut d liu linh hot ra PDF, HTML, Microsoft Excel, Rich Text Format, hay text

    thun ty. - Wizard h tr thit k bo co d dng v nhanh chng. - Phin bn chuyn nghip vi nhiu chc nng nh phn nhm, ng k, tch hp

    th mc, kim duyt

    Phn tch:

    Pentaho Analysis l mt cng c phn tch c lc gip ngi dng a ra nhng quyt nh c hiu qu nht. V d: Bo co cho bit tnh hnh bn c khuynh hng gim hn so vi mong i th cc tri thc d dng pht hin ra nguyn nhn vn bng cch t cc cu hi sau:

    - Vn nh hng n mt dng sn phm hay mt khu vc no ?

    - S khc nhau gia s phi hp ny vi nhng phi hp khc m khng c vn l g?

    - Vn lin quan vi nhng hng bn l g? Nhng chin dch tip th? Hay ci g khc?

    Pentaho Analysis gip tr li nhng cu hi kinh doanh bng cch:

    - Gip ngi dng d dng khai thc thng tin kinh doanh bng cch ko, th, xem chi tit hay lp bng kim tra cho d liu.

    - Tr li nhanh cc truy vn phn tch phc tp.

    - Gii quyt cc cu hi phc tp nhanh chng.

  • 28

    - H tr cc kh nng tin tin bao gm bo co tch hp, siu d liu, biu thng qua vic tch hp vi cc sn phm khc trong b Pentaho.

    Biu :

    Pentaho Dashboards gip ngi qun tr hiu tng tn s vic bn trong ngay lp tc t s thc hin c nhn, phng ban, hay doanh nghip. Bng php o trn giao din trc quan, Pentaho Dashboards cung cp cho nh doanh nghip thng tin thc gip h hiu bit v ci thin cng vic.

    Pentaho Dashboards h tr tnh trc quan bng cch cung cp:

    - Kh nng qun l cc php o ton din cho php nh ngha v theo di nhng o c ng ch mc c nhn, phng ban hay doanh nghip.

    - Hin th trc quan phong ph gip nh kinh doanh c th thy ngay nhng ci no ang i ng hng v ci no cn ch .

    - Tch hp bo co v phn tch ngi s dng c th khai thc tn gc cc bo co v phn tch hiu nhng nhn t a n thnh cng hay tht bi.

    - Cng tch hp d dng chuyn cc php o c trong kinh doanh lin quan vi s lng ln ngi s dng, tch hp thng vo trong ng dng ca h.

    - Tch hp bo ng lin tc theo di nhng ngoi l v thng bo cho ngi s dng bit.

    Khai ph d liu:

    - Nhng mi quan h tm n trong d liu c th c dng ti u ha nhng qui trnh nghip v v d on nhng kt qu tng lai.

    - Cung cp mt phm vi tin tin y cc gii thut khai thc d liu.

    - Hin th kt qu cho ngi dng vi nh dng d hiu.

    Quy trnh:

    - Qui trnh kinh doanh t ng v hp l a ra cc kt qu c bo chng, hiu qu v c th bo co vi nhiu mc ch khc nhau.

  • 29

    - Lin kt trc tip cc php o c vi tin trnh. y mnh ci tin chu trnh kinh doanh lin tc. T vic bo co da theo cc php o thng qua s thay i trong kinh doanh n vic bo co nhng kt qu thay i , v lp li qu trnh ti u ha hn na.

    3.3 Nhng c tnh v li ch

    Cung cp hiu bit tng tn gia cc mu v mi quan h n trong d liu ca bn:

    Mt v d in hnh ca vic khai thc d liu l mt ngi bn l pht hin ra mi quan h gia vic bn t lt v bia vo nhng chiu ch nht Hai sn phm ny chng c quan h g vi nhau. Nhng nu tnh c nhnhg ng chng pht hin trong kho hng c bia th h s nht bia ln thay v t lt iu ny s khng c pht hin trc khi khai thc d liu.

    Cho php bn khai thc nhng tng quan ci thin cng vic

    Tip tc v d trn, nhng ngi bn l thng hot ng trong nhng mi quan h h c bng cch dng chin thut lin kt cc hn mc vi nhau kch thch s mua hng. Cc doanh nghip c th thu li t theo cch lm ging nhau s dng nhng kiu mu c khm ph mi nht v nhng tng quan nh c s thi hnh ci thin hiu qu v hiu lc.

    c kt cc b quyt cho tng lai

    Khng hc t qu kh chc chn s b vp li l mt li trch ni ting t nh trit hc George Santayana. Vic khai thc d liu c kh nng d on nhng hu qu da vo d liu c ci thin ng k cht lng v nhng hu qu trc khi a ra quyt nh. Ly v d n gin, l ngi quyt nh tt nn kt hp cc giai an m khch hng thanh ton ng hn v li dng nhng thng tin hu ch ny a ra nhng quyt nh.

    Cho php a cc khuyn co vo trong ng dng

    Bn c th dng kt qu khai thc d liu trnh by mt bn tng kt thu chi n gin v ua ra nhng khuyn co vo hat ng ng dng. V d trn mn hnh thanh ton bn c th thm cu: Da vo s liu mi c n 85% kh nng khch hng ny tr chm, v th ha n ny c ngh tr trc 50%. Vic lp bo co

  • 30

    da trn kt qu tng th ging nh thi gian thu hi tin hng tn ng (DSO) gip bn o c s tin trin trong kinh doanh da vo cc xut cho php hay khng cho php bn tinh chnh m hnh v cc khuyn co c hiu qu ti u, cho php bn tn dng trit cc phm vi thut ton.

    Khng c thut ton no ti u cho tt c cc tnh hung. V vy bn nn th cc phm vi tm ra tht ton ph hp nht cho d liu ca bn.

    Nu bn c nhiu thut ton hp l bn c th dng tt c V d: Da trn s phn tch ca 3 m hnh d an th kh nng khch hng ny tr chm l: M hnh A: 95% (96% ng), M hnh B: 89% (92% ng), M hnh C: 76% (97% ng).

    C th p dng cho bt c BI hoc tin trnh kinh doanh no

    Tch hp vi cc thnh phn khc ca h Pentaho BI cho php bn d dng p dng khai thc d liu cho bt k tin trnh no trong h thng (chn hng nh quay vng tin mt) v qui trnh kinh doanh thng minh (nh pht sinh bo co, ha n, v nhg hnh ng tri quy lut). Vic ng dng ny rt linh hat ty theo d kin ca tin trnh BI c thc hin.

    Trch dn, to, khai thc c ci nhn su sc hn trong phn tch ca bn

    iu ny xy ra khi d liu c sinh ra hoc mt phn trong tin trnh chun b d liu. V d khi lm bo co bn hng bn c th hin vng ha m bn dng cho khai thc d liu sau ny. Ngai ra bn cng c th thm d liu trong qu trnh chun b khai thc d liu nh cc bin tnh ton hay n v o lng khc.

    Cch khai thc d liu.

    Chn mt m hnh

    Cc nh phn tch c th lm vic trn phm vi m hnh trc quan bao gm cc hnh thc tin tin ca khai thc d liu nh l xp nhm, phn on, cc quyt nh hnh cy, kiu ngu nhin, kiu hnh mng, v phn tch nhn t thit yu.

    Thm d liu

    C th thm cc tnh nng khc cho d liu. V d, bn c th nh ngha cc bin h thng c th t ng ly d liu to thm cc ct mi phn tch.

    Ph hp

  • 31

    Mi m hnh lm vic c nhng tham s ring ph hp vi d liu mu. Nhng ngi phn tch c th dng tham s ny mt cch t ng hay iu chnh bng tay (ph thuc vo m hnh)

    nh gi

    Kt qu c th c nh gi theo m hnh dng d liu c so vi kt qu thc t

    Tnh hon ho

    p dng m hnh hun luyn trong qui trnh. Sau khi c hun luyn chc chn s a ra kt qu tt nht cho mc ch kinh doanh c th cn p dng.

    D liu u ra

    p dng m hnh hun luyn trong qui trnh. Sau khi c hun luyn chc chn s a ra kt qu tt nht cho mc ch kinh doanh c th cn p dng.

    K thut

    Cng c khai thc d liu mnh

    Cung cp mt cng c hc gii thut ton din t d n Weka bao gm xp nhm, chia on, quyt nh hnh cy, kiu ngu nhin, kiu hnh mng, v phn tch nhn t thit yu.

    Pentaho tch hp vi h Pentaho BI x l chuyn i nh dng d liu t ng thnh cc nh dng m cng c khai thc d liu cn[8].

    Gii thut c th p dng trc tip vo d liu hoc gi t Java.

    u ra c th xem dng th tng tc vi chng trnh hoc dng ngun d liu to bo co, phn tch su hn hay cc x l khc na.

    B lc h tr vic phn ri ho, bnh thng ha, mu s dng li, chn lc thuc tnh, thay i v kt hp thuc tnh.

    Cng c phn loi cung cp cc m hnh d on nhng s lng o v thc. S hc bao gm nhng cy quyt nh v danh sch, nhng my vc t h tr, perceptrons nhiu lp, hi quy logic, mng Baye v cc k thut tin tin khc[9].

  • 32

    Cng c khai thc d liu l mt h han ho trong vic pht trin my hc s gip khch hng kt hp cht ch cc m hnh ca h.

    u vo v u ra c kim sot cht ch, cho php ngi pht trin a ra nhng gii php hon ton ty bin s dng nhng thnh phn c cung cp.

    Cng c thit k trc quan

    Cng c thit k khai thc d liu v qun tr trc quan c tch hp theo chun Pentaho v c h tr trong Eclipse.

    Cung cp giao din ngi dng trc quan trong vic tin x l d liu, phn loi, hi qui, xp loi, qui lut hip hi v trc quan ha.

    Bo mt v t chun

    Cung cp bo mt vai tr v qui tc kinh doanh

    H tr Java ng nhp mt ln v LDAP tch hp vi cc bo mt doanh nhip ang tn ti

    H tr mc ch kim th. D liu kim th c th in ngay ra bo co v c tch hp vi cc c tnh ca tin trnh trong h Pentaho BI

    Cc nh ngha Web Services, Repositories, XML

    Cc thnh phn c giao din ha c s dng mt cch linh hat

    Kho d liu tp trung cha cc biu bo, biu mu, truy vn v cc ni dung khc

    Cc nh ngha v ni dung c lu dng XML c th to, sa cha bng nhiu cch khc hn l ch trn giao din v d sa file XML bng tay.

    Tnh mm do v tnh thc thi

    c thit k trin khai trong cc doanh nghip, ng dng vi chc nng phong ph chy trn nn J2EE bao gm JBoss , ngoi ra cn c tnh nng mm do nh l phn nhm.

  • 33

    Chng 4. Gii thiu bi ton trin khai trn Pentaho v kt qu t c

    4.1. Gii thiu bi ton

    minh ha cho vic s dng cng c pentaho trong vic xy dng bo co ti chnh em xin trnh by v d c th sau:

    Xem xt, nh gi s nh hng ca gi du, t gi USD/VND,ch s VNIndex ln gi vng.

    Mi trng thc hin:

    - H iu hnh windows 7

    - H qun tr c s d liu Mysql

    - B cng c pentaho

    4.2. Thu thp,x l d liu D liu v t gi USD/VND c ly ti:

    http://www.oanda.com/currency/historical-rates File ti v l file excel c dng:

    Hnh 10. D liu t gi

  • 34

    D liu v gi vng c ly ti a ch: http://www.perthmint.com.au/investment_invest_in_gold_precious_metal_prices.aspx File ti v c dng:

    Hnh 11. D liu gi vng

    n gi c tnh theo n v USD/ounce D liu gi du c ly ti a ch: http://sdw.ecb.europa.eu/browse.do?node=2120782 File ti v c dng:

  • 35

    Hnh 12. D liu gi du

    n v tnh l USD / Thng D liu ch s VnIndex c ti ti a ch http://www.cophieu68.com/datametastock File ti v c dng:

    Hnh 13. D liu ch s VnIndex

  • 36

    Theo d liu thu thp c th ta thy d liu khng ng nht. V d nh trong d liu t gi th thng tin v t gi c y cc ngy, nhng trong d liu v gi vng th khng c d liu v gi vng trong ngy th 7 v ch nht hng tun, v d liu v gi du ch c theo thng.Vy gii php trong bi ny em chn ng b d liu l d liu thiu s c thm vo bng cch ly d liu ca ngy trc , d liu hng ngy ca gi du s bng d liu gi du ca thng .Nh vy, d liu c ng b.

    Bc tip theo ta tng hp d liu thnh 1 file excel vi y thng tin nh hnh sau:

    Hnh 14. D liu tng hp

    Trng timekey c thm theo cch nm+thng+ngy vit lin. Nh vy ta lm sch c d liu, loi b d liu d tha.

    4.3. To data warehouse

    p dng cng c data intergation trong b cng c ca pentaho l Spoon, ta to kho d liu nh sau:

    Kho d liu c 3 bng: 2 bng chiu v 1 bng chnh, 2 bng chiu l : bng dim_time : a ra cc d liu v ngy, thng, qu, nm. Bng dim_factor: a ra cc nhn t dng x l. Mt bng chnh l bng fact_price cha thng tin gi ca tng nhn t ti tng thi im.

    Cu trc bng v s quan h c m t trong hnh sau:

  • 37

    Hnh 15. M hnh kho d liu

    M cng c Spoon, ta chn File -> New -> Transformation. D liu u vo l file excel lu di dng .csv cha ton b d liu c chun

    ha do trong Step ta chn phn input l CSV file input ko v th biu tng ny vo trong vng thao tc ta c:

    Hnh 16. Spoon workspace

    Click p vo i tng ny ta thay i cc thuc tnh ca n nh step name, file name (ng dn n file d liu .csv), delimiter (k t ngn cch gia cc trng trong file .csv), sau ta n Get Fields v sa i tn cc trng cho ph hp:

  • 38

    Hnh 17. Spoon nhp d liu

    lm bc tip theo, ta phi to mt c s d liu trng trong Mysql. Ta dng Mysql Query Browser to c s d liu mi thng qua truy vn CREATE DATABASE data_price. Vi data_price l tn data warehouse cn to.

    Tr li Spoon trong phn step ta chn trong tab Data Warehouse ko th Combination lookup / update vo khung lm vic. Sau ko di chut tri+ shift t bc input CSV sang bc Combination lookup/update.

    Hnh 18. Combination Lookup/Update

    Click p vo bc Combination lookup/update thay i cc thuc tnh

  • 39

    Hnh 19. Thay i thuc tnh

    Phn connection ta chn new nu cha c kt ni no:

    Hnh 20. Kt ni c s d liu

    Ta chn Mysql trong phn Connection Type, in thng tin c s d liu, connection name, chn test, nu kt ni thnh cng ta chn OK.

    Quay tr li ca s Combination lookup/update ta in cc thng s , y trong bc ny ta s to bng dim_time

  • 40

    Hnh 21. To bng Dim_time

    n nt Get Fields load cc trng trong file excel , ta loi b nhng trng khng xut hin trong bng dim_time, t trng kha cho bng dim_time, tick vo Remove lookup fields? cc trng ny khng xut hin trong cc bng sau. n nt SQL xem cc cu lnh sql to bng sau n nt Execute to bng Dim_time(time_id,timekey,month,quarter,year).

    Tng t ta cng ko thm 1 step Combination lookup/update na v ni tip vi bc to bng dim_time trn:

    Hnh 22. To bng dim_factor

  • 41

    Trong bng ny ch c 2 trng l factor_key t ng sinh ra v l kha chnh v trng factor cha tn ca cc nhn t nh hng.

    Bc tip theo ta to bng fact_price, y l bng output v n bao hm 2 bng

    trn. Do trong phn step ta ko th Table output.

    Hnh 23. To Table Output

    Click p vo Table output ta thay i cc thng s cho ph hp:

    Hnh 24. To bng fact_price

    n SQL xem cu lnh sql v n nt Execute to bng. Ta lu transformation ny vo v n nt v chn Launch nhp d liu vo trong c s d liu c to.

  • 42

    Hnh 25. Nhp d liu

    Nh vy ta to thnh cng data warehouse all_price bng cng c Spoon.

    4.4. X l d liu bng k thut OLAP 4.4.1. To cube to cube ta dng cng c Schema Workbench trong b cng c Pentaho.

    Trc tin ta phi to kt ni ti c s d liu Mysql bng cch trong menu Tools ta chn Connection hin ra ca s, ta in cc thng s kt ni ti Mysql:

    Hnh 26. Kt ni c s d liu

    Ta to 1 schema mi v 1 cube vi vi cc o sum v avg i vi gi nh trong hnh:

  • 43

    Hnh 27. Kin trc Cube

    Sau khi to c cube, ta publish cube ny ln h thng pentaho vi thng tin y v server v ti khon user trong pentaho.

    Hnh 28. Repository Login

    Ta lu li file cube v publish schema v cube ln h thng pentaho.

    4.4.2. Analysis View Pentaho cung cp tin ch p dng k thut OLAP l Analysis View. Ngoi ra

    chng ta c th p dng cng c c pentaho pht trin ring s dng OLAP l Mondrian.

    Trong bi ny em xin trnh by cch p dng tin ch Analysis View ng dng k thut OLAP.

  • 44

    Trc tin ta phi kt ni Pentaho n c s d liu m chng ta cn phn tch trn h c s d liu Mysql. kt ni n c s d liu ny ta vo folder ci t Pentaho , vo folder administration-console v chy file start-pac.bat khi ng Administration Console. Sau ta vo trnh duyt v chy link : http://localhost:8099 s hin ra khung ng nhp, ti khon admin mc nh l user: admin / password: password. to kt ni ti mysql v data warehouse c to ta vo tab Database Connection.

    Trong bi ny em s dng c s d liu all_price v h c s d liu Mysql nn ta s nhp nh hnh sau:

    Hnh 29. Kt ni c s d liu

    Sau khi nhp y , ta n test kim tra kt ni, kt ni thnh cng ta chn OK lu kt ni ny. Nh vy ta kt ni thnh cng pentaho ti mysql.

    Bc tip theo ta vo a ch http://localhost:8080 vo Pentaho User Console. Hin ra khung ng nhp, ta in user v password vo, hoc c th s dng 1 vi account mu.

    Sau khi login vo s hin ra mn hnh nh sau:

  • 45

    Hnh 30. Khung lm vic Pentaho

    S dng Analysic View chn schema v cube c to ra bc trn.

    Hnh 31. Chn schema v cube

    Sau khi n OK s hin ra ca s nh sau:

    Hnh 32. D liu schema v cube

    Trn thanh Tools bar ta chn la chn o, columns, rows v filter cho vic la chn hin th ni dung cc phn tch.

  • 46

    Hnh 33. Ni dung phn tch

    phn tch t gi USD/VN trong vng 10 nm t nm 2000 ti 2010, trong phn Measures ta chn avg price:

    Hnh 34. Chn Measures

    Phn factor ta chn exchange:

    Hnh 35. Chn factor

    Trong phn thi gian ta chn cc nm t 2000 n 2010, y ta so snh gi tr trung bnh ca t gi ca tng nm.

  • 47

    Hnh 36. Chn nm phn tch

    Tuy nhin ta c th la chn thi gian chi tit hn theo thng thng, tng qu, v tng ngy bng cch chn nt :

    Hnh 37. Chn chi tit ngy thng

    V hin th biu t gi trong vng 10 nm qua, ta chn la chn kiu biu :

  • 48

    Hnh 38. Chn loi biu

    Sau khi chn xong nh dng cho biu , ta chn nt : hin th biu t gi usd / vnd trong vng 10 nm qua:

    Hnh 39. Biu t gi USD/VND

    Da vo biu ta c th thy t gi USD/VN thay i nhiu nht trong nhng nm 2008 tr li y v ang c xu hng tng. Tng t ta c biu gi vng:

  • 49

    Hnh 40. Biu gi vng

    Gi vng trong 10 nm gn y tng mnh, c bit l t nm 2005 ti nay, gi vng bin i v tng lin tc. Da vo biu ta c th thy gi vng ang c xu th tng.

    Biu gi du:

    Hnh 41. Biu gi du

    Gi du c nhiu bin ng trong 10 nm tr li y. Gi du tng gim tht thng, rt kh d on. Gi du cao nht vo khong gia nm 2007. V hin nay ang c xu hng tng tr li. Biu ch s VnIndex:

  • 50

    Hnh 42. Biu ch s VnIndex

    Ch s VnIndex ca nc ta c bin ng rt ln. T nm 2000 ti nm 2005 ch s VnIndex tng rt chm, nhng sau nm 2005 ti nm 2007 ch s VNIndex lin tc tng cao. V ri xung thp nht vo cui nm 2008 u 2009, hin nay ang c du hiu phc hi v cn bng. Biu gi vng v gi du:

    Hnh 43. Biu gi vng v gi du

    Biu gi vng v t gi USD /VN

  • 51

    Hnh 44. Biu t gi v gi vng

    Da vo biu ta c th nhin thy s lin quan gia gi vng v t gi USD/VND, hu nh chng u cng tng v cng gim. Biu ch s VNIndex v gi vng:

    Hnh 45. Biu gi vng v VNIndex

    Da vo biu ta nhn thy rng gi vng v ch s VNIndex t c mi lin h vi nhau. Do kh c th kt lun xu hng ca gi vng da vo xu hng ca ch s VNIndex.

  • 52

    Kt lun Qua nhng phn tch v ng dng trong bi bo co ny a ra cho thy vic p

    dng kho d liu v cc k thut OLAP trong tng lai s l tt yu v l xu th cc doanh nghip ng dng.

    Kha lun t c nhng kt qu:

    - Tm hiu v phn tch k thut kho d liu v ng dng trong lnh vc ti chnh.

    - Tm hiu v phn tch k thut OLAP, cc m hnh lu tr h tr OLAP, ch ra cc u v nhc im ca cc m hnh lu tr .

    - Phn bit c s khc nhau gia OLTP v OLAP

    - Gii thiu b cng c Business Intelligent l Pentaho v p dng.

    - Phn tch d liu bin ng ca gi la, gi vng v ch s VNIndex.

  • 53

    Ti liu tham kho

    Ting Vit

    [1] Kho d liu . http://vi.wikipedia.org/wiki/Kho_d%E1%BB%AF_li%E1%BB%87u

    [2] Ths. Nguyn Th Quyn. Gii thiu v kin trc khi ca OLAP. Tp ch Cng ngh thng tin & Truyn thng. http://www.tapchibcvt.gov.vn/News/PrintView.aspx?ID=15695

    Ting Anh

    [3]. Djoni Darmawikarta. Dimensional Data Warehousing with MySql. Brainy Software Corp, 2007.

    [4]. Don Jones. Why is OLAP Faster than OLTP. http://nexus.realtimepublishers.com/tips/Data_Warehousing/Why_Is_OLAP_Faster_Than_OLTP.php

    [5]. Hari Mailvaganam. Introduction to OLAP. http://www.dwreview.com/OLAP/Introduction_OLAP.html

    [6]. Kefa Rabah. Pentaho Business Intelligene BI Suite Training Manual. Global Open Versity, 2007. Tr. 1-23.

    [7]. Online Analytical Processing. Wikipedia.org.

    [8]. Pentaho Corporation. Pentaho Training Course 2010 Edition. Pentaho Corporation, 2007. Tr 1-13.

    [9]. Pentaho Corporation. Pentaho Analysis Viewer User Guide. Pentaho Corporation, 2007. Tr 1-23.

    [10]. Roland Bouman- Jos Van Dongen. Business Intelligence and Data Warehousing with Pentaho and Mysql- Pentaho Solutions. Wiley Publishing,Inc, 2009. Tr 3-309.

    [11]. S.Nagabhushana. Data warehousing Olap and Data mining. New Age International Publishers, 2006. Tr. 24-246.

  • 54

    [12]. Seth Grimes. Mysql V5- Ready for Prime Time Business Intelligence. Alta Plana Corporation, 2006. Tr 2-23.

    [13]. Surajit Chaudhuri- Umeshwar Dayal. An Overview of Data warehouse and OLAP Technology. Tr 2-10.

    [14] Thomas C.Hammergren- Alan R. Simon. Data warehousing for dummies. Wiley Publishing,Inc. Tr 9-95.

    [15] MOLAP, ROLAP, And HOLAP http://www.1keydata.com/datawarehousing/molap-rolap.html