dwh (

23
DATAWAREHOUSE / ETL TESTING Reason for build Data warehouse: 1) Data is scattered at different places. 2) Data inconsistency. 3) Depending on volatile and non-volatile. [Data keep on changing]

Upload: narain565262384061

Post on 17-Aug-2015

224 views

Category:

Documents


3 download

DESCRIPTION

data ware house

TRANSCRIPT

DATAWAREHOUSE / ETL TESTINGReason for build Data warehouse:1) Data is scattered at different places.2) Data inconsistency.3) Depending on volatile and non-volatile. [Data eep on changing! ") #urrogateeys $eans %rtificial &eysData warehouse: - Ralph &i$ball% data warehouse is relation $anage$ent syste$' which is specifically design business analysis and $aing decisions to active the business goals. % data warehouse is design to support decision $aing process hence it is called as decision supporting syste$ (D##). % data warehouse is historical database which stores historical business infor$ation re)uired for analysis.** % data warehouse is +read only database, which supports business $anages to )uery the data re)uired for analysis' but not for business transactions processing.% data warehouse is a integrated database which stores the data in an integrated for$atted' the data is collected fro$ $ultiple -./0 source syste$s.Data warehouse: 1.2.3n$on% data warehouse is a 1) /i$e 4ariant2) 5on-volatile3) #ub6ect -riented") 3ntegrated database .oad78traction%pplication -riented#ub6ect oriented78:- -ver 9usiness %pplication 78traction .oad 78: - -ver #ales %pplication .oadStudent admissionStudent fee detailsStudent examinationStudentsubjectsOLTP SystemTransaction (0r) IntegrateStudent SubjectData warehouseCurrent Saving Checking Transaction (or) Integrated Account SubjectOLTP systemData Order AlicationIntegration AlicationOLTP SystemTransaction (0r) IntegrateSales SubjectData warehouse78traction%pplication -riented:haracteristic features of Data warehouse:-1) /i$e-4ariant:- % Data warehouse is a ti$e-variant database which supports business needs of end users in co$paring and analy;ing the business with different ti$e periods. /his is also nown as /i$e series analysis.yvarience#ales%$ount< 8 =uarter product82) 5on-volatile: - i. % data warehouse is a non-volatile.ii. -nce the data entered into the data warehouse it doesn,t reflect to the change which taes place at operational database .2ence the data is >static? in the data warehouse.3) #ub6ect oriented: - % data warehouse is a sub6ect oriented database which supports the business need of depart$ent specific users.78: #ales' accounts' hr' students' loans etc.. % sub6ect is derived fro$ $ulti -./0 applications which organi;e the data two $eet specificbusiness functionality.") 3ntegrated:-a data warehouse is an integrated database which collects the data fro$ $ultiple -./0 databases infor$ation.Shiment AlicationUsers PaymentCalendar (or) TimeData YearQuarterMonthWeekDa Q!Q" SA#$SDW% -./0 D9 at @.&.oad 78traction -./0 D9 at 35D% data warehouse is container to store the business.Data warehousing:-% data warehousing is process of building a data warehousing. % process includesi. 9usiness re)uire$ent analysisii. Database designiii. 7/. develop$ent and testingiv. Report develop$ent and testing% 9usiness analyst and onsite technical co-ordinaters the business re)uire$ent and technical re)uire$ent.9R#(9usiness Re)uire$ent specification) :- % 9R# contains the business re)uire$ents which are collected by an analyst.% #R# contains software and hardware re)uire$ent which are collected by senior technical people./he process of designing the database is called as a $odeling or di$ensional $odeling. % database architect or data $odeler designs the warehouse with set of tables.-.%0 (online analytical processing): -%n -.%0 is technology which supports the business $anagers to $ae a )uery fro$ date warehouse. %n -.%0 provides the gateway between users and data warehouse.% data warehouse is nown as -.%0 database78: :ognos' 9o,sDifferences between -./0 database and data warehouse:-./0 D121) 3t is design to support business transaction processing2) 4olatile data1)3t is design to support a decision $aing processing2)non-volatile dataIntegrate3) :urrent data") Detailed dataA) Design for running the business B) 5or$ali;ationC) %pplication oriented dataD) Design forcritical operationE) 7R-$odeling3)historical data") su$$ary dataA) design foranaly;ing the businessB) de-nor$ali;ationC) sub6ect oriented dataD) design for $anagerial operationE)di$ensional $odeling 7nterprising Data warehousing ob6ects: -% relational database is a defined as collection of ob6ects such as tables' views' procedures' $acros' triggers etc.../able: - % table is a two directional ob6ect where the data can be stored in the for$ of rows and colu$ns.4iew: - % view is lie a window into one or $ore table it provides a containise access to the base table and provides.1) Restrictingwhich colu$ns are visible fro$ base tables2) Restricting which rows are visible fro$ base tables.3) :o$bining rows and colu$ns fro$ several base tables. 3t $ay be define a sub6ect of rows of table. 3t $ay be define sub6ect of colu$ns of table.Data warehouse-RD9F#: -/he following are the relation databases can be defined to build data warehousingi. -racleii. #)l serveriii. 39F D92iv. /era datav. Green plu$vi. 5ete;;avii. #ybase viii. Redbrici8. infor$i8-ne of the best RD9F# to store $assive historical infor$ation' parallel storage' parallel retrieval is +/eradata,.Data ac)uisition: -3t is a process of e8tracting the relevant business infor$ation' transfor$ing data into re)uire business for$at and loading into the target syste$.% data ac)uisition is defined with following type of process1) data e8traction2) data transfor$ation3) data loading/here are two types of 7/. used to build data ac)uisition.1) :ode based 7/.2) G@3 based :ode based 7/.:%n 7/. application can be developed using so$e progra$$ing language as #=.'0.H#=.78: #%# base' #%# e8cess' /eradata' 7/. utilities /eradata 7/. utilities:i. 97/=ii. Iast .oadiii. Fulti loadiv. / pu$pG@3 base 7/.:- %n 7/. application can be design with si$ple graphical user interfacing' point and clic techni)ues.78: 3nfor$atica' Data stage' %binitio' -D3 (oracle data integrated)' data services' data $anger' ##3# (#=. server integration services).Data 78traction:3t is a process of reading the data fro$ various types of source syste$s. /he following are type of sources which are used to define e8traction.1) 7R0 sourcesi. #%0ii. -racle applicationsiii. J.D.7dwardsiv. 0eople soft2) Iile sourcesi. Joins?7$pno 7na$e #aldeptno23AAs$ith "KKK1K Deptno Dna$e loc1Ksales te8as7na$esaldeptnoS!S"8oinS!s""*,,smith-(((!( saleste5asS!S"998oinS! s" $name sal de:t dname locSales ta5QTY72rice7('!,Detailed dataSummardataSum()/rMa5() Dna$e loc4ertical $erging:3t is process of $erging the records vertically when the two sources are having sa$e +$eta data, (union).Feta data $eans data structures (two or three table colu$n na$es are sa$e) #ource syste$7$pno 7na$e #aldeptno staging23AAs$ith "KKK1K e$pno ena$e saldeptno32AAallen 21"K2KData .oading: 3t is the process of inserting the data into a target syste$. /here are two type of data code1) 3nitial load or full load2) 3ncre$ental load or delta load7/. client server technology: %n 7/. plan defines e8traction' transfor$ing and loadingL %n 7/. plan is design with followingtypes of $etadata.1) #ource definition: it is the structure of source data or source table fro$ data e8tracts.2) /arget definition: it is the structure of the target table to which data load3) /ransfor$ation rule: it defined the business logic used to for processing the data.Feta data:3t defined the structure of the data' which is represent as colu$n na$e' data type' precision'scale and eys (pri$ary ey' foreign ey). M:3D M:3D M:I5a$e M:5a$e M:.5a$e MGender MGenderS"union$m:noenamesalde:tno "*,,smith -((( !(S!C3D number(-) 2;C/#T2DW% 1e have to give the length of data type sa$e (or) $ore of the target syste$ but we have not given the less length of source syste$Data warehouse N Database design:% data warehouse is design with the following types of sche$a1. #tar sche$a2. #now flae sche$a3. Gala8y sche$a (:onistallation sche$a' 3ntegrated sche$a' highbred sche$a and $ultistar sche$a) /he process of designing the database is nown as data $odeling % database architect (or) data $odeler creates database sche$as using a G@3 base database designing tool called >7R315?. 3t is a process of co$puter associates.1) Star schema :% star sche$a is a database design which contains centrally located fact table which is surrounded by di$ension tables. #ince database design loos lie a star sche$a database design 3n data warehouse facts are nu$eric. % fact table contains facts. 5ot every nu$eric is a fact but nu$eric which are of type ey perfor$ance indicatorare nown as facts Iacts are business $easures which are used to evaluate the perfor$ance of an entireprice % fact table contain the facts at lowest level granularity % fact granularity deter$ine level of details % di$ension is adescriptive data which describes the ey perfor$ance nown as facts % di$ension tablecontainsa de-nor$ali;ed data % fact tablecontains a nor$ali;ed data % fact table contains a co$ponent ey where each candidate ey is aforeign eytothe di$ension table % di$ension provides answer to the following business )uestion1) 1ho 2) what 3) when ") where Di$-custo$er Di$-/i$e #ale-/ransaction1ameAddress2honeCustomer?;e(2;)YearQuarterMonthlWeekDa

Date?;e(2;)CategorSub categor2roduct 2roduct?ke(2;)Market CodeMarket 1ameMarket>ke(2;)Date?ke(namec>add!(!AS. DS1.!(!AS.k:!(! AS. rjnrCustomec>namec>addressc>:honeDim>customerC>ke c>id c>namec>add !!(! AS. DS1. " !(! AS.k: *!(!AS. rjnrC>3DC>1AM$C>ADDC>2honeCustmer"'ey(P)*/ypes of -.%01. D-.%0 (Destop -.%0):%n -.%0 which can )uery the data fro$ a database which is contributed by using destop databases lie dbase' Io80ro' clipper etc.