test labs 2016. Тестирование data warehouse

20
1 © Luxoft Training 2012 1 © Luxoft Training 2012 TEST Labs 2016 Тестирование Data Warehouse (DWH) Юрий Слива Luxoft

Upload: sasha-soleev

Post on 21-Mar-2017

200 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Test labs 2016. Тестирование data warehouse

1 © L

uxoft

Tra

inin

g 2

012

1 © L

uxoft

Tra

inin

g 2

012

TEST Labs 2016

Тестирование Data

Warehouse (DWH)

Юрий Слива

Luxoft

Page 2: Test labs 2016. Тестирование data warehouse

1. Введение.

2. Основные понятия и принципы работы DWH.

3. Тестирование DWH. С чего начать?

4. SQL(DDL, DML, DCL) и их использование в тестировании.

5. Tips and tricks. QA.

Содержание курса

Page 3: Test labs 2016. Тестирование data warehouse

3 © L

uxoft

Tra

inin

g 2

012

3 © L

uxoft

Tra

inin

g 2

012

Тестирование DWH

Введение

Page 4: Test labs 2016. Тестирование data warehouse

Relational database

A relational database is a collection of data

items organized as a set of formally-described

tables from which data can be accessed or

reassembled in many different ways without

having to reorganize the database tables.

The standard user and application

program interface to a relational

database is the structured query

language (SQL).

Page 5: Test labs 2016. Тестирование data warehouse

5 © L

uxoft

Tra

inin

g 2

012

5 © L

uxoft

Tra

inin

g 2

012

Тестирование DWH

Основные понятия и принципы работы DWH.

Page 6: Test labs 2016. Тестирование data warehouse

Why a Data Warehouse is Separated from

Operational Databases?

• An operational database is constructed for well-

known tasks and workloads such as searching

particular records, indexing, etc.

• In contract, data warehouse queries are often

complex and they present a general form of

data.

• Operational databases support concurrent

processing of multiple transactions.

• Concurrency control and recovery mechanisms

are required for operational databases to ensure

robustness and consistency of the database.

• An operational database query allows to read

and modify operations, while an OLAP query

needs only read only access of stored data.

• An operational database maintains current data.

On the other hand, a data warehouse maintains

historical data.

Page 7: Test labs 2016. Тестирование data warehouse

What is Data Warehouse?

• A data warehouse is a database, which is kept

separate from the organization's operational

database.

• There is no frequent updating done in a data

warehouse.

• It possesses consolidated historical data, which

helps the organization to analyse its business.

• A data warehouse helps executives to organize,

understand, and use their data to take strategic

decisions.

• Data warehouse systems help in the integration

of diversity of application systems.

• A data warehouse system helps in consolidated

historical data analysis.

Page 8: Test labs 2016. Тестирование data warehouse

8 © L

uxoft

Tra

inin

g 2

012

8 © L

uxoft

Tra

inin

g 2

012

Тестирование DWH

Тестирование DWH. С чего начать?

Page 9: Test labs 2016. Тестирование data warehouse

E

TL

Source data

Transformed

data

Business application specific data

Business application specific data

ET

L

T r a n s f o r m e d d a t a

Local

storage area

Dimensions

Schema 1

Application1

Pipe - delimited data

Feed 1

Feed 2

Real - time feeds

Feed 3

( Web Services )

Feeds

Static Data

DATA ( Oracle DB )

XLS

CSV

CSV

CSV

XLS

CSV

Application area 1 Staging Area

J M

S

T r a

n s f o

r m a t i o

n a

r e a

( I

o r

t c a )

Transformed data from Schema 1

Application 3

Application area 3

T r a n s f o r m e d d a t a Application 2

Application area 2

Reporting

App

App

Reporting

Reporting

ET

L

ET

L

Shared Folder

SFTP

SFTP

SFTP

from Schema 1

from Schema 1

Transformed

data

Transformed

data

DWH - high level

Page 10: Test labs 2016. Тестирование data warehouse

DWH Testing Process

Test Preparation

Following task’s should be done on test

preparation phase:

- Analyse requirements

- Create test plan

- Clarify open points

- Create test pack (test cases)

- Mitigate risks

Test Execution

• Test Scripts and Test Cases execution -

it is the responsibility of the Testers, and

test Results are recorded by tester in

the Bug tracking system.

• The tester will record any defects

identified during test execution in the

Defect Management system

• Defects will be logged in Defect

Management System, according to the

Defect Management process definition.

Page 11: Test labs 2016. Тестирование data warehouse

DWH – feeds testing Legend:

System Parameter, ie parameter is generated by system Parameter 1

Parametrized XML parameter (i.e. value of tag is derived from one of system field) <Attribute>

Line # Xpath (open tag) Input Parameter Xpath (close tag) R/O/C

1 <?xml version="1.0" encoding="UTF-8"?>

2 <publicExecutionReport xmlns="http://www.fpml.org/FpML-5/transparency"

3 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" fpmlVersion="5-3"

4 xsi:schemaLocation="http://www.fpml.org/FpML-5/transparency

../../xmls/SDR/transparency/fpml-main-5-3.xsd">

5 <header>

6 <messageId messageIdScheme=" Data prefix "> Required

7 Internal TWH Message SID </messageId>

8 <sentBy> Data value </sentBy> Required

9 <sendTo>DTCCGTR</sendTo>

10 <creationTimestamp> Message Creation Date/Time </creationTimestamp>

11 </header>

12

SELECT

to_char(EXECUTIONDATETIME2)

FROM SCHEMA_OWNER.TABLE T1,

XMLTABLE

(

XMLNAMESPACES

(

'http://www.fpml.org/FpML-5/transparency' AS "ns"

),

'//ns:publicExecutionReport'

PASSING XMLType(T.MESSAGE_C)

COLUMNS -- columns for parsed values

EXECUTIONDATETIME2 VARCHAR2(200) PATH

'//ns:termination/ns:executionDateTime/text()'

)t2

where

t1.id = 1

Page 12: Test labs 2016. Тестирование data warehouse

DWH – Staging Area

A staging area, or landing zone, is an intermediate storage area used for data

processing during the extract, transform and load (ETL) process. The data

staging area sits between the data source(s) and the data target(s), which are

often data warehouses, data marts, or other data repositories.[1]

Page 13: Test labs 2016. Тестирование data warehouse

List of the most popular ETL tools

• Informatica - Power Center

• IBM - Websphere DataStage(Formerly known

as Ascential DataStage)

• SAP - BusinessObjects Data Integrator

• IBM - Cognos Data Manager (Formerly known

as Cognos DecisionStream)

• Microsoft - SQL Server Integration Services

• Oracle - Data Integrator (Formerly known as

Sunopsis Data Conductor)

• SAS - Data Integration Studio

• Oracle - Warehouse Builder

• AB Initio

• Information Builders - Data Migrator

• Pentaho - Pentaho Data Integration

• Embarcadero Technologies - DT/Studio

• IKAN - ETL4ALL

• IBM - DB2 Warehouse Edition

• Pervasive - Data Integrator

• ETL Solutions Ltd. - Transformation Manager

• Group 1 Software (Sagent) – DataFlow

• Sybase - Data Integrated Suite ETL

• Talend - Talend Open Studio

• Expressor Software - Expressor Semantic Data

Integration System

• Elixir - Elixir Repertoire

• OpenSys - CloverETL

Page 14: Test labs 2016. Тестирование data warehouse

ETL Testing

Key points:

• Ensure that data is transformed correctly

• Without any data loss and truncation projected

• Data should be loaded into the data warehouse

• ETL application appropriately rejects and

replaces with default values and reports invalid

data

• Make sure that the data loaded in data

warehouse within prescribed and expected time

frames to confirm scalability and performance

• All methods should have appropriate unit tests

regardless of visibility

• To measure their effectiveness all unit tests

should use appropriate coverage techniques

• Strive for one assertion per test case

• Create unit tests that target exceptions

Testers key responsibilities:

• Stage table testing

• Business transformation logic applied

• Target table loading from stage file or table after

applying a transformation.

Page 15: Test labs 2016. Тестирование data warehouse

Mapping

Source -> Staging

Staging to CSV-file

Page 16: Test labs 2016. Тестирование data warehouse

16 © L

uxoft

Tra

inin

g 2

012

16 © L

uxoft

Tra

inin

g 2

012

Тестирование DWH

SQL(DDL, DML, DCL) и их использование в

тестировании

Page 17: Test labs 2016. Тестирование data warehouse

SQL(DDL, DML, DCL)

Data Definition Language (DDL) are used to define the database structure or schema. Examples:

CREATE - to create objects in the database

ALTER - alters the structure of the database

DROP - delete objects from the database

TRUNCATE - remove all records from a table, including all spaces allocated for the records are removed

COMMENT - add comments to the data dictionary

RENAME - rename an object

Data Manipulation Language (DML) are used for managing data within schema objects. Examples:

SELECT - retrieve data from the a database

INSERT - insert data into a table

UPDATE - updates existing data within a table

DELETE - deletes all records from a table, the space for the records remain

MERGE - UPSERT operation (insert or update)

CALL - call a PL/SQL or Java subprogram

EXPLAIN PLAN - explain access path to data

LOCK TABLE - control concurrency

Data Control Language (DCL) is used for privileges. Examples:

GRANT - gives user's access privileges to database

REVOKE - withdraw access privileges given with the GRANT command

Page 18: Test labs 2016. Тестирование data warehouse

18 © L

uxoft

Tra

inin

g 2

012

18 © L

uxoft

Tra

inin

g 2

012

Тестирование DWH

SQL(DDL, DML, DCL) и их использование в

тестировании

Page 19: Test labs 2016. Тестирование data warehouse

Tips and tricks

CREATE SEQUENCE Name [START WITH first value]

[INCREMENT BY increment_value];

SEQUENCE

PARTITION BY

SELECT col1, col2, SUM(col3) sum_col3

FROM Table

GROUP BY col1, col2;

SELECT id, col1, col2, SUM(col3)

OVER (PARTITION BY col1, col2) sum_col3

FROM Table;

ROW_NUMBER

RANK

SELECT *, ROW_NUMBER() OVER(ORDER BY type)

num, RANK() OVER(ORDER BY type) rnk

FROM WORK_PRN

code model color type price num rnk

1 1276 n Laser 259 3 3

2 1433 y Jet 302 1 1

3 1434 y Jet 243 2 1

4 1401 n Matrix 139 5 5

5 1408 n Matrix 280 6 5

6 1288 n Laser 402 4 3

Page 20: Test labs 2016. Тестирование data warehouse

20 © L

uxoft

Tra

inin

g 2

012

20 © L

uxoft

Tra

inin

g 2

012

Тестирование DWH

Questions