testing pdi solutions - openin · 2019. 3. 14. · data fixtures twineworks • data fixtures...

82
Twineworks Testing PDI solutions Slawomir Chodnicki BI Consultant [email protected]

Upload: others

Post on 30-Dec-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Testing

PDI solutions

Slawomir Chodnicki

BI Consultant

[email protected]

Page 2: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksThe sample project

2

.

|-- bin # entry point scripts

|-- environments # environment configuration

|-- etl # ETL solution

`-- spec # tests and helpers

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 3: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest orchestration

$ bin/robot test

3Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 4: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest orchestration

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin4

Page 5: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest orchestration

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin5

Page 6: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest orchestration

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin6

Page 7: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Jenkins is a continuous integration server.

It’s basic role is to run the test suite and build any artifacts upon changes

in version control.

Example server:

http://ci.pentaho.com/

Page 8: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Page 9: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Page 10: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Page 11: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Jenkins is a continuous integration server.

It’s basic role is to run the test suite and build any artifacts upon changes

in version control.

Example server:

http://ci.pentaho.com/

Page 12: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Testable solutions

Page 13: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksConfiguration

management• configure all data sources/targets and

paths through kettle variables or

parameters

• local environment - not in version control

• test environment - reference

environment

• production environment - optional13

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 14: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksConfiguration

management

14

environments

`-- local # dev environment – not in version control

|-- environment.sh # shell environment variables

|-- my.cnf # database config file

`-- .kettle # KETTLE_HOME

|-- shared.xml # database connections

`-- kettle.properties # kettle variables

`-- test # test environment – in version control

`-- production # other environments

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 15: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksEnvironments are self-contained

• share nothing

• reproducible results:

– you can run it

– your team can run it

– ci-server can run it

15Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 16: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksConfiguration

management$ bin/robot spoon

16Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 17: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksConfiguration

management

17Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 18: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksInitialize db

$ bin/robot db reset

clearing database [ OK ]

initializing database

2017/10/20 15:32:37 - Kitchen - Start of run.

2017/10/20 15:32:37 - reset_dwh - Start of job execution

2017/10/20 15:32:41 - Kitchen - Finished!

2017/10/20 15:32:41 - Kitchen - Processing ended after 4 seconds.

[ OK ]

18Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 19: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksSub-systems/Phases

• Define sub-systems/phases – define pre-requisites

• data expected in certain sources

– define outcomes• data written to certain sinks

• A sub-system/phase of the ETL process is responsible for a small set of related side-effects to happen

19Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 20: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksSub-systems/Phases

20Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 21: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksSub-systems/Phases

21Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 22: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksEntry points

• Define entry points with a full functional

contract.

• An entry point implements an application

feature.

22Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 23: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksEntry points

23

Page 24: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Testing ETL

solutions

Page 25: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksKinds of automated tests

• Computation tests

• Integration tests

• Functional tests

• Non-functional tests

25Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 26: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksComputation tests

• Single unit of ETL under test

• Performs a computation (no side-effects)

• What is a “unit” in PDI?

– Job?

– Transformation?

– Sub-transformation (mapping)?

26Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 27: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksA simple computation test

Test job

spec/dwh/validate_params/validate_params_spec.kjb

27Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 28: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksA simple computation test

Test job

spec/dwh/validate_params/validate_params_spec.kjb

The job calls

etl/dwh/validate_params.kjb

- with DATA_DATE=2016-07-01 and expects it to succeed

- with DATA_DATE=1867-12-21 and expects it to fail

The job succeeds if all expectations are met. It fails otherwise.

28Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 29: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest jobs

29Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 30: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest transformation

results

30Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 31: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest transformation

results

31Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 32: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest sub-transformations

32Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 33: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest sub-transformations

33Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 34: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Integration tests

Page 35: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksIntegration tests

• ETL responsible for a set of related side-

effects under test

• Most common case in ETL testing

– Test individual phases of a batch process

35Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 36: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksIntegration tests

36Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 37: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksIntegration tests

37Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 38: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksIntegration tests

38Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 39: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Functional testing

Page 40: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksFunctional tests

• Entry point of ETL solution under test

• Assertions reflect invocation contract

– Behavior on happy path

– Behavior on errors

– Behavior on incorrect invocation

40Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 41: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest the daily run

41Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 42: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest the daily run

42Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 43: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest the daily run

43Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 44: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Non-functional tests

Page 45: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksNon-functional tests

• Performance– how long does workload x take?

• Stability– what does it take to break it?

– How much memory is too little?

– What happens when loading unexpected data? (truncated file, column too long, 50MB XML in string field, badly formatted CSV reads as single field, empty files)

45Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 46: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksNon-functional tests

• Security

– Verify configuration assumptions

automatically

• Compliance

– We must use version x of library y

46Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 47: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest compliance

47Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 48: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Scripting tests

Page 49: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksJRuby

JRuby is the ruby language on the JVM

http://jruby.org

Maintained by Redhat.

Runs Rails on JBoss

49Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 50: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksRspec

Rspec is a testing framework for ruby

http://rspec.info/

https://relishapp.com/rspec

50Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 51: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksRspec

$ bin/robot test

• Includes helper files in spec/support

• traverses the spec folder looking for files

whose names end in _spec.rb and loads

them as tests

51Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 52: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

52

describe "dwh clear job" do

end

end

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 53: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

53

describe "dwh clear job" do

describe "when db is not empty" do

end

end

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 54: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

54

describe "dwh clear job" do

describe "when db is not empty" do

before :all do

dwh_db.load_fixture "spec/fixtures/steelwheels/steelwheels.sql"

@result = run_job "etl/dwh/load/load.kjb", {}

end

end

end

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 55: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksRspec helpers

55

spec/support/spec_helpers.rb

def dwh_db

...

end

Returns a JDBC database object.

Connects on demand, and closes automatically when test-

suite ends.

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 56: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksRspec helpers

56

spec/support/spec_helpers.rb

def dwh_db

...

end

In additiondwh_db.load_fixture(path) allows loading a sql or json fixture file

dwh_db.reset() triggers $ bin/robot db reset

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 57: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

57

describe "dwh clear job" do

describe "when db is not empty" do

before :all do

dwh_db.load_fixture "spec/fixtures/steelwheels/steelwheels.sql"

}

end

end

end

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 58: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

58

describe "dwh clear job" do

describe "when db is not empty" do

before :all do

dwh_db.load_fixture "spec/fixtures/steelwheels/steelwheels.sql"

@result = run_job "etl/dwh/util/clear.kjb"

end

end

end

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 59: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksRspec helpers

59

spec/support/spec_helpers.rb

def run_job file, params

...

end

Runs a kettle job and returns a map {

:successful? => true/false,

:log => “log text”,

:result => [row1, row2, row3, …]

}

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 60: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

60

describe "dwh clear job" do

describe "when db is not empty" do

before :all do

dwh_db.load_fixture "spec/fixtures/steelwheels/steelwheels.sql"

@result = run_job "etl/dwh/util/clear.kjb"

end

end

end

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 61: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

61

describe "dwh clear job" do

describe "when db is not empty" do

before :all do

dwh_db.load_fixture "spec/fixtures/steelwheels/steelwheels.sql"

@result = run_job "etl/dwh/util/clear.kjb"

end

it "completes successfully" do

expect(@result[:successful?]).to be true

end

end

end

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 62: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

62

describe "dwh clear job" do

describe "when db is not empty" do

before :all do

dwh_db.load_fixture "spec/fixtures/steelwheels/steelwheels.sql"

@result = run_job "etl/dwh/util/clear.kjb"

end

it "completes successfully" do

expect(@result[:successful?]).to be true

end

it "clears the db" do

expect(dwh_db.query("SHOW TABLES").to_a.length).to eq 0

end

end

end

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 63: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest orchestration

$ bin/robot test

63Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 64: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest orchestration

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin64

Page 65: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest orchestration

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin65

Page 66: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest orchestration

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin66

Page 67: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksRunning jobs as rspec tests

67

spec/etl/etl_spec.rb

Recursively traverses etl/spec looking for files whose names

end in _spec.kjb, and dynamically generates a describe and

it block for it.

Hence all such job files are part of the test suite.

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 68: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

68

describe "ETL" do

Dir.glob("./**/*_spec.kjb").each do |path|

describe "#{path}” do

it "completes successfully" do

@result = run_job path.to_s, {}

expect(@result[:successful?]).to be true

end

end

end

end

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 69: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksRspec – test orchestration

69

Rspec runs in two phases

Phase 1: collects tests, recording the structure as given by

the describe blocks.

Phase 2: filters found tests as per command line parameters

and executes them

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 70: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksRspec – test orchestration

70

Run only tests containing the word ‘clear’ in their name or

enclosing describe blocks:

$ bin/robot test --example 'clear'

Run only tests tagged ‘long_running’:

$ bin/robot test --tag 'long_running'

Run only tests in spec/commands

$ bin/robot test spec/commands

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 71: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest orchestration

$ bin/robot test

71Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 72: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest orchestration

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin72

Page 73: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest orchestration

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin73

Page 74: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest orchestration

Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin74

Page 75: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Jenkins is a continuous integration server.

It’s basic role is to run the test suite and build any artifacts upon changes

in version control.

Example server:

http://ci.pentaho.com/

Page 76: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Thank you!

Page 77: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Backup Slides

Page 78: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

Twineworks

Testing in practice

Page 79: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTest what you run

• Verify behavior of the entity you run

directly

79Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 80: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksTools of the trade

• Helpers

– utility code/etl of components reused to make

tests about the what, not about the how

– fixture loaders

– assertion helpers

– data comparison helpers

80Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 81: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksData Fixtures

• Data Fixtures

– sets of test data, encoded in a convenient way, easily loaded into data sources and sinks

• JSON, CSV, SQL, XML, YAML

– Use whatever is easiest to maintain for the team

• Generate data fixtures through parameterized scripts if you need to generate datasets with consistent relationships

81Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com

Page 82: Testing PDI solutions - Openin · 2019. 3. 14. · Data Fixtures Twineworks • Data Fixtures –sets of test data, encoded in a convenient way, easily loaded into data sources and

TwineworksFile Fixtures

• File Fixtures

– sets of test files acted upon during a run

• Maintain file fixtures separate from source location expected by ETL

• If fixture files are changed as part of the test, copy them to a temporary location before running tests

• Create a unique source location per test run, if the file location is shared (like sftp)

82Twineworks GmbH | Helmholtzstr. 28 | 10587 Berlin | twineworks.com