fme world tour 2015 - fme & data migration simon mccabe

25
Processes undertaken in large data migration projects Simon McCabe Chief Technology Officer, IMGS

Upload: imgs

Post on 11-Aug-2015

205 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Processes undertaken in large data migration projectsSimon McCabeChief Technology Officer, IMGS

Page 2: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Discussion Context

High level case study of processes undertaken in large electricity and water data migrations run by IMGS over the last 5 years

High level data volumes:

Project 1:

60+ primary features

100+ secondary features

Approx. 28 million records

Project 2:

40+ primary features

Approx. 1.2 million records

Page 3: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Discussion Context

Topics covered:

Understanding the legacy system and its data

Designing a migration process

Tools to use (FME…)

Reporting and Reconciliation processes

Data Cleansing

Cutover\Deployment considerations

How FME played its part

Presentation will cover the full migration lifecycle

Analysis Design TestBuild Deploy

Page 4: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 1: Understanding legacy system

Analysis Phase

Page 5: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 1: Understanding legacy system

Forensic review of current data sets (what needs migration)

Review\define all source features (objects)

What is their makeup?

database tables, CSV, customised file formats, etc..

what are the field characteristics (definition, sparsely populated etc.)

how do we link graphics to attributes?

during migration will a features attribute or graphical data distinguish the feature (or a combination of both)?

What are they related too in the system?

other features/objects

how are they related to each other

should we create relationships during migration

Collate statistics

Before you leave the analysis phase audit the data in terms of volume.

Page 6: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 1: Understanding legacy system

Methodology: Workshops and Documentation!

Data Audit

What data is used by the legacy system and the users in the

customers department

External Interfaces?

What legacy system data is utilised by others (outside the

customers department)

Page 7: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Analysis Design TestBuild Deploy

Step 1: Understanding legacy system

• FME assists in the data audit (volumetric\statistics)

• FME helps you discuss the data in terms of what is in the system vs what the user thinks is in there

• Use FME to migrated (a simple migration) graphical data to staging tables

• Migrate style information, level information, size information (widths and heights) etc.. – building up a tabular picture of data

• Utilize Microsoft Excel\Word to provide a human readable view on the above to allow dialogue\discussion

Page 8: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 2: Design the migration process

Design Phase

Page 9: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 2: Design the migration process

What data is missing or unusable? E.g. do all records have a geometry?

What data is stored in a non-standard format? E.g. are dates in a column all recorded in the same format?

What data values give conflicting information?

What data is missing important relationship linkages?

What data is incorrect or out of date?

What data records are duplicated?

Common Design considerations [Data Quality]

Page 10: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 2: Design the migration process

If geometries are missing – should they be fixed in the source system or via migration (can depend on time and source system processes)

If geometries exist with no attributes, should they be migrated or abandoned?

How do we report on inconsistencies in data i.e. field dates might have a combination of DD-MM-YY or YY-MM-DD etc..

Common Design considerations [Data Cleansing]

If fixed during migration will this affect the source system

External Interfaces

(systems get out of sync…)

Page 11: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 2: Design the migration process

How you address\fix the points in the last 2 slides can depend on the overall team size.

Fix in the source system

Fix during migration

Don’t fix and migrate as is (if possible)

Or abandon the migration of the problematic feature instance

A key requirement from the migration however will be to report on all issues and allow dialogue on their resolution.

Common Design considerations [Team Size]

Page 12: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 2: Design the migration process

Walk before you run

Design\Sketch migration workflows before starting migration process (Microsoft Visio etc..)

Ensure all logical scenarios are covered

Document what is an abandon event vs what is an issue event

Abandon events = feature instance is not migrated

Issue event = an issue was encountered and needs to be reviewed (missing geometry, invalid date format etc..)

Design logging processes for developers to follow (legacy system references, event type and error description)

Common Design considerations [Workflows]

Page 13: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 2: Design the migration process

Start up and Shut down scripts (TCL or Python)

Log start time, end time, features read/written and run status to migration_log table

Assists in flagging long running scripts (an thus targeted performance updated)

Write errors/issues to ERROR_LOG table

Excel reports then designed to read error log table and provide periodic reports

FME transformers to validate geometries, dates etc.

OGC Validator

Geometry Validator

Duplicate Remover (if required)

Periodic reviews of reports feed back into FME processes and validation\logging requirements

Common Design considerations [Logging]

Logging

Tra

nsf

orm

ers

OGC Validator

start-up script

Duplicate Remover

GeometryValidator

Python

TCL

ERROR_LOG table

MIGRATION_LOG table

Page 14: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 2: Design the migration process

It is impossible to manually test all feature instances, all attributes, all relationships etc.. after they migrate to the destination system

If possible (preferable and advised):

Have a separate team (to the migration team) develop reconciliation reports

Feature Reconciliation List source features and their counts along side the migrated

destination features and their counts

Report and explain any discrepancies

Attribute Reconciliation Per source feature, list the source attribute along side their

destination attributes – all counts must be the same

Automate, Automate, Automate….

Reconciliation reports should easily identify/flag all issues and provide input into further dialogue and issue management.

Common Design considerations [Reconciliation]

Page 15: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 2: Design the migration process

Reconciliation Reports

Flag long running scripts

Report on issues Feature Reconciliation

Analyse source and destination feature counts side by side

& Report

Attribute Reconciliation

Analyse source and destination attribute counts side by side

Reconciliation Reports form an integral part of quality control, assurance and testing

Page 16: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Analysis Design TestBuild Deploy

Step 2: Design the migration process

• Decide on FME Desktop Vs FME Server

• Prototype designs where applicable

• Design process to be run with minimal intervention i.e. one play button to go from start to finish

• Ensure logging and reporting is designed into the solution and embedded within build FME scripts

Page 17: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 3 & 4: Build and Test

Build & Test Phase

Page 18: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 3 & 4: Build and Test

• FME is very powerful, and provides many ways to migrate data – however developers need to regularly review\critique their workflows

• Items to keep an eye on:

• Excessive use of SQLExecutor (can add hours to a migration script). Examples in scripts from previous projects can reduce scripts by 50%.

• Review all database calls i.e. any necessary SQLExecutor calls or queries used to read data – ensure indexes are applied to datasets (FME can only read\link the data if it is setup correctly in the database)

• If you have multiple destination (writer) connections – the first will write almost immediately, the others will be cached (and slow the migration down)

• Try to have only one writer per workspace

• Examples in scripts from previous projects can reduce scripts by 10 hours

• Use the power of the database

• Pre process data where possible

• Cache sequences (hours can be saved here)

Analysis Design TestBuild Deploy

Page 19: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 3 & 4: Build and Test

• Prototype where possible

• Design workflows provide the logical interpretation of the migration requirement BUT should not be used as the blueprint for the FME workspace design

• Use the principles of the workflow

• Might be faster to write database functions/procedures to pre-process some of the data

• Utilizing the power of the database

• Removes overly complex custom transformers from FME

• Simplifying the workspace

• Reducing overall time to process the data

• Example of the above can be seen with inexperienced developers or newbies to FME

• Once script in the past read a large dataset (1.5 million records) it had 2/3 SQLExecutors in the workspace which made approximately 4.6 million database calls during the migration (of one feature!)

Analysis Design TestBuild Deploy

Page 20: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 3 & 4: Build and Test

• Use the reconciliation process to quality stamp/approve and test your builds.

• At the start of build expect data quality issues (bugs and also source data cleansing activities)

• At the end of build and test data quality should disappear (bug fixing, source system remedies)

• Report analysis is critical moving from build to test to cutover.

• Issues should reduce

• All team members should be aware of open issues

Analysis Design TestBuild Deploy

Page 21: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 3 & 4: Build and Test

• The reconciliation report timings will feed times into the cutover\deployment window planning

• Should be reviewed regularly

• Additional reading on the subject of performance improvement could be:

• FME Performance and Profiling

• Turbocharging FME: How to Improve the Performance of Your FME Workspaces

• Performance Tuning FME

Analysis Design TestBuild Deploy

Page 22: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 3 & 4: Build and Test

Useful transformers used in the migration process

• Workspacerunner

• SchemaMapper

• AttributeCreator

• Tester

• TestFilter

• Aggregator

• SQLCreator

• SQLExecutor

• Counter

• GeometryRemover

• NullAttributeMapper

• Sampler

• TimeStamper

• StatisticsCalculator

• PointOnAreaOverlayer

• VertexCreator

• FeatureMerger

Analysis Design TestBuild Deploy

Page 23: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 5: Deploy

Deployment Phase

Page 24: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Step 5: Deploy

• At this stage everything is fully tested and OK to deploy.

• Cut over window• Timings of scripts can vary from the development server to

the test server to the production server (depending on spec’s of those machines)

• Run dress rehearsals into the final destination environment before cutover to get a view on expected timings.

• Once the final migration has run• Review the Feature reconciliation reports from this migration

to the last – all should be the same (or better)

• Review the Attribute reconciliation reports from this migration to the last – all should be the same (or better)

Analysis Design TestBuild Deploy

Page 25: FME World Tour 2015 -  FME & Data Migration Simon McCabe

Thank You!

Questions?

For more information:

[email protected]

IMGS