solution blueprint - customer 360

54
Customer 360 Solution Blueprint Final January 30 th , 2015

Upload: vishal-shah-pmi-certified

Post on 16-Apr-2017

1.092 views

Category:

Documents


13 download

TRANSCRIPT

Page 1: Solution Blueprint - Customer 360

We operate as John Hancock in the United States, and Manulife in other parts of the world.

Customer 360 Solution Blueprint

Final

January 30th, 2015

Page 2: Solution Blueprint - Customer 360

1

Table of Contents

# Topic Page

1 Revision History 02

2 Document Distribution List 03

3 References 04

4 CCF Solution Architecture 05

5 Key Assumptions 06

6 Technical Risk and Mitigation 07

7 Technical/Non-Functional

Requirements

08

8 Data Flow 15

# Topic Page

9 Conceptual Architecture 24

10 Extract, Transform and Load (ETL) 28

11 Data Quality 59

12 Master Data Management (MDM) 72

13 Data Architecture 109

14 Data Governance 116

15 Summary & Next Steps 129

16 Appendix 131

Page 3: Solution Blueprint - Customer 360

2

Customer Centricity Foundation (CCF)As-Is Solution Architecture

Page 4: Solution Blueprint - Customer 360

3

Key Assumptions

No. Assumption

1The Data elements identified for customer 360 are the critical data elements that will be measured for data quality and data

governance.

2Data owners and data stewards have been identified for critical data elements for customer domain within the Canadian

Division.

3 Out of the box Data Quality Information Analyzer reporting tool will be used for measuring customer critical data.

4The Data Governance workflows will be provided as swim lane diagrams in Visio and will not be automated using AWD by the

Data Governance work stream.

5The IBM Standard Data Model for Party, Accounts and Product will be leverage with possibility of extension up to 10-15 custom

entities. The estimated entities for this phase are around 40-45.

6Account/Policy Data will be hosted in MDM. For this phase, Account/policy Data will not be mastered. The Account/Policy Data

is maintained with relationship to Customer.

7MDM will house product data to support customer to product and product to line of business relationships but is not considered

master data.

8 Virtual and physical MDM will be leveraged for distinct match, merge, and persistence capabilities in a hybrid model.

9The out of the box MDM artifacts will be the basis for any customization or configuration (matching algorithms, UI components

,and services).

10There will be multiple points of human interaction within the MDM tool suite based on a given task (linking, data quality,

reference data management, golden record management, or product data configuration).

11Customer 360 will leverage existing extract files from admin systems, except for MPW where new extract process will be

developed

Page 5: Solution Blueprint - Customer 360

4

Technical Risk and Mitigation

No. Risk DescriptionRisk

CategoryMitigation Strategy

1

Quality of data is not being high enough to effectively match and

merge client data before exposing it to the client via the

Customer Portal and CCT.

HighExecute Data Profiling, Remediation

and Monitoring

2

Access to unmasked data for the purposes of effectively profiling

data, data quality and identifying fields to match on, and for

respective probability and thresholds.

High

Identity individuals who need access

to unmasked data and restrict access

to others.

3

Aggregating entire Customer data from across the Canadian

division in one place increases risk of Confidential Customer data

being lost, stolen and exposed in case of a Security Breach.

High

Implementing high Security measures

and protocols to protect data e.g.

using SFTP and HTTPS.

4

Several Technology components will be implemented for the first

time at Manulife, raising a risk of integration challenges which

can lead to schedule and cost impacts

High

Conduct a Proof of Technology for

Customer 360 architecture

components

Page 6: Solution Blueprint - Customer 360

5

Technical/Non-Functional Requirements

No. Criteria Requirement Criteria Count

1 AvailabilityCriteria for determining the availability of the system for service when required

by the end users.6

2 MaintainabilityCriteria pertaining to the ease of maintenance of the system with respect to

needed replacement of technology and rectifying defects.1

3 Operability The criteria related to the day to day ease of Operation of the System. 2

4 Performance Criteria related to the speed and response of the system. 4

5 RecoverabilityCriteria to determine how soon the system would recover to the original state

after a failure.8

6 ScalabilityCriteria to determine the ability of the system to increase throughput under

increased load when additional resources are added.2

7 SecurityCriteria to determine Security measures to protect the system from internal and

external threats.4

Page 7: Solution Blueprint - Customer 360

6

Technical/Non-Functional Requirements

Req No.Work

Stream Requirement Owner Status

1 MDMQ. Determine what time the data will be available ?

A. The data should be cleansed, matched and synchronized with Salesforce by 7am.Jamie Complete

2 ETLQ. Determine when do the source files arrive by?

A. All source files expected to arrive by 5am.Steven Complete

3 Web Service

Q. Determine how often will ePresentment pull data from ePresentment stage?

A. Large runs monthly and annually. Otherwise, small runs occurring nightly for notices,

letters, confirms, etc.

Jamie Complete

4 ETL

Q. Determine does ePresentment stage expect full data everyday?

A. ePresentment would expect full data everyday. Deltas would suffice, but any document

delivery preference changes during the day should be reflected in the ePreferences staging

database.

Jamie Complete

5 ETL

Q. Determine when something fails within ETL, how soon should someone be notified of the

errors?

A. Recommendation : ETL error notification should be sent at least once a day.

Steven Open

6 IVRQ. Determine is it ok for the IVR vocal password to be down for the day?

A. 12 hours to align with overnight batch schedules.Jamie Open

Availability

Page 8: Solution Blueprint - Customer 360

7

Technical/Non-Functional Requirements

Req No.Work

Stream Requirement Owner Status

1 ETL

Q. Determine how are deltas being identified / captured within the current processing?

A.. For Dataphile, the source and target files are being compared to capture the deltas. For

the other source files the process is still to be determined.

Vishal/

StevenOpen

Maintainability

Operability

Req No.Work

Stream Requirement Owner Status

1 ETLQ. What is the existing File validation processes?

Jamie Open

2 ETL

Q. Determine what happens when the source files per system do not arrive on time?

A. Given that 360 should process files as they come in, there should not be holding of any

files.

Jamie Complete

Page 9: Solution Blueprint - Customer 360

8

Technical/Non-Functional Requirements

Performance

Req No.Work

Stream Requirement Owner Status

1 ETL

Q. Determine if all the files are processed as the file arrives or is there a queue in process?

A. The files should be processed as they arrive, preference for a real-time processing option.

However if there are cost or delivery date related issues, then files will be processed when

all files are available, or at an arbitrary time.

Jamie Open

2 ETLQ. Determine what volume are we expecting from each of the source systems (initial and

incremental)? Steven Open

3 Tech Arch

Q. Determine what is the daily expected volume from each of the sources/within

ePresentment stage?

A. Total of new customers + preference changes. Expect less than 5000 per day, ongoing.

(Rough estimate)

Jamie Complete

4 Tech Arch

Q. Determine what are the archiving requirements for the ePresentment stage?

A. Archiving requirements are not necessary for ePresentment stage. It is not source of

record. Data is primarily transient; being staged for performance reasons.

Jamie Complete

Page 10: Solution Blueprint - Customer 360

9

Technical/Non-Functional Requirements

Req No.Work

Stream Requirement Owner Status

1 Tech Arch

Q. Determine the fail over time?

A. Fail over to alternate site should be immediate; utilizing a cross-data center clustered

WAS architecture with our global load balancer.

Jamie Complete

2 Tech Arch

Q. Determine the data loss time?

A. For data where 360 is not source of record, 24 hour data loss is probably acceptable as

data can be re-run from the admin systems. For data where 360 is the source of record

(preferences, for example) then acceptable data loss is very small. However would still

probably be hours’ worth, given that Salesforce would capture the data and presumably we

can resend the messages.

Jamie Complete

3 Tech ArchQ. Determine what happens when the virtual MDM is lost? How soon can it be recovered?

A. Virtual repository would have to be back up within a 24 hour period.Jamie Complete

4 Tech ArchQ. How often will the system be backed up?

A. System should be backed up nightly. Tape backup would be the option.Jamie Complete

5 Tech ArchQ. Who will be responsible for the database back up?

A. DBAs would be responsible for the database backup.Jamie Complete

6 Tech ArchQ. What data must be saved in case of a disaster?

A. Recommendation:Jamie Open

7 Tech ArchQ. How quickly after a major disaster must the system be up and running?

A. System should be back up and running within 24 hours after disaster.Jamie Complete

8 Tech ArchQ. What is the acceptable system downtime per 24-hour period?

A. Acceptable system downtime would largely be driven by dependent systems. Jamie Open

Disaster Recovery and Business Continuity

Page 11: Solution Blueprint - Customer 360

10

Technical/Non-Functional Requirements

Req No.Work

Stream Requirement Owner Status

1 Tech Arch

Q. Determine the expected growth rate?

A. Expected Growth Rate for the first year is 150% as result of IIS business being added

and Standard Life Acquisition. Year over year, after that, it is expected that the growth rate

will be between 10% - 50%.

Jamie Complete

2 ETL

Q. Determine what files are batch / real-time?

A. All files are batch, including MPW. However, feeds should be processed as they come in.

There will be interdependencies e.g. MLIA gets data from both iFast Seg and PPlus

Jamie Complete

Scalability

Page 12: Solution Blueprint - Customer 360

11

Technical/Non-Functional Requirements

Req No.Work

Stream Requirement Owner Status

1 MDM

Q. How should web services be secured?

A. Services being exposed by DataPower or IIB should be secured using 2-way SSL

certificates. Services on the WAS or MDM servers should be exposed to DataPower/IIB

over HTTPS and utilize a user id/password. Additionally, IP restrictions should be in place

such that the services can only be called by IIB/DataPower.

Jamie Complete

2Data

Protection

Q. How should SIN be protected?

A. SIN number should not be visible to users of the MDM/DataQuality platforms (DGO

members). Only the last 3 digits should be stored in Salesforce.

Jamie Complete

3Data

Protection

Q. How should Customer data who are also employees be handled?

A. Employee customer data should only be visible to those in roles that have access to staff

plan. Additionally, there are probably requirements about how board member data should

be handled and High Net Worth data.

Jamie Open

4 Tech Arch

Q. Determine if FTP/SFTP is being used in the Current State and the Future State

Requirement?

A. Current State – FTP, Future State – SFTP.

Jamie Complete

Security

Page 13: Solution Blueprint - Customer 360

12

Data Flow

Page 14: Solution Blueprint - Customer 360

13

Systematic Data Flow

AdminSystems

FTP Server

DMO Staging

ETL

DQRepository

IBM MDM

Server

MDMStewardship

UI

IBM WebSphere DataPower

WebSphere cast Iron

CloudSynchronization

Source Files Dependency

and Scheduling

ePresentment

Contact Center Portal

ManualData

Check?

Systematic Customer updates

No

Yes

1

2

3

45

6

78

9

9

10

1111

OOTBXML

14

Dox Delivery Preferences

StagingArea

DQStewardship

UI

Existing Process/Tools/Technology Out of Scope of Customer 360 Waiting Decisions or Discussions

In Scope for Customer 360

IIB

16

IVR

Staging Area

SFTP

12

Adhoc Systems

Data Management Processes: Data Security, Metadata Management, Data Quality Management, Data Lineage

15

Reference Data

Updates

13

13

Page 15: Solution Blueprint - Customer 360

14

Systematic Data Flow Description (1 of 3)

Step

No.Step Summary Details of the Step

1Data Extraction from Admin

Systems

For Phase 1, existing data feeds from admin system for DMO will be leveraged for MDM There will be

additional Admin System feeds required for MDM namely “MPW” for wealth management.

2Data Files are moved to a FTP

ServerThe extracted data Flies will land on a common FTP server as depicted in the systematic data flow.

3Source File Dependency and

Scheduling

Multiple files will arrive from some admin system which have dependencies upon each other. The CA

scheduler will wait for all the files to arrive from an admin system before calling the Informatica ETL jobs

to load the staging tables.

4Standardization and

Transformation of the Data

Informatica PowerCenter performs a series of ETL transformations on the data within staging tables

while moving it to a Pre-DQ tables within the DQ repository. IBM Quality Stage will then perform

standardization and enrichment tasks while moving the enriched data to the finalized DQ repository

5Standardized Data is sent to

MDM in the format OOTB XML

In this step the data will be prepared and sent to the IBM MDM Advance for ingestion into matching and

linking processes. This format will be OOTB XML interface specified by the MDM platform. The Virtual

(Initiate) part of MDM received the data. The customer updates are then matched against existing

customer records in MDM.

6Data Stewardship UI or a

workflow to review the updates

in MDM

Few scenarios could occur such as “False Negative” or “ False Postive” where systematic match and

merger results are not fully reliable and scenarios such as updates that may have larger impact on cross-

business or cross –system level. These scenarios may need to be reviewed and approved before the

changes are applied to the system. It can be achieved by triggering a workflow for data stewards or for the

roles as defined in Data Governance Operating Model to look at the data and take necessary steps to

validate or update data changes.

Page 16: Solution Blueprint - Customer 360

15

Step

No.

Step Summary Details of the Step

7 Workflow is triggered

As mentioned in the step 6, a workflow may be triggered to review and approve such scenarios.

There is an opportunity to set up Data Governance processes to review and approve very critical

scenarios.

8Record is approved and

updated

In this step, the updates are reviewed/updated and approved and the updates are persisted in

MDM Repository .

9Updated Data propagated to

Salesforce

As customer updates happen in MDM, relevant information is sent to Salesforce through

websphere Cast Iron Cloud Synchronization.

10Salesforce Updates are now

available for Portal and CCTPortal and CCT will have updated customer records from Sales Force .

11Portal/CCT Updates to MDM

via Salesforce

For data integration from Salesforce.com, triggered events may call exposed service calls from

Salesforce to Manulife. These services would be exposed through the IBM Datapower appliance

and will be serviced by IIB (IBM integration Bus), integrating with either the staging area or Quality

Stage, in order to have the data written to the MDM platform on-demand.

12Salesforce, CCT and Portal

updates synchronization with

Admin Systems

Any Customer updates in Salesforce that are relevant to Admin Systems will be directly sent to

Admin Systems through Systematic interface. Those change as described in steps 1 through 8 will

flow in to MDM system on a daily basis as defined in batch. There are few exceptions to this rule

such as portal username, preferences, portal updated email id and phone number, Portal Status

and so on. These exception data elements will be updated in MDM from Portal through

Salesforce.

Systematic Data Flow Description (2 of 3)

Page 17: Solution Blueprint - Customer 360

16

Step

No.

Step Summary Details of the Step

13Life Cycle Management of

Master Data/Reference Data

In Manulife Landscape, Customer updates and Reference Data updates will take place in Admin

or in some cases, in consuming systems (systems that received updates from MDM). These

updates are to be consistently applied across landscape in order to seamless movement of data.

Analysis of the information that will be hosted in MDM for this phase revealed that the frequency of

the reference data updates is minimal. Hence, automatic reference data updates across system is

ruled out. There will be an overarching process established to monitor and control life cycle

management of master data and reference data. The data governance process will be designed

based on these scenarios.

14 Updating Preferences in MDM

MDM will be the repository for customer preference data. The stored customer preferences will be

pushed into a messaging queue using XML, and picked up by Informatica PowerCenter to load

into the “Document Delivery Preferences Staging Area” for other systems (ePresentment) to

consume.

The purpose of this staging area is to avoid intensive resource consumption of MDM server when

customer preferences need to be sent to DSTO.

15Propagating Preferences from

supplementary Database to

ePresentment

The ‘Document Delivery Preferences Staging Area’ will host printing preferences for ePresentment

and other systems to consume.

16IVR Id will be updated from

IVRAs Voice Print Id Is updated in IVR, it will be synchronized with MDM through IBM Integration BUS

Systematic Data Flow Description (3 of 3)

Page 18: Solution Blueprint - Customer 360

17

Conceptual Architecture

Page 19: Solution Blueprint - Customer 360

18

Customer 360To-Be Architecture

12

3

4 5

6

7

Page 20: Solution Blueprint - Customer 360

19

Customer 360 – Conceptual Architecture Description

Component Functional Area Description

1 ETL

Existing data feeds from the admin system will be leveraged for Customer 360. The extracted data

Flies will land on a common FTP server and further loaded into a staging database using

Informatica PowerCenter for consumption.

2 Data Quality

Once PowerCenter transfers the data from the file(s) in to a database Staging Area. IBM

Information Analyzer and Quality Stage will parse, cleanse, enrich, and standardize the data;

based on defined business rules.

3 MDM

IBM Infosphere MDM will then ingest the data. The records are matched, merged, and ultimately

stored in the physical MDM repository where they become available for consumption by

downstream systems.

4 Softlayer

Consuming applications, proxy, and compound services will use a combination of IBM tools and

custom services (Cast Iron, IIB, DataPower, and Java Services) to access MDM repository data.

5Dox Deliver &

Preferences Staging

The Dox delivery and preferences staging database will be loaded from MDM using PowerCenter

(ETL) to allow for quick and easy access by the customer preferences user interface.

6 Data Governance

The Data governance organization and tools will cross the customer 360 architecture and will

touch all phases of data management lifecycle.

7Downstream

Applications

Salesforce will be the primary consumer of MDM data supporting two customer applications: CCT

and Customer Portal.

Page 21: Solution Blueprint - Customer 360

20

Key Decision: Separate Customer 360 Stage

Accenture will Stage the incoming feed data 1 to 1 without applying any transformation / business rules or filtering rules. The only logic embedded within these load processes would be file validationchecks and corresponding Exception Handling / ABC logic.

Benefits of STG table(s):

Risk Mitigation

Scalability

Measuring Integrity

Development Complexity

Current operational systems will not be affected, however, existing business rules will be leveraged for Customer 360 design.

Provides an opportunity to leverage Stage to design future projects around Data Profiling and Data Quality Services.

Enables a true view of the source system data providing accurate Data Quality measurements.

Replacing transformation hardcoding with business managed reference tables and conforming data in DQ is an easier operation to be handled within DQ.

Page 22: Solution Blueprint - Customer 360

21

Extract, Transform and Load (ETL)

Page 23: Solution Blueprint - Customer 360

22

Accenture Data Integration Framework

Accenture Data Integration Framework encompasses combination of technical and business processes used to combine data from disparate sources into meaningful and valuable information. A complete data integration solution encompasses discovery, cleansing, monitoring, transforming and delivery of data from a variety of sources.

Page 24: Solution Blueprint - Customer 360

23

Requirements for File Validation, Change Data Identification, and transformation business rules will be defined for each Admin system feed as it is loaded into the Customer 360 staging database

Acquisition

Filtering

General Rules defined for DMO will

be analyzed and leveraged when

appropriate. Additional rules will be

defined specifically to meet the

Customer 360 requirements.

Change Data Identification

Changed Data Identification (CDI) is

a process to identify data records

which have been newly created,

updated, or deleted.

Some files may contain one or more

of the following fields within the

header or record for CDI.

• Time stamps

• Status (I / U / D flags)

• Create Date

• Combination of fields / date field

Other files will be compared against

historic or current data sets to

determine the delta for CDI.

Source Data Acquisition

Existing source data interface

mappings and process will be

leveraged for Customer 360.

A copy of the existing Admin data

files will be created for Customer 360.

Admin data files will be loaded into its

own Staging (STG) table with little to

no transformations to reflect the

integrity of the supplied data feed.

Various File validation rules will be

used to verify that each Admin feed is

received with no data leakage before

proceeding with the STG process.

Supplied Data files will be

archived on the FTP Server for

determined amount of time.

Sourc

e D

ata

Da

ta S

tag

ing

(S

TG

)

Acquisition

21

Page 25: Solution Blueprint - Customer 360

24

Source Data Acquisition

File

Validations

Process Description Exception Type

Header

AnomaliesInstance where business date within the header is incorrect. Warning

Trailer

Anomalies

Instances where either the trailer record is missing or the count specified in

the trailer does not match the total count of records received.Fatal / Critical Error

Missing header Instance where the header is missing. Fatal / Critical Error

File validation errors indicate issues in the source feed which may prevent processing of a source data file. These errors will be classified as either ‘Warnings’ or ‘Fatal / Critical Errors’. These will be handled by a defined Exception Handling and ABC (Audit Balance Control) process.

Below are some of the typical file validation processes that will be implemented:

1

Page 26: Solution Blueprint - Customer 360

25

CDI Options

Change Indicators

Receive deltas

DB Minus

Pre-DQ table Match

Match against MDM

Change Data Identification (CDI)

Customer 360 receives a mix of source data files populated with only delta or changed data will other data files contain the full contents of the customer population. Multiple strategies will be considered when determining the appropriate changed data identification strategy to be applied for received data file.

Description

Can the source send only deltas?

Could a hash compare be performed between the STG (from current feed) and Pre-DQ table?

Could a hash compare be performed between the STG (from current feed) and Pre-DQ table?

Does the source file provide Changed Data Indicators (I / U / D) ?

Can the source include Changed Data Indicators ?

Could ETL be used to match incoming feeds against MDM ?

2

Page 27: Solution Blueprint - Customer 360

26

CDI Scenario Database Subtraction (Minus)

This approach performs a quick database MINUS operation to identify updates / new records. INTERSECT operation should be used in conjunction (UNION between MINUS and INTERSECT) to capture similar records between both feeds and identify Deletes.

Perform Hash of each record

Perform Hash of each record

Compare Hashes to identify I / U / D

ETL Transformations Pre-DQ table

Pre-DQ table(s)

STG table(s)

Pro(s) Con(s)

Provides a simplistic approach, performance

efficiency, and ease of implementation.

None yet identified.

Does not require retaining history within Pre-DQ

table(s).

None yet identified.

2

Page 28: Solution Blueprint - Customer 360

27

CDI Scenario Hash Compare against Pre-DQ

This approach calculates a hash of each record within Pre-DQ table(s) and each record in the STG table(s) having most recent data. Both set of hashes are them compared to identify Inserts / Updates / Deletes.

Perform Hash of each record

Perform Hash of each record

Compare Hashes to identify I / U / D

ETL Transformations Pre-DQ table

Pre-DQ table(s)

STG table(s)

Pro(s) Con(s)

Prevent from processing full load each time. Requires retaining history in Pre-DQ table(s).

NA Calculating record hash and performing hash

comparison may impact performance depending on

record volume / size.

2

Page 29: Solution Blueprint - Customer 360

28

Pro(s) Con(s)

Prevent from processing full load each time. Calculating record hash and performing hash comparison

may impact performance depending on record volume /

size.

Prevents from having to store history in Pre-DQ table(s) in

addition to MDM table(s).

Increased complexity in ETL, due to completely different

data model.

Perform Hash of each record

Perform Hash of each record

Compare Hashes to identify I / U / D

ETL Transformations Pre-DQ table

STG table(s)

MDM table(s)

CDI Scenario Match against MDM

This approach performs a hash of each record within MDM table(s) and each record from the incoming feed. Both set of hashes are then compared to identify Inserts / Updates / Deletes.

2

Page 30: Solution Blueprint - Customer 360

29

CDI Options

Change Indicators

Receive deltas

DB Minus

Pre-DQ table Match

Match against MDM

Change Data Identification (CDI)Recommendation

Based on current understanding of what information the client receives from the source feeds, Accenture recommends some of the below CDI approaches. Depending on source feed, either of the below recommended CDI approaches would be implemented.

Description

Can the source send only deltas?

Could a hash compare be performed between the STG (from current feed) and Pre-DQ table?

Could a hash compare be performed between the STG (from current feed) and Pre-DQ table?

Does the source file provide Changed Data Indicators (I / U / D) ?

Can the source include Changed Data Indicators ?

Could ETL be used to match incoming feeds against MDM ?

Source data does not provide any Changed

Data Indicators. Hence, CDI is performed

within existing DMO processing.

The source data cannot make changes.

Source data cannot send deltas, especially

now that many difference processes use

these full feeds.

Possible option.

This option is preferred over other methodologies

due to simplistic approach, performance efficiency,

and ease of implementation.

Possible option

Performance impact based on expected volume

would need to be considered.

Avoid as much as possible.

Performance impact based on expected volume

would need to be considered.

Understanding / High Level Recommendation

2

Page 31: Solution Blueprint - Customer 360

30

Exception Handling

Existing exception handling /

warning rules for DMO

process will be leveraged.

Customer 360 exception

handling process, will

capture and log exceptions:

• As they occur during

Stage (STG) load.

• As data is pushed using

ETL to DQ.

Batch Services

Audit & Control

Source system traceability

will be enabled through:

• Audit (A): Audits will

provide traceability of data

and all the batch

processes.

• Balance (B): Enables

performing checks and

balances on the

processed records.

• Control (C): Provides

workflow management

throughout the batch.

ETL Parameters / Variables

Existing parameters /

variables will be analyzed to

evaluate what could be

leveraged.

Additional parameters /

variables might be defined to

simplify development efforts.

Scheduler

CA’s scheduler will enable

the orchestration of batch

program executions from

source system feeds through

MDM and SFDC.

ETL Services

Service Components

1 2 34

Page 32: Solution Blueprint - Customer 360

31

What is Audit Balance Control (ABC)?

ABC will provide business users:

1. Visibility into audit information including ‘Who’ updated a record, ‘What’ changes were made

and ‘When’ the record was updated.

2. Needed metrics for operational reporting of batch processes.

3. Ability to balance back to the source feed.

Audit

• Audit provides Manulife traceability of data and processes.

• Audit tables should be tagged with sufficient data to:

– validate success– assess performance– research issues

• Enables linking each data record to the process which created or modified it.

Balance

• Independent comparison of checks and sums

– checks for lost records– checks for computation and

other processing errors

Control

• Provides workflow management to ensure that:

– processes are run at the right times

– exceptions are caught– exception notification occurs– exception recovery occurs

1

Page 33: Solution Blueprint - Customer 360

32

Audit Balance Control (ABC) Methodology

Audit, Balance and Control (ABC) process will be implemented at 2 levels. At batch level, information about the batch will be captured, while at an individual job (segment) level detailed execution information associated with each child job will be captured.

Batch Group

• Process will create a unique BATCH_GRP_ID in Batch Group table for every batch initiated by scheduler or “kick-off” script with status “In Progress”.

• Ex: iFast workflow might include many processes within. Hence, iFast workflow will have BATCH_GRP_ID = 1 and BATCH_GRP_STATUS_CD = 0 (In Progress).

Batch Segment

• The process will create a unique BATCH_SEG_ID in Batch Segment table for every job within the Batch Group.

• Ex: Within iFast workflow, might have 2 file validation processes, and 1 staging process. Hence, within BATCH_GRP_ID =1 for 2 file validation processes, BATCH_SEG_ID = 1, 2 and for 1 Staging process BATCH_SEG_ID = 3 and SEG_STATUS_CD = 0 (In Progress).

0 = In Progress

1 = Completed 2 = Failed

1

Page 34: Solution Blueprint - Customer 360

33

Audit Fields

Each Audit / Exception Handling table will capture some common set of metadata, often identified as Audit fields.

Audit Attributes Description

Create Date Standard audit date time stamp record was created.

Created By Standard audit user id that created record.

Update Date Standard audit date time stamp record was updated.

Updated By Standard audit user id that created record.

Audit Attributes (populated with every ETL Batch execution).

1

Page 35: Solution Blueprint - Customer 360

34

ABC Batch Group Metadata

At batch group level, batch level metadata information related to the batch and not individual jobs (segments) will be captured.

Business requirements / design considerations for Customer 360, might result in modification of recommended attributes for ABC.

Attributes Description Sample Values

Batch Group ID Primary key. 1000

BatchDescriptive Name

Descriptive name of the job package. Runs iFAST MMF process

Batch Name Job name of the job package. Wf_iFastMMF_Account_STG

Start Date The date/time stamp the batch process started. 01/12/2015 21:59:55

End Date The date/time stamp the batch process ended. 1/12/2015 22:59:00

Batch Status Code

The batch group process status code. C

MLAC Source System ID

MLAC’s source system ID. 23

Batch Group attributes

1

Page 36: Solution Blueprint - Customer 360

35

ABC Batch Segment Metadata

Column Name Description Sample Values

Segment ID Primary key for child level audit records. 1100

Batch Group ID Foreign key of Job Package (parent) table. 1000

Process Type Name Name of the type of process (STG, DQ, etc.). STAGING

File_Date Business Date of the file. 01/11/2015 09:00:00

Process Type Start Date/Time Start date/time of the process type. 01/12/2015 22:00:00

Process Type End Date/Time End date/time of the process type. 01/12/2015 22:45:00

Segment Name Exact name of the job (session / script name, etc.). s_iFastMMF_Account_STG

Source table / file Name Name of the source table. iFast_MMF_Account

Target table / file name Name of the target table. iFastMMF_Account_STG

# Re-processed Errors 2

# Read records 1000

# Inserted records 980

# Critical Error records 20

# Warning records 4

Net Balance Records (Read Ct + Error_Reprocessing_Ct) – (Insert Ct + Error Ct ). (Read Ct) – (Insert Ct + Error Ct )

Balance Indicator If Net Balance Records = 0 Then ‘Y’ Else ‘N’. ‘Y’

Segment Status Code The segment process status code. Success

AUDIT fields 4 ETL AUDIT fields (already defined). 4 ETL AUDIT fields (already defined)

Existing Fields New FieldsLegend:

1

Page 37: Solution Blueprint - Customer 360

36

What is Exception Handling?

Exceptions refer to any error which occur when reading or processing a data file or mapping. Examples of potential errors include, but are limited to the following: Incorrect data format, duplicate values, unknown codes not defined in the business dictionaries, file issues, etc.

Exceptions may lead to data rejections and even an abort of an ETL process. Exceptions will be capturing for reporting in order to conduct root cause, remediation options and impact analysis.

While informational / warnings are technically not errors, they will be captured for trend analysis. The following are examples of error types, and reporting.

Warnings: Processing of record / file may continue with a warning message but require attention. Look into the root cause to determine whether to take action.

Informational: Not an error or a warning (acceptable data) or processing conditions that do not impact output. This is an informational event; no action needs to be taken.

2000

1000

1009 – SIN is non-numeric or < 9

Examples:

1010 – Active record fell off feedExamples:

Critical: Exceptions that cause an ETL interface to abort or violate business rules. Take action to resolve the issue.

3000

2010 – Input value is not unique

4014 – Record already existsFile Validation Errors

Examples:

2

Page 38: Solution Blueprint - Customer 360

37

Exception Handling Metadata

Attributes Description Sample Values

EXCEPTION ID Unique ID for exception record. 1

BATCH ID Unique ID for the current Batch. 100

Source table / file name The name of the source table / file. iFast-MMF

MLAC System ID MLAC Source System ID. 23

Exception Code Reason Code for the exception. 2010

Exception Comments Generic description of the exception.Value ‘23’ already present in target

table ‘iFastMMF_Account_STG’

Exception Type ID Type of Exception (E = Critical Error, W = Warning). C

Record Field Field / file column that caused the error / warning. Account_ID

Field data Incoming field / file column data associated with the critical error / warning. 22

Session Name Informatica session name. s_iFastMMF_STG

Data Stream Exception record from the source file. 1 | 100 | 23| ….n

Exception Record Status Run status of the exception record (New, Processed, Re-try failed). New

Re-process Indicator Flag to determine if error record should be re-processed or not. Y

Re-processed Date Date the error record was re-processed. 01/12/2015 12:00:00

Re-process Count # of times the exception record has been re-processed. 1

ETL AUDIT fields 4 ETL Audit fields (already defined). --

ETL Exception table(s) will be created to capture error types and codes. These tables will be built using Accenture common standards and extended to capture further business requirement metadata.

Existing Fields

New Fields

2

Page 39: Solution Blueprint - Customer 360

38

Conceptual Data Model

Batch Group

Batch Group ID

Batch Grp Job Name… n

Batch Segment

Batch Segment ID

Batch Seg Job Name

… n

Batch Grp ID

Exception Log

Exception IDBatch IDException Src Field….n

Exception Code

Exception CodesCode Description

ABC

Exception Handling

One

Many

Exception Type

Exception Class IDClass Description

Page 40: Solution Blueprint - Customer 360

39

Exception Handling / ABC - Data Flow

Informational

Warning

Fatal / Critical

Auto Retry

Manual Err Correction

Input File(14 sources)

CA Scheduler (Re-start ability / Dependency)

1

Data Governance

2

1 to 1 Mapping

Transformations

Non-required missing field

Invalid SIN number

Missing Key

Re-P

rocess E

rro

rs

4

3

Re-P

rocess E

rrors

Extract Transform Load (ETL)

Staging Table(s)

Pre-DQ Table(s)

Audit & Error Table(s)

2

Data Quality (DQ)

DQ Table(s)

Data Cleansing

MissingPhoneNumber

MissingLookupValue

Match FieldError

N

Y

N

Y

File Validation Process

Duplicate filename

Header Anomalies

Trailer Anomalies

Missing Header

Page 41: Solution Blueprint - Customer 360

40

Exception Handling / ABC - Data Flow Steps

Step

No.

Step Summary Details of the Step

1 Load Source Data. The file validation processes will be applied to each of the 13 Admin system files.

2Determine if any

exceptions were noted

within source feeds.

2a. If no exceptions were encountered, load the records into STG tables

2b. If exceptions were encountered, fail the file and abort the process. Record the

issue within the Error table(s).

2c. Log the process details into the Audit table(s).

3While pushing STG data to

DQ through ETL, verify if

an exception occurred.

3a. If no fatal errors were encountered, process the records, along with any

‘Informational’ or ‘Warnings’ error types through the ETL for DQ process.

3b. If fatal errors were encountered, fail the record and log the details within the

Error table(s).

3c. Log the process details into the Audit table(s).

4To perform DQ process,

check is performed to see

if any exceptions occurred.

4a. If no fatal errors were encountered, load the records along with ‘Informational’ or

‘Warnings’ types into the final DQ tables.

4b. If fatal errors were encountered, fail the record and log the details within the Error

table(s).

4c. Log the process details into the Audit table(s).

Page 42: Solution Blueprint - Customer 360

41

ETL Parameters & Variables

ParametersParameters represent a constant value and help retain the same value throughout the process run. Parameters could be used to change values such as database connections or file names from job to job. Parameters provide:

1. Portability across environments (Dev, SIT, UAT, Prod).2. Simplification and automation of code promotion.3. Removal of manual step of updating the database connection(s) in a session.

VariablesVariables represent a value that can be change during run-time. In addition to simplifying development, some of the value they add include:

1. Defining of general properties for Integration Services such as email address, log file. counts, etc.

2. Evaluating task conditions to record information for downstream job execution.

Specific parameters / variables will be defined during Gate IV / Design phase.

3

Page 43: Solution Blueprint - Customer 360

42

A one-to-one design will be followed such that there is one CA Scheduler job for each unique batch session.

All ETL batch jobs will be developed such that they are fully re-star table. Unique approach to restarting and recovering a job will be defined depending on volume of data and other requirements defined during Gate IV.

Below would be considered while defining the restart and recovery methodology:

1. Job dependencies.

2. Data Duplications.

3. Guaranteed delivery of data.

Scheduler4

Page 44: Solution Blueprint - Customer 360

43

Source File Metadata (1 of 4)

To define File Processing Strategy and CDC Strategy for Customer 360 , a detailed understanding of each file feed is needed. Additionally, more information will likely be needed before Gate IV (before kicking-off Design).

Growth rate is expected to be 150% for all sources during the 1st year and 10%-50% (average of 25%) subsequent years.

Sources /

Src_Sys_ID

Files /

Physical Filename

File – Data Streams /

Record Types

Vendor File Type Avg. File

Volume

(# of Days)

Expected Growth

Rate

(150% + 25%)

Freq. Feed

Type

File Arrival

Time

iFAST-Seg Fund

(18)

Account Relationship /

MMIF_ACRL.DT

(ask developer)

NA

IFDS

(Timeshare)

Flat File

(Fixed POS)

4,725

(7 days)14,766 Daily

(M – F)

Delta

Earliest:

11:41PM

Latest:

04:55 AM

Account /

MMIF_ACCT.DT

NA Flat File

(Fixed POS)

5,879

(8 days)18,372 Daily

(M – F)

Delta

Holdings

(may be out of scope)

MMIF_ACCS.DT /

VENDOR IS: IFDS

NA Flat File

(Fixed POS)

4,365

(7 days)13,641 Daily

(M – F)

Delta

iFAST-MMF

(36)

Account Relationship /

MMMF_ACRL.DT

(ask developer)

NA

IFDS

(Timeshare)

Flat File

(Fixed POS)

324

(6 days)1,013 Daily

(M – F)

Delta

Earliest:

10:59 PM

Latest:

05:25 AM

Account /

MMMF_ACCT.DT

NA Flat File

(Fixed POS)

7,365

(7 days)23,016 Daily

(M – F)

Delta

Holdings /

(may be out of scope)

MMMF_ACCS.DT

NA Flat File

(Fixed POS)

3,519

(6 days)10,997 Daily

(M – F)

Delta

Page 45: Solution Blueprint - Customer 360

44

Source File Metadata (2 of 4)

Sources /

Src_Sys_ID

Files /

Physical Filename

File – Data Streams /

Record Types

Vendor File Type Avg. File

Volume

(# of Days)

Expected

Growth Rate

(150% + 25%)

Freq. Feed

Type

File Arrival

Time

Dataphile-IIROC

(28)

DP_7585_DMO.DT CIFDTL, PLNDTL,

BNDHLD, EQTHLD,

FNDHLD, TRMHLD

(may be out of scope)

Broadridge

Flat File

(Fixed POS)

1,444,637

(5 days)4,514,491 Daily

(M-F)

Full

Earliest:

03:01 AM

Latest:

07:41 AM

Dataphile-MFDA

(30)

DP_7584_DMO.DT CIFDTL, PLNDTL,

BNDHLD, EQTHLD,

FNDHLD, TRMHLD

(may be out of scope)

Flat File

(Fixed POS)

1,296,394

(4 days)4,051,231 Daily

(M-F)

Full

Dataphile-Seg

(35)

DP_3495_DMO.DT CIFDTL, PLNDTL, FNDHLD

(may be out of scope)

Flat File

(Fixed POS)

35,149

(2 days)109,841 Daily

(M-F)

Full

PS

(16)

Client /

pwb.pwbdpccl60.client

Customer, Customer

Address

Flat File

(Manulife

Mainframe)

Flat File

(Fixed POS)

515,409

(231 days)1,610,653

Daily

(M-F)

Delta

Earliest:

00:44 AM

Latest:

05:08 AM

Coverage

(may be out of scope) /

Pwb.pwbdpcco.dicco60.cover

age

Coverage Customer Flat File

(Fixed POS)

Daily

(M-F)

Delta

Policy /

pwb.pwbdpcpo.dipo60.policy

Policy, Policy Customer Flat File

(Fixed POS)

Daily

(M-F)

Delta

CAPSIL 4.2

(8)

Flat File

(Manulife

Mainframe)

Flat File

(Fixed POS)

253,568

(251 days)792,400 Daily Delta Earliest:

10:16 AM

Latest:

12:42 AM

VAPS

(37)

VAPS_DATA.txt Account, Client Manulife

Mainframe

Flat File

(Manulife

Mainframe)

392,263

(2 days)

1,225,822 Daily Full Earliest:

09:18 PM

Latest:

02:47 AM

Page 46: Solution Blueprint - Customer 360

45

Source File Metadata (3 of 4)

Sources /

Src_Sys_ID

Files /

Physical Filename

File – Data Streams /

Record Types

Vendor File Type Avg. File

Volume

(# of Days)

Expected

Growth Rate

(150% + 25%)

Freq. Feed

Type

File Arrival

Time

CATS

(15)

Client Customer, Customer

Address

Manulife

Mainframe

Flat File

(Fixed POS)

1,989,064

(252 days)6,215,825

Daily

(M-F)

Delta

Earliest

00:02 AM

Latest:

11:23 PM

Coverage

(may be out of scope)

Coverage Customer Flat File

(Fixed POS)Daily

(M-F)

Delta

Policy Policy, Policy Customer Flat File

(Fixed POS)Daily

(M-F)

Delta

CLAS

(14)

Client Customer, Customer

Address

Manulife

Mainframe

Flat File

(Fixed POS)

1,068,435

(252 days)3,338,860

Daily

(M-F)

Delta

Earliest

00:44 AM

Latest:

06:05 AM

Coverage

(may be out of scope)

Coverage Customer Flat File

(Fixed POS)Daily

(M-F)

Delta

Policy Policy, Policy Customer Flat File

(Fixed POS)Daily

(M-F)

Delta

PPlus-GIC

(20)

PPLUSDATA.txt Plan, Customer,

Investments

(may be out of scope)

SIT Flat File

(Fixed POS)

Daily

(M-Su)

Full Earliest:

02:51 AM

Latest:

10:09 PM

PPlus-Bank DMO.DT Plan, Customer, Loan,

Retail, Term

(may be out of scope)

SIT Flat File

(Fixed POS)

Daily

(M-F)

Full

Page 47: Solution Blueprint - Customer 360

46

Source File Metadata (4 of 4)

Sources /

Src_Sys_ID

Files /

Physical Filename

File – Data Streams /

Record Types

Vendor File Type Avg. File Volume

(# of Days)

Expected

Growth Rate

(150% + 25%)

Freq. Feed

Type

File Arrival

Time

MPW Customers,

Plan

NA Wealth

Manager

Oracle DB 250 customers

1000 plans

20% / yr

HR Hr.txt SQL Server DB Sometimes 0 and

sometimes few

records

150% in 1st

year due to

expected

acquisition

Daily

(M-F)

Delta Estimated:

12:00 AM

Page 48: Solution Blueprint - Customer 360

47

File Processing Strategy

Business requirement calls for attempting to process file as they arrive in an attempt to keep the process to as real-time as possible. To define file processing strategy for Customer 360, based on existing source file feed information, an understanding of below requirements were considered.

ID Requirements Existing Process

1

Are there dependencies between files amongst source system feeds ?

For ex: Is there a dependency between files from iFast feeds and

PPlus feeds ?

There certainly exists dependencies between files amongst source system

feeds (ex: dependencies exist between iFast files and PPlus files). Other

dependencies are yet to be identified.

It is yet to be analyzed if the files needed for Customer 360 amongst

source system feeds have dependencies.

2

Are there dependencies amongst files within the same source system

feed ?

There might be dependencies amongst files from the same source feed.

Whether or not those dependencies would exist amongst files within

Customer 360’s scope, is yet to be identified.

3

If a file within a source system does not arrive on time, how does DMO

currently handle this ?

Currently, if a file does not arrive, the file watch fails that batch and all

other dependent batches. For example, if iFast-MMF Account file does not

arrive, Account Relationship and other files for iFast-MMF will not be

processed.

Operations team works with required teams to resolve issue ASAP. In the

meantime, batch status is manually updated to ‘Complete’ status.

4If a file is not received, does the source send 2 files the next day ? Sources have been requested to only send 1 file. Hence, if iFast-MMF

was not processed, all other files next day will include records from the

day prior.

5 Does the next day’s file include records from previous day ? Yes.

6 Does the source system send all their feeds all together ? Yes.

Page 49: Solution Blueprint - Customer 360

48

File Processing Strategy – Data Flow

Based on current understanding of various scenarios (described in the previous slide) and existing file delivery agreements with source systems, our recommendation is to continue existing file processing strategy. As we look further into file dependencies and analyze file arrival times during Gate IV, this recommendation might go through revisions.

Source System 1

Source System 2

Source System n

FTP Server

Copy files for Customer 360

Leverage Existing Process

1

Did all files per source system

arrive?

Audit & Error Table(s)

2a

ETL Process

Critical File Validation

Errors?

Y

N

Fail Entire Batch

N

Y

2c

3a

3b

3c

Watch for all files from each source system to arrive

2b

New Process for Customer 360

for Customer 360

File 2

File 1

Source System 1

Source System 2

File 1

for DMO

File 2

File 1

Source System 1

Source System 2

File 1

For Customer 360

Page 50: Solution Blueprint - Customer 360

49

File Processing Strategy – Data Flow Description

Step

No.

Step Summary Details of the Step

1 Did the file arrive?

File Watch process evaluates if all required files for each source has been received.

Since, each source systems do no have dependencies upon each other, upon

presence of all files for the source system, ETL process for that source system will

start.

2Did the file arrive?

Is the file valid?

2a. If any file within a source system is not received, entire batch for that source

system should fail.

2b. If all required files for the source system has been received, the batch should kick

off.

2c. Log Audit & Error table(s).

3Are there any Critical File

Validation Errors?

3a. If any critical error was encountered within the received source files, fail the batch

and record Audit & Error table(s).

3b. If no critical errors were encountered in the received source files, proceed to ETL

Process.

3c. Log Audit & Error table(s).

Page 51: Solution Blueprint - Customer 360

50

ETL Development Cycle – Single Track

Create workspace and

project

Layout job by arranging steps,

data sources, and targets

Validate Job Generate job package

Finalize project in repository

Deploy job artifacts

Validate JobExport Job

Deploy to next environment

1) Developers

• Developers creates connections to repositories and folders needed for development.

• The developer lays out the mapping (data flow) by arranging data sources, transformations, and targets. The developer defines the links and configures the mapping.

• The developer tests the job until it is ready for promotion.

• The job code is then exported in an XML format ready to be imported to another environment’s repository.

2) Manulife Tech

• Client team verifies and approves the job.

• The job is then moved into the test environment.

• These changes can be deployed by importing the XML into the repository / environment.

3) Functional Team

• The functional test team, aided by job owners will validate the results or recommend changes.

• Changes may include modifications to joins / transformation rules or new transformations and error conditions.

• If there are unexpected results, the process restarts in the developer’s workspace.

4) Manulife Tech

• Once the functional test team validates the results, the client teams promote the necessary artifacts forward.

• The promotion process will iterate between promote and test cycles until the artifacts reach production.

1

1)

234

Page 52: Solution Blueprint - Customer 360

51

Development Standards & Guidelines

These standards will encompass:

1. Naming conventions for ETL jobs.

2. Naming conventions for objects used within each ETL.

3. Project and project object directory structures.

4. Preferred and recommended development practices.

5. Guidelines around preferred use of re-useable components to streamline development efforts.

Accenture will be adopting Manulife guidelines and standards and may be enriched using Accenture Methodologies and leading practices.

Development standards will be defined to capture ETL development guidelines for Customer 360 that will be applicable to design and development efforts of the project.

Page 53: Solution Blueprint - Customer 360

52

Summary & Next Steps

Page 54: Solution Blueprint - Customer 360

53

Summary and Next Steps

Completed the Data Flow and Solution Architecture for Customer 360.

Gathered and documented high level business requirements and non functional requirements.

Developed the conceptual data model for customer 360.

Drafted initial MDM component architecture and customer matching rules.

Produced directional data quality profiling results on masked DMO data.

Defined the structure for the data management and governance organization structures.

Create the Detailed Design Inventory and data mapping for the design phase.

Run Data Quality profiling reports on unmasked source system data.

Develop the component dependency and design order.

Support the design and build of Technical Architecture.

Start Identifying the gaps in governance processes and defining the formal roles and processes for data governance.

Summary of Plan & Analyze Outcome

Next Steps