complex legacy system archiving/data retention with mongodb and xquery

Post on 20-Aug-2015

1.459 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Legacy System Archiving With XML, XQueryand MongoDB

Dave Watson

SVP, iWay Software

@watsondaveny

watson.dave@gmail.com

Agenda

XML Archive Overview and Business Use Cases

XML Archive Technical Discussion

Copyright 2009, Information Builders. Slide 2

iWay Archive

Copyright 2009, Information Builders. Slide 3

What is XML Archive

An extension of ESB for archiving data

Leverage ESB process-oriented integration and data

federation capabilities

Long term data retention

Large repository, large index (Big Data)

Search and retrieve capabilities (High performance)

Business use examples

Satisfy regulatory requirement

e-Discovery (e.g. research, forensic)

Business analytics

Archive – Solving Business Needs

Copyright 2009, Information Builders. Slide 4

Regulations / Reqrs Example Data Retention

Federal Record

Retention Requirement

Patient health records 75 years (after last

episode of care)

FDA 21 CFR Part 11 Clinical trials and FDA

approval

35 years

HIPAA (Healthcare) Pediatric medical records 21 years

Sarbanes-Oxley (public

companies)

Audit 7 years

SEC 17a-4 (Financial

services)

Account records

Corporate documentation

6 years

Life of the enterprise

Research Life science Long-term

Analytics Financial / Legal Long-term

Examples of Business Requirements:

Archive – Types of Data

Can handles all types of data, for example: Electronic Documents

Word, Excel, EDI, HL7, XML, …

Applications

ERPs, CRMs, SAP, SFDC, …

Database Data

IMS, DB2, Oracle, Sybase, SQL Server, MUMPS, …

Electronic Files

VSAM, Unix, Logs, …

Email

Outlook, Lotus Notes

Others

Multimedia files, Paper, Blueprints, Forms, Claims, …

ESB adapter components can be used to connect to the different types of

data.

Archive – Archiving Needs

Copyright 2009, Information Builders. Slide 6

Archive Requirements

Policy Based – Logical selection of DB records/transactions to be archived

Store very large amounts of data in archive

Keep data for a very long periods of time

Become independent from Applications/DBMS/Systems – future proof

Protect authenticity of data – regulation and compliance

Access archived data when needed / as needed

Quickly search huge numbers of archived documents

Discard data after retention period – regulation and compliance

Examples of Archiving Requirements:

Copyright 2010, Information Builders. Slide 7

Store 75 years worth of patient data

Diverse Sources

XML

MUMPS

Oracle

HL7

Support archive, query and integration scenarios

XML to remain unchanged and exist outside the data store

Ability to query documents

Ability to retrieve original XML or part of XML using XQuery

Ability to integrate XML archived data in federated services

with operational sources (e.g. MUMPS, HL7, Oracle)

Archive – Example Business Use Case

Copyright 2007, Information Builders. Slide 8

Highly scalable high performance document

management database

Easily integrates into a ESB architecture

Multi-threaded parallel processing

Distributed processing

Just another data source along with, e.g., Oracle and

MUMPS databases

Leverage ESB Tools for process orchestration,

process monitoring, data mapping/transformation,

security and data aggregation capabilities.

Implementation and vendor neutral – archived data (e.g.

XML) stored in the operating system‟s native file system

Archive – Example Business Requirements

Copyright 2009, Information Builders. Slide 9

XML Archive Technical Discussion

Overview

Copyright 2009, Information Builders. Slide 10

Load Channel

Reads XML documents and loads them into the

document repository.

Query Channel

Handles query request and response against the

document repository.

Test Channel

Simple visual interface displaying functionality and

usage of the Query API.

Highly configurable ESB Java application that can be

customized to specific needs.

Technology Involved

Copyright 2009, Information Builders. Slide 11

ESB -

iWay Service Manager (commercial)

IBM WebSphere ESB (commercial)

Oracle Service Bus (commercial)

WS02 ESB (open source)

mongoDB - http://www.mongodb.org/

JSON - Java Script Object Notation

XQuery - XML query language

mongoDB

“Humongous”

Scalable, high-performance, document-oriented database.

JSON-style documents.

Mirror capable.

Auto-Sharding (clustering), horizontal scaling, automatic

failover, zero single point failure.

MapReduce support for complex processing. Work is

distributed among the cluster.

GridFS support.

A distributed file system.

Commercial support from 10gen (OEM by iWay Software)

Copyright 2009, Information Builders. Slide 12

XQuery

A query and functional programming language for XML

documents.

Is to XML documents what SQL is to databases.

“FLWOR” expressions.

FOR, LET, WHERE, ORDER BY, RETURN

Example:

for $x in /FEDREG/CNTNTS/AGCY where

$x/EAR=„Agricultural‟ order by $x ascending

return $x

Supports syntax for constructing new documents.

Copyright 2009, Information Builders. Slide 13

JSON – JavaScript Object Notation

Copyright 2009, Information Builders. Slide 14

The new data-interchange language of the web.

www.json.org

Base Loading Architecture

Copyright 2009, Information Builders. Slide 15

ESB

mongoDB

Listener Flow

XML to

JSON

Binary

Storage

Store

JSON

Store

XML

GridFS

Base Query Architecture

Copyright 2009, Information Builders. Slide 16

ESB

mongoDB

HTTP

Listener Flow

Query

DB

Binary

Storage

(Optional)

Get XML

GridFS

Requester

Loading ModificationExternal Storage

Copyright 2009, Information Builders. Slide 17

ESB

mongoDB

Listener Flow

XML to

JSON

File System

Store

JSON

Store

XML

Loading ModificationSAP Loading Architecture

Copyright 2009, Information Builders. Slide 18

ESB

mongoDB

Flow

XML to

JSON

Binary

Storage

Store

JSON

Store

XML

GridFS

RFC

Server

SAP

System

Store

IDOC

IDOC to

XML

Loading ModificationChange Data Capture Loading Architecture

Copyright 2009, Information Builders. Slide 19

ESB

mongoDB

Flow

XML to

JSON

Binary

Storage

Store

JSON

Store

XML

GridFS

CDC

Listener

RDBMS

Loading ModificationSalesforce.com Loading Architecture

Copyright 2009, Information Builders. Slide 20

ESB

mongoDB

Flow

XML to

JSON

Binary

Storage

Store

JSON

Store

XML

GridFS

SOAP

Listener

Salesforce

System

Loading ModificationFTP Loading Architecture

Copyright 2009, Information Builders. Slide 21

ESB

mongoDB

Flow

XML to

JSON

Binary

Storage

Store

JSON

Store

XML

GridFS

FTP

Server

File

System

Query ModificationWeb Service SOAP Query Architecture

Copyright 2009, Information Builders. Slide 22

ESB

mongoDB

SOAP

Listener Flow

Query

DB

Binary

Storage

(Optional)

Get XML/

IDOC

GridFS

Web

Service

Client

The Test Client

Note: The archive is designed to be called from other

flows or programs.

A simple AJAX based human interface for querying the XML

Archive.

Provides examples of the HTTP query interface provided by

the base XML Archive.

Installed with the base implementation of the XML Archive.

Copyright 2009, Information Builders. Slide 23

Simple Example

Copyright 2009, Information Builders. Slide 24

Loaded this simple XML Doc:

Displaying the Document

Copyright 2009, Information Builders. Slide 25

XML Link:

JSON Link:

Basic Query

Copyright 2009, Information Builders. Slide 26

Return all documents who have the name attribute of

the <a> element equal to “bob”.

Advanced Queries

Copyright 2009, Information Builders. Slide 27

Support for:

And

Or

Regular Expressions

Ranges

Query handler is a wrapper around the mongoDB

query language.

Basic XQUERY

Copyright 2009, Information Builders. Slide 28

Return only the <b> element from the document.

Formatted Result:

top related