faster sorting, manipulation, reporting & test data - speed high-volume file processing...

26
Faster Sorting, Manipulation, Reporting & Test Data - Speed High- Volume File Processing In/Outside Natural www.cosort.com Copyright 2006, IRI, Inc.

Upload: patrick-cook

Post on 25-Dec-2015

224 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural

www.cosort.comCopyright 2006, IRI, Inc.

Page 2: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

ISV - HQ in Melbourne, FLFounded 1978 – Manhasset, NY

World’s 1st commercial sorts for: CP/M, DOS, UNIX, Windows, Linux

Experts in sorting and data manipulation

Recommended by all UNIX H/W vendors

Embedded by leading ISVs like:Cincom, Clerity, EDIWatch, Experian, Fiserv, Kalido,Mereo, Sabre, SPSS, ViPS

30+ international support offices

Who We AreInnovative Routines International

(IRI), Inc.

www.cosor t.com

1-800-333-SORT

Page 3: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

What We Sell

CoSORTCoSORTHigh Performance Sorting, Reporting, ETL

RowGenRowGenSafe Test Data in Custom File Formats

FAst extraCT (FACT)FAst extraCT (FACT)High Performance Oracle Unload

netCONVERTnetCONVERT Legacy File Migration using Copybook layouts

Logon SecurityLogon SecurityGranular UNIX Account Access Control and Audit

x-PRESSx-PRESSComprehensive Cross-Platform Compression Suite

Permitas Permitas Software Licensing Libraries and Management

CoSORT Platforms

ALL 32 & 64-bit UNIX (AIX, HP-UX, Solaris, Tru64, IRIX, MP-RAS, DG/UX, SINIX, ptx, etc.)

ALL 32 & 64-bit LINUX(RHEL, SLES, Debian, Fedora,

Debian, Mandrake, Gentoo, Turbo, WOW, AsianUX, Ubuntu, etc.)

Windows XP, NT, 2K/3, DC

IBM i, p, x & z Series

Page 4: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

When You’d Want:

CoSORTCoSORT

Fast, Large File TransformationFast, Large File TransformationSelect, Sort, Join, Convert, Aggregate, Reformat

Legacy Sort MigrationLegacy Sort MigrationFrom: VS/VSE JCL, SS Unix, Natural Sort, et al

Data and File Format ConversionData and File Format Conversione.g. EBCDIC to ASCII, MF-ISAM to CSV

Custom Reporting & Hand-offCustom Reporting & Hand-offMulti-target/format, segmented detail/summary BI

DB Loader AccelerationDB Loader AccelerationPre-sort flat files on primary index key

ETL Tool & Application AccelerationETL Tool & Application AccelerationSort & Metadata Hooks for DataStage, Informatica

RowGenRowGen

Test Data & Safe File Synthesis Test Data & Safe File Synthesis

FAst extraCT (FACT)FAst extraCT (FACT)

Oracle Unload, Reorg & ETLOracle Unload, Reorg & ETL

Page 5: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc
Page 6: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

CoSORT Functionality

High-Performance Record ProcessingHigh-Performance Record Processing – Sort, Copy, Merge, Join, Check– Input / Output Conditional Filter/Select, De-Dupe– Aggregate (Sum, Min, Max, Count, Average), Sequence– Cross-Calculate (Expression Logic)– Segment, Re-map, Re-format, Report (Custom Layouts)

File Integration & TransformationFile Integration & Transformation– Sequential files (Line, Record, Variable)– Unisys VB & Blocked format files– MF-ISAM & ACUCOBOL Vision (Index) files– Named & Un-named Pipes– Records-in-memory– Custom input, compare & output procedures (User Exits)– Coming soon: Huge LDIF & Flat XML Sources

Page 7: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

CoSORT Functionality

Data Type Collation & ConversionData Type Collation & Conversion

MiscellaneousMiscellaneous– Replacement and Conversion of 3rd-Party Sorts– LOCALE (operating system defined) collation– Thread-safe APIs – Granular resource tuning and monitoring

ASCII

Binary integerEBCDICEdited numeric

Bit Embedded signCharacter FloatDate/timestamps Packed decimalIP Address Whole numberCurrency UnicodeDouble Unsigned decimal

Page 8: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

Sort Replacements– Seamless drop-in services for many third-party sorts

Sort-I (Sort Interactive)– Command line prompt/batch program for novices

SortCL (Sort Control Language)– mainframe-familiar DDL/DML for JCL sort migration,

data warehouse integration/staging (ETL), + reporting

– CLI, API, Java GUI for cross-platform design/launch

SortCL Conversion Tools – for third-party metadata and legacy sort parms

Application Programming Interfaces (APIs)– 3 callable libraries for third-party software integration

CoSORT User Interfaces

Page 9: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

Sort Interface Support for:

– Amdocs Ensemble– Ascential (IBM WebSphere) DataStage – Cincom Supra SQL– IBM DB2 UDB Loader – Informatica PowerMart/Center– MF COBOL (Workbench, Srv/Net Express)– SAS System – Software AG Natural– Sun Mainframe Rehosting MTP/MBM– SyncSort UNIX (via script conversion)– Unix (/bin/sort)

Metadata Re-Use Support for:

– MVS and VSE JCL sort parms– COBOL Copybooks– Common & Extended Web Log formats– Microsoft CSV files– ETL, BI, XML and RDB file formats via MIMB

Leveraging What’s There

“The maturing IT industry can no longer propagate the notion of scrapping previous investments to

adopt new technologies. Billions of dollars have already

been invested in hardware, operating systems and applications.

Our solutions integrate withwhat is already there, and thus, can deliver exceptional ROI.”

Norman Praed, CEOProgeni Corporation

Page 10: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

CoSO

RT’s

Natur

al

SORT

Repla

cem

ent

1) Copy $COSORT_HOME/etc/Makefile.nat2cs

into your ~sag/nat/vxxx/bin/build directory

2) Install the replacement with:

cd ~sag/nat/vxxx/bin/build

mv Makefile Makefile.orig

cp Makefile.nat2cs Makefile

3) Uncomment Makefile LIB_COSORT entry for your O/S

4) Link with:

make natural cosort=yes

5) Run with:

setenv PATH $PATH:$COSORT_HOME/bin

natural [...]

6) Enable debugging by setting the environment variable:

NAT2SCL_DEBUG=1.

Page 11: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

Coroutine SORT Architecture + Parallel CPU Exploitation

19 MB in 2 seconds on P200/2 w/ NT, 2 keys

1.0 GB in 12 seconds on IBM p690/4

1.8 GB in 67 seconds on Compaq GS140/8

2.4 GB in 39 seconds on SunFire 15K/6

5.2 GB in 20 minutes on Sun UE3000, 23 keys

272 GB in <2 hours on IBM Numa-Q 2000/4 setting a 2000 TPC-H DSS benchmark record.

Page 12: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

TDWI’s Customer Intelligence LifecycleCoSORT’s SortCL …

is used for very fast

data integration and staging:

extract –transform –load (ETL) operations

on multiple,

large external (flat file)

data sources.

Sort Control Language (SortCL)

Page 13: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

SortCL: Single PassSingle PassData ManipulationsData Manipulations

… through many, large, differently-formatted inputs:

Sort/Mergeon any number of keys in any position

Join matching 1-1, many-1, inner/outer, left/right

Select via record filters or conditional include/omit

Convert translate input field data types to new types

Aggregate min, max, average, sum, count (sub and grand)

Calculate across rows to perform math (+ sci functions)

Re-map change field positions, sizes, and values

Report to highly-formatted, multi-level output targets

User Exits for custom input, compare and output criteria

Page 14: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

SortCL Also Re-hosts JCL Sorts.

Consider Tetrad’s OPX:Operational Processing for UNIX

Tools and architecture to manage jobs on UNIX

Separates job definitions from job processing

Easy integration with 3rd party tools like:Natural, CoSORT, MF COBOL

Developed in Perl for multi-platform use

Not dependent on Software AG products

Page 15: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

How OPX Rehosts via “Job Wrappers”

OPX provides job access to programs through the use of “Wrappers”

A wrapper is a script or program that facilitates the interface between the OPX job and an external program or process

A sample set of wrappers are provided with OPX (including Natural, SORT, IEFBR14, GENER, FTP)

Page 16: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

SORT WrapperCreates pseudo JCL

for CoSORT

Final CoSORT.scl statements to

perform sort

SORT WrapperPre-processes the

.scl statements

pseudo-JCLfor CoSORTTranslation

ConvertedSort Control

Language (.scl)

CoSORT mvs2sclconversion utility

SORT WrapperCalls CoSORTSortCL utility

CoSORT SortCLperforms the .scl job as requested

Done!

OPX SORT WrapperIntegration with CoSORT

Page 17: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

OPX SORT Wrapper Example

//SUMS JOB 1// EXEC PGM=SORT 2//STEPLIB DD DSN=SORT.RESI.DENCE,DISP=SHR 3//SYSOUT DD SYSOUT=A 4//SORTIN DD DSN=chiefs30.votes, 5// UNIT=2400-3,VOL=SER=887766,// DISP=(OLD,KEEP)//SORTOUT DD DSN=termsums, 6// UNIT=2400-3,VOL=SER=554433,// DD DISP=(NEW,KEEP)//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,20) 7//SYSIN DD * 8 SORT FIELDS=(40,3,CH,A,45,2,CH,A) 9 SUM FIELDS=(23,3,CH) 10/* 11

Can we run this on Unix?

...with OPX and CoSORT - Sure!

Page 18: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

OPX SORT Wrapper Example

What does the CoSORT SortCL Version Look Like?

Same job, easier to expand …

/INFILES=chiefs30.votes /FIELD=(field_0, POSITION=40, SIZE=3, EBCDIC) /FIELD=(field_1, POSITION=45, SIZE=2, EBCDIC) /FIELD=(field_2, POSITION=23, SIZE=3, EBCDIC) /CONDITION=(cond_0, TEST=(field_0 OR field_1))

/SORT /KEY=(field_0, ASCENDING) /KEY=(field_1, ASCENDING)

/OUTFILE=termsums /FIELD=(field_0, POSITION=40, SIZE=3, EBCDIC) /FIELD=(field_1, POSITION=45, SIZE=2, EBCDIC) /FIELD=(field_2_sum, POSITION=23, SIZE=3, EBCDIC) /SUM field_2_sum FROM field_2 BREAK cond_0

Page 19: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

Sort Control Language (SortCL)CLI/Batch Mode Example (Join)

################################################# CoSORT/SortCL Job Spec file csumtv_join_1.spec# gets run with $sortcl /spec=csumtv_join_1.spec# Created 09/18/2006 15:20EST by DW_team_alpha# Conditional indexed summary join report + calc################################################

/STATISTICS=/warehouse/stats/csumptv_join.sta # runtime performance log/MEMORY-WORK="$COSORT_HOME/etc/cosortrc“ # job-specific tuning file/SPEC=/warehouse/views/csumtv # metadata, condition references

/INFILE=${SQL_DATA}csumtv_temp # first input/join file (EV)/ALIAS=left/INFILE=${SQL_DATA}csumtv_ICN_temp # second input/join file (EV)/ALIAS=right

/JOIN LEFT_OUTER left right WHERE left.TID==right.TID AND left.SSN==right.SSN

/OUTFILE=${SQL_DATA}csumtv.report # new record layout, data types

/HEADREC=“>>>>> STARS Report >>>> %D, DATE, %S, USER ****”

/FIELD=(left.YDATE, SEPARATOR='~', POSITION=1, SIZE=4.2, NUMERIC)/FIELD=(left.TID, SEPARATOR=‘\t', POSITION=2, SIZE=10, EBCDIC) /DATA="~*“

/FIELD=(left.SSN, SEPARATOR=‘,', POSITION=3, SIZE=22, ASCII)/FIELD=(left.sum_DAYS_SUPPLY, SEPARATOR=‘|', POSITION=4, SIZE=12, INTEGER)/FIELD=(left.sum_A_AMT, SEPARATOR='~', POSITION=5, SIZE=15.2, NUMERIC)/DATA= IF Cond1 THEN "~ 0.0“ THEN IF Cond2 ELSE left.YDATE+1900

/SUM RUNNING FROM left.PERIOD WHERE COND3 # accumulating aggregate

/FIELD=(SEQUENCER+50, SEPARATOR=‘|', POSITION=6, SIZE=5) # DB reindexer

Page 20: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

CoSORT’s Java GUI–to–SortCL (gui2scl) Client/Server Sort/ETL Application(gui2scl) Client/Server Sort/ETL Application

Page 21: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

CoSORT’s Java GUI–to–SortCL (gui2scl) Client/Server Sort/ETL Application(gui2scl) Client/Server Sort/ETL Application

Page 22: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

CALL shcmd sortcl /spec=file.scl

/INFILE=Natural WORK FILE

/OUTFILE=Natural WORK FILE

Benefits:– 1-pass, multi-file integration, staging, and reporting– No limit on # of in/output files– No limits on file sizes or layouts– Same metadata as RowGen –

so …. The same SortCL job script can be used in RowGen to generate safe test data in exactly the same file format.

Using CoSORT SortCL with Natural for Unix

Page 23: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

Affordable ($1K - $25K)

Perpetual use (not a lease)

Volume license discounts

ISV / OEM runtime pricing

GSA schedule, state bidder

Global support (30+ offices)

Product bundling discounts:– FAst extraCT (DB unload)– RowGen (safe test data)– netCONVERT (file porting)– x-PRESS (compression)– Logon Security (access)– Permitas (app licensing)

CoSORT Value

Page 24: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

New New Product!Product!

Prototype ApplicationsPrototype Applications– Create data and file formats your projects needCreate data and file formats your projects need

Share Files with OutsourcersShare Files with Outsourcers– Provide accurate layouts, not real dataProvide accurate layouts, not real data

Specify Value RangesSpecify Value Ranges– Use selection and set files: better than real dataUse selection and set files: better than real data

Simulate DB OpsSimulate DB Ops– Quickly test table loading and query scenariosQuickly test table loading and query scenarios

Benchmark Testing Benchmark Testing – Gen big files for hardware and software PoCsGen big files for hardware and software PoCs

CoSORT’s CoSORT’s RowGenRowGen

Data SynthesizerData Synthesizer

Create Custom Files with Safe DataCreate Custom Files with Safe Data(Using SortCL Metadata!)

Page 25: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

UsingOracle?

Use FACT to SpeedExtractsand write

metadata for SortCL and

SQL*Loader.

Single-pass entire E-T-L operations through a

pipe!

Page 26: Faster Sorting, Manipulation, Reporting & Test Data - Speed High-Volume File Processing In/Outside Natural  Copyright 2006, IRI, Inc

Now You Know.

CoSORTCoSORT is the innovator in UNIX and Windows sort software, and a key infrastructure tool for the staging, integrating, manipulating and presenting of large data volumes. Since 1978, IT installations have chosen CoSORTCoSORT to meet their project and performance objectives in:

Natural & JCL sort migrations

VLDB reorg (unload, sort, reload)

Data warehouse staging (ETL)

Detail and summary reporting

3rd-party sort replacements

Batch jobs and new products