job scheduling - a strategic pathway for improved data warehouse-business intelligence performance

9
ASCI Enterprise Job Scheduling Software White Paper Job Scheduling: A Strategic Pathway for Improved Data Warehouse/Business Intelligence Performance Contents 2 Introduction 3 The Real Time Enterprise 4 Data Management Options 6 Increasing DBMS Efficiency 7 ActiveBatch: Unique Value for BI 9 Conclusion

Upload: vanoushp

Post on 28-Jul-2015

90 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Job Scheduling - A Strategic Pathway for Improved Data Warehouse-Business Intelligence Performance

ASCI Enterprise Job Scheduling Software White Paper  

 

 

 Job Scheduling: A Strategic Pathway for Improved  Data Warehouse/Business Intelligence Performance 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

             Contents   

          2       Introduct ion        3      The  Real ‐Time  Enterpr ise        4      Data  Management  Options        6       Increasing  DBMS  Eff ic iency        7      Act iveBatch:  Unique  Value   for  BI        9      Conclus ion  

↘ 

Page 2: Job Scheduling - A Strategic Pathway for Improved Data Warehouse-Business Intelligence Performance

        Job Scheduling: A Strategic Pathway for Improved 2      Data Warehouse/Business Intelligence Performance 

 

  

I n t r o d u c t i o n  

 

I t ’s hard to bel ieve, in an era when data

arr ives at the end of a f iber opt ic cable and

seconds-old informat ion is considered

dated, that the term Business Inte l l igence

is more than 50 years o ld. As far back as

1958, businesspeople real ized the need to

create a comput ing in f rastructure that

would a l low thei r organizat ions to

systemat ical ly gather , access and analyze

operat ional data in order for i t to run bet ter .

Over the years, as the abi l i t y for companies

to col lect Business Inte l l igence (BI) data

has grown exponent ia l ly , so has the need

to ef f ic ient ly organize and store i t . This

need has led to the not ion - a s ingle,

centra l ized reposi tory of the data that

descr ibes an organizat ion’s fu l l gamut of

act iv i t ies. Bi l l Inmon, general ly regarded as

the “ father of data warehousing” , says that

in format ion in a data warehouse must be

four th ings: subject-or iented (organized by

topic) ; t ime-var iant ( t racked over t ime);

non-volat i le ( read-only) ; and integrated

(complete and consistent) .

Clear ly, BI systems are only as ef fect ive as

the qual i ty and t imel iness of the data

support ing them. But as informat ion

cont inues to f lood in to the typical data

warehouse f rom convent ional sources as

wel l as newer points of or ig in such as RFID

readers and Web Serv ices appl icat ions,

increasingly sophist icated technologies and

standards are needed to manage the

swel l ing f lood of in format ion. “T ime to

ins ight , ” a key goal of BI , is of ten

compromised when IT systems cannot

del iver adequate storage and processing

capabi l i t ies.

Master Data Management (MDM), which

seeks to implement an of f ic ia l , consistent

set of ident i f iers and hierarchies to a

company’s data sets, is another example of

the coping d isc ip l ines current ly gain ing

currency. Ef fect ive resource ut i l izat ion,

work load balancing, and scalabi l i t y a lso

become cr i t ica l , both for s torage and data

processing resources.

     1.      SUBJECT‐ORIENTED                                    ORGANIZED BY TOPIC      2.      TIME‐VARIANT                                             TRACKED OVER TIME       3.      NON‐VOLATILE                                            READ‐ONLY        4.      INTEGRATED                                                  COMPLETE AND CONSISTENT 

↘ Information in a Data Warehouse Must Be Four Things:

Page 3: Job Scheduling - A Strategic Pathway for Improved Data Warehouse-Business Intelligence Performance

        Job Scheduling: A Strategic Pathway for Improved 3      Data Warehouse/Business Intelligence Performance 

 

 

T h e  R e a l ‐ T im e   E n t e r p r i s e

 

 

 

 

Perhaps the b iggest s ingle evolut ionary

dr iver in the expansion of modern data

warehouse management is the

democrat izat ion of Business Inte l l igence.

Where BI was once the purv iew of specia l ly

t ra ined business analys ts who used

sophist icated sof tware to s l ice-and-dice

h ighly complex data sets, the advent of

easy to use, mul t id imensional analyt ica l

appl icat ions have extended BI to the

far thest corners of the enterpr ise.

Today, managers at a l l levels can instant ly

cal l up sales, manufactur ing, f inance,

human resource and other k inds of data to

pose quer ies and gain ins ight for bet ter

decis ion-making. Onl ine Analyt ica l

Processing (OLAP), Google Docs, Serv ice

Or iented Archi tecture (SOA)

implementat ions and other developments

have given—in some cases mandated—BI

avai labi l i t y to near ly everyone whose job

descr ipt ion involves some level of business

management.

No longer is analys is one step removed

f rom dai ly operat ions; in fact , Gartner

predicts that “by the end of 2009, 90% of

Global 2000 companies wi l l have

implemented some type of miss ion-cr i t ica l

dependency between the [data] warehouse

and at least one revenue-support ing or

cost-contro l l ing operat ional appl icat ion—

up f rom less than 25% in 2007.” I t a lso

f inds that the same percentage of Global

2000 companies a l ready have analyt ics

engines bui l t in to thei r operat ional

appl icat ions, or p lan to have them by the

end of 2010 (“Operat ional Analyt ics and the

Emerging Miss ion-Cr i t ica l Data

Warehouse”, May 2007) .

Wi th the shi f t f rom backroom strategic

appl icat ion to essent ia l dai ly management

tool comes the need for real - t ime, or near

real - t ime, data col lect ion. In fact , another

Gartner s tudy researched the need for low-

latency data del ivery ( “Gartner Study on

Data Integrat ion Ident i f ies Key Usage

Trends”, February 2006). The report found

“st rong demand for low – latency data

Increasing Demand for Real-Time Data Integration

Page 4: Job Scheduling - A Strategic Pathway for Improved Data Warehouse-Business Intelligence Performance

        Job Scheduling: A Strategic Pathway for Improved 4      Data Warehouse/Business Intelligence Performance 

 

del ivery on a g lobal basis, wi th

organizat ions in aggregate indicat ing that

more than 60 percent of thei r data

integrat ion act iv i t ies must happen wi th

latency of one hour or less, and over 35

percent wi th la tency of less than one

minute. This s igni f icant shi f t in the past

several years can be at t r ibuted to greater

levels of compet i t ion, customer demand for

rapid serv ice in a l l industr ies, and the

overal l business c l imate.” Federal / s tate /

local governments, f inancia l serv ices,

manufactur ing, reta i l , t ransportat ion and

ut i l i t ies were found to be the segments wi th

the h ighest need for real - t ime

requirements.

The number of enterpr ises that can get by

wi th s imple batch-or iented, h igh- latency

data ref resh programs cont inues to drop.

Those who st i l l populate thei r data

warehouses n ight ly, are pr imar i ly us ing

thei r data warehouses for long-term

strategic BI . More common today are

ref resh rates that are semi-dai ly, hour ly, or

semi-hour ly.

I t ’s important to note that t rue real - t ime, or

instantaneous, data avai labi l i t y is

of tent imes more an ideal dr iven by

perceived compet i t ive pressures, than a

necessary or even desi rable goal . Whi le

many enterpr ises report that users are

demanding data ref resh rates down to the

mi l l isecond, only those indiv iduals who

depend on “business-aware” appl icat ions,

in f ie lds l ike product ion management or

t ransact ional processing, are l ike ly to need

such immediate and f lu id in format ion. Most

operat ional appl icat ions, and near ly a l l BI

tools, can fu l f i l l thei r tasks wi th per iodic or

near real - t ime data.

 

           Da t a  Mana g emen t  Op t i o n s  

To provide the re levant , t imely and

opt imized data businesspeople need to do

thei r jobs, i t ’s necessary to cont inual ly

process and update the informat ion stored

in a data warehouse. A number of data

  

↘  Increasing Business Demands Require That Many Industries Perform with Real-Time Data Integration

Page 5: Job Scheduling - A Strategic Pathway for Improved Data Warehouse-Business Intelligence Performance

        Job Scheduling: A Strategic Pathway for Improved 5      Data Warehouse/Business Intelligence Performance 

 

management appl icat ions and archi tectures

are avai lable to fu l f i l l these funct ions; most

Database Management Systems (DBMS),

and even operat ing systems l ike UNIX and

Windows, of fer some sor t of scheduler to

coordinate processing tasks. Yet because

most of these are e i ther at tached to a

s ingle database server , focused on

database maintenance only, or restr ic ted to

sof tware f rom a speci f ic vendor, they of ten

are too myopic for broader, enterpr ise-wide

data management requirements.

Dedicated job schedul ing appl icat ions, on

the other hand, have the necessary power

and capabi l i ty to per form far beyond the

constra ints of OS- or DBMS-based

schedul ing tools. Job schedulers can

t r igger tasks based on events, rather than

s imply date or t ime, and can accommodate

unpredictable or one-t ime occurrences.

They can recover/ restar t automat ical ly in

case of job fa i lure, generate execut ion

reports on scheduled tasks, and provide

audi t t ra i ls for compl iance purposes.

In an era when quick data loads are

essent ia l to BI per formance, yet manual

involvement is of ten necessary, job

schedulers can complete ly automate the

loading and execut ion process. Even more

important ly, job schedulers can leverage

and load-balance large numbers of jobs

across mul t ip le servers and storage

devices, ef fect ive ly increasing the

ef f ic iency of an enterpr ise’s IT

inf rastructure whi le a lso complet ing jobs

faster , a l l a t l i t t le or no addi t ional cost .

When select ing a job scheduler for data

warehouse management, i t ’s important to

seek one that can separate d i f ferent tasks,

e.g. , opt imiz ing data for quick loads, yet

a lso of fer a f ramework for tying together

many k inds of conf igurat ions and def in ing

job st reams as t ight ly as needed. I t should

a lso have the abi l i t y to p ick of f events f rom

di f ferent machines in a mul t i -OS

envi ronment , and run f lat f i le ( two-

dimensional ) as wel l as mul t i -d imensional

database jobs.

Job schedulers can al low data warehouse

administ rators to take advantage of

processing capaci ty beyond the server or

system in quest ion, and maximize use of

avai lable resources. Best-of-breed

schedulers a lso support heterogeneous OS

envi ronments (e.g. , L inux, UNIX, Windows,

z/OS and OpenVMS), thereby removing

roadblocks to important appl icat ions and

data.

 

 

Page 6: Job Scheduling - A Strategic Pathway for Improved Data Warehouse-Business Intelligence Performance

        Job Scheduling: A Strategic Pathway for Improved 6      Data Warehouse/Business Intelligence Performance 

 

 

I n c r e a s i n g  DBMS   E f f i c i e n c y  

 

Event-based schedul ing, of the type found

in Advanced Systems Concepts ’

Act iveBatch® Job Schedul ing and Workload

Automat ion appl icat ion, is perhaps the most

unique and valuable advantage of job

schedul ing sof tware. Event t r iggers can

not only shr ink the tota l daypart (s) devoted

to processing, but a lso increase

responsiveness. Fur thermore, depending

on how widely the scheduler ’s agents are

implemented, the processing envi ronment

can approach “c loud comput ing” status and

even create a greener IT inf rastructure.

Job st reams can be assembled wi th t r iggers

l inked to dozens of events, f rom system

star tup and f i le creat ion / modi f icat ion /

delet ion to runaway processes, job fa i lures

and much more. With the addi t ion of act ive

var iables, i t ’s a lso possib le to in terrogate

data as a means of t r igger ing a task.

Examples of act ive var iables inc lude Date

Expression, F i le Contents, SQL Record Set ,

Web Serv ice Request Resul t , WMI Query,

XML Query, and Fi le System Informat ion.

Event-dr iven schedul ing, because i t

responds to the f lu id and somet imes

unpredictable pace of business, is ideal for

near real - t ime data inser ts for miss ion

cr i t ica l appl icat ions. Because low la tency

envi ronments of ten necessi tate the

combinat ion of more volat i le data wi th

stable master data reposi tor ies, event-

based job schedulers can be used to

establ ish hold ing areas at set t imes of day,

in tegrat ing the two on a scheduled basis.

Other p lusses exist as wel l for the event-

based model . Whi le many IT departments

use thei r DBMS to bui ld processing jobs, i t

can be advantageous to use a job

scheduler to t ie together smal ler packages

of jobs, s ince i t e f fect ively separates the

execut ion logic f rom the data t ransfer logic.

In th is way the job scheduler ’s event-based

scheme can manage even large job

packages, report ing on success/ fa i l and

minimiz ing checkpoint ing. Coding and

maintenance are a lso reduced.

Of course, v i r tual ly a l l job schedulers can

a lso accommodate date/ t ime schedul ing for

rout ine database refreshes. The

combinat ion of the two, as found in

Act iveBatch, creates maximum opportuni ty

to furn ish relevant and act ionable data to

decis ion makers across the enterpr ise.

 

    

 

      

Job streams can be assembled with triggers linked to dozens of events, such as: 

 

↘ 

Page 7: Job Scheduling - A Strategic Pathway for Improved Data Warehouse-Business Intelligence Performance

        Job Scheduling: A Strategic Pathway for Improved 7      Data Warehouse/Business Intelligence Performance 

 

 

A c t i v e B a t c h :  Un i q u e  V a l u e   f o r  B I  

In the modern enterpr ise, data warehouses

used for Business Inte l l igence must col lect

data f rom a bewi lder ing array of sources.

The need for fast loading of product

t ransact ions, for example, is cr i t ica l ; in

some cases, order data must be integrated

into product ion processes even before the

sale is complete. ETL (Extract / Transform

/ Load) tools found in IBM WebSphere

DataStage, Microsof t SQL Server

Integrat ion Serv ices and the open source

Apatar appl icat ion can accompl ish many of

these tasks; however, the r ight job

scheduler can s impl i fy the creat ion of a job

st ream framework that t ight ly coordinates

ETL workf lows, ensur ing that such

processes are handled ef f ic ient ly.

The unique Integrated Jobs L ibrary in

Act iveBatch can be used to quick ly create

ETL workf lows wi thout custom scr ipt ing.

For example, i t can be used to quick ly

create workf lows that reach across mul t ip le

SQL Servers to job chain mul t ip le Data

Transformat ion Serv ices jobs. Al ternate ly

i t can create job p lans that pass

informat ion f rom one database to another

or to var ious appl icat ions. I t can also

integrate management tasks wi th other

scr ipts or appl icat ions.

Act iveBatch, because of i ts easy- to-use

management envi ronment, can add new

data sources automat ical ly, wi th l i t t le or no

human decis ion-making. The el iminat ion of

manual in tervent ion a l lows enterpr ises to

conf igure and integrate data f rom more

sources, more quick ly than ever before. In

addi t ion, Act iveBatch’s abi l i t y to balance

work loads across many servers has a l lowed

users to improve serv ice levels by

complet ing jobs in less t ime, and wi th fewer

errors.

Work load balancing, h igher server

ut i l izat ion and scalabi l i t y is of utmost

importance to most enterpr ises.

Act iveBatch’s st rengths in these areas has

g iven users bet ter use of thei r IT

resources; in fact , Act iveBatch has been

proven to rel iably run over one mi l l ion jobs

per day and connect to 2,000 servers,

ensur ing i ts ef f ic iency and dependabi l i ty in

product ion envi ronments of a l l s izes.

Page 8: Job Scheduling - A Strategic Pathway for Improved Data Warehouse-Business Intelligence Performance

        Job S8      Data 

Act iveBa

by maxim

comput in

complet io

business

for exam

dist r ibute

on comm

servers s

warehous

cheduling: A Warehouse/B

tch a lso min

miz ing ut i l iza

ng power, re

on t ime and

processes

p le, is par t ic

ed process s

odi ty ( In te l ,

surrounding

se.

                   C 

↘ 

Strategic PatBusiness Inte

nimizes inve

at ion of ex is

ducing e laps

al lowing mo

to be execut

cular ly wel l -

schedul ing

Windows or

the centra l

ross  ‐ Plat

hway for Implligence Perfo

stment r isk

t ing

sed job

ore

ted. ETL,

sui ted to

r L inux)

data

form  ETL  A

proved ormance 

For

mart

nece

depa

man

aler t

acco

Roo

work

busi

thos

thei r

Vi r tu

can

esta

al low

oper

chan

mod

pert

Architectu

large enterp

ts or operat i

essary or de

ar tments to

ipulate obje

ts, or sched

ommodate th

t , a capabi l i

k teams, dep

ness un i ts l

se objects, jo

r job descr ip

ual Root are

be set by us

abl ished uni t

wing those c

rat ional nee

nges wi thout

i fy or even s

ain ing to oth

ure  Chart  

pr ises wi th d

ional data st

es i rable to a

create work

ects such as

u les. Act ive

his need wi th

ty that g ives

par tments an

og- in protec

obs and p lan

pt ions. Perm

e contro l led

ser , group o

t on a granu

closest to th

d to make n

t being able

see objects

her par ts of

discrete data

tores, i t ’s of

l low mul t ip le

f lows, or to

calendars,

eBatch can a

h i ts Vi r tual

s indiv iduals

nd even

cted access

ns appropr ia

miss ions in t

centra l ly an

r other

lar basis ,

e BI or

ecessary

to access,

or workf low

the enterpr i

 

a

f ten

e

a lso

s,

to

ate to

the

d

s

se.

Page 9: Job Scheduling - A Strategic Pathway for Improved Data Warehouse-Business Intelligence Performance

        Job Scheduling: A Strategic Pathway for Improved 9      Data Warehouse/Business Intelligence Performance 

 

 

 

C o n c l u s i o n  

 

 

 

As data warehouses become increasingly

miss ion-cr i t ica l in nature, and as BI

extends i ts march across the modern

enterpr ise, database administ rators face

the daunt ing task of making thei r

reposi tor ies increasingly agi le and

responsive. At the same t ime—as any IT

professional knows—global business

imperat ives demand that more be done wi th

less.

Many organizat ions a l ready use dedicated

job schedulers l ike Act iveBatch to manage

thei r data processing work loads. By

leveraging thei r ex ist ing job schedulers for

data warehouse management, IT

organizat ions can improve thei r abi l i ty to

handle more sophist icated database needs

whi le a lso reducing the t ime devoted to

scr iptwr i t ing and rout ine job management.

The inc lus ion of an exist ing job scheduler ,

or insta l la t ion of a new job scheduler , can

opt imize data warehouse performance,

shortening BI “ t ime to ins ight” in an era

when t imely, accurate and user-opt imized

business informat ion has never been

greater . Ul t imately, an organizat ion ’s

abi l i t y to compete is only as great as the

informat ion i t has avai lable to i t—perhaps

the best reason of a l l to put a best-of -breed

job scheduler in to the data warehouse

management mix.

© Copyright Advanced Systems Concepts, Inc.

All rights reserved

“Ultimately, an organization’s ability to compete is only as great as the information it has available to it – perhaps the best reason of all to put a best-of-breed job scheduler into the data warehouse management mix.” 

Discuss Your Workload Automation Goals

with an ActiveBatch® Consultant

Learn more about ActiveBatch with a

personalized Live Product Demonstration

hosted online.