job scheduling - a strategic pathway for improved data warehouse-business intelligence performance
TRANSCRIPT
ASCI Enterprise Job Scheduling Software White Paper
Job Scheduling: A Strategic Pathway for Improved Data Warehouse/Business Intelligence Performance
Contents
2 Introduct ion 3 The Real ‐Time Enterpr ise 4 Data Management Options 6 Increasing DBMS Eff ic iency 7 Act iveBatch: Unique Value for BI 9 Conclus ion
↘
Job Scheduling: A Strategic Pathway for Improved 2 Data Warehouse/Business Intelligence Performance
I n t r o d u c t i o n
I t ’s hard to bel ieve, in an era when data
arr ives at the end of a f iber opt ic cable and
seconds-old informat ion is considered
dated, that the term Business Inte l l igence
is more than 50 years o ld. As far back as
1958, businesspeople real ized the need to
create a comput ing in f rastructure that
would a l low thei r organizat ions to
systemat ical ly gather , access and analyze
operat ional data in order for i t to run bet ter .
Over the years, as the abi l i t y for companies
to col lect Business Inte l l igence (BI) data
has grown exponent ia l ly , so has the need
to ef f ic ient ly organize and store i t . This
need has led to the not ion - a s ingle,
centra l ized reposi tory of the data that
descr ibes an organizat ion’s fu l l gamut of
act iv i t ies. Bi l l Inmon, general ly regarded as
the “ father of data warehousing” , says that
in format ion in a data warehouse must be
four th ings: subject-or iented (organized by
topic) ; t ime-var iant ( t racked over t ime);
non-volat i le ( read-only) ; and integrated
(complete and consistent) .
Clear ly, BI systems are only as ef fect ive as
the qual i ty and t imel iness of the data
support ing them. But as informat ion
cont inues to f lood in to the typical data
warehouse f rom convent ional sources as
wel l as newer points of or ig in such as RFID
readers and Web Serv ices appl icat ions,
increasingly sophist icated technologies and
standards are needed to manage the
swel l ing f lood of in format ion. “T ime to
ins ight , ” a key goal of BI , is of ten
compromised when IT systems cannot
del iver adequate storage and processing
capabi l i t ies.
Master Data Management (MDM), which
seeks to implement an of f ic ia l , consistent
set of ident i f iers and hierarchies to a
company’s data sets, is another example of
the coping d isc ip l ines current ly gain ing
currency. Ef fect ive resource ut i l izat ion,
work load balancing, and scalabi l i t y a lso
become cr i t ica l , both for s torage and data
processing resources.
1. SUBJECT‐ORIENTED ORGANIZED BY TOPIC 2. TIME‐VARIANT TRACKED OVER TIME 3. NON‐VOLATILE READ‐ONLY 4. INTEGRATED COMPLETE AND CONSISTENT
↘ Information in a Data Warehouse Must Be Four Things:
Job Scheduling: A Strategic Pathway for Improved 3 Data Warehouse/Business Intelligence Performance
T h e R e a l ‐ T im e E n t e r p r i s e
Perhaps the b iggest s ingle evolut ionary
dr iver in the expansion of modern data
warehouse management is the
democrat izat ion of Business Inte l l igence.
Where BI was once the purv iew of specia l ly
t ra ined business analys ts who used
sophist icated sof tware to s l ice-and-dice
h ighly complex data sets, the advent of
easy to use, mul t id imensional analyt ica l
appl icat ions have extended BI to the
far thest corners of the enterpr ise.
Today, managers at a l l levels can instant ly
cal l up sales, manufactur ing, f inance,
human resource and other k inds of data to
pose quer ies and gain ins ight for bet ter
decis ion-making. Onl ine Analyt ica l
Processing (OLAP), Google Docs, Serv ice
Or iented Archi tecture (SOA)
implementat ions and other developments
have given—in some cases mandated—BI
avai labi l i t y to near ly everyone whose job
descr ipt ion involves some level of business
management.
No longer is analys is one step removed
f rom dai ly operat ions; in fact , Gartner
predicts that “by the end of 2009, 90% of
Global 2000 companies wi l l have
implemented some type of miss ion-cr i t ica l
dependency between the [data] warehouse
and at least one revenue-support ing or
cost-contro l l ing operat ional appl icat ion—
up f rom less than 25% in 2007.” I t a lso
f inds that the same percentage of Global
2000 companies a l ready have analyt ics
engines bui l t in to thei r operat ional
appl icat ions, or p lan to have them by the
end of 2010 (“Operat ional Analyt ics and the
Emerging Miss ion-Cr i t ica l Data
Warehouse”, May 2007) .
Wi th the shi f t f rom backroom strategic
appl icat ion to essent ia l dai ly management
tool comes the need for real - t ime, or near
real - t ime, data col lect ion. In fact , another
Gartner s tudy researched the need for low-
latency data del ivery ( “Gartner Study on
Data Integrat ion Ident i f ies Key Usage
Trends”, February 2006). The report found
“st rong demand for low – latency data
Increasing Demand for Real-Time Data Integration
↘
Job Scheduling: A Strategic Pathway for Improved 4 Data Warehouse/Business Intelligence Performance
del ivery on a g lobal basis, wi th
organizat ions in aggregate indicat ing that
more than 60 percent of thei r data
integrat ion act iv i t ies must happen wi th
latency of one hour or less, and over 35
percent wi th la tency of less than one
minute. This s igni f icant shi f t in the past
several years can be at t r ibuted to greater
levels of compet i t ion, customer demand for
rapid serv ice in a l l industr ies, and the
overal l business c l imate.” Federal / s tate /
local governments, f inancia l serv ices,
manufactur ing, reta i l , t ransportat ion and
ut i l i t ies were found to be the segments wi th
the h ighest need for real - t ime
requirements.
The number of enterpr ises that can get by
wi th s imple batch-or iented, h igh- latency
data ref resh programs cont inues to drop.
Those who st i l l populate thei r data
warehouses n ight ly, are pr imar i ly us ing
thei r data warehouses for long-term
strategic BI . More common today are
ref resh rates that are semi-dai ly, hour ly, or
semi-hour ly.
I t ’s important to note that t rue real - t ime, or
instantaneous, data avai labi l i t y is
of tent imes more an ideal dr iven by
perceived compet i t ive pressures, than a
necessary or even desi rable goal . Whi le
many enterpr ises report that users are
demanding data ref resh rates down to the
mi l l isecond, only those indiv iduals who
depend on “business-aware” appl icat ions,
in f ie lds l ike product ion management or
t ransact ional processing, are l ike ly to need
such immediate and f lu id in format ion. Most
operat ional appl icat ions, and near ly a l l BI
tools, can fu l f i l l thei r tasks wi th per iodic or
near real - t ime data.
Da t a Mana g emen t Op t i o n s
To provide the re levant , t imely and
opt imized data businesspeople need to do
thei r jobs, i t ’s necessary to cont inual ly
process and update the informat ion stored
in a data warehouse. A number of data
↘ Increasing Business Demands Require That Many Industries Perform with Real-Time Data Integration
Job Scheduling: A Strategic Pathway for Improved 5 Data Warehouse/Business Intelligence Performance
management appl icat ions and archi tectures
are avai lable to fu l f i l l these funct ions; most
Database Management Systems (DBMS),
and even operat ing systems l ike UNIX and
Windows, of fer some sor t of scheduler to
coordinate processing tasks. Yet because
most of these are e i ther at tached to a
s ingle database server , focused on
database maintenance only, or restr ic ted to
sof tware f rom a speci f ic vendor, they of ten
are too myopic for broader, enterpr ise-wide
data management requirements.
Dedicated job schedul ing appl icat ions, on
the other hand, have the necessary power
and capabi l i ty to per form far beyond the
constra ints of OS- or DBMS-based
schedul ing tools. Job schedulers can
t r igger tasks based on events, rather than
s imply date or t ime, and can accommodate
unpredictable or one-t ime occurrences.
They can recover/ restar t automat ical ly in
case of job fa i lure, generate execut ion
reports on scheduled tasks, and provide
audi t t ra i ls for compl iance purposes.
In an era when quick data loads are
essent ia l to BI per formance, yet manual
involvement is of ten necessary, job
schedulers can complete ly automate the
loading and execut ion process. Even more
important ly, job schedulers can leverage
and load-balance large numbers of jobs
across mul t ip le servers and storage
devices, ef fect ive ly increasing the
ef f ic iency of an enterpr ise’s IT
inf rastructure whi le a lso complet ing jobs
faster , a l l a t l i t t le or no addi t ional cost .
When select ing a job scheduler for data
warehouse management, i t ’s important to
seek one that can separate d i f ferent tasks,
e.g. , opt imiz ing data for quick loads, yet
a lso of fer a f ramework for tying together
many k inds of conf igurat ions and def in ing
job st reams as t ight ly as needed. I t should
a lso have the abi l i t y to p ick of f events f rom
di f ferent machines in a mul t i -OS
envi ronment , and run f lat f i le ( two-
dimensional ) as wel l as mul t i -d imensional
database jobs.
Job schedulers can al low data warehouse
administ rators to take advantage of
processing capaci ty beyond the server or
system in quest ion, and maximize use of
avai lable resources. Best-of-breed
schedulers a lso support heterogeneous OS
envi ronments (e.g. , L inux, UNIX, Windows,
z/OS and OpenVMS), thereby removing
roadblocks to important appl icat ions and
data.
Job Scheduling: A Strategic Pathway for Improved 6 Data Warehouse/Business Intelligence Performance
I n c r e a s i n g DBMS E f f i c i e n c y
Event-based schedul ing, of the type found
in Advanced Systems Concepts ’
Act iveBatch® Job Schedul ing and Workload
Automat ion appl icat ion, is perhaps the most
unique and valuable advantage of job
schedul ing sof tware. Event t r iggers can
not only shr ink the tota l daypart (s) devoted
to processing, but a lso increase
responsiveness. Fur thermore, depending
on how widely the scheduler ’s agents are
implemented, the processing envi ronment
can approach “c loud comput ing” status and
even create a greener IT inf rastructure.
Job st reams can be assembled wi th t r iggers
l inked to dozens of events, f rom system
star tup and f i le creat ion / modi f icat ion /
delet ion to runaway processes, job fa i lures
and much more. With the addi t ion of act ive
var iables, i t ’s a lso possib le to in terrogate
data as a means of t r igger ing a task.
Examples of act ive var iables inc lude Date
Expression, F i le Contents, SQL Record Set ,
Web Serv ice Request Resul t , WMI Query,
XML Query, and Fi le System Informat ion.
Event-dr iven schedul ing, because i t
responds to the f lu id and somet imes
unpredictable pace of business, is ideal for
near real - t ime data inser ts for miss ion
cr i t ica l appl icat ions. Because low la tency
envi ronments of ten necessi tate the
combinat ion of more volat i le data wi th
stable master data reposi tor ies, event-
based job schedulers can be used to
establ ish hold ing areas at set t imes of day,
in tegrat ing the two on a scheduled basis.
Other p lusses exist as wel l for the event-
based model . Whi le many IT departments
use thei r DBMS to bui ld processing jobs, i t
can be advantageous to use a job
scheduler to t ie together smal ler packages
of jobs, s ince i t e f fect ively separates the
execut ion logic f rom the data t ransfer logic.
In th is way the job scheduler ’s event-based
scheme can manage even large job
packages, report ing on success/ fa i l and
minimiz ing checkpoint ing. Coding and
maintenance are a lso reduced.
Of course, v i r tual ly a l l job schedulers can
a lso accommodate date/ t ime schedul ing for
rout ine database refreshes. The
combinat ion of the two, as found in
Act iveBatch, creates maximum opportuni ty
to furn ish relevant and act ionable data to
decis ion makers across the enterpr ise.
Job streams can be assembled with triggers linked to dozens of events, such as:
↘
Job Scheduling: A Strategic Pathway for Improved 7 Data Warehouse/Business Intelligence Performance
A c t i v e B a t c h : Un i q u e V a l u e f o r B I
In the modern enterpr ise, data warehouses
used for Business Inte l l igence must col lect
data f rom a bewi lder ing array of sources.
The need for fast loading of product
t ransact ions, for example, is cr i t ica l ; in
some cases, order data must be integrated
into product ion processes even before the
sale is complete. ETL (Extract / Transform
/ Load) tools found in IBM WebSphere
DataStage, Microsof t SQL Server
Integrat ion Serv ices and the open source
Apatar appl icat ion can accompl ish many of
these tasks; however, the r ight job
scheduler can s impl i fy the creat ion of a job
st ream framework that t ight ly coordinates
ETL workf lows, ensur ing that such
processes are handled ef f ic ient ly.
The unique Integrated Jobs L ibrary in
Act iveBatch can be used to quick ly create
ETL workf lows wi thout custom scr ipt ing.
For example, i t can be used to quick ly
create workf lows that reach across mul t ip le
SQL Servers to job chain mul t ip le Data
Transformat ion Serv ices jobs. Al ternate ly
i t can create job p lans that pass
informat ion f rom one database to another
or to var ious appl icat ions. I t can also
integrate management tasks wi th other
scr ipts or appl icat ions.
Act iveBatch, because of i ts easy- to-use
management envi ronment, can add new
data sources automat ical ly, wi th l i t t le or no
human decis ion-making. The el iminat ion of
manual in tervent ion a l lows enterpr ises to
conf igure and integrate data f rom more
sources, more quick ly than ever before. In
addi t ion, Act iveBatch’s abi l i t y to balance
work loads across many servers has a l lowed
users to improve serv ice levels by
complet ing jobs in less t ime, and wi th fewer
errors.
Work load balancing, h igher server
ut i l izat ion and scalabi l i t y is of utmost
importance to most enterpr ises.
Act iveBatch’s st rengths in these areas has
g iven users bet ter use of thei r IT
resources; in fact , Act iveBatch has been
proven to rel iably run over one mi l l ion jobs
per day and connect to 2,000 servers,
ensur ing i ts ef f ic iency and dependabi l i ty in
product ion envi ronments of a l l s izes.
Job S8 Data
Act iveBa
by maxim
comput in
complet io
business
for exam
dist r ibute
on comm
servers s
warehous
↘
cheduling: A Warehouse/B
tch a lso min
miz ing ut i l iza
ng power, re
on t ime and
processes
p le, is par t ic
ed process s
odi ty ( In te l ,
surrounding
se.
C
↘
Strategic PatBusiness Inte
nimizes inve
at ion of ex is
ducing e laps
al lowing mo
to be execut
cular ly wel l -
schedul ing
Windows or
the centra l
ross ‐ Plat
hway for Implligence Perfo
stment r isk
t ing
sed job
ore
ted. ETL,
sui ted to
r L inux)
data
form ETL A
proved ormance
For
mart
nece
depa
man
aler t
acco
Roo
work
busi
thos
thei r
Vi r tu
can
esta
al low
oper
chan
mod
pert
Architectu
large enterp
ts or operat i
essary or de
ar tments to
ipulate obje
ts, or sched
ommodate th
t , a capabi l i
k teams, dep
ness un i ts l
se objects, jo
r job descr ip
ual Root are
be set by us
abl ished uni t
wing those c
rat ional nee
nges wi thout
i fy or even s
ain ing to oth
ure Chart
pr ises wi th d
ional data st
es i rable to a
create work
ects such as
u les. Act ive
his need wi th
ty that g ives
par tments an
og- in protec
obs and p lan
pt ions. Perm
e contro l led
ser , group o
t on a granu
closest to th
d to make n
t being able
see objects
her par ts of
discrete data
tores, i t ’s of
l low mul t ip le
f lows, or to
calendars,
eBatch can a
h i ts Vi r tual
s indiv iduals
nd even
cted access
ns appropr ia
miss ions in t
centra l ly an
r other
lar basis ,
e BI or
ecessary
to access,
or workf low
the enterpr i
a
f ten
e
a lso
s,
to
ate to
the
d
s
se.
Job Scheduling: A Strategic Pathway for Improved 9 Data Warehouse/Business Intelligence Performance
C o n c l u s i o n
As data warehouses become increasingly
miss ion-cr i t ica l in nature, and as BI
extends i ts march across the modern
enterpr ise, database administ rators face
the daunt ing task of making thei r
reposi tor ies increasingly agi le and
responsive. At the same t ime—as any IT
professional knows—global business
imperat ives demand that more be done wi th
less.
Many organizat ions a l ready use dedicated
job schedulers l ike Act iveBatch to manage
thei r data processing work loads. By
leveraging thei r ex ist ing job schedulers for
data warehouse management, IT
organizat ions can improve thei r abi l i ty to
handle more sophist icated database needs
whi le a lso reducing the t ime devoted to
scr iptwr i t ing and rout ine job management.
The inc lus ion of an exist ing job scheduler ,
or insta l la t ion of a new job scheduler , can
opt imize data warehouse performance,
shortening BI “ t ime to ins ight” in an era
when t imely, accurate and user-opt imized
business informat ion has never been
greater . Ul t imately, an organizat ion ’s
abi l i t y to compete is only as great as the
informat ion i t has avai lable to i t—perhaps
the best reason of a l l to put a best-of -breed
job scheduler in to the data warehouse
management mix.
© Copyright Advanced Systems Concepts, Inc.
All rights reserved
“Ultimately, an organization’s ability to compete is only as great as the information it has available to it – perhaps the best reason of all to put a best-of-breed job scheduler into the data warehouse management mix.”
Discuss Your Workload Automation Goals
with an ActiveBatch® Consultant
Learn more about ActiveBatch with a
personalized Live Product Demonstration
hosted online.
↘
↘