madsen wp open source data integration

Upload: rob-latty

Post on 07-Apr-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 Madsen WP Open Source Data Integration

    1/16

  • 8/6/2019 Madsen WP Open Source Data Integration

    2/16

    The Role of Open Source in Data Integrat ion January, 2009

    Page 1 Thi rd Nature

    TABLEOFCONTENTS

    Introduction ........................................................................................................................2 OpenSourceandtheFutureofDataIntegration ........................................................2

    SpendingPrioritiesEmphasizeNeedforDataIntegration .................................... 2TheDriveTowardOpenSourceDataIntegration..................................................3

    UnderstandingDataIntegration ......................................................................................... 4TheDifferenceBetweenApplicationIntegrationandDataIntegration ......................4OperationalDataIntegrationvs.AnalyticDataIntegration.........................................4ThreeApproachesforDataIntegration .......................................................................5

    Consolidation ......................................................................................................... 5Propagation............................................................................................................6 Federation..............................................................................................................7

    CreatingSolutionsforOperationalDataIntegrationProblems................................... 8TheMostCommonPractice:CustomCoding ........................................................8 TheStandardOption:BuyaDataIntegrationProduct.......................................... 8TheThirdAlternative:OpenSource ......................................................................9

    TheBenefitsofOpenSourceforDataIntegration ...........................................................10Flexibility.....................................................................................................................11 VendorIndependence................................................................................................11 OptimalPrice .............................................................................................................. 12

    Recommendations............................................................................................................14

  • 8/6/2019 Madsen WP Open Source Data Integration

    3/16

    The Role of Open Source in Data Integrat ion January, 2009

    www.ThirdNature.net Page 2

    Introduction

    Open Source and the Future of Data Integration

    Dataintegration(DI)hasseenminimalautomationoverthepastdecadedespitemany

    technologyadvances.Mostcompaniesstillhandcodedataintegrationbetween

    applications(operationaldataintegration)usingtechniquesthatwouldbefamiliartoa

    programmerfromthe1980s. Inbusinessintelligence40%oftheExtractTransformand

    Load,orETL,processesarestillhandcoded.

    Inthenextfewyearsitslikelythattherewillbestabletodeclininginvestmentinnew

    applicationsduetoeconomicfactors,butincreasedneedfordataintegration

    technology.IntegrationconsumesasignificantportionoftheITbudgetandiscoming

    underheavierscrutinyasacosttocontrol.

    Newvendorsareaddressingthisgapindataintegration,butfaceadoptionchallengesin

    IT.Dataintegrationisadevelopertaskandaninfrastructureitem,makingnewtools

    hardtojustify.Fortunately,thereareopensourceproductsinthismarketcapableof

    supplyingthemuchneededautomation.

    Opensourcedataintegrationtoolscanprovidethecostadvantagesofhandcodingwith

    theproductivityadvantagesoftraditionaldataintegrationsoftware.Theyare

    establishedinthedevelopertoolsmarketwhichhasbeenthetraditionalstrongholdof

    opensourcesoftware.Expectopensourcetobeakeycomponentofdataintegration

    (andespeciallyofoperationaldataintegration)inthenearfuture,similartothewayitis

    akeycomponentofapplicationdevelopmentenvironmentstoday.

    Spending Priorities Emphasize Need for Data Integration

    BusinessintelligenceappearsconsistentlyasthetopiteminsurveysofbusinessandIT

    managementpriorities.Morebusinessintelligencemeansgreaterneedfordata

    integrationtools.BImarketsurveysshowthatroughly40%ofcompaniesarehand

    codingtheirETLprocesses,leavingroomforgrowth.

    ACIOInsightITspendingsurveyshowsthatstandardizingandconsolidatingITinfrastructureisthenumberonepriorityinthecomingyearforlargefirms,andnumber

    twoinmediumandsmallfirms.Acrossallfirms,improvinginformationqualityshowsup

    asthenumberthreepriority.

    ResultsfromasurveybyOracleondataintegrationshowedthat30%oftheircustomers

    arebuyingtoolsforoperationaldataintegrationtoday.Thetopneedsofthese

    customerswerelowlatencydataaccess,doingmigrations,andcreatingdataservices.

    Thesestatisticsclearlyhighlightthenewfocusondataintegration.Only60%ofthe

    businessintelligencemarketisusingdataintegrationtools,sothereisstillroomfor

    growth.With70%to85%ofcompaniesstillhandcodingoperationaldataintegration,

    itsclearthatthisisanareareadyforautomation,withearlyadoptersalreadyusing

    thesetools.

  • 8/6/2019 Madsen WP Open Source Data Integration

    4/16

    The Role of Open Source in Data Integrat ion January, 2009

    The Drive Toward Open Source Data Integration

    OpensourcehasbecomeastandardpartoftheinfrastructureinITorganizations.Most

    areusingLinux,opensourcedevelopmenttools,andmanyarerunningopensource

    databases.Themajorityofenterprisewebinfrastructureisbuiltusingopensource.

    Thisgrowingfamiliaritywithopensourceledtoincreasedadoptionratesacrossall

    categoriesoftoolsandapplications.Venturecapitalfloodedintoopensourcestartups

    overthepastseveralyearsresultinginanexplosionofenterprisereadytoolsand

    applications.

    Opensourcedataintegrationvendorsarecreatingchallengesforbothtraditional

    vendorsintheDImarketwhoaretryingtointroducenewtools,andfornewnonopen

    sourcevendorsofoperationaldataintegrationtools.Theexistenceofopensourcetools

    inamarketraisesbarrierstoentrythatarehardforvendorstoaddress.Thisisthe

    scenariothatplayedoutinthewebserver,applicationserver,andJavadevelopment

    toolsmarkets.

    Enterprisecustomersaredemandingprojectsizeddataintegrationtoolsthatcanbe

    scaleduptoenterpriseuse.Theydontwantcomplex,expensiveDIproductsthatare

    notafitwiththedistributednatureoftheapplicationenvironment.Withsuchalarge

    marketneed,thefuturedirectionofdataintegrationissuretohavealargeopensource

    component.

    Page 3 Thi rd Nature

  • 8/6/2019 Madsen WP Open Source Data Integration

    5/16

    The Role of Open Source in Data Integrat ion January, 2009

    www.ThirdNature.net Page 4

    Understanding Data Integration

    The Difference Between Application Integration and Data Integration

    Dataintegration(DI)andenterpriseapplicationintegration(EAI)arenotthesamething,

    thoughvendorssometimesobscurethedifferencetobroadentheappealoftheirtools.

    Applicationintegrationfocusesonmanagingtheflowofevents(transactionsor

    messages)betweenapplications.Dataintegrationfocusesonmanagingtheflowofdata

    andprovidingstandardizedwaystoaccesstheinformation.

    Applicationintegrationaddressestransactionprogrammingproblems,allowingoneto

    directlylinkoneapplicationtoanotheratafunctionallevel.Thefunctionsareexposed

    toexternalapplicationsviathetoolsAPI,thushidingallapplicationsbehindacommon

    interface.

    Dataintegrationaddressesadifferentsetofproblems.DIstandardizesthedatarather

    thanthetransactionorservicecall,providingabetterabstractionfordealingwith

    informationthatiscommonacrosssystems.DItoolsabstracttheconnectors,transport

    andmoreimportantlymanipulationnotjustthesystemendpoints.Whendone

    properly,DIensuresthequalityofdataasitisbeingintegratedacrossapplications.

    Thetypeandlevelofabstractionarewhatdifferentiatesthetwoclassesofintegration.

    EAItoolsareatransporttechnologythatrequiresthedevelopertowritecodeatthe

    endpointstoaccessandtransformdata.Thesetoolstreatdataasabyproduct.This

    makesfunctionsreusableattheexpenseofcommondatarepresentations.

    Dataintegrationtoolsuseahigherlevelofabstraction,hidingthephysicaldata

    representationandmanipulationaswellastheaccessandtransport.Thetoolsprovide

    dataportabilityandreusabilitybyfocusingondataandignoringtransactionsemantics.

    Becausetheyareworkingatthedatalayerthereisnoneedtowritecodeatindividual

    endpoints,andalldatatransformationandvalidationisdonewithinthetool.

    ThekeypointindifferentiatingDIandEAIistoknowthattherearetwodistincttypesof

    integrationwithseparateapproaches,methodsandtools.Eachhasitsrole,onefor

    managingtransactionsandoneformanagingthedatatransactionsoperateon.

    Operational Data Integration vs. Analytic Data Integration

    Therearetwodifferentwaysofusingdataintegrationtoolsbasedonthetypeof

    systemsbeingintegrated:transactionalapplicationsorbusinessintelligencesystems.

    Theseusesaffecttheapproach,methodsandtoolsthatarebestforthejob.Extract,

    transformandloadorETListhetermusedinanalyticsystems.Theindustryissettling

    onthetermoperationaldataintegrationorOpDIwhenreferringtodataintegrationforapplications.

    Businessintelligencehasbeentheprimarydriverofdataintegrationproductsforthe

    pastdecade.BIsystemsaremostoftenloadedinbatchcyclesaccordingtoafixed

    schedule,bringingdatafrommanysystemstoonecentralrepository.Theyhave

    relativelylargevolumesofdatatoprocessinashorttime,buthavelittleconcurrent

    loadingactivity.Mostproductswereoriginallydesignedtomeetthespecificneedsof

    theanalyticdataintegrationmarket.

  • 8/6/2019 Madsen WP Open Source Data Integration

    6/16

    The Role of Open Source in Data Integrat ion January, 2009

    Thenatureofoperationaldataintegrationproblemsisdifferent.Dataintegrationisa

    smallelementofanapplicationprojectunlikeadatawarehousewhereDImayconsume

    80%oftheprojectbudgetandtimeline.

    Mostapplicationintegrationprojectsneeddatafromoneortwoothersystems,notthe

    manysourcesandtablesfeedingadatawarehouse.Thescopeisusuallysmaller,with

    lower

    data

    volumes

    and

    narrower

    sets

    of

    data

    being

    transferred

    with

    minimal

    transformation.

    AkeychallengeforOpDIisthatthedataisusuallyneededmorefrequentlythanone

    batchpernight,unlikemostanalyticenvironments.TraditionalETLproductsforthe

    datawarehousemarketdonthandlelowlatencyrequirementsaswellasother

    integrationtools.ThismakesETLapoorerfitforsometypesofoperationaldata

    integration.

    Thedifferencesinfrequencyofexecution,datavolume,latencyandscopearetechnical

    elementsthatdifferentiateoperationalandanalyticdataintegration.Theother

    characteristicthatseparatesthemisusagescenarios.Howpeopleintegratedatain

    operational

    environments

    is

    different.

    Three Approaches for Data Integration

    Thedataintegrationscenarioscommonlyencounteredinprojectscanbemappedto

    oneofthreeunderlyingapproaches:consolidation,propagationorfederation.

    Consolidationimpliesmovingdatatoasinglecentralrepositorywhereitcanbe

    accessed.Withpropagationthedataiscopiedfromthesourcestotheapplicationslocal

    datastore.Federationleavesdatainplacewhilecentralizingtheaccessmechanismsso

    thedataappearstoconsumingapplicationsasifitwereconsolidated.

    Consolidation Propagation FederationConsolidation

    Theconceptofconsolidationistomovethedatawholesalefromoneormoresystems

    toanother.Allintegrationandtransformationisdonebeforeitisloadedinthetarget

    system.

    This

    is

    most

    often

    seen

    in

    business

    intelligence,

    where

    ETL

    is

    used

    to

    centralize

    datafrommanysystemsintoasingledatawarehouseoroperationaldatastore.Outside

    ofanalyticenvironments,asinglecentrallyaccessedrepositoryismostlikelytobe

    foundinmasterdatamanagementandCRMprojects.

    Page 5 Thi rd Nature

  • 8/6/2019 Madsen WP Open Source Data Integration

    7/16

    The Role of Open Source in Data Integrat ion January, 2009

    Intheworldofoperationaldataintegrationthereareseveralother

    scenariosthatfitwithinaconsolidationapproach.Systemmigrations,

    upgradesandconsolidationsallrequirelargescalemovementofdata

    fromonesystemtoanother.

    Consolidation Consideramergeroracquisitionwherethereareredundantsystemsbetweenthetwocompanies.Ifthecompaniesarerunningmultiple

    instancesofthesamesoftwaretheycanreducethecostofsoftwaremaintenanceand

    operationsbyconsolidatingtheseintooneinstance.

    MergingthedatafromseveralinstancesofanERPsystemisnotatrivialtask.Therecan

    bethousandsoftablestocopyandmerge,andthatsthesimplepart.Dataquality

    issuesareusuallydiscoveredintheprocess.Thesolutionmayrequirededuplicating

    customerrecords,mergingvendors,orreassigningandcrossreferencingproduct

    numbers.

    The advantage of not

    physically copying datameans that there are nodatabases to create ortables to manage,speeding development.

    Thesametasksandproblemsoccurinasinglecompanywhenmigratingfromone

    vendorsapplicationtoanother,forexamplewhenmovingfromaninternalCRMsystem

    toahostedapplication.Evenpackagedapplicationupgradescaninvolvealevelofdata

    migration.Deployingnewapplicationsalmostalwaysinvolvesimportingdataand

    settingupdatafeedstoandfromothersystems.

    Propagation

    Unliketheonetimejobofanupgrade,migration,orconsolidation,propagationisan

    ongoingactivity.Propagationisthemostpopularapproachusedforrepetitivedata

    integrationbecauseitsthesimplesttoimplement.Whenanapplicationneedsdata

    fromanothersystem,anautomatedprogramordatabasetoolisusedtocopythedata.

    Datatransformation,ifany,isdoneaspartoftheprocessbeforeloadingthedatainto

    thetarget.

    Dependingonthetools,propagationcanbescheduledasabatch

    activityortriggeredbyevents.Mostofthetimeitisdoneasapush

    modelfromthesourcetothetarget,butitcanalsobe

    implementedasapullmodeldrivenbytheapplication.

    Thedatamovementmaybeonewayorbidirectional.Oneway

    datamovementiscommoninscenarioswhereanapplicationneeds

    periodicdatafeedsorrefreshesofreferencedata.Forexample,aproductpricing

    systemneedstosendpriceupdatestoawebsite,anorderentrysystemandacustomer

    servicesystem.

    Propagation

    Synchronizingdatabetweensystemsismorechallengingbecauseitisbidirectionaland

    caninvolvemorethantwosystems.Asthenumberofsystemsgoesup,thenumberof

    possibleconnectionsexplodes.Customerdataisacommoncasewheresynchronization

    isused.

    Manyapplicationscantouchcustomerdata,forexampleorderentry,accountspayable,

    CRMandSFAsystems.Somechanges,likecreditstatus,customercontactsorrefunds

    shouldberepresentedacrossallthesystemswhentheyoccur.Becausethesystemsare

    www.ThirdNature.net Page 6

  • 8/6/2019 Madsen WP Open Source Data Integration

    8/16

    The Role of Open Source in Data Integrat ion January, 2009

    independent, itisntpossibletocentralizethedata.Instead,thedataneedstobe

    synchronizedsochangesinonelocationarereflectedinotherlocations.

    Propagationoftenleadstotheneedforsynchronizationbecausedataisbeingcopied

    andlaterchangedindownstreamsystems.Datamultipliesanddiscrepanciesappear

    leadingtodisagreementsaboutwhichinformationiscorrect.

    Dealingwiththeseproblemsatenterprisescalecanbeoverwhelmingbecauseofthe

    tangleofhandcodedintegrationthatevolvedovertheyearswiththeapplications.

    Propagationisaneasyandexpedientsolutionwithouttools,butcreatesdata

    managementproblems.

    Dataintegrationtoolscanhelpsolvetheseproblems.Thecommontoolsetand

    informationcollectedinthetoolmetadatamakeiteasiertounderstandandmanage

    theflowofdata.Thisinturnsimplifiesmaintenancetasksandspeedsbothchangesand

    newprojectsthatrequireaccesstoexistingdata.

    Federation

    Federationisamethodforcentralizingdatawithoutphysicallyconsolidatingitfirst.Thiscanbethoughtofascentrallymediatedaccessorondemanddataintegration.Thedata

    accessandintegrationaredefinedaspartofamodel,andthatmodelisinvokedwhen

    anapplicationrequeststhedata.

    Federateddataappearstoanapplicationasifitwerephysically

    integratedinoneplaceasatable,fileorwebservicecall.Inthe

    backgroundaprocessaccessesthesourcedataintheremote

    systems,appliesanyrequiredtransformationsandpresentsthe

    results,muchlikeaSQLquerybutwithouttherestrictionthatallof

    thedataoriginateinarelationaldatabase.

    Becausefederationisaviewimposedontopofexternalsources,itsgenerallyaonewayflowofinformation.Itcantbeusedtosynchronizeormigrate

    databetweentwosystems.Thismakesfederationappropriateforadifferentclassof

    problemssuchasmakingdatafrommultiplesystemsappearasifitcamefromasingle

    source,orprovidingaccesstodatathatshouldntbecopiedforsecurityorprivacy

    reasons.

    Federation

    Federationisausefulapproachinscenarioswhereitwouldbetoocostlytocreateand

    manageadatabasefortheintegrateddata.Forexample,inacustomerselfservice

    portaltheremightbeadozenpossiblesourcesofdatathecustomercouldaccess.

    Pullingtherequireddatafrommanysystemsintoasingledatabaseispossibleinthis

    scenario.Thechallengeisprovidingrealtimedeliveryofthisinformation.Achangeinanyofadozensystemsmustbeimmediatelyreplicatedtothisdatabaseachallenging

    andexpensivetask.Byfederatingaccessorconstructingadataservicelayer,the

    applicationdeveloperscanbuildtheportalagainstaunifiedmodelwithouttheneedto

    copydata.Thedataisaccesseddirectlyfromthesourcesothereisnoproblemwith

    deliveringoutofdateorincorrectinformation.

    Page 7 Thi rd Nature

  • 8/6/2019 Madsen WP Open Source Data Integration

    9/16

    The Role of Open Source in Data Integrat ion January, 2009

    Creating Solutions for Operational Data Integration Problems

    Regardlessofthedataintegrationmodel,thefinaldecisionisusuallygovernedbythe

    projectbudget,timelineandwhatthedevelopersarefamiliarwith.ITfocusonbudgetis

    atanalltimehighmakingithardtojustifytheinvestmentneededfordataintegration

    tools.Thisisanareawhereopensourcecanhelp.

    The Most Common Practice: Custom Coding

    Industrysurveysshowthatoperationaldataintegrationisbuiltbyhandformorethan

    threequartersoftheapplicationprojectsinproductiontoday.

    Products get better overtime. Hand-writt en codegets worse.

    HandcodingiscommonbecauseDIisnotthoughtofintermsofinfrastructureanddata

    management,butintermsofglueforapplications.Whilecopyingdatafromoneplace

    toanotherisntoptimal,itsstillworkableinthecontextofasingleapplication.The

    priceispaidintheoverallcomplexityofintegrationspreadthroughouttheenterprise.

    Handcodedintegrationisgoingtochangedueinlargeparttothenewemphasison

    externalintegrationforexamplewithautomatedbusinessprocessesinvolvingoutside

    companies

    or

    with

    the

    increasing

    use

    of

    SaaS

    applications.

    Databaseadministratorshavenoeasywaytomovedataoutsidethecompany.The

    standardDBAtoolsdonotallowthemtosendandreceivedatatothewebservice

    interfacesusedbymostSaaSapplications,nordoDBAshavetheexpertisetoprogram

    totheseinterfaces.

    Applicationdevelopershavetheskillstosendandreceiveremotedataandtoprogram

    towebservices.Theproblemisthatoperationaldataintegrationismorethanthecore

    tasksofextractingandmovingdata.Reliableproductionsupportmeanscreating

    componentstodealgracefullywithexceptions,handleerrors,andtieintoscheduling,

    monitoringandnotificationsystems.Theadditionalworkisenoughtoconstituteitsown

    project.

    Migrations,upgradesandconsolidationsareaslightlydifferentproblem.Thecomplexity

    andscaleofmappinghundredstothousandsoftablesmakesthelaborofhandcodinga

    poorchoice.Beyondtheamountofwork,problemsarehardtodebugandthereisno

    traceabilityforthedata.Thelackofeasytraceabilitycancreatecomplianceandaudit

    headachesafterthenewsystemisinproduction.

    Handcodingforoperationaldataintegrationisadeadendinvestment.Productsusually

    improveovertime.Extendingthiscodeorfixingminorproblemsisalowpriorityrelative

    tootherITneeds.Sincethecodeiswrittenforaspecificprojectitcanrarelybereused

    onotherprojectsthewayatoolcanbereused.

    The Standard Option: Buy a Data Integration Product

    Integration code issingle-purpose, toolsare multi-purpose. Youshould always go withtools when you canafford them.

    Companiesarerecognizingtheproblemsassociatedwithhandcodedintegrationand

    arestartingtoevaluateandusedataintegrationproducts.Codingrequiresproficiency

    withtheoperatingsystem,dataformatsandlanguageforeveryplatformbeing

    accessed.DItoolsimproveproductivitybyabstractingworkawayfromtheunderlying

    platforms.Thisallowsthedevelopertofocusonthelogicratherthanunimportant

    platformdetails.

    www.ThirdNature.net Page 8

  • 8/6/2019 Madsen WP Open Source Data Integration

    10/16

    The Role of Open Source in Data Integrat ion January, 2009

    ThereareanumberofdifferenttoolsavailablethatcanworkforoperationalDI

    problems.CompanieswithadatawarehouseareextendingtheiruseofETLtoolsinto

    thisspace.TasksinvolvingconsolidationareparticularlywellsuitedtoETLtoolsbecause

    theproblemdomainmatchestheircapabilitiesforlargebatchmovementofdata.The

    useisonetimesothereislittledangerofneedingtopayformorelicenses.

    The

    large

    ETL

    vendors

    are

    shifting

    their

    product

    strategies

    to

    address

    operational

    DI

    needsandnowcallthemselvesdataintegrationvendors.Theirinitialfocushasbeen

    migrationsandconsolidations,althoughallhavebeenreworkingthetoolstofunction

    betterinpropagationandsynchronizationscenarioswherelowlatencydataaccessis

    moreimportant.

    ETLtoolsarestillapoorfitforpropagationandsynchronizationbecauseoftheir

    inabilitytoaddresshighconcurrency,lowlatencyneeds.Otherproblemswithmanyof

    theproductsaretheircomplexity,deploymentarchitecture,andcost.

    Mostaredesignedascentralizedservers.Thisforcesallintegrationjobsontoasingle

    serverorclusterwhichmustthenbesharedwithotherusers.Itispossibletorun

    smaller

    independent

    servers

    for

    different

    applications,

    but

    the

    cost

    of

    doing

    this

    is

    prohibitivebecauseoftheserverbasedlicensingmodel.

    Companiesneedtoolsthatcanbedeployedinadistributedmanneratthepointofuse,

    andthatcanbegiventoanyapplicationdeveloperwhoneedsthem.Enterpriseserver

    licensingforETL,DIandSOAtoolsoftenpreventthis.

    The Third Alternative: Open Source

    Opensourceoffersathirdalternativetothetraditionalbuyversusbuilddecision.When

    lookingfortoolsdirectedatdevelopers,thefirststepshouldalwaysbetolookforopen

    sourcesoftware.Assumingthereisanacceptablesolution,itsclearthatyouwillsave

    timeandmoneyovercustomdevelopment.

    Open source avoids thepitfalls of coding andgains the advantages ofusing tools.

    Giventhethreequartersofcompanieshandcodingintegration,itstimetorevisitthe

    buyversusbuilddecision.Opensourcedataintegrationtoolscanaddressthe

    shortcomingsofhandcoding.Asfullfeaturedtools,theyoffertheerrorhandling,

    operationalsupportandavailabilityfeaturesthatmustbebuiltinmanualcoding

    environments.

    Anadvantagenotoftendiscussedistheproductivitythesetoolsbringtoapplication

    developers.Asidefromthestandardintegrationtasks,theyofferasignificant

    improvementwhendealingwithheterogeneoussystemsanddatabases.DItoolsexpand

    theabilitytododataintegrationtoadeveloperaudiencewhowouldotherwiselackthe

    necessaryplatformskills.

    OpensourcealsohasadvantagesoverthecurrentcropofDItoolsonthemarketwhen

    itcomestooperationaldataintegration.TheabilitytoexpandtraditionalETLtoolsfor

    useinoperationalDIislimitedbecauseofthemismatchtheircentralizedarchitecture

    andcostlylicensingmodelshavewithdistributedOpDIneeds.

    ThebudgetforapplicationprojectscantabsorbthehighcostofenterpriseDIsoftware

    whichmakesithardtojustifythepurchaseofatool.Spendingoninfrastructuregoes

    againstprojectbasedbudgetingmodelsandtheROIishardtomeasure.

    Page 9 Thi rd Nature

  • 8/6/2019 Madsen WP Open Source Data Integration

    11/16

    The Role of Open Source in Data Integrat ion January, 2009

    www.ThirdNature.net Page 10

    Dataintegrationis,andwillcontinuetobe,viewedasapplicationgluesoorganizations

    needanalternative.IfITcantaffordtofundanenterpriseDItoolasaninfrastructure

    itemthenthealternativeiseitherhandcodingoropensource.

    Opensourcedataintegrationtoolsprovidethecostadvantagesofhandcodingwiththe

    productivityadvantagesoftraditionaldataintegrationsoftware.Thisistherealreward

    for

    using

    open

    source

    development

    tools.

    The Benefits of Open Source for Data IntegrationPeopleoftenmisunderstandormisrepresentthebenefitsopensourceprovides.The

    samecouldbesaidofpackagedsoftwareingeneral.

    Thesadtruthofmostsoftwareisthatitisnondifferentiating.Itdoesnotconferany

    competitivebenefittothecompanybecauseacompetitorcanacquireidentical

    software.Likewise,dataintegrationtoolsarethemselvesnotadifferentiator.The

    differenceisthatthesetoolsallowdevelopersthefreedomtodobettercustomized

    integration.Theyareanenablingtechnologythatallowsacompanytodifferentiatehow

    itconfiguressystemsandtheflowofinformation.Thisisthepointwheredifferentiation

    occurs.

    Forthisreason,mostdevelopmenttoolshavebeentakenoverbyopensource.No

    companyoutcompetesanotherbydevelopingtheirowndataintegrationsoftwareany

    moretheywouldfrombuildingtheirowngeneralledger.Withdevelopmenttools,

    everyonewinsbypoolingtheircollectiveresources.Theparttheykeeptothemselvesis

    whatgetsdonewiththosetoolsbecausethatswherethevalueis.

    Dataintegrationsoftwareissquarelyinthecrosshairsofopensourcevendorsand

    venturecapitalistsbecauseitfitsthesameprofileascompilers,languages,andother

    developmenttools.Inallofthesecasestheshareddevelopmentanddistributionmodel

    removedcost,improvedtoolquality,andbenefitedeveryone.

    Therearetwoopensourcemodelsforsharingdevelopmentanddistributioncosts.One

    isprojectbasedorcommunitybasedopensource.Theotheriscommercialopensource

    software,orCOSSforshort.

    Mostpeoplearefamiliarwithprojectbasedopensource.Thismodeltypicallyinvolves

    somesortofnonprofitfoundationorcorporationtoownthecopyright,andpeople

    contributetheireffortstodevelopmentandmaintenance.Theymayevenbefulltime

    employees,buttheprojectdoesnotoperateinthesamewayatraditionalsoftware

    companydoes.

    Commercialopensourceevolvedwithrecognitionthatcompaniesarewillingtopayfor

    support,service,andotherlesstangibleitemslikeindemnificationorcertifying

    interoperability.Acommercialopensourcevendoroperatesjustlikeatraditional

    softwarevendor,exceptthatthesourcecodeisnotshroudedinsecrecy.Thisenables

    moreanddeeperinteractionbetweenthecommunityofcustomersanddevelopers,

    makingtheopensourcemodelmoreuserfocusedthanthetraditionalmodel.

    Incontrasttothemajorityofprojects,commercialopensourcevendorsemploymostof

    thecoredevelopersfortheirprojectandexpecttomakeaprofitwhiledoingso.They

    providethesameservicesandsupportthattraditionalvendorsdo,andfrequentlywith

  • 8/6/2019 Madsen WP Open Source Data Integration

    12/16

    The Role of Open Source in Data Integrat ion January, 2009

    moreflexibilityandlowercost.SomeCOSSvendorsborrowelementsoftheproprietary

    vendors,likebuildingnonopensourceaddoncomponentsorfeaturesthatcanbe

    purchasedinplaceof,orinadditionto,thefreeopensourceversionofthesoftware.

    ThedifferencebetweenCOSSvendorsandtraditionalvendorsisasmuchabout

    businesspracticesasitisaboutthecode.Proprietaryvendorscantopenthedoorsand

    invite

    bug

    fixes,

    design

    suggestions

    or

    feature

    additions,

    nor

    should

    they.

    The

    key

    force

    drivingmanyopensourceprojectsisnotinnovativeintellectualproperty,butthe

    commoditynatureofdevelopmentsoftware.

    Studiesonopensourceadoptionhavenotedthatotherbenefitscanoutweighthecost

    advantagesofopensource.Notallprojectsarejustifiedbasedonfinancialbenefit.

    Whileadvantagesmaybetranslatedintofinancialterms,valuecancomefromsolvinga

    particularproblemsooner,enablingworkthatwaspreviouslynotpossible,orproviding

    efficiencythatallowspeopletobedeployedtoothertasks.

    Accordingtoseveralmarketsurveysoverthepastfewyearsofcompaniesadopting

    opensource,thefollowingthreebenefitsrisetothetopofthelist.

    Flexibility

    Thechallengewithflexibilityisdefiningit.Respondentsusethistermtomeananumber

    ofdifferentelementsofflexibility.Thesethreearethemostfrequentlymentioned:

    Evaluation.Organizationscantryopensourcetoolsattheirownpaceaccordingtotheir

    owntimeline.Somecompaniesevaluatealltoolsinaproofofconceptandallotthe

    sameamountoftimetoeach.Otherstryopensourcefirstandrunextendedtrialswhich

    evolveintoprototypesorproductionuse.Unliketraditionalsoftware,therearenonon

    disclosureagreementsortriallicensesthatlimitthedurationorextentofuse,noris

    thereapresalesconsultantbreathingdowntheneckoftheevaluator.

    Deployment.Asnotedabove,asuccessfultrialinstallationcanbeeasilyputinto

    production.Therearefew,ifany,limitationsregardingdeployment.Forexample,one

    firmmaychoosetocentralizedataintegrationwhileanotherchoosestodistributeit

    closertoapplications.Scalingupbyaddingserversisnotusuallylimitedwithopen

    sourcethewayitiswithtraditionalsoftwaremodelswheremorelicensesmustbe

    purchased.Theunbundlingoflicense,support,andservicemeandecisionsaboutthese

    itemscanbemadelater.

    Adaptability.Opensourcetoolsmaybeusedinunrestrictedways,forpurposesthatthe

    projectmightneverhaveintended.BuyingasixfigureETLtoolforasmallintegration

    problemoraonetimemigrationisoverkill,butanopensourceETLtoolcanbeeasily

    adaptedforuse.Anothersideofadaptabilityiscustomization.Mostcompanieswill

    rarely,ifever,lookataprojectssourcecode.Itsstillnicetoknowthatthesoftwarecan

    betailoredtofitasituationiftheneedarisesforexamplewiththeadditionof

    customizedconnectors.

    Vendor Independence

    Abenefitofopensourcementionedbymanycustomersisvendorindependence.There

    aretwoaspectstovendordependence.Oneisbeingbeholdentoagivenvendorforthe

    useandsupportofthesoftware.Theotheristheproblemoftechnologylockin.

    Page 11 Thi rd Nature

  • 8/6/2019 Madsen WP Open Source Data Integration

    13/16

    The Role of Open Source in Data Integrat ion January, 2009

    Theopensourcelicenseisthekeydifferenceforopensourcesoftware.Evenifa

    customercontractswithaCOSSvendorinordertogetsupportorotherservices,there

    isnorequirementtocontinuewiththatvendor.Thisopensupthepossibilityofusing

    thirdpartiesforthesameservices,orforegoingthoseservicesbutcontinuingtousethe

    software.

    The

    problem

    of

    technology

    lock

    in

    is

    much

    less

    likely

    to

    happen

    with

    open

    source

    software.Opensourceprojectstendtoadheretoopenstandards.Thereismore

    motivationtouseexistingopenstandardsandreuseotheropensourcecodethantotry

    tocreatenewstandards.Thefactthatthecodeisvisibletoeveryoneisanadditional

    incentivetothedeveloperstowritebettercode.Studieshaveshownthatmanyopen

    sourceprojectshavelowerdefectratesthancomparableproprietaryofferings.

    Proprietaryvendorssometimesavoidopenstandardsbecauseproprietarystandards

    ensurecontrolovertheirworkingenvironmentandthecustomerbase.Someproducts

    arecloselytiedtovendortechnologystacks,anobviousexamplebeingdatabase

    suppliedETLtools.Opensourcetoolsaremuchlesslikelytobetiedtoaspecific

    platformortechnologystack,partlybecauseofhowthesoftwareisdevelopedand

    partlyduetothediversityofthedeveloperandusercommunitieswhoquicklyport

    usefulcodetotheirplatformofchoice.

    Optimal Price

    Itsimportanttodistinguishbetweencostsavingsandpayingtherightprice.Whilethe

    opensourceproductionanddistributionmodelhascostadvantagesthattranslate

    directlyintolowerlicenseprice,thisdoesnotguaranteethatacompanywillsave

    moneybyusingopensource.Its hard to justify eventhe lowest cost tools fora system migrationbecause they becomeshelfware at the end of

    the project.

    Moreimportantthantryingtoevaluatecostsavingsislookingatpayingtherightprice

    attherighttime.Opensourcegivesacompanytheoptiontopaynothing,pay

    incrementally,orpayupfront.Thechoicedependsonfactorslikebudgetforinitialprojectstartup,howimportantsupportisduringdevelopmentandanticipatedgrowth

    onceinproduction.

    Thegreatestsavingsopportunitieswillcomefromnewprojectswherethehighcostof

    dataintegrationtoolsfavorsopensource.Startupcostsforaprojectusingproprietary

    DItoolscanbeexceptionallyhighandyoucantdeferpurchaseorsupportcostswith

    traditionalsoftware.

    Thenextbiggestsavingscomeswhenscalingforgrowth.Asthenumberofservers,data

    source&targets,orCPUsgrows,thelicensecostofproprietarytoolskeepspace.Scaling

    upcanquicklybecomecostprohibitive.

    Opensourceisdeliveredinwaysthatallowforlowcostorevenzerocostscaling.Some

    COSSvendorschargeforsupportbasedonfixedattributesorsimplesubscription

    pricing.Otherschargeperdeveloperratherthanonaperserverbasis.

    Foroperationaldataintegration,thistranslatesintoasignificantadvantageforopen

    source.MostoperationalDIsoftwareisdistributedacrosstheenterprise,notinafew

    centralizedservers.Thisposesseriouscostobstaclesforproprietaryvendors.

    www.ThirdNature.net Page 12

  • 8/6/2019 Madsen WP Open Source Data Integration

    14/16

    The Role of Open Source in Data Integrat ion January, 2009

    Thecostbenefitsofopensourcetoolscanbeevenhigherfordataconsolidationtasks

    wherethechallengeistojustifythepurchaseofatoolthatwillbeusedonce.

    Traditionalenterprisedataintegrationtoolsarenotpricedforonetimeuse,putting

    themoutofreachformostprojects.

    Itshardforamanagertojustifyeventhelowestcosttoolsbecauseattheendofthe

    project

    they

    become

    shelfware.

    For

    cases

    like

    this

    when

    expensive

    mainstream

    IT

    softwareisoutofreach,opensourcecansavetheday.

    Page 13 Thi rd Nature

  • 8/6/2019 Madsen WP Open Source Data Integration

    15/16

    The Role of Open Source in Data Integrat ion January, 2009

    RecommendationsThewayorganizationsplanandbudgetfordataintegrationisnotgoingtochangeany

    timesoon.Mostoperationaldataintegrationwillcontinuetobepaidforaspartof

    individualprojects,continuingthelargelyadhocDIinfrastructure.Thismeansthesingle

    highcostenterpriselicensingmodelfornewoperationalDItoolsisntlikelytofitmost

    ITorganizations.

    Operational DI is not thesame as ETL or analytic

    DI. Keep this in mindwhen evaluating tools.

    ITmanagersanddevelopersneedawaytomaketheintegrationjobeasier,repeatable

    andmoreproductive.Opensourceisonewaytoaccomplishthesegoals.People

    responsibleforselectingandmaintainingtoolsfordataintegrationcanbenefitfromthe

    followingguidelines.

    Differentiatebetweenanalyticdataintegrationandoperationaldataintegration.Businessintelligenceenvironmentshavespecificneedslikelargebatchvolumes,

    manytooneconsolidationandspecializedtableconstructs.Whileapplicableto

    consolidationprojects,ETLtoolsdesignedforthedatawarehousemarketwont

    provideacompletesetoffeaturesforoperationaldataintegration.

    Discouragehandcodeddataintegration.Therearemanydifferenttoolswhichcanbeusedtosolvedataintegrationproblems,andnewertoolsspecificallydesignedfor

    operationaldataintegration.Encouragedevelopersonapplicationdevelopmentand

    packageimplementationprojectstolookatthesetools.Thebenefitsovermanual

    codingareobvious.

    Usetherightdataintegrationmodelfortheproblem.Determinewhethertheintegrationproblemrequiresconsolidation,federationorpropagation.Eachofthese

    isdifferentinbothapproachandrequiredtoolsorfeatures.Selectthetechnology

    thatbestfitswiththeapproachtoavoidmismatchesthatwillleadtoproblemsduring

    implementation.

    Makeopensourcethedefaultoptionfordataintegrationtools.Wheninanenvironmentwithfewornotools,opensourceshouldbethefirstalternative.Itisthe

    simplest,fastestandlikelytheleastexpensiveroutetosolvetheproblem.Itsthe

    logicalnextstepaftermanualcoding.Looktoproprietarytoolsonlywhenopen

    sourcetoolscantdothejob,orwhenyouhavetheminhousealreadyandthe

    licensingissuesarenotanobstruction.

    Augmentexistingdataintegrationinfrastructurewithopensource.Therewillbemanycaseswhereitisnoteffectivetoextendcurrentdataintegrationtoolstoanew

    project.Thismaybeduetolackofspecificfeatures,poorfitwiththeapplication

    architecture,orextendedcostduetolicensingortheneedforadditionalcomponents.

    Manyproprietarydataintegrationtoolswillchargeextraforoptionslikeapplicationconnectors,dataprofilingordatacleansing.Inthesecases,opensourcecanbeused

    toaugmenttheexistinginfrastructure.

    www.ThirdNature.net Page 14

  • 8/6/2019 Madsen WP Open Source Data Integration

    16/16

    The Role of Open Source in Data Integrat ion January, 2009

    About the Author

    MARKMADSENispresidentofThirdNature,aconsultingandtechnologyresearchfirm

    focusedoninformationmanagement.Markisanawardwinningarchitectandformer

    CTOwhoseworkhasbeenfeaturedinnumerousindustrypublications.Heisan

    internationalspeaker,acontributingeditoratIntelligentEnterprise,andmanagesthe

    opensourcechannelattheBusinessIntelligenceNetwork.Formoreinformationorto

    contactMark,visit http://ThirdNature.net.

    About the Sponsor

    Talendistherecognizedmarketleaderinopensourcedataintegration.Hundredsof

    payingcustomersaroundtheglobeusetheTalendIntegrationSuiteofproductsand

    services

    to

    optimize

    the

    costs

    of

    data

    integration,

    ETL

    and

    data

    quality.

    With

    over

    3.3

    millionlifetimedownloadsand700,000coreproductdownloads,Talendssolutionsare

    themostwidelyusedanddeployeddataintegrationsolutionsintheworld.The

    companyhasmajorofficesinNorthAmerica,EuropeandAsia,andaglobalnetworkof

    technicalandservicespartners.FormoreinformationandtodownloadTalend's

    products,pleasevisithttp://www.talend.com.

    About Third Nature

    ThirdNatureisaresearchandconsultingfirmfocusedonnewpracticesandemerging

    technologyforbusinessintelligence,dataintegrationandinformationmanagement.

    Ourgoalistohelpcompanieslearnhowtotakeadvantageofnewinformationdriven

    managementpracticesandapplications.Weofferconsulting,educationandresearch

    servicestosupportbusinessandITorganizationsaswellastechnologyvendors.

    Page 15 Thi rd Nature

    http://thirdnature.net/http://www.talend.com/http://www.talend.com/http://thirdnature.net/