big data, big opportunity: a primer for understanding the big data frontier
TRANSCRIPT
![Page 1: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/1.jpg)
BigData,BigOpportunityAPrimerforUnderstandingTheBigDataFrontier
SanjaiMarimadaiah
Mainframe
CATechnologiesProductManagement,OfficeoftheCTO,BigDataManagementMFX01E
@SanjaiM1#CAWorld
MichaelHarer @MikeHarer Hiren Mandalia @hiren0210
![Page 2: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/2.jpg)
2 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
Abstract
BigDataenvironmentsnowarebusiness-criticalforanyorganization.LearnthebasicsofBigDataandsomeoftheemergingtechnologiestargetingtheBigDataspace
SanjaiMarimadaiah
MichaelHarer
Hiren MandaliaCATechnologiesProductManagementOfficeoftheCTOBigDataManagement
![Page 3: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/3.jpg)
3 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
Agenda
WHATISBIGDATA?
BIGDATAUSECASES
HADOOPBASICS
1
2
3
NOSQL BASICS4
CASSANDRABASICS5
MONGODB BASICS6
![Page 4: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/4.jpg)
4 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
HowdoIdeliveraflawlessexperienceeverytimeanapplicationtouchesthemainframe?
Intheapplicationeconomyit’sallaboutyourcustomers.Youneedtothinkaboutyourmainframereframed.
Connectmobile-to-mainframeapplications
Createmainframeinfrastructureflexibility
forthefuture
Unleashthepowerofdataonthemainframe
4 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
![Page 5: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/5.jpg)
5 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
WhatisBigData?
Datasetswhosevolume,velocity,varietyandcomplexityexceedabilityofcommonlyusedsoftwaretoolstocapture,process,store,manage,andanalyzethem.
Information Sources
MobileTransactionalData
SearchTextsCRM,SCM,ERP
$ € ¥
ImagesEmail SocialMedia
ITOps AudioVideo
Velocity Volume
Variety Complexity
BigData
![Page 6: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/6.jpg)
6 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
EvolutionofDataManagementSolutionsRelationalDatabasesarenotsuitedforBigData
HierarchicalDataModels
RelationalDataModels
1960 1970 1980 1990 2000 2010
DocumentDataModels
Structured DataUnstructured Data
IBMIMS
SybaseInformixOracleIBM
GoogleHadoop
![Page 7: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/7.jpg)
7 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
StateofDatabaseWorkloadsBigDataworkloadsenablebroaderOLAPworkloads
Database- RDBMSOnline TransactionProcessing
DataWarehouseOnlineAnalyticalProcessing
BigDataBigDataWorkloads
BetterAnalyticsforhighervaluetransactions
Collecthistoricaltransactionaldataforanalytics
Addingmorecompletedataenhances analytics
Enhancedinsightsfromoperationalworkloads&
informationaccessapplications
Multimedia
WebLogs
SocialData
Sensordata:images
RFID
TextData:emails
![Page 8: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/8.jpg)
8 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
WhatisdrivingBigDataSolutionsCostefficiencyandStandardizedPlatformisfosteringinnovation
Scale-OutArchitecture Open-SourceSoftware
• Protects Investment : Just add more servers to expand capacity
• Lower cost of Infrastructure: Less expensive commodity servers (x86 based)
• Standardization leads to Innovation: A common programing interface is enabling innovation up the SW stack
• Lower software cost: Open source software is lowering software cost
100’s of inexpensive servers
HadoopCassandra
![Page 9: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/9.jpg)
9 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
AdoptionofBigDataSolutions
2X INCREASEinnumberoforganizationsthathavedeployed/implementeddatadrivenprojectssince2014
KeyTrends• Greaterpriorityonstructureddatainitiatives
• Topvendorcriteria- Integrationwithexistinginfrastructure
- Security- EaseofUse
• Necessaryskill sets:BusinessAnalysts,DataArchitects,DataAnalysts&DataVisualizers
40% oforganizationsarestillplanningtoimplementdataprojects
oforganizationsarestillplanningtoimplementdataprojects30%
Source:2015CASponsoredResearch:Vanson Bourne GlobalBigDataUserSurvey
![Page 10: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/10.jpg)
10 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
OverallBigDataMarket
§ TheBigDatamarketwas$27.36Bin2014,upfrom$19.6Bin2013.
§ 89%ofbusiness leadersbelieveBigDatawillrevolutionizebusinessopsthesamewaytheInternetdid.
§ 83%havepursuedBigDataprojectsinordertoseizeacompetitiveedge.
Wikibon projectstheBigDatamarketwilltop$84Bin2026,attaininga17%Compound AnnualGrowthRate(CAGR)fortheforecastperiod2011to2026.
Source:2015Wikibon BigDataMarketForecast
![Page 11: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/11.jpg)
11 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
DatabaseforBigDataOverallBigDatadatabasemarkettoprojectedtogrowat33%CAGRuntil2017
Source:©Wikibon BigDataModel2011-2017,BigDataMarketDatabase Projection,2011-2017($USbillions)
• BigDatadatabasemarketwillgrowatapprox.60%from2011-2017(6-year)
• MarketforNoSQLdatabasewas$0.2Bin2012,growingto$1.6Bin2017.
• Technologyprogression inData-in-DRAM-MemoryandData-in-Flash-Memorywillimprovethescalability ofSQLdatabases.
• Applications areeasiertoprogramandrequirelowermaintenanceifSQLisused;NoSQLhasgreaterscalabilityandlowertechnologycostsforverylargebig-dataapplications.
Source:2015Wikibon BigDataModel2011-2017
![Page 12: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/12.jpg)
12 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
VendorLandscape– BroaderParticipantsBIGDATAMARKETSEGMENT
HARDWARESERVERS(CHIPS) STORAGE NETWORKING
HP EMC/Dell CiscoDell NetApp AristaNetworksIntel Fusion-io Infeineta Systems
SOFTWAREHADOOP NOSQL *NGDW ANALYTICS &BI Management Solutions
Hortonworks Cassandra HP Vertica DigitalReasoning CABigDataControlCenter
Informatica
Cloudera MongoDB EMCGreenplum RevolutionAnalytics Vmware IBM BigInsights
MapR Couchbase TeradataAster Jaspersoft HPHAVEn ZettasetHadapt DataStax IBMNetezza Dataeet BluedataEPIC Syncsort
EMCGreenplum 10gen SAP Pentaho StackIQ BMC Control-M
SERVICESCLOUD SERVICES TECHNICAL SERVICES PROFESSIONALSERVICES
Amazon Hortonworks ThinkBigAnalyticsGoogle Cloudera IBMMapR Cloudwick EMCIBM EMC Accenture
Microsoft IBM Deloitte
*NGDW=NextGenerationDataWarehouse
CoreInfrastructureHadoopCassandraMongoDB AmazonBigDataMAPRElasticSearch
![Page 13: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/13.jpg)
BigDataUseCaseStudies
![Page 14: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/14.jpg)
14 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
Media&EntertainmentUseCasePROBLEM SOLUTION POTENTIALBENEFITS§ Acompany’s streamingbusiness
hasexpandedfromthousandsofmemberswatchingoccasionallytomillionsofmemberswatchingovertwobillionhourseverymonth.
§ Acollectionofeventsdescribingwhat isbeing viewedmust begathered. Giventhatviewingiswhatmembersspendmostoftheirtimedoing,what’sneededisarobustandscalablearchitecturetomanageandprocessthis.
§ Certain thingswillbreakthearchitecturethatprocessesbillionsofviewing-relatedeventsperday.
§ Focusontheminimumviablesetofusecases
§ Availabilityoverconsistency- ourprimaryusecasescantolerateeventuallyconsistentdata,sodesignfromthestartfavoringavailabilityratherthanstrongconsistencyinthefaceoffailures.
§ Byfocusingontheminimumviablesetofusecases,ratherthanbuildingagenericall-encompassingsolution,wehavebeenabletobuildasimplearchitecturethatscales.
§ The company’sviewingdataarchitectureisdesignedforavarietyofusecases,rangingfromuserexperiencestodataanalytics.Thefollowingarethreekeyusecases,allofwhichaffecttheuserexperience:
![Page 15: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/15.jpg)
15 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
HealthCareUseCase
15
*SystemzVSAMdatabaserequiresspecialskillstoaccesswithoutvStorm ConnectDataStreamingforBigData
PROBLEM SOLUTION POTENTIALBENEFITS
§ Relapsesincardiacpatients§ “Onesizefitsall”
treatment§ Medicare readmission
penalties§ Sensitivepatientdataon
zSystemsVSAMfiles§ Noefficientwaytooffload
§ Identifyriskfactorsbyanalyzingpatientdata*
§ Factorsusedtopredictlikelyoutcomes
§ Reductioninreadmissions§ Savingsinnopenalty fees§ Nomanualintervention§ Noincrease instaffing
![Page 16: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/16.jpg)
16 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
RetailUseCase
16
PROBLEM SOLUTION POTENTIALBENEFITS
§ Streamsofuserdatanotcorrelated
§ e.g.storepurchases,websiteusagepattern,cardusage,historicalcustomerdata
§ Historical customerdataSystemzVSAM&DB2based– noefficient,secureoffload
§ HDFSsecurelypopulatedwithhistoricalcustomerdata,cardusage,storepurchases,websitelogs
§ Splunk scorescustomersbasedonthevariousdatastreams
§ Highscoringcustomersofferedcoupons,specialdealsonwebsite
§ Increaseinonlinesalesinthemiddleofretailslowdown
§ Improved conversionrateofwebsitebrowsingcustomers(shoppingcarttosales)
§ Eliminationofdatasilos–sincenowanalyticscoveralldatanomorereliance onmultiple reports/formats
![Page 17: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/17.jpg)
HadoopBasics
![Page 18: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/18.jpg)
18 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
WhatisHadoop?
Hadoopis…open-sourcesoftwaredesignedforHighScalability,FaultTolerant andHighlyDistributed
Keyelements:1. Distributedprocessing ofBigData(e.g.MapReduce)2. Distributedstorage(HadoopDistributedFileSystemorHDFS)
HDFS(DistributedReliableStorage)
MapReduce(ResourceManagement
&DataProcessing)
HDFS(DistributedReliableStorage)
YARN(ResourceManagement)
MapReduce(Dist.Programming)
Hadoop1.0 Hadoop2.0
Spark(InMemory)
1
23
HBase
(NoSQLstore)
Hive(Query)
Pig(Scripting)
Oozie(Workflow)
45
![Page 19: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/19.jpg)
19 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
MapReduce– CoreHadoop1
§ Hadoop’sMapReduceframeworkinvolvestwophases:1. MapPhase:Distributesdatasetamongmultiple serversand
operatesonthedatalocally.2. ReducePhase:Recombinesthepartialresults.
AdistributedcomputingFramework
![Page 20: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/20.jpg)
20 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
MapReduce– CoreHadoop1
AdistributedcomputingFramework
• JobTracker-OneoftheCoreHadoopservices thatmanagesthejobs andtheresourcesinthecluster(tasktrackers).JobTrackertriestoschedule a“map”asclosetotheactualdatabeingprocessed.
• TaskTracker–deployedonthedatanodes andareresponsible forrunningthemapandreducetasksasinstructedbyjobtracker
JobTracker
Job-1
Job-2
Job-3
Job-4
Job-5
MR
Processeslargejobsinparallelacrossmanynodesandcombinestheresults.
245
125
134
235
134
DataNodes
TaskTrackers
MasterNode
SlaveNodes
![Page 21: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/21.jpg)
21 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
Job-1
Job-2
Job-3
Job-4
Job-5
HDFS
DataNodes
TaskTrackers
HadoopDistributedFileSystem(HDFS)Self-healing,highbandwidthClusteredStorage
• NameNode-OneoftheCoreHadoopservicesthatmaintainsthenamespace–knowswheredataisandmanagesblocks ondatanodes
• DataNode- serves thatactualstorethedataintheirlocaldisks.
• SecondaryNameNode-performsperiodic checkpointofprimarynamenodetoserveasabackupincaseoffailure
SlaveNodes
245
125
134
235
134
HDFSbreaksincomingfilesintoblocksandstoresthemredundantlyacrossthecluster.
NameNode(primary)
NameNode(secondary)
MasterNode
PeriodicCheckpoint
2
![Page 22: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/22.jpg)
22 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
YARN
YARNis…§ ResourceManagement§ NextgenerationMapReduce(MRv2)§ Splits JobTrackerinto:
– ResourceManager– Scheduling /Monitoring
3
WhatdoesYARNdo?§ Provides aclusterlevelresourcemanagerfor
improvedresourcemanagement&scaling§ Formsthenewsystem formanaging
applications inadistributedmanner§ Provides slotsforjobsotherthan
Map/Reduce§ Improvesresourceutilization ResourceManagementmovesintoYARN
YetAnotherResourceNegotiator
![Page 23: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/23.jpg)
23 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
HBASE
Whatisit?§ AHadoopopen source(Java)NoSQLdatabase§ Provides real-timeread/writeaccesstothose
largedatasets§ Distributedwithautomaticfailover
Anon-relational(NoSQL)databasethatrunson topofHDFS
4
Whyuseit?§ Provides anaturaldatastoragemechanism forall
kinds ofdata(especially unstructured)§ Forrandom,realtimeaccesstodatainHadoop§ Whentheprojectgoalistohostverylargetables
i.e.billions ofrowsandmillions ofcolumns§ Combines datasources thatuseawidevarietyof
differentstructuresandschemas§ Greatfor: storingsemi-structureddatalikelogdata
HBase(NoSQLstore)
LogicalViewofCustomerContactInformationinHBase
![Page 24: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/24.jpg)
24 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
Hive
Whatisit?§ AqueryenginewrapperbuiltonMapReduce§ TreatedasadatawarehousetoolfortheHadoop
ecosystem§ PrimarilyforuserswithSQLskills§ ProvidesHive=QL(similartoSQL)§ StoresdatainHDFS
ADataWarehouseinfrastructurebuiltonHadoop
5
Whyuseit?§ Dataanalysisandreportingpurposes§ HidesHadoopcomplexityfromendusers§ CanbeusedwithinanELTfunction– i.e.toconvert
StructuredQuerylanguagetounstructuredMapReducejobs torunonaHadoopcluster
§ Goodfor:BatchProcessing tasks:logs, textmining,documentindexing, customerBI)
§ Notgoodfor:Onlinetransactionprocessing, real-timequeries.
Hive(Query)
![Page 25: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/25.jpg)
25 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
Cross-IndustryUseCase– ApacheHadoopELTPROBLEM SOLUTION BENEFITS§ Traditional DataWarehousing resourcesare
EXPENSIVE (e.g.transactionalMainframesystems)
§ Needtoreducecosts associatedtoStorage,CPUcapacityand3rd partyETLtools
§ Current systems cannotscale(i.e.process§ Lackefficient tools§ Toolstypicallyonlyhandlestructured data
(RDBMS)but BigDatainsightisderivedfromalltypesofdata(structured, unstructured, semi-structured
§ ApacheHadooptoolsto:
1. perform ETLfunctions
2. forhandlingallofthespecific datatypes.
3. Toshiftawayfromtraditional ETLtoELT(extract, load, andtransform).Thisshiftismainlydrivenbybigdata,whichfollowsthe“storefirst, analyzelater”modelthatisbecomingthenewstandard.
§ Compared totraditional transactional systems,Hadoopprovidesfast,low-cost processing
§ Newvaluecanbederivedfromability tohandlestructured andnon-structured data
§ Greater flexibility &choice:e.g.theTransformfunction canuseMapReduce,Hive,Pig,R,ShellScripts, Java…etc.
§ Vastsupport model:opensourcedevelopercommunity
ExtractTransform
Load
Load
Load
DWH
DataMining
Reporting
OLAP Analysis
Traditional ETLProcess
Web
CRM
ERP
Web
CRM
ERP
Social Media
Sensor Logs
Structured
Unstructured
Flume
Sqoop
Extract/Load
DataMining
Reporting
AnalyticsHDFS
HadoopDistributedFileSystem
PigMapReduce
Hive
Transform
![Page 26: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/26.jpg)
26 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
Pureopensource– OpenCore– Compatible
CommercialDistributionsofHadoop
Cloudera Hadoop
HDFS OOZIE
Hortonworks
MAPR
Apache
![Page 27: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/27.jpg)
27 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
TheEvolvingHadoopEcosystemComponents Description
mahout RDataMining/machinelearningtoolsusedagainstHadoop datatodetectpatternsandtrends
PigScriptinglanguageforanalyzinglargedatasets.CompilestoMapReduce jobs
MapReduce YARNProgrammingmodelforprocessinglargedatasets.YARNperforms overall resourcemgmt
Oozie Aworkflowscheduler tooltomanageHadoop MapReduce jobs
Sqoop HiveEnableSQLforHadoop data:Sqoop - DatatransferbetweenHadoopandstructureddatastores.HIVE - datawarehouseforHadoop.Drill - opensource,lowlatencySQLqueryengineforHadoop andNoSQL.
Drill
ZooKeeperCoordinationofconfig.data,namingandsynchronizationofHadoop projects
Components Description
BigTopPackagingservicesforHadoopprojectstoeasetestinganddeployment
HBaseAnon-relational,distributeddatabasethatrunsontopofHDFS
Thrift /AVRO Schema-baseddata serializationsystemusingRPCcalls
Solrhutch Indexingandsearchtoolsfor
datastoredinHDFSforHadoopElasticsearch
Kafka /FlumeCollect,aggregate,andmovestreamingdatafrommultiplesourcesinto Hadoop
SparkAppDev toolfor Hadoop appscombiningbatch,streaming,andinteractiveanalytics
Anbari Chukwa Monitoring&ManagementofHadoop clustersandnodes
![Page 28: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/28.jpg)
NoSQLBasics
![Page 29: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/29.jpg)
29 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
NoSQL DatabasesOverview
§ Farbetterathandlingsemi-structuredandunstructureddata
§ Databaseconsistencyiscompromisedforavailabilityandeaseofpartitioning
§ Supportsobject-orientedprogrammingthatiseasytouseandflexible
§ Efficient,scale-outarchitectureinsteadofexpensive,monolithicarchitecture
![Page 30: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/30.jpg)
30 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
NoSQLtypes
Type DatabaseexamplesColumnDataModel HBase,Cassandra, Accumulo
DocumentDataModel MongoDB
Key-ValueDataModel OpenTSDB,Redis
GraphDataModel Neo4j,ArangoDB
![Page 31: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/31.jpg)
CassandraBasics
![Page 32: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/32.jpg)
32 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
Cassandra– History
BigTable,2006 Dynamo,2007
OpenSource,2008
CassandraDSE– Dec2011
Google Amazon
Datastax
![Page 33: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/33.jpg)
33 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
CassandraisIdealFor…
§ Massive,linearscaling
§ Extremelyheavywrites
§ Highavailability
CERN Barracuda
CISCO BlueMountain
Comcast Netflix SoundCloud
![Page 34: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/34.jpg)
34 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
Cassandra– DataModel
BenefitsofCassandraDataModel:§ Easilyaddnewcolumnswithoutdowntime
§ Schemafree/schemalessdatabase
§ Compressionpermitscolumnaroperations(MIN,MAX,SUMetc.)rapidly
ColumnFamily(similar toRDBMStable) ColumnFamily- JSONFormat
![Page 35: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/35.jpg)
35 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
CassandraArchitecture
§ Allnodesthesame
§ Datapartitionedamongallnodesincluster
§ EachnodecommunicateswithothernodesusingGossipprotocol
§ Acommitlogisusedoneachnodetocapturewriteactivityfordatadurability
Client
Storage :CassandraFileSystemProcessing :CassandraQueryLanguage(CQL)
![Page 36: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/36.jpg)
36 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
Cassandra– Keyfeatures
§ Nosinglepointoffailure
§ Multi-datacenterandzonesupport
§ Purepeer-to-peerclustersetup
§ Allowsfor“tunableconsistency”
§ CassandraQueryLanguage(CQL)
§ CassandraFileSystem(CFS)
![Page 37: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/37.jpg)
37 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
CassandraatNetflix
Usecases:§ WhattitleshaveIwatched?§ Whattitlesarerecommendedforme?§ WheredidIleaveofflast?§ Whatelseisbeingwatched?§ Measurememberengagement§ Informproduct&contentdecisions
Solution:§ Captureall‘view’ eventsinscalable
Cassandraclusters
Challenges:§ Ability toscalebillionwriteevents/day§ Provideresponsive titlebrowsingexp.
Source:techblog.netflix.com
![Page 38: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/38.jpg)
MongoDB Basics
![Page 39: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/39.jpg)
39 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
2007Founded
2009MongoDB 1.0Open-sourced
2012MongoDB 2.0
2015MongoDB 3.0
2013MongoDB Inc.
10gen 10gen 10gen MongoDB MongoDB
![Page 40: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/40.jpg)
40 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
MongoDB isidealfor…
§ RDBMSreplacementforWebApplications
§ Semi-structuredContentManagement
§ Real-timeAnalyticsandHigh-Speedlogging
§ CachingandHighScalability
Web2.0,Media,SAAS,Gaming
HealthCare,Finance, Telecom,Government
Notsogreatfor– HighTransactionalDatabases
DisneyEventbriteIntuitIGN
Craigslist
![Page 41: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/41.jpg)
41 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
MongoDB – Datamodel
RDBMS Document-oriented
BenefitsofDocument-orientedDBMS:
• Databaseschemaisoptional
• Flexibleindealingwithchangeandoptionalvalues
{“streetnum”: “123”,“streetname”: “Main St.”,“unit”: “456”,“City”: “Mountain View”,“State”: “California”,“zip”: “65432”}
{“streetnum”: “123”,“streetname”: “Main St.”,“unit”: “456”,“City”: “Mountain View”,“State”: “California”,“County”: “Santa Clara”“zip”: “65432”}Present
Future
![Page 42: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/42.jpg)
42 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
MongoDB – Sharding
![Page 43: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/43.jpg)
43 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
ShardedProductionClusterSetup
Imagesource:mongodb.org
§ Shards storethedata.Toprovidehighavailabilityanddataconsistency,inaproductionshardedcluster,eachshardisareplicaset
§ ReplicaSetAclusterofMongoDB serversthatimplementsmaster-slavereplicationandautomatedfailover
§ QueryRouters,or mongos instances,interfacewithclientapplicationsanddirectoperationstotheappropriateshardorshards.
§ Config servers storethecluster’smetadata.Thisdatacontainsamappingof thecluster’sdatasettotheshards.
![Page 44: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/44.jpg)
44 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
MongoDB– KeyFeatures
§ ScalableHigh-PerformanceOpen-Source,Document-orienteddatabase
§ BuiltforSpeed
§ RichdocumentformatallowsforEasyReadability
§ FullindexsupportforHighPerformance
§ ReplicationandFailoverforHighAvailability
§ Auto-Sharding forEasyScalability
§ Map/ReduceforAggregation
![Page 45: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/45.jpg)
45 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
MongoDB atCraigslist
Usecases:§ Createnewposts§ Browseallmyposts§ Allowforpostclassification§ Searchrelevantposts
Solution:§ MigratefromMySQLtoMongoDB
Challenges:§ Archivebillions ofrecordsinmultiple formats§ Query/reportonarchivesatruntime§ Needcontinuous availabilitymandatedfor
regulatorycompliance§ Support 700sitesin70differentcountries
CraigslistEnvironment
• 5Billiondocuments• Avg Size:2KB• 3Replicasets/3serverseach• 2Datacenters• Sharding key– PostingID
![Page 46: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/46.jpg)
Closing
![Page 47: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/47.jpg)
47 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
CABigDataControlCenter– Vision
Bringefficiencytoroot-causeanalysis atalllevelsofBigDatasolution stack
SimplifymanagementbyabstractingthecomplexitiesofunderlyingBigDataTechnologies
HolisticallymeettheneedsofDevOpsbymanagingthelifecycleofApplications,DataandServices
BigDataTechnologies
LOB/BizAnalysts
AppDev./DataSci.
DataEng./DataAdmin
ITOps/ITMgmt.
BigData/SysAdmin
PrimaryPersonas
1
2
3
SecondaryPersona
End-to-EndManagementofBigDataEnvironments fortheApplicationEconomy
Application
Data Services
DataSources
ITSolutions CABigDataControlCenter
![Page 48: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/48.jpg)
48 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
ManageBigDataWithAUnifiedView
JobMonitoring
HeterogeneousSystemManagement
IntelligentAlertManagement
ResourceReporting
Cluster/Job/NodeManagement
![Page 49: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/49.jpg)
49 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
UnifiedView– Details
![Page 50: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/50.jpg)
50 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
RecommendedSessions
SESSION# TITLE DATE/TIME
MFT05S BigIron+BigData=BIGDEAL!Unlock ThePowerofYourMainframeData
1/18/2015 at2:00pmLocation:MainframeTheater
MFX15S PredictingWhenYourApplicationsWillGoOfftheRails!ManagingDB2Application PerformanceusingAnalytics
1/18/2015 at4:30pmLocation:BreakersI
MFT15TNewMainframeITAnalytics:ActionableInsightintoRootCauseAnalysis ofPerformanceIssues
1/18/2015 at3:45pmLocation:MainframeAreaTechTalk
MFX06S CA'sStrategyandVision forMainframeDataManagementandAnalytics
1/18/2015 at1:00pmLocation:BreakersI
MFT01S TheBigData,BigPicture:CanYouSeeIt? 11/19/2015 at3:45pmLocation:MainframeTheater
![Page 51: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/51.jpg)
51 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
MustSeeDemos
SeetheFutureofBigDataManagement
CABigDataControlCenter
AppEconomyAreaStation:APPECN001
UnleashthePowerof
MainframeData
vStorm ConnectDataStreamingforBigData
MainframeAreaStation:MNFSE001
MaximizeYourMainframe
DatabaseValue
CAIDMS/CADatacom
MainframeAreaStation:MNFSE002
PerformanceAnalyticsforDB2
DB2Analytics
MainframeAreaStation:MNFSE004
![Page 52: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/52.jpg)
52 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
FollowOnConversationsAt…
SmartBarDB2ToolsandPerformance
Analytics
MainframeAreaonExpoFloor
TechTalksFiveStepstoPowerfulDatabase
Experience
MainframeAreaonExpoFloor
![Page 53: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/53.jpg)
53 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
InfluencingOurRoadmap
WinningwithCA
§ Submityourideasoncommunities.ca.com
§ Vote&commentonideasthatareimportanttoyou
§ CAProductManagementreviewsideasandupdatesstatusastheymovethroughthelifecycle
§ “CurrentlyPlanned”ideastatusindicatesinclusioninAgileBacklogorProductRoadmap
Taketheopportunity to influenceourproductdevelopment.Helpensurethatwedeliveriswhatyouneedandwant.
AgileDevelopment
CACommunities Ideation§ Registertoparticipatein:– LiveDemos/End-of-SprintReviews
– Private-MembersOnly-OnlineCommunity
– Pre-ReleaseOnsiteTestingandSupport(Beta)
– UpgradeSupportfromSWATTeam
§ Howtoregister:https://validate.ca.com
CustomerValidation
![Page 54: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/54.jpg)
54 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
AgileDevelopmentTransformation
DrivingSignificantBusinessValueforourCustomers!
Speed Quality
Performance
UKCustomerStandardLifebenefitsfromCAagileprocess
251 uniquecustomersparticipatedin56 productreleasesduringayear
99.5%reductionincost98%reductioninmonthendcycletime
45products releasedagainstzerodefectpolicy20%decreaseinsupportissues
![Page 55: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/55.jpg)
55 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
ForInformationalPurposesOnlyTermsofthisPresentation
©2015CA.Allrightsreserved.Alltrademarksreferencedhereinbelongtotheirrespectivecompanies.Thepresentationprovided atCAWorld2015isintendedforinformationpurposesonlyanddoesnotformanytypeofwarranty.Someofthespecificslideswith customerreferences relatetocustomer'sspecificuseandexperienceofCAproductsandsolutionssoactualresultsmayvary.
CertaininformationinthispresentationmayoutlineCA’sgeneralproductdirection.Thispresentationshallnotserveto(i)affecttherightsand/orobligationsofCAoritslicenseesunderanyexistingorfuturelicenseagreement orservicesagreementrelatingtoanyCAsoftwareproduct;or(ii)amendanyproductdocumentationorspecificationsforanyCAsoftwareproduct.Thispresentationisbasedon currentinformationandresourceallocationsasofNovember18,2015,andissubjecttochangeorwithdrawalbyCAatanytimewithoutnotice.Thedevelopment,release andtimingofanyfeaturesorfunctionalitydescribedinthispresentationremainatCA’ssolediscretion.
Notwithstandinganythinginthispresentationtothecontrary,uponthegeneralavailabilityofanyfutureCAproductrelease referenced inthispresentation,CAmaymakesuchrelease availabletonewlicenseesintheformofaregularlyscheduledmajorproductrelease.SuchreleasemaybemadeavailabletolicenseesoftheproductwhoareactivesubscriberstoCAmaintenanceandsupport,onawhen andif-availablebasis.Theinformationinthispresentationisnotdeemedtobeincorporatedintoanycontract.
![Page 56: Big Data, Big Opportunity: A Primer for Understanding The Big Data Frontier](https://reader033.vdocuments.mx/reader033/viewer/2022052606/58f2c7341a28ab447f8b4593/html5/thumbnails/56.jpg)
56 ©2015CA.ALLRIGHTSRESERVED.@CAWORLD #CAWORLD
Q&A