2012 10 bigdata_overview

Download 2012 10 bigdata_overview

If you can't read please download the document

Upload: jdijcks

Post on 13-Jan-2015

907 views

Category:

Technology


3 download

DESCRIPTION

 

TRANSCRIPT

  • 1. Big DataJean-Pierre Dijcks

2. Agenda Big Data Strategy Technology Use Cases 3. Big Data3 Copyright 2011, Oracle and/or its affiliates. All rights reserved. 4. Big DataReact to an EventPro-Actively Change Outcomes Technology presents the opportunity to transform business* Mark Hurd, President, Oracle* Oracle Profit Magazine, Volume 17, Number 1 5. Big Datas Key Ingredient Improvement merely lets youBig Data transformshit the numbers. Creativity isour business 5%what transforms.*Ron Johnson, CEO, JCPenney Big Data improvesour business 20% What is Big Data? 75%* Fortune Magazine VOL. 165, NO. 4 6. Big Data Extends the Breadth and Speed of Data Video and ImagesBig Data:Decisions based Documentson all your dataSocial Data Machine-Generated Data Information Architectures Today: Transactions Decisions based on database data 7. Big Data Extends the Depth of AnalyticsGraph AnalyticsStatisticsQuery and ReportingData Mining 2 miles Spatial Analytics Text Analytics 8. Big Data DefinedBig Data: Techniques andTechnologies that Enable Enterprisesto Effectively and EconomicallyAnalyze All of their Data 9. Strategy 10. Strategic Transformations Reporting AnalyticsAutonomousRear-view MirrorActionsTransactional All DataData 11. Oracles Big Data solution Endeca Information Discovery OracleBig Data OracleApplianceExadataOracleOracle Exalytics Big DataConnectorsInfiniBand InfiniBandOracle CEP Real-Time DecisionsAcquireOrganize & Discover AnalyzeDecide11 Copyright 2011, Oracle and/or its affiliates. All rights reserved. 12. Oracle Big Data Strategy BI Tools SemanticText CEP Data& Advanced RTDManagement Analytics GraphSpatialData Discovery Tools Management Infrastructure BuildAcquire Adopt Engineer 13. Technology 14. Big Data ApplianceHardware: 288 CPU cores with 1152 GB RAM 648 TB of raw disk storage 40 Gb/s InfiniBandIntegrated Software: Oracle Linux Oracle Java VM Cloudera Distribution of Apache Hadoop (CDH) Cloudera Manager Open-source distribution of R NoSQL Database Community EditionAll integrated software (except NoSQL DB CE) is supported as part of Premier Support for Systems and Premier Support forOperating Systems 15. Oracle Big Data ApplianceFile System Mount UI Framework SDK FUSE-DFS HUE HUE SDKWorkflowScheduling MetadataAPACHE OOZIE APACHE OOZIE APACHE HIVELanguages / CompilersAPACHE PIG, APACHE HIVE, APACHE MAHOUTFastDataRead/Write Integration AccessAPACHE FLUME, APACHE APACHE HBASESQOOPHDFS, MAPREDUCE Coordination APACHE ZOOKEEPER 16. Why Cloudera? Includes Open Source Apache Hadoop Fast evolution in critical features Proven at very large scale Managed Distribution Components certified to work together in regular updates Cloudera Manager provides Management GUI Most popular distribution in the market 17. Oracle and Cloudera All Cloudera software pre-installed and pre-configuredon BDA Engineered with Cloudera All Cloudera assets included Single Oracle Product SKU for HW & SW Single Oracle Support SKU for HW & SW (life of the machine) Oracle is the single point of contact for the solution 18. Price comparisonOracle Big Data Appliance Build-Your-Own HP hardware and ClouderaYear 1 Year 2Year 3TotalYear 1 Year 2 Year 3Total Servers and BDA Cost $450,000 $428,220 switches Support $54,000 $54,000 $54,000 Support Cost$136,233 $72,000 $72,000 Cost On-site Installation & Installation$14,150 configuration not included Total$518,150 $54,000 $54,000$626,150 Total $564,453 $72,000 $72,000$708,453Full details at https://blogs.oracle.com/datawarehousing/entry/price_comparison_for_big_data 19. Oracle NoSQL DatabaseA distributed, scalable key-value database Simple Data Model Key-value pair with major+sub-key paradigmApplication Application Read/insert/update/delete operations NoSQLDB Driver NoSQLDB Driver Scalability Dynamic data partitioning and distribution Optimized data access via intelligent driver High availability One or more replicas Disaster recovery through location of replicas Resilient to partition master failures No single point of failure Transparent load balancingStorage Nodes Storage Nodes Reads from master or replicasData Center A Data Center B Driver is network topology & latency aware 20. Big Data ConnectorsOptimized integration of Hadoop with Oracle Databaseand Oracle Exadata Oracle Loader for Hadoop Oracle Direct Connector for Hadoop Distributed File System(HDFS) Oracle Data Integrator Application Adapter for Hadoop Oracle R Connector for Hadoop Does not require Big Data Appliance can be licensed forHadoop running on non-Oracle hardware 21. Oracle Loader for HadoopUse The Cluster ORACLE LOADER FOR HADOOPMAPREDUCEMAPLast stage in MapReduceMAP SHUFFLE/SORTREDUCE workflow Partitioned and non- MAP REDUCEpartitioned tables MAP REDUCESHUFFLE MAP /SORT REDUCE Online and offline loads 22. Oracle Direct Connector for HDFSDirect Access from Oracle DatabaseHDFS Oracle Database SQL Query SQL access to HDFSExternal Table External table view Data query or importDCH DCHHDFS InfiniBand DCHClient 23. Oracle Data IntegratorSimplifying MapReduceOracle DataIntegrator Automatically generates MapReduce code OracleLoader for Manages the process Hadoop Loads into Data Warehouse 24. What is Data Discovery?SimplifiedQuickly explore all relevant data Relationships Advanced search Structuredundefined or unknown Faceted navigation Semi-structured No pre-defined model Analytics Unstructuredrequired Messy data Rapid, iterative change Beyond the data warehouse 25. Business Intelligence and Data Discovery Complementary Solutions, Integrated Business Processes Known & Clearly Uncertain orDefined Questions Open-Ended QuestionsWho, What, When? Why, How, What Else?Un-modeled DataInsights yieldData Discoverymature modelsDiverse and Changing Modelsand KPIs Fast Answers to New QuestionsNew questionsModeled DataBusiness Intelligence require new Proven Answers to Known Conforms to a Single Model Questionsdata, exploration 26. Oracle Endeca Information DiscoveryA platform for data discovery applications across the enterpriseEndeca Information Discovery(EID) helps organizationsquickly explore all relevant data Combine structured & unstructureddata from disparate systems Rapidly assemble easy to useanalysis applications Automatically organize informationfor search, discovery & analysis 27. Big Data: Why Deeper Analytics?CommunicationsEnhanced churn prediction with social network analyticsConsider each customers value as part of their social network Focus retention campaigns on high-value social networks Identify new prospective high-value customers Target promotions for upselling and cross-selling to key social network influencers Identify rotational churners and exclude from retention offers Insurance Automated deep analytics for fraud and abuse in insurance claims processingEnhance fraud analytics by considering text data (assessors reports, police reports, witness interviews) in addition to transaction dataInvestigate claims that have the highest expected risk (based on likelihood of fraud and claim size)Focus scarce investigative resources and create feedback loop for automated analysis Retail Identify and respond to shifts in behavior Combine past and most recent point-of-sale data with customer information Track and monitor shifts in individual customer behaviors and household purchases Anticipate new up-sell and cross-sell opportunities27 | 2012 Oracle Corporation 28. Deeper Analytics: Oracle Advanced Analytics Oracle Advanced Analytics extends Oracle Database into a comprehensive analytical platform Predictive analytics, data mining, text mining, statistical analysis, advanced numerical computations Scalable and parallel: analyze huge volumes of data Tightly integrated with SQL: share results of analytics throughout enterprise Built for data analysts28 | 2012 Oracle Corporation 29. Oracle Advanced Analytics: Data Mining 12 cutting-edge machine-learning algorithms Parallel model creation Data transformation and preparation for data mining Scalable mode creation Efficiently scoring for large volumes Data Miner GUI to build and evaluate data mining models Data Mining can provide valuable results: Predict customer behavior (Classification) Predict or estimate a value (Regression) Segment a population (Clustering) Identify factors more associated with a business problem (Attribute Importance) Find profiles of targeted people or items (Decision Trees) Determine important relationships and market baskets within the population (Associations) Find fraudulent or rare events (Anomaly Detection)29 | 2012 Oracle Corporation 30. Oracle Advanced Analytics: Oracle R Enterprise Oracle R Enterprise brings Rs statistical functionality closer to the Oracle Database 1. Eliminate Rs memory constraint by enabling Rto work directly & transparently on database objects Allows R to run on very large data sets 2. Architected for Enterprise production infrastructure Automatically exploits database parallelism without requireparallel R programming Build and immediately deploy 3. Oracle R leverages the latest R algorithms and packages R is an embedded component of the DBMS server30 | 2012 Oracle Corporation 31. Use Cases 32. Big Data Architecture Pattern Analyze2 miles Capture Text Analytics StatisticsData Mining Graph AnalyticsSpatial Analytics Integrate intoApplicationsOperational Systems Real-time Event DetectionFront EndBack End Data Handlers Acquire Low value density data Organize Real-time & Batch Feeds AlgorithmsHigh value dataFilterIndex ETL Classify CorrelateStoreLow density High valueSemantic HDFSvalue data NoSQL Relational data/Spatial32 | 2012 Oracle Corporation 33. Big Data ExamplesInsurance Individualize auto-insurance policies based on newly captured vehicle telemetry dataInsurer gains insight into customers driving habits delivering More accurate assessments of risks Individualized pricing based on actual individual customer driving habits Guide and motivate individual customers to improve their driving habitsTravel Optimize buying experience through web log and social media data analysis Travel site gains insight into customer preferences and desires Up-selling products by correlating current sales with (subsequent) browsing behavior Increase browse-to-buy conversions via customized offers and packages Deliver personalized travel recommendations based on social media dataGamesCollect gaming data to optimize spend within and across games Games company gains insight into likes, dislikes and relationships of its users Enhance games to drive customer spend within games Recommend other content based on analysis of player connections and similar likes Create special offers or packages based on browsing and (non-)buying behavior33 | 2012 Oracle Corporation 34. Big Data Use Case: Smart MallPoint of Sale Capture:Customer Profile: Coupon usedJane Send Coupon: 3 items bought (up 1)Customer enters Doe, 32, Married of item 20% Increased spend (up $10)mall area based 2 kids (2&4 yrs) used in the whenon Cell Phonenext 15 minutes112 113 114 115 couponsUses our 116 117 118 119 120location data121126 125 124127123 122 34 | 2012 Oracle Corporation 35. Big Data Technology PatternIdentify User Collection &Deliver Decision PointCoupon FilterOracle DecisionRTDCEPEnrichBig Data Collection & Appliance Decision PointsModelsScoresAnalyzeStreamingMapBig Data Analyze Reduce ConnectorsSocialFeeds35 | 2012 Oracle Corporation