Transcript
  • BIG DATA ANALYTICS FOR REAL TIME SYSTEMS

    Kamalika Dutta Manasi Jayapal

  • Overview

    Introduction

    Big Data Analytics

    Real Time Systems

    Challenges of Real Time Analytics

    Technologies

    Tools

    Use Cases

    Future Work and Conclusion

    2 Big Data Analytics for Real Time Systems

  • Overview

    3 Big Data Analytics for Real Time Systems

    Introduction

    Big Data Analytics

    Real Time Systems

    Challenges of Real Time Analytics

    Technologies

    Tools

    Use Cases

    Future Work and Conclusion

  • Where does Big Data come from?

    4 Big Data Analytics for Real Time Systems

    Courtesy: http://goo.gl/JWswfj

  • What makes it Big Data?

    5 Big Data Analytics for Real Time Systems

    Courtesy: Oracle

    VARIABILITY

  • Evolution of Big Data

    6 Big Data Analytics for Real Time Systems

    1960s 1967

    Automatic Data Compression

    1997

    Information Explosion

    Our Literature Survey!

  • Overview

    7 Big Data Analytics for Real Time Systems

    Introduction

    Big Data Analytics

    Real Time Systems

    Challenges of Real Time Analytics

    Technologies

    Tools

    Use Cases

    Future Work and Conclusion

  • Big Data Analytics Big data analytics is the process of examining large data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other useful business information.

    8 Big Data Analytics for Real Time Systems

    Predictive Analysis

    Text Analysis

    Data Mining

    Statistical Analysis

    Courtesy: smartdatacollective.com

  • Sample Systems

    9 Big Data Analytics for Real Time Systems

  • Analytics & 3 Vs

    10 Big Data Analytics for Real Time Systems

    Courtesy: watalon.com

  • Overview

    Introduction

    Big Data Analytics

    Real Time Systems

    Challenges of Real Time Analytics

    Technologies

    Tools

    Use Cases

    Future Work and Conclusion

    11 Big Data Analytics for Real Time Systems

  • Real Time Systems A real-time system is one that processes information and produces a

    response within a specified time, else risk severe consequences, sometimes including failure.

    12 Big Data Analytics for Real Time Systems

    Telecommunication

    Systems

    Anti-Lock Brakes in a Car

    Air Traffic Control System

    Weather Forecasting

    System

    Courtesy: yourdon.com

  • Real-Time Analytics of Big Data

    13 Big Data Analytics for Real Time Systems

    What is Happening?

    Kilobytes/ Sec

    Megabytes/ Sec

    Gigabytes Terabytes

    Petabytes Exabytes

    Seconds Milliseconds Minutes Minutes Hours

    Big Data

    Real Time

    Courtesy: infochimps.com

  • Overview

    Introduction

    Big Data Analytics

    Real Time Systems

    Challenges of Real Time Analytics

    Technologies

    Tools

    Use Cases

    Future Work and Conclusion

    14 Big Data Analytics for Real Time Systems

  • Challenges of Real Time Analytics

    15 Big Data Analytics for Real Time Systems

    Expensive

    Complex Architecture, Batch Processing

    Semi and Unstructured Data: New Sources are unpredictable; Relational databases are not capable, leaving us hamstrung

    Market too Dynamic to Predict: Subscribers preferences change; competition adds acceleration to it

    Scalability: Requires sub-second response times; more than a single server can handle

  • Thinking Beyond Hadoop!

    16 Big Data Analytics for Real Time Systems

    Manage & store huge volume of any data

    Hadoop File System MapReduce

    Manage streaming data Stream Computing

    Analyze unstructured data Text Analytics Engine

    Data Warehousing Structure and control data

    Integrate and govern all data sources

    Integration, Data Quality, Security, Lifecycle Management, MDM

    Understand and navigate federated big data sources Federated Discovery and Navigation

    Courtesy: IBM

  • Our Solution

    Do the impossible: Incorporate any kind of data

    Scale Big: Scale without any complexity Not Time Consuming: Seconds to

    Minutes Real Time: Try to analyze data without

    expensive data warehouse loads

    17 Big Data Analytics for Real Time Systems

    Powerful Analytics, In Place, In Real Time.

    Courtesy: slideshare.com

  • Overview

    Introduction

    Big Data Analytics

    Real Time Systems

    Challenges of Real Time Analytics

    Technologies

    Tools

    Use Cases

    Future Work and Conclusion

    18 Big Data Analytics for Real Time Systems

  • In-Memory Computing In-memory computing primarily relies on keeping data in a server's RAM as a means of processing at faster speeds. It uses a type of middleware software that allows one to store data in RAM, across a cluster of computers, and process it in parallel.

    19 Big Data Analytics for Real Time Systems

    Courtesy: Stratecast

  • Stream Processing

    20 Big Data Analytics for Real Time Systems

    Courtesy: EMC

    Stream-processing systems operate on continuous data streams e.g., click streams on web pages, user request/query streams, monitoring events, notifications, etc.

    Stream processing delivers real-time analytic processing on constantly changing data in motion.

    Analyse first store later!

  • Complex Event Processing Complex Event Processing (CEP) processes multiple event streams generated within the enterprise to construct data abstraction and identify meaningful patterns among those streams.

    21 Big Data Analytics for Real Time Systems

    Analytics across both real-time and historical data. Real-time event capture, filtering, pattern detection, matching, and

    aggregation.

  • Overview

    Introduction

    Big Data Analytics

    Real Time Systems

    Challenges of Real Time Analytics

    Technologies

    Tools

    Use Cases

    Future Work and Conclusion

    22 Big Data Analytics for Real Time Systems

  • Tools for Real Time Analytics Big Data is NOT new, the Tools ARE!

    23 Big Data Analytics for Real Time Systems

    IBM InfoSphere Streams

  • Kafka A high performance distributed publish-subscribe messaging system.

    Designed for processing of real time activity stream data.

    Initially developed at LinkedIn, now part of Apache.

    Kafka works in combination with Apache Storm, Apache HBase and Apache

    Spark for real-time analysis and rendering of streaming data.

    24 Big Data Analytics for Real Time Systems

    Fast

    Scalable

    Durable

    Fault-tolerant

  • Storm A highly distributed real-time computation system. Acquired by Twitter. Twitter claims, Over a million tuples processed per second per node. Fast, Scalable, Reliable and Fault-tolerant.

    25 Big Data Analytics for Real Time Systems

    Stream: Unbounded sequence of tuples

    Primitives Spouts: Pull messages Bolts: Perform core

    functions of stream computing

    Stream

  • Spark Streaming

    Was developed in the AMPLab at UC Berkeley.

    In-memory computing capabilities deliver speed.

    Low latency High throughput Fault tolerant New programing model:

    Discretized streams (Dstreams) Resilient Distributed Datasets

    26 Big Data Analytics for Real Time Systems

    Spark Streaming uses micro-batching to support continuous stream processing. It is an extension of Spark which is a batch-processing system.

    Courtesy: Apache Spark

  • Spring XD (XD=eXtreme Data) Spring XD is a unified, distributed, and extensible system for data ingestion, real

    time analytics, batch processing, and data export.

    Spring XD framework supports streams for the ingestion of event driven data

    from a source to a sink that passes through any number of processors.

    27 Big Data Analytics for Real Time Systems

    Courtesy: Infoq

  • Comparison of Tools (1) Spark Streaming Apache Storm Spring XD

    Definition A fast and general purpose cluster computing system. A distributed real-time computation system.

    A unified, distributed, and extensible system for data

    ingestion, real time analytics, batch processing, and data

    export.

    Implemented in Scala Clojure Java

    Programming API Scala, Java, Python Java API and usable with any programing language. Java

    Development A full top level Apache project. Undergoing Apache project. Spring project by Pivotal.

    Processing Model Batch processing framework that also does micro-batching.

    Stream Processing Framework that processes and dispatches

    messages as soon as they arrive.

    Unified platform for stream processing.

    Fault Tolerance Recovery of lost work and restart of workers via the

    resource manager.

    Restart of Workers, Supervisors like nothing ever

    happened.

    Reassignment of work to container working.

    28 Big Data Analytics for Real Time Systems

  • Comparison of Tools (2) Spark Streaming Apache Storm Spring XD

    Data processing Messages are not lost and

    delivered once. (Small-scale batching)

    Keeps track of each and every record.

    Unacknowledged messages are retried until the

    container comes back.

    Use Cases

    Combines batch and stream processing (Lambda Architecture).

    Machine Learning: Improve performance of iterative algorithms

    Power Real-time Dashboards.

    Prevention of: securities fraud compliance violations security breaches network outage

    Stream tweets to Hadoop for sentiment analysis.

    High throughput distributed data ingestion into HDFS from a variety of input sources.

    Real-time analytics at ingestion time, e.g. gathering metrics and counting values.

    29 Big Data Analytics for Real Time Systems

  • Which tools are right for you?

    30 Big Data Analytics for Real Time Systems

  • Lambda Architecture

    31 Big Data Analytics for Real Time Systems

    In 2013, Nathan Marz and James Warren proposed the Lambda Architecture that attempts to provide a methodology to build a Big Data system.

    Such a system would balance latency, throughput, and fault-tolerance by using batch processing to provide comprehensive and accurate pre-computed views, while simultaneously using real-time stream processing to provide dynamic views. Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. O'Reilly Media, 2013.

    Courtesy: Trivadis

  • Lambda Architecture Example

    32 Big Data Analytics for Real Time Systems

    Marz, Nathan, and James Warren. Big Data: Principles and best practices of scalable real-time data systems. O'Reilly Media, 2013.

    Courtesy: Trivadis

  • Overview

    Introduction

    Big Data Analytics

    Real Time Systems

    Challenges of Real Time Analytics

    Technologies

    Tools

    Use Cases

    Future Work and Conclusion

    33 Big Data Analytics for Real Time Systems

  • Use Cases

    34 Big Data Analytics for Real Time Systems

    Healthcare Capture and analyze real-time data from medical monitors,

    alerting hospital staff to potential health problems before patients manifest clinical signs of infection or other issues.

    Analyze privacy-protected streams of medical device data to detect early signs of disease, identify correlations among multiple patients.

    Finance Analyze ticks, tweets, satellite imagery, weather trends, and any

    other type of data to inform trading algorithms in real time. Apply fraud insights to take action in real time. Use analytics on

    streaming data to confidently differentiate legitimate actions, while preventing or interrupting suspicious actions and respond immediately to criminal patterns and activities.

  • Use Cases

    35 Big Data Analytics for Real Time Systems

    Government Identify social program fraud within seconds based on program

    history, citizen profile, and geospatial data. Identify items or patterns for deeper investigation in Cyber-

    security.

    Transport Traffic managers can now respond quickly and accurately to

    relevant insights from real-time analytics drawn from data feeds and reports.

    Telematics can provide data-in-motion such as vehicle speed, data relating to the transmission control system, braking, air bags, tire pressure and wiper speed as well as geospatial and current environmental conditions data. Hence, automotive companies can strengthen customer relationships

  • Use Cases

    36 Big Data Analytics for Real Time Systems

    Telecommunication Improve customer profitability analysis, end-to-end visibility for

    new product rollouts and real-time analysis for better the network customers.

    Perform capacity planning for mobile networks as new high-bandwidth services are introduced. Improve customer experience.

    Retail See a product recurring in abandoned shopping carts. Run a

    promotion to close more sales of that product. Evaluate sales performance in real time. Take measures now to

    achieve sales quotas. An electric coupon delivery service sends e-mails to customers

    with recommendations matched to their interests derived from their location information, membership information, and information on nearby stores.

  • 37 Big Data Analytics for Real Time Systems

    Courtesy: SAP

  • Overview

    Introduction

    Big Data Analytics

    Real Time Systems

    Challenges of Real Time Analytics

    Technologies

    Tools

    Use Cases

    Future Work and Conclusion

    38 Big Data Analytics for Real Time Systems

  • Future Work

    Increased Level of Merging

    Application of Social and Digital Media

    New Technologies

    Further Development of Telemetric Data

    Self Learning Systems

    Complex Statistical Methods

    39 Big Data Analytics for Real Time Systems

  • Conclusion

    40 Big Data Analytics for Real Time Systems

    Resources

    Privacy Security

    Time Cost

    Consumer Data will be the biggest differentiator in the next two to three years. Whoever unlocks the reams of data and uses it strategically, will win

    -Angela Ahrendts, CEO, Burberry

    ?

  • 41 Big Data Analytics for Real Time Systems

    BIG DATA ANALYTICS FOR REAL TIME SYSTEMSOverviewOverviewWhere does Big Data come from?What makes it Big Data?Evolution of Big DataOverviewBig Data AnalyticsSample SystemsAnalytics & 3 VsOverviewReal Time SystemsReal-Time Analytics of Big DataOverviewChallenges of Real Time AnalyticsThinking Beyond Hadoop!Our SolutionOverviewIn-Memory ComputingStream ProcessingComplex Event ProcessingOverviewTools for Real Time AnalyticsKafkaStormSpark StreamingSpring XD (XD=eXtreme Data)Comparison of Tools (1)Comparison of Tools (2)Which tools are right for you?Lambda ArchitectureLambda Architecture ExampleOverviewUse CasesUse CasesUse CasesSlide Number 37OverviewFuture WorkConclusionSlide Number 41


Top Related