big data analytics solution for real time
TRANSCRIPT
BIG DATA ANALYTICS For
REAL TIME SYSTEM
Where does big data come from?
Big Data is often boiled down to three main varieties:
• Transactional data—these include data from invoices, payment orders, storage records, and delivery records.
• Machine data—this can be data gathered from industrial equipment (for example, the latest generation of aircraft produce several terabytes of data on asingle transatlantic flight), real-time data from sensors (including sensors on your smart-phone or your heart rate monitor, not to mention the 4m CCTV cameras around the UK), and web logs that track user behaviors online.
• Social data—this could be data coming from social media services, such as Facebook Likes, Tweets and YouTube views.
In many cases, this data on its own is meaningless. Real business value often comes from combining these Big Data ‘feeds’ with ‘traditional’ (relational) data such as customer records, sales location data, and revenue figures to generate new insights, decisions and actions.
What makes it big data?
Extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.
Evolution of Big Data
Big data Analytics Big data analytics is the process of examining large data sets to uncover hidden patterns, unknown correlations, market trends, customer preferences and other usefulbusiness information.
Various Kind of Analytics
Predictive Analytics
Predictive analytics is the branch of the advanced analytics which is used to make predictions about unknown future events. Predictive analytics uses many techniques from data mining, statistics, modeling, machine learning, and artificial intelligence to analyze current data to make predictions about future.
Real Time Analytics
A real-time system is one that processes information and produces a response within aspecified time, else risk severe consequences, sometimes including failure.
Real-time Big-Data Analytics or Real-time business intelligence (RTBI) is the process of delivering information about business operations as they occur. Real time means near to zero latency and access to information whenever it is required.
Real-time Processing SystemsReal-time means a range from few seconds to a few milliseconds after the business event has occurred. While traditional business intelligence presents historical data for manual analysis, real-time business intelligence compares current business events with historical patterns to detect problems or opportunities automatically. This automated analysis capability enables corrective actions to be initiated and/or business rules to be adjusted to optimize business processes.
Tools For Real Time Analytics1. Apache Spark
2. Apache Storm
3. Apache kafta
Apache Sparkpache® Spark™ is a powerful open sourceprocessing engine built around speed, ease of use,and sophisticated analytics. It was originallydeveloped at UC Berkeley in 2009.
Benefits• Speed
• Ease of Use• A Unified Engine
Engineered from the bottom-up for performance, Spark can be 100x faster than Hadoop for large scale data processing by exploiting in memory computing and other optimizations. Spark is also fast when data is stored on disk, and currently holds the world record for large-scale on-disk sorting.
Spark has easy-to-use APIs for operating on large datasets. This includes a collection ofover 100 operators for transforming data and familiar data frame APIs for manipulating semi-structured data.
Spark comes packaged with higher-level libraries, including support for SQL queries, streaming data, machine learning and graph processing. These standard libraries increase developer productivity and can be seamlessly combined to create complex workflows.
Stream Analytix Solution with Apache Spark Impetus Technologies Announces StreamAnalytix 2.0 Featuring Support for Apache Spark StreamAnalytix™ 2.0, featuring support for Apache Spark Streaming, in addition to the current support for Apache Storm. The platform will provide enterprises with the advantages of the industry's first open-source based, enterprise-grade, multi-engine platform for rapid and easy development of real-time streaming analytics applications.
Among stream processing engines, Spark Streaming is gaining popularity, while Apache Storm has been in production deployments for many years and is a robust, proven, widely used option. StreamAnalytix 2.0 builds on its existing visual integrated development and application-monitoring environment to provide abstraction over multiple streaming engines. It can also accommodate newer engines as they gain market acceptance. This approach allows developers and data analysts to use drag-and-drop operators to create real-time analytics applications by choosing the most optimal engine for each use case.
StreamAnalytix 2.0 builds upon the successful adoption of version 1.0, which is used by leading Fortune 1000 companies that are taking advantage of streaming data for
improved business outcomes. In addition to support for Spark Streaming,
There are a number of important functional enhancements in this release, including:• Spark Streaming
• Rich array of drag-and-drop Spark data transformations.
• Support for Spark SQL and MLlib operations.
• Platform Enhancements
• Ability to interconnect subsystems, which individually use different streaming
engines.
• Embedded complex event processing engine enhanced for high-availability
support.
• Built-in operators for predictive models including inline model-test feature.
• Additional support for industry standard message queue systems, including
Amazon Kinesis and Simple Storage Service (S3), Apache ActiveMQ, IBM MQ
and TIBCO.
• Enhanced self-service, real-time dash-boarding with editable widgets for
various chart types.
• Multi-tenancy controls with the ability to restrict resources for specific tenants
and pipelines.
• Ability to create multiple versions of real-time pipelines and choose the active
version.
• Rich array of real-time data processing functions for string, time, date, numeric
and other data types.
• Code-free enrichment and blending of streaming data with static data with
lookups and MVEL expressions.
• Extensibility of stream-processing operators and libraries with user-defined
functions.
Apache StormApache Storm is a free and open source distributed realtime computation system. Storm makes it easy to reliably processunbounded streams of data, doing forrealtime processing what Hadoop did forbatch processing. Storm is simple, canbe used with any programming language, and is a lot of fun to use!
Stream Analytix Solution with Apache StormEase of Development
A powerful visual designer interface makes it extremely easy to build applications
quickly using built-in operators.
Abstraction over Complex Technologies
Lets you focus on your business logic rather than worrying about the underlying
infrastructure.
Apache kaftaApache Kafka is publish-subscribe messaging rethought as a distributed commit log. A
single Kafka broker can handle hundreds of megabytes of reads and writes per second
from thousands of clients. Kafka is
designed to allow a single cluster to
serve as the central data backbone
for a large organization. It can be elastically and transparently expanded without
downtime. Data streams are partitioned and spread over a cluster of machines to
allow data streams larger than the capability of any single machine and to allow
clusters of co-ordinated consumers. Kafka has a modern cluster-centric design that
offers strong durability and fault-tolerance guarantees.
Contact
http://streamanalytix.com
720 University Avenue Suite 130Los Gatos, CA 95032
4082133310