The current process of Big data analytics involves considerable presence of human element The current process of Big data analytics involves considerable presence of human element in form of data scientists and analysts who are in form of data scientists and analysts who are
Difficult to find because of their unique skill set. Difficult to find because of their unique skill set.
Expensive. Expensive.
Prone to errors common with any human and only work on principles of limited but well Prone to errors common with any human and only work on principles of limited but well definite set of rules and algorithms that operate within limited scope of learning.definite set of rules and algorithms that operate within limited scope of learning.
Can we reduce the involvement of data scientists and Can we reduce the involvement of data scientists and analysts by using Artificially Intelligent systems for analysts by using Artificially Intelligent systems for big data processing?big data processing?
An intelligent big data engine can…An intelligent big data engine can…
Process and predict based on huge volumes of data.Process and predict based on huge volumes of data.
Learn from the data.Learn from the data.
Identify patterns and cause and effect relationship.Identify patterns and cause and effect relationship.
Utilize Combinatorics computational model to overcome the Utilize Combinatorics computational model to overcome the limitations of a human working on the same problem. limitations of a human working on the same problem.
Facebook AI analysis systemFacebook AI analysis system
Google’s Deep LearningGoogle’s Deep Learning
Big Data Analytics in Medical FieldBig Data Analytics in Medical Field
IBM Watson LabsIBM Watson Labs
’’’’
Facebook aims toFacebook aims to
Use AI to analyze the profile semantically from the activities.Use AI to analyze the profile semantically from the activities.
A data scientist would limit the pace by finding which pattern A data scientist would limit the pace by finding which pattern to apply.to apply.
The engine will use its computational power to find a pattern, The engine will use its computational power to find a pattern, learns that pattern and apply the same pattern to other learns that pattern and apply the same pattern to other profiles.profiles.
Image: www.theverge.com
’’16000 computers, 10 million images from YouTube video frames 16000 computers, 10 million images from YouTube video frames and three days to see a cat?and three days to see a cat?
’’““We don’t understand how our We don’t understand how our deep-learning deep-learning decision-making decision-making computer systems have made themselves so good at recognizing computer systems have made themselves so good at recognizing things in photos. This means that we may need fewer experts in things in photos. This means that we may need fewer experts in future as it can instead rely on its semi-autonomous, semi-smart future as it can instead rely on its semi-autonomous, semi-smart machines to solve problems all on their own.” machines to solve problems all on their own.”
--Quoc V. Le, Google software engineer, Machine learning --Quoc V. Le, Google software engineer, Machine learning conference San Francisco.conference San Francisco.
Source: http://www.theregister.co.uk/2013/11/15/google_thinking_machines/
’’
What it means for you?What it means for you?
NuPIC :: www.numenta.org NuPIC :: www.numenta.org
GROK :: www.numenta.com GROK :: www.numenta.com
Quill :: www.narrativescience.comQuill :: www.narrativescience.com
Yseop :: www.yseop.comYseop :: www.yseop.com
Image: www.groksolutions.com
Case-study to monitor cloud serversCase-study to monitor cloud servers
•Stream data from Amazon CloudWatch. Stream data from Amazon CloudWatch.
•It builds hundreds of models automatically andIt builds hundreds of models automatically and
identifies the best model.identifies the best model.
•Get Insights, take action.Get Insights, take action.
Image: www.groksolutions.com
Image: Big Data: What Does It Really Cost? A WinterCorp Report
How much does your “conventional” Big Data Solution Cost?How much does your “conventional” Big Data Solution Cost?
$740 million to Implement $740 million to Implement Enterprise Data Warehouse Enterprise Data Warehouse
on Hadoop in 5 years for on Hadoop in 5 years for 500TB of data !!500TB of data !!
““$219 spent on Analysis”$219 spent on Analysis”
Image: http://www.kdnuggets.com/2013/02/salary-analytics-data-mining-data-science-professionals.html
How much does your organization spent on Data Scientists?How much does your organization spent on Data Scientists?
200 TB = Need 50 Data Scientist
Average of $120,000 - $180,000
= $150,000/annum
Total Cost = 50 x 150,000
= $7,500,000 ($7.5 million)/annum
Image: http://www.creditcards.com/credit-card-news/consumers-getting-smarter-about-credit_scores-1270.php
Need to GetNeed to Get
SmarterSmarterFaster…Faster…
Forbes Survey on 211 Senior Marketers
84% of agencies and non-agencies indicated it as critical for the success of
their marketing campaigns
Fast-automated systems collect and analyze data critical for:
Maintaining Data Quality
Optimizing Processes
Generating Good Return on Investment (ROI).
Image: http://www.wired.com/autopia/2009/03/fedex-gets-mad/
Leverage
Prescriptive
Proactive
Predictive
How much time and cost to destination?
Impact after taking a particular route?
Analysis
Report on route congestion
Image: http://www.ngdata.com/wp-content/uploads/multi_target_prediction.pdf
• Branch of Artificial IntelligenceBranch of Artificial Intelligence
• Self aware and self learning systemSelf aware and self learning system
• Solves complicated problems where multiple predictions are Solves complicated problems where multiple predictions are requiredrequired
Image Annotation Retrieval Image Annotation Retrieval ScenarioScenario
Image: http://thedesigninspiration.com/articles/40-clever-advertising-campaigns-of-mcdonalds/
Advertising CampaignsAdvertising Campaigns Identifies right time and communication medium to market product
Performs real time analysis on big data and accounts variable change (feedback mechanism)
Utilizes streaming analytics techniques to identify data for advertisement targeting
Takeaway: Imagine, cost involved if data scientists carry all these tasks
Image:http://www.govtech.com/computing/Baltimore-Weaves-New-Infrastructure-with-Fabric-Based-Computing.html
Increased Computational PowerIncreased Computational Power Large Hadron Collider-LHC generates Large Hadron Collider-LHC generates
5 trillion bits of data every second5 trillion bits of data every second
Increasing computational is NOT about Increasing computational is NOT about adding processorsadding processors
Use past data sets to train system for Use past data sets to train system for future data setsfuture data sets
Chop data into bits and distribute Chop data into bits and distribute across fixed processors for machine across fixed processors for machine learninglearning
Takeaway: Imagine, ROI and Takeaway: Imagine, ROI and performance on achieving even 5% of performance on achieving even 5% of computational power similar to LHCcomputational power similar to LHC
• Akshay Wattal: Analyzing cost effectiveness and efficiency of working
with Intelligent Big data with fewer data scientists.
• Mohana Kumaran S: Present Big Data infrastructure and justifying the need for Intelligent Big Data systems.
• Mohul Kaila: Introduction to Big data and its evolution.
• Shashank Garg: Identifying solutions to achieve intelligent big data systems and current state of art.
QUESTIONSQUESTIONS