data science, self learning algorithms (by alexander frimout & max nie)
TRANSCRIPT
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL1
Template presentation Innovation Day 2016 CONFIDENTIAL
Max NieCoordinator digital lab & project [email protected]
Alexander FrimoutConsultant [email protected]
TRACK 3: EVOLVING ARCHITECTURES
DATA SCIENCE: SELF LEARNING ALGORITHMS
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL2
CONFIDENTIAL2
EVOLVING ARCHITECTURESLearning machines in a new data world
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL3
A TRADITIONAL HARDWARE PRODUCT IS “MATURE”
Everything you need(and will ever need)In one handy box
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL4
WHAT HAPPENS TO THE BOX ONCE IT LEAVES THE COMPANY?
“We have no idea”
Sounds familiar?
Maintenance?
How is it used?How long does it last?
What goes wrong?
Are people happy with our product?
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL5
THE PRODUCT IS EXPECTED TO ALWAYS PERFORM TO ITS STANDARD
Sometimes an “error” with a product doesn’t show until later…
…or a users mess up the intended use of a product...
…and the only option is to fix/improve it in a next generation
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL6
SOFTWARE DEVELOPMENT TAKES A DIFFERENT APPROACH
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL7
EVEN SO CALLED “MATURE” SOFTWARE IS NEVER TRULY FINISHED
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL8
NEW, CONNECTED PRODUCTS ALSO HAVE THIS POSSIBILITY
Self learning machines can add enormous value!• Personal experience tailored to the user• Evolving products that promise more• Better understanding of your own product• Reduced costs for user & manufacturer
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL9
ML CAN DO VERY COOL STUFF (BUT WE DON’T FULLY UNDERSTAND WHY)
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL10
• The difference between 95% and 99% accuracy in speech recognition is game changing
• Training a speech recognition app requires $100 of electricity• 1 super computer to run a Neural Net with 100 billion connections
• 10^19 floating point operations on thousands of parallel GPUs
• 4 TB training data.
THE CATCH: ML REQUIRES TRULY MASSIVE AMOUNTS OF TRAINING DATA
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL11
ML IS NOT NEW TECHNOLOGY, THE BREAKTHROUGH IS IN THE SCALE
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL12
Can you ignore this?How do you play this game?
ML IS DRIVEN BY VERY BIG TECH WITH VERY BIG DATA
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL13
A new class of software interfaces that interacts at our own messy level• Pictures
• Speech
• Text
• Expressions
• Behavior
ML ENABLES PRODUCTS THAT UNDERSTAND AND INTERACT WITH US
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL14
IMAGE RECOGNITION
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL15
SPEECH AND NATURAL LANGUAGE RECOGNITION
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL16
1. In this ecosystem volumes of training data are the currency • Your added value is determined by how much you really know
2. Artificial Intelligence is the next computing platform• New value chains and classes of products will emerge
3. Software and CPU’s are cheap; training data is not• Algorithms and hardware are not a source of differentiation,
• Building training data is the basis for ROI
4. Performance of smart product continuously grows based on the flywheel of user generated data feedback
• Through machine learning
• Through superior user insights
IMPLICATIONS
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL17
1. What data assets do you own?2. What data assets could you create?
YOUR DATA ASSETS
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL18
CONFIDENTIAL18
HOW TO PLAYPrinciples of smart product innovation
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL19
Buying not just a product, but a promise
Some services are only possible after a sufficiently large data set/user base
Our world is evolving fast, we expect our products to evolve with us
Making the most out of data to improve your product
INCREASING VALUE WITH LEARNING & ADAPTIVE PRODUCTS
Tesla cars have driven over 150 million miles autonomously
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL20
Don’t just start collecting data without first knowing
Build the use cases for your product/service:• What is the added value of this solution? What advantages
or improvements am I offering my users?• Is this a good fit with my product? Can I do this technically?• What market am I targeting? Can I make a profit with this?
BUILDING THE RIGHT USE CASES
WHY
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL21
Involve experts from all fields
Typically during a pressure cooker or sprint session
Keep an open mind and build a wide range of diverging cases
Select the right ones by objectively criticizing all aspects
BUILDING USE CASES REQUIRES A MULTIDISCIPLINARY APPROACH!
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL22
SOME EXAMPLES OF USE CASES
IF …What is my trigger?
I detect the performance of my elevator dropping
I can monitor the heat profile and exact hotspot of a transformer
THEN …What action can I perform?
I want to dispatch a technician early
I can set up cooling much more rapidly and efficiently
BECAUSE …What is the underlying driver?
I want to prevent is from malfunctioning laterGetting stuck in an elevator causes huge dissatisfaction with my hotel guests
Better heat management canincrease operating life by several yearsUniform cooling is inefficient
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL23
3 THINGS ARE NEEDED FOR MACHINE LEARNING
1. Training Data which has been tagged, categorized, or otherwise sorted by humans.
2. Software libraries which build the machine learning models by evaluating training data.
3. Hardware CPUs and GPUs which run the software’s calculations.
More and more becoming commodities• Computation in the cloud• Low powered networking• Low powered CPU• Minimal storage
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL24
1. Product performance increases as more training data is fed
2. New user growth from ever increasing performance
3. Unique insights from product data drive product evolution and revolution
ONGOING PRODUCT PERFORMANCE IMPROVEMENT DRIVEN BY DATA
More users
More data
Better product/service
Better algorithms
1. Training data
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL25
Unique data can allow you to provide a unique service or product
1. Many different factors can make your data unique!
2. You don’t have to generate all data yourself
3. Putting together all the right pieces of the puzzle is important
DATA IS NOT A COMMODITY!
Unique location
Established base
Always-on machinery
Product data
User data
Infrastructure access
Pre-installed sensor
You have more access to unique data than you think!
Product usage
Unique technology
Financial dataIntelligentX: The beer that’s continuously getting better
Market data
R&D testing
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL26
• Hardware to collect & process data can & should be cheap!• Cheap sensors
• Computation in the cloud is mandatory to exploit big data assets
• Low powered networking
• Low powered CPU
• Minimal storage
• Hardware design must enable data collection for the right use cases and contexts
• Think beyond mobile apps to wearables and other devices
• Form factor and price will drive hardware innovation, not performance
IMPACT ON HARDWARE
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL27
• Hiring people to produce training data is too expensive
• So you must acquire an audience and let them create your training data
• The ideal data driven application creates training data and delivers value, powered by the data captured
Offer value or meaning in return for data
THE IDEAL DATA DRIVEN APPLICATION
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL28
1. It tracks your steps
2. It tracks your distance run by GPS
3. It loads your Spotify playlists
4. It connects to online services
5. It synchs data with fitness apps
6. It’s SDK allows 3rd party development
7. It’s an Alexa powered PA:• “Alexa, play my workout list”
• “Alexa, what will the weather/traffic be?
• “Alexa, what’s the latest news”
• “Alexa, add milk to my shopping list”
• “Alexa, set the house temperature to 22°
• “Alexander, you have one meeting today
PEBBLE CORE: A $69 SMART PHONE REPLACEMENT (FOR RUNNING)
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL29
Hardware suppliers can become service providers!Transformation process for organization
Requires you to consider alternative business models!
THIS WILL IMPACT YOUR ORGANIZATION & BUSINESS MODEL!
From …Buying a car
To …
Subscribing to a flexible transportation service
Who will handle user communication? Do we need an IT department? Who are our new stakeholders?
How will we handle data?
Map your new ecosystem!
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL30
Try to think of use cases for improving a product with data for…
EXERCISE
…a pillow
Hint: you can include an actuators!
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL31
CONFIDENTIAL31
WHAT DOES IT TAKEDeveloping self learning algorithms
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL32
DATA DRIVEN INNOVATION PROCESS
Create smart concept (use case)
Solve the data science problem
Develop & introduce productINNOVATE
Have training data
INSIGHTS, PERFORMANCE
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL33
DATA SCIENCE PROCESS
DATA QUESTION
DATA PRODUCT
TIDY DATA
DATA PROCESSING
DATA ANALYSIS
SneakernetManual download
Scraping Custom scripts
DescriptiveExploratoryPredictiveInferential
CausalMechanistic
Audience analysisPremises
Conclusion(s)
• Answerable with data
• Data is obtainable
• Business & user validated
• Explore• Clean• Transform• Combine
• Descriptive analysis
• Exploratory analysis
• Inferential analysis
• Predictive analysis
• Prescriptive analysis
• Post• Visualization• App
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL34
‘SUPERVISED LEARNING’ BASED ON TRAINING DATA SETS
Regression problems Classification problems
Hypothesis Function Cost Function
Source: http://www.andrewng.org/
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL35
THE OPTIMAL HYPOTHESIS MINIMIZES THE COST FUNCTION
Iterative convergence
Source: http://www.andrewng.org/
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL36
GRADIENT DESCENT ALGORITHM FOR COST FUNCTION MINIMIZATION
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL37
THE JOY OF CONVEX COST FUNCTIONS
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL38
UNFORTUNATELY REAL PROBLEMS ARE NONLINEAR
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL39
THIS REQUIRES A MORE FLEXIBLE APPROACH
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL40
NON LINEAR CLASSIFICATION
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL41
MODELING THE XNOR FUNCTION WITH A NEURAL NETWORK
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL42
BUILDING COMPLEXITY AND SCALE WITH NEURAL NETWORKS
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL43
TRAINING STEP 1: DEFINE NEURAL NETWORK ARCHITECTURE
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL44
TRAINING STEP 2: EVALUATE COST AND PARTIAL DERIVATIVE FUNCTIONS
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL45
TRAINING STEP 3: MINIMIZE NON CONVEX COST FUNCTION
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL46
HOW DEEP LEARNING OVERCOMES THE BIAS VARIANCE TRADE-OFF
1. Example: set benchmark for speech recognition at human error rate of 1%
2. If training error is too high, e.g. 5% then you have a bias issue run a bigger neural network
3. If validation set error is too high, e.g. 6% then you have a variance issue get more data
4. Otherwise you’re done.
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL47
• Machine learning libraries: Theano, Keras, NumPy,…
• Big data tooling: Hadoop, MapReduce, Spark,…
• MLaaS by Amazon, Google, IBM, Microsoft,… • And their cloud API’s for Speech, Vision, Natural Language, Translation
TAKE ADVANTAGE OF OPEN SOURCE AND CLOUD
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL48
EXAMPLE CODE TO MODEL AND FIT A NN USING KERAS AND NUMPY
from keras.models import Sequential
from keras.layers import Dense
import numpy
# fix random seed for reproducibility
seed = 7
numpy.random.seed(seed)
# load pima indians dataset
dataset = numpy.loadtxt("pima-indians-diabetes.csv", delimiter=",")
# split into input (X) and output (Y) variables
X = dataset[:,0:8]
Y = dataset[:,8]
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, init='uniform',
activation='relu'))
model.add(Dense(8, init='uniform', activation='relu'))
model.add(Dense(1, init='uniform', activation='sigmoid'))
# Compile model
model.compile(loss='binary_crossentropy' , optimizer='adam',
metrics=['accuracy'])
# Fit the model
model.fit(X, Y, nb_epoch=150, batch_size=10)
# evaluate the model
scores = model.evaluate(X, Y)
print("%s: %.2f%%" % (model.metrics_names[1], scores[1]*100))
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL49
1. Commercial data partners• Big tech companies
• Data brokers in all industry domains
• High resolution satellite data
2. Public open data• Government agencies
• Academic institutions
• International organizations
• NGO’s
• Space agencies
ENRICH YOUR DATA ASSETS WITH OPEN DATA FOR UNIQUE INSIGHTS
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL50
1. Large scale supervised machine learning enables adaptive, self learning products.
2. Large volumes of training data is a key competitive advantage, find or make your own data assets!
3. Finding the right use cases and answering the right data questionsis critical, and requires a multidisciplinary effort.
4. Algorithms and computing are becoming commoditized. Leverage open source and cloud computing and focus on strategic differentiation based on unique data.
CONCLUSIONS
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL51
VERHAERT CONNECT / CWI CASE: ILLEGAL PARKING PREDICTION
• Training data asset: several years of scan car data
• Application concept: a heat map showing illegal parking probabilities
• Data question: predict illegal parking probabilities for each city neighborhood
• Modeling approach: discrete choice regression model
• Cost function: TBD
• Algorithms: maximum likelihood estimators
• R&D plan: 3 months
• Product development plan: 3 months
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL52
VERHAERT CONNECT: A NATURAL PROPOSITION
SENSOR FUSION
TECHNOLOGY
INTEGRATION
ALGORITHMS
CONTEXT SENSITIVE
USER CENTRICBIG DATA
ADDED VALUE
MULTIDISCIPLINARY
3.2 Simple sensors: case ‘smart metering’
CONFIDENTIAL53
Innovation Day is an initiative of Masters in Innovation, the umbrella brand of the Verhaert Group which aims to connect, train and accelerate professional innovators.
KruibekeBelgiumHogenakkerhoekstraat 21B-9150 KruibekeT +32 3 250 19 00E [email protected]
www.verhaert.com
NivellesBelgium
NoordwijkNetherlands
Av. Robert Schuman 102B-1400 NivellesT +32 67 47 57 10E [email protected]
www.lambda-x.com
Kapteynstraat 12201 BB NoordwijkT +31 71 760 05 50E [email protected]
connect.verhaert.com
INDUSTRY
TECHXFER
MEDICAL
AEROSPACE
TECHXFER
FMCGCONNECT
TECHXFER
FMCGCONNECT
MEDICAL
AveiroPortugalAv. Dr. LourençoPeixinho 96D 4o3800-159 AveiroT +351 234 604 088E [email protected]
www.load-interactive.com
CONNECT
GentbruggeBelgiumBruiloftstraat 55-57B-9050 GentbruggeT +32 9 330 27 90E [email protected]
www.moebiusdesign.com
ON SITE CONSULTANCY