Predicting behaviour with Machine Learning

Download Predicting behaviour with Machine Learning

Post on 15-Feb-2017

487 views

Category:

Technology

6 download

TRANSCRIPT

Optimizing ECommerce experiences with machine learning and game theory, on Cassandra, Elasticsearch and Spark

Predicting BehaviourElasticsearch, Cassandra, Machine learning, and Spark

Jamie Turner@pcajamie

1

You wont have heard of us but you will have used us!

5.5bn, 1500tps, 15m

2

Search + ServiceHard to scaleHard to maintainExpensive

Background to the company + the product3

Tin v SkinCostCapacityConsistency

ContextCoverageConscious

Challenges of being online + worldwide

Balancing the efficiency of self service with the value of full service but without the cost

Tin over skin: maintaining small business service with big business efficiency and consistency

Technology (tin) is cheap and repeatable but easy to repeatedly be dumb. People are (normally) smart and understand context. Can we blend human like intelligence with machine efficiency?4

TriggarInternalTraditional stackSmall data#scale #fail

Built in house using traditional tech - .net, SQL, windows etc.

When considering external potential, realised that this cant scale either technically or financially. 5

Sensors Analyse

Sensors capture information from a variety of systems, internal, external, production etc = coverage

HD image of activity

6

Games Predict

Animation Sequence: Events are analysed into streams which represent journeys and then similar ones are grouped together. These are organised into a network which we describe as a game, similar to a board game. Good areas and bad areas etc.

STOCHASTIC MODEL allows prediction

Whats special significant terms!

REALTIME

OPACITY OF WEB = behaviour not demographics

Behaviour = everyone therefore 96:4

7

Interventions Improve

Somethings wrong but what to do about it? Need to optimise but which works?

Evidence based8

Problem

Volume

Velocity

Variety

Classic problem this only works if you can collect loads of data, process it very fast and manage constant change. Needs to be robust, fast and will grow (and fast if necessary)9

Research.NET friendly?FlexibleDurable, Scalable, Reliable

Tested all the main competitors: Straight line performanceComparable performance (Mongos crazy no ack driver)Under network and node failureDeployabilityScalabilitySupportClarity (is it obvious whats going on and do you as the dev have clear levers to trade off consistency + durability etc)

10

OptionsCouchDB, Riak, Redis, Hbase, CouchBase, Neo4j, Dynamo, XAP, Aerospike, BigTable, Keyspace, LevelDB, AccumuloMySQLMongoDBCassandra

MySql is familiar and sort of scalable but needs care

Mongo has a great interface but an architectural car crash and sharding is a dark art

Cassandra seemed mature and pretty SQL like + stable & supported

And the others its like the 80s with PCs again 11

SolutionElasticsearchCassandraSpark.NET

Elasticsearch = conf in SF

Didnt like Solr

Cassandra too rigid but counters + consistency

12

Solution/FlowWe have live events flowing through our .Net cluster and being stored in C*. Batch processes periodically run on Spark to create and update the games, storing the parameters of the models back into C*.Spark Streaming takes live event streams and runs some of the same code run in the batch jobs as near real-time micro-batching processes to determine when a player needs to be sent an intervention.The event data is stored in Elasticsearch to make it available for multi-faceted querying, aggregation and analytics such as significant term analysis.These outputs, along with the graphical representations of the games are sent to the UI to enable users to fully understand how the way that customers are using our site is changing over time and showing us where to apply improvements.13

Moores law has made us complacent hardware has scaled with most problems so single server / SQL has largely kept up. But fails for bigger problems.

Takes us back to engineering choice and compromise you need to know whats happening under the hood.

Get it wrong and its costly understand the architectures and youll fly.

Architect for scale size / speed and ability. The latter is vital.14

Elasticsearch what we loveIt works!Super ExpressiveNESTAggs

Fast / Scalable blah blah bla

15

Thanks for listening!

Moores law has made us complacent hardware has scaled with most problems so single server / SQL has largely kept up. But fails for bigger problems.

Takes us back to engineering choice and compromise you need to know whats happening under the hood.

Get it wrong and its costly understand the architectures and youll fly.

Architect for scale size / speed and ability. The latter is vital.16