Predicting behaviour with Machine Learning

Download Predicting behaviour with Machine Learning

Post on 15-Feb-2017

489 views

Category:

Technology

7 download

Embed Size (px)

TRANSCRIPT

<p>Optimizing ECommerce experiences with machine learning and game theory, on Cassandra, Elasticsearch and Spark</p> <p>Predicting BehaviourElasticsearch, Cassandra, Machine learning, and Spark</p> <p>Jamie Turner@pcajamie</p> <p>1</p> <p>You wont have heard of us but you will have used us!</p> <p>5.5bn, 1500tps, 15m</p> <p>2</p> <p>Search + ServiceHard to scaleHard to maintainExpensive</p> <p>Background to the company + the product3</p> <p>Tin v SkinCostCapacityConsistency</p> <p>ContextCoverageConscious</p> <p>Challenges of being online + worldwide</p> <p>Balancing the efficiency of self service with the value of full service but without the cost</p> <p>Tin over skin: maintaining small business service with big business efficiency and consistency</p> <p>Technology (tin) is cheap and repeatable but easy to repeatedly be dumb. People are (normally) smart and understand context. Can we blend human like intelligence with machine efficiency?4</p> <p>TriggarInternalTraditional stackSmall data#scale #fail</p> <p>Built in house using traditional tech - .net, SQL, windows etc. </p> <p>When considering external potential, realised that this cant scale either technically or financially. 5</p> <p>Sensors Analyse</p> <p>Sensors capture information from a variety of systems, internal, external, production etc = coverage</p> <p>HD image of activity</p> <p>6</p> <p>Games Predict</p> <p>Animation Sequence: Events are analysed into streams which represent journeys and then similar ones are grouped together. These are organised into a network which we describe as a game, similar to a board game. Good areas and bad areas etc.</p> <p>STOCHASTIC MODEL allows prediction</p> <p>Whats special significant terms!</p> <p>REALTIME</p> <p>OPACITY OF WEB = behaviour not demographics</p> <p>Behaviour = everyone therefore 96:4</p> <p>7</p> <p>Interventions Improve</p> <p>Somethings wrong but what to do about it? Need to optimise but which works?</p> <p>Evidence based8</p> <p>Problem</p> <p>Volume</p> <p>Velocity</p> <p>Variety</p> <p>Classic problem this only works if you can collect loads of data, process it very fast and manage constant change. Needs to be robust, fast and will grow (and fast if necessary)9</p> <p>Research.NET friendly?FlexibleDurable, Scalable, Reliable</p> <p>Tested all the main competitors: Straight line performanceComparable performance (Mongos crazy no ack driver)Under network and node failureDeployabilityScalabilitySupportClarity (is it obvious whats going on and do you as the dev have clear levers to trade off consistency + durability etc)</p> <p>10</p> <p>OptionsCouchDB, Riak, Redis, Hbase, CouchBase, Neo4j, Dynamo, XAP, Aerospike, BigTable, Keyspace, LevelDB, AccumuloMySQLMongoDBCassandra</p> <p>MySql is familiar and sort of scalable but needs care</p> <p>Mongo has a great interface but an architectural car crash and sharding is a dark art</p> <p>Cassandra seemed mature and pretty SQL like + stable &amp; supported</p> <p>And the others its like the 80s with PCs again 11</p> <p>SolutionElasticsearchCassandraSpark.NET</p> <p>Elasticsearch = conf in SF</p> <p>Didnt like Solr</p> <p>Cassandra too rigid but counters + consistency</p> <p>12</p> <p>Solution/FlowWe have live events flowing through our .Net cluster and being stored in C*. Batch processes periodically run on Spark to create and update the games, storing the parameters of the models back into C*.Spark Streaming takes live event streams and runs some of the same code run in the batch jobs as near real-time micro-batching processes to determine when a player needs to be sent an intervention.The event data is stored in Elasticsearch to make it available for multi-faceted querying, aggregation and analytics such as significant term analysis.These outputs, along with the graphical representations of the games are sent to the UI to enable users to fully understand how the way that customers are using our site is changing over time and showing us where to apply improvements.13</p> <p>Moores law has made us complacent hardware has scaled with most problems so single server / SQL has largely kept up. But fails for bigger problems. </p> <p>Takes us back to engineering choice and compromise you need to know whats happening under the hood. </p> <p>Get it wrong and its costly understand the architectures and youll fly. </p> <p>Architect for scale size / speed and ability. The latter is vital.14</p> <p>Elasticsearch what we loveIt works!Super ExpressiveNESTAggs</p> <p>Fast / Scalable blah blah bla</p> <p>15</p> <p>Thanks for listening!</p> <p>Moores law has made us complacent hardware has scaled with most problems so single server / SQL has largely kept up. But fails for bigger problems. </p> <p>Takes us back to engineering choice and compromise you need to know whats happening under the hood. </p> <p>Get it wrong and its costly understand the architectures and youll fly. </p> <p>Architect for scale size / speed and ability. The latter is vital.16</p>