Machine Learning in Magento 2

Download Machine Learning in Magento 2

Post on 06-Jan-2017



Data & Analytics

5 download

Embed Size (px)


<p>PowerPoint Presentation</p> <p>Machine Learning in Magento 2</p> <p>WHAT</p> <p>Customer retentionRecommendations during product choosingCross-sell during purchasing processRealtime personal discountsPersonalized search results</p> <p>Customer returnsPredict following sales and offer the customerPersonalized discounts and hot-sales e-mailsWork with abandoned basketsAfter-sale support by e-mail of phone</p> <p>Customer behavior analyticsCustomers segmentationAutomatic clusterizationSearching for hidden behavior patternsSignalizing about unusual customer activities</p> <p>Customer behavior predictionPredict customer preferencesPredict unknown data about customerPredict future purchasesPredict lost customersPredict anything for what you can find correlations</p> <p>The Main goal:personalization</p> <p>Convert visitors to HAPPY buyersAnalyze visitorPredict visitor needsDetermine visitor behavior patternInject into salesflow the most effective additional points of influence personalized for the visitorSuggest exactly what the visitor wantsMake the visitor happy buyerThank him for the purchasing and suggest more</p> <p>HOW</p> <p>Magento 2.0instance</p> <p>Data SourcesEvents flow</p> <p>Generate events</p> <p>ES</p> <p>Persist events to datastore321Event consumersMachine Learning standalone service</p> <p>Data required for prediction models</p> <p>Consumer asks by SOAP for a prediction results</p> <p>Create sub-events</p> <p>Calls to API in order to ML decisions</p> <p>Communicates with visitors</p> <p>Realtime calls to ML service API to obtain predicted dataHadoopSpark ML</p> <p>Batch and realtime long-term history analysis, heavy reportingCustomer activities, internal data changesData Flow</p> <p>Data sourcesProducts catalog, inventoryPages visit logsPurchases, abandoned basketsRatings, reviewsExternal data sources like Twitter, Amazon, public datasets, etc.Timeseries withhistory of changes of products pricescustomers activity log</p> <p>Events FLOWCommon event bus using RabbitMQ for small customers and Apache Kafka for a largeIts a horizontal highly scalable solutionAll data inside events should get to the persistent datastore according to consumers rulesAfter that consumers may trigger sub-event for the ML algorithms that depends on changed dataIf ML algorithm should call some API method in Magento (for example add customer to a new segment), it would publish event for the appropriate consumerOn each step we have the opportunity to integrate any external systems into our process flow through the event bus</p> <p>Persistent datastoreDatastore should have three levelsIn-memory datastore to cache operational data for realtime queriesOperational datastore to persists all appropriate data for machine learning algorithmsAnalytical datastore for all historical data which will be used for a heavy reporting and deep ML analysisDue to the probabilistic nature of the ML algorithm, in datastore architecture we can sacrifice Consistency of CAP theory and guarantee Availability and Partition toleranceOn the first step of discussing I propose to use Redis(VoltDB, Aerospike, Tarantool), ElasticSearch(Solr, MySQL, HBase) and Hadoop</p> <p>Machine learning serviceWill be implemented as standalone serviceBinary/SOAP/REST protocols using HTTP/TCP transport layerDirect read-only access to all data sourcesACL checks should be implemented on clientsHorizontally scalable nothing-shared architectureCalculated models will be synced using binary protocol without master-node (Zookeeper)Each node has its own memory pool to store internal datasets for calculations</p> <p>Hadoop + SparkShould be implemented only for extremely large storesHadoop is a very slow datastoreBut Hadoop and Spark together allow us to use machine learning algorithms in near-realtime and distributed mannerUsing event bus we can write all the data to Hadoop and run ML tasks on unlimited volumes of data:ReportingBatch clusteringSearching for patterns and outliers</p> <p>Use CASES</p> <p>Realtime Recommendations Using all historical data about users activity and internal datasources we can predict customer needsUser activities:Page views log with duration of an each page viewVisitor returnsRegistrationsRatings and reviewsPurchasesAbandoned shopping cartInternal datasourcesProducts prices with changesDiscountsCustomer segments</p> <p>Personal discountsCreate behavior patterns and detect cases when merchant should give a personal discount to customer on a particular productDiscounts will be shown to customer in realtime during catalog browsingIf customer didnt purchase discounted product, algorithm should take this into consideration in further work with this customer</p> <p>Personalized product catalogBesides product recommendations ML algorithms can determine customers preferences and generate product catalog page according to themProducts list may naturally include starred products from predicted listAnother way is sorting list by the best choice</p> <p>MACHINE LEARNING ALGORITHMS</p> <p>Collaborative filtering</p> <p>Boosted random FORESTS</p> <p>Hidden Markov decision trees</p> <p>k-Nearest Neighbors Classification</p>