machine learning in magento 2

Download Machine Learning in Magento 2

Post on 06-Jan-2017



Data & Analytics

5 download

Embed Size (px)


PowerPoint Presentation

Machine Learning in Magento 2


Customer retentionRecommendations during product choosingCross-sell during purchasing processRealtime personal discountsPersonalized search results

Customer returnsPredict following sales and offer the customerPersonalized discounts and hot-sales e-mailsWork with abandoned basketsAfter-sale support by e-mail of phone

Customer behavior analyticsCustomers segmentationAutomatic clusterizationSearching for hidden behavior patternsSignalizing about unusual customer activities

Customer behavior predictionPredict customer preferencesPredict unknown data about customerPredict future purchasesPredict lost customersPredict anything for what you can find correlations

The Main goal:personalization

Convert visitors to HAPPY buyersAnalyze visitorPredict visitor needsDetermine visitor behavior patternInject into salesflow the most effective additional points of influence personalized for the visitorSuggest exactly what the visitor wantsMake the visitor happy buyerThank him for the purchasing and suggest more


Magento 2.0instance

Data SourcesEvents flow

Generate events


Persist events to datastore321Event consumersMachine Learning standalone service

Data required for prediction models

Consumer asks by SOAP for a prediction results

Create sub-events

Calls to API in order to ML decisions

Communicates with visitors

Realtime calls to ML service API to obtain predicted dataHadoopSpark ML

Batch and realtime long-term history analysis, heavy reportingCustomer activities, internal data changesData Flow

Data sourcesProducts catalog, inventoryPages visit logsPurchases, abandoned basketsRatings, reviewsExternal data sources like Twitter, Amazon, public datasets, etc.Timeseries withhistory of changes of products pricescustomers activity log

Events FLOWCommon event bus using RabbitMQ for small customers and Apache Kafka for a largeIts a horizontal highly scalable solutionAll data inside events should get to the persistent datastore according to consumers rulesAfter that consumers may trigger sub-event for the ML algorithms that depends on changed dataIf ML algorithm should call some API method in Magento (for example add customer to a new segment), it would publish event for the appropriate consumerOn each step we have the opportunity to integrate any external systems into our process flow through the event bus

Persistent datastoreDatastore should have three levelsIn-memory datastore to cache operational data for realtime queriesOperational datastore to persists all appropriate data for machine learning algorithmsAnalytical datastore for all historical data which will be used for a heavy reporting and deep ML analysisDue to the probabilistic nature of the ML algorithm, in datastore architecture we can sacrifice Consistency of CAP theory and guarantee Availability and Partition toleranceOn the first step of discussing I propose to use Redis(VoltDB, Aerospike, Tarantool), ElasticSearch(Solr, MySQL, HBase) and Hadoop

Machine learning serviceWill be implemented as standalone serviceBinary/SOAP/REST protocols using HTTP/TCP transport layerDirect read-only access to all data sourcesACL checks should be implemented on clientsHorizontally scalable nothing-shared architectureCalculated models will be synced using binary protocol without master-node (Zookeeper)Each node has its own memory pool to store internal datasets for calculations

Hadoop + SparkShould be implemented only for extremely large storesHadoop is a very slow datastoreBut Hadoop and Spark together allow us to use machine learning algorithms in near-realtime and distributed mannerUsing event bus we can write all the data to Hadoop and run ML tasks on unlimited volumes of data:ReportingBatch clusteringSearching for patterns and outliers


Realtime Recommendations Using all historical data about users activity and internal datasources we can predict customer needsUser activities:Page views log with duration of an each page viewVisitor returnsRegistrationsRatings and reviewsPurchasesAbandoned shopping cartInternal datasourcesProducts prices with changesDiscountsCustomer segments

Personal discountsCreate behavior patterns and detect cases when merchant should give a personal discount to customer on a particular productDiscounts will be shown to customer in realtime during catalog browsingIf customer didnt purchase discounted product, algorithm should take this into consideration in further work with this customer

Personalized product catalogBesides product recommendations ML algorithms can determine customers preferences and generate product catalog page according to themProducts list may naturally include starred products from predicted listAnother way is sorting list by the best choice


Collaborative filtering

Boosted random FORESTS

Hidden Markov decision trees

k-Nearest Neighbors Classification


View more >