comparison of machine learning algorithms for e commerce

AVINASH

SHENOI

PREDICTIVE ANALYTICS: WAY TO BOOST E-COMMERCE BUSINESS

http://www.instaclique.com/

A comparison of machine learning algorithms for E-

Commerce

Background: In this paper we compare two alternate machine-learning techniques from the Apache Mahout stable, namely: Apache Sparks’, spark-item similarity, and its counterpart Apache Hadoop’s MapReduce. We compare these both qualitatively as well as quantitatively in the context of two ecommerce stores with different behavior to determine which one is more effective and efficient in a given context.

Subjects: The subjects under test are two ecommerce stores.

1. Sample Store 1: a. Traffic: Approximately 3 M unique visitors per month b. Transactions: 2500 transactions daily

2. Sample Store 2: a. Traffic: Approximately 1 M unique visitors per month b. Transactions: 250 Transactions daily.

Data Gathering and setup: Relevant click stream data for both subjects was collected. This constitutes user behavior, namely view and buy. Based on this, predictive analytics for item-similarity was run using the Apache Spark and Apace Hadoop MapReduce Log Likelihood in both cases. The subjects were observed for 1 week to gather both quantitative and qualitative results.

Quantitative Analysis: We gathered data for both stores and plotted the following data points hourly for a one-week period. That explains the peaks and troughs where activity goes down at night and peaks during the day.

1. Total products viewed ( blue ) 2. Recommendation available from Apache Hadoop MapReduce log

likelihood (LLR ) ( red ) 3. Recommendations available from Apache SPARK (Spark ) ( grey )


Observations:

Sample Store 1:

Sample Store 2:

In the case (Sample store 2) where we have lower transactions and lower visitors, we see that Spark yields far fewer results (i.e. recommendations) than in the case (Sample store 1) where there are large number of transactions and more traffic. We see that in (Sample store 1) the total product views, the total products for which we have recommendations from LLR and recommendations from SPARK are almost identical, which shows we have recommendations for almost all products that are viewed both using Spark as well as LLR. In Sample store 2, we see that the total product views and the total products for which we have recommendations from LLR are almost identical, but the recommendations from Spark lag behind significantly.

Inference: Hence we conclude that quantitatively if the there are large number of transactions then quantitatively Spark and LLR are almost equivalent in terms of the number of recommendations they yield.

Qualitative Analysis: We gathered data for both stores and plotted the following data points hourly for a one-week period.

1. Total products bought ( purple ) 2. Products that were recommended by Apache Hadoop MapReduce log

likelihood (LLR ) that were bought ( Blue ) 3. Products that were recommended by Apace Spark (Spark) that were

bought ( grey )


Observations:

Sample Store 1:

Sample Store 2:

Sample Store 1: We see that in (Sample store 1) the total product buys, and the total products which were recommended by SPARK and bought are almost equal, which suggests that most buys were for products that were recommended by Spark. However products recommended by LLR which were bought lag behind significantly.

Sample Store 2: We see that in (Sample store 2) the total product buys, and the total products, which were recommended by SPARK and LLR and bought, are further apart than in Sample store1, which suggests that most buys were for products that were not recommended by Spark or LLR. We also see that while spark still does marginally better than LLR, both are comparable, and deviate from the products that were actually bought.

Inference: Hence we conclude that if there are large number of transactions then qualitatively Spark is significantly better than LLR, and almost all products that are recommended by Spark are also bought. LLR lags behind significantly. When there are less transactions, we see that Spark is still marginally better than LLR qualitatively, but products that are actually bought, are different from the ones that are recommended by both Spark and LLR. Avinash Shenoi Founder & Director – InstaClique [email protected]

comparison of machine learning algorithms for e commerce

Engineering