altima: a kbb-like reference pricing system

Post on 11-Apr-2017

779 Views

Category:

Data & Analytics

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A KBB-like Reference Pricing System ---- Using Machine Learning

Team: Altima Hao Zhu Yingqi Yang

Product

A KBB-like Reference Pricing System

The end product could be integrated with online apt./room/house etc. rental listings to provide people looking for rental housing with a reference point for rent negotiation

Rental Details Asking Rent Reference Price % Above Reference

…… 1000 800 25%

Business Model

• Build our own service by scraping online rental listings and applying this system

• Cooperate with online rental listing providers such as Craigslist and provide this system as a value-added service

• Promote this system to other similar web services such as ebay auction to predict closing price

Approach

Data Set

Source Seattle Apt/House Rent Price Downloaded from GitHub

Size• Total of 2313 Entries from Nov. 2014• Training/Validation: 75/25

Attribute

• Responds (Price) • 14 Predictors

(Number of Bedrooms, Room Size, Listing Title……)

Approach

Data Exploration

Outlier:

Price < 600 or Price > 3100

Rent Price Distribution

Rent Price Histogram

Approach

Text Mining on Listing Title Variable

Approach

Model Selection

(1) K Nearest Neighbors – Regression Model (KNN)

An algorithm that stores all available cases and predict the numerical target based on a similarity measure (e.g., distance functions).

Numeric Variables -- Euclidean Categorical Variables – Hamming Distance

Approach

Model Selection

(1) K Nearest Neighbors – Regression Model (KNN)

Size Beds Zip code Price1 1710 4 98115 25002 2200 2 98199 28953 1420 2 98117 2150

Step1: Standardize Data Set

Size Beds 98104 98115 98117 Price1 0.564 0.212 0 1 0 25002 0.731 0.091 1 0 0 28953 0.465 0.091 0 0 1 2150

Approach

Model Selection

(1) K Nearest Neighbors – Regression Model (KNN)Step2: Give Reasonable Weights

Variable Size Bath Bed Zip Code ……

Weight 5 4 3 2 ……

Approach

Model Selection

(1) K Nearest Neighbors – Regression Model (KNN)

Forecast 2.053 0.273 98104 ?

③①

Step3: Calculate Distance

K =1 Price = 2150K =2 Price = (2150+2500) /2

Size Beds 98104 98115 98117 Price Distance1 2.819 0.636 0 1 0 2500 0.82 3.654 0.273 0 0 0 2895 1.63 2.325 0.273 0 0 1 2150 0.3

Approach

Model Selection

(1) K Nearest Neighbors – Regression Model (KNN)

(2) Other Models

• Decision Tree Model• Forest Model• Spline Model• Support Vector Machine Model (SVM)

Approach

Model Comparison

Model Name MAPE RMSEKNN Regression

Model 0.17963 20.53814

Decision Tree Model 0.15522 334.49524

Forest Model 0.12895 287.84426

Spline Model 0.16774 408.67882

* SVM Model 0.15726 336.83526

* Not able to implement in Alteryx Designer; Used R to develop instead

Result: Ensemble Model

Demo

Baths Beds Size Zip Code Price Reference Price % Above Reference

1 1 828 98121 2,055 2,038 0.011 2 900 98117 1,800 1,700 0.061 1 583 98121 2,395 1,395 0.721 1 577 98121 1,398 1,595 -0.12

Model Improvement

• Use a larger dataset to build the model to make it stronger

• Add attributes such as availability of pool, security guard, etc.• Include contents of the listings for text mining• Distinguish between house and apartment

• Add time component to the model to handle trend and seasonality in rent price

• Do more research on the variables to get better weights for KNN Regression Model

Q & A

AppendixK Nearest Neighbors – Regression Model (KNN)

D1 ¿ 2√(2.053−2 .819)2+(0.273−0.636)2+1

D2 ¿ 2√(2.053−3 .654 )2+(0.273−0.273)2+0

F_Price = 2150

F_Price = (2150+2500) / 2

K = 1

K = 2

Step3: Calculate Distance

Forecast 2.053 0.273 98104 ?

③①

②Size Beds 98104 98115 98117 Price Distance

1 2.819 0.636 0 1 0 2500 0.82 3.654 0.273 1 0 0 2895 1.63 2.325 0.273 0 0 1 2150 0.3

Reference

http://www.ncbi.nlm.nih.gov/pubmed/16723004

http://www.cs.upc.edu/~bejar/apren/docum/trans/03d-algind-knn-eng.pdf

top related