predict house prices in taichung to create an online ... · predict house prices in taichung to...

12
PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET Andi M Rizki Guerman Alexei Thimmaraju Team:7 Institute Service Science National TsingHua University 2015

Upload: others

Post on 13-Oct-2020

11 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE ... · PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET ! Andi!M!Rizki! Guerman!Alexei!

 

 

 

     

PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL

ESTATE MARKET  

A n d i   M   R i z k i  G u e r m a n   A l e x e i  

T h i mm a r a j u  T e am : 7  

       

I n s t i t u t e   S e r v i c e   S c i e n c e    N a t i o n a l   T s i n g H u a   U n i v e r s i t y  2 0 1 5      

Page 2: PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE ... · PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET ! Andi!M!Rizki! Guerman!Alexei!

 

TABLE  OF  CONTENT  

     EXECUTIVE  SUMMARY…………………………………………………………….3  TECHNICAL  SUMMARY……………………………………………………………4  

  PROBLEM  DESCRIPTION…………………………………………4  DATA  DESCRIPTION………………………………………………4  DATA  PREPARATION  FOR  ANALYSIS…………………………..4  DATA  MINING  SOLUTION………………………………………..5  CONCLUSIONS……………………………………………………..5  

APPENDICES………………………………………………………………………..6       APPENDIX  A:  VARIABLES  USE  FOR  ANALYSIS……………….7       APPENDIX  B:  EXTERNAL  VARIABLES  USE  FOR  ANALYSIS…9       APPENDIX  C:  MODEL  OUTPUT……………………………….11       APPENDIX  C:  BOXPLOT  VALIDATION  ERROR………………12  

   

Page 3: PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE ... · PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET ! Andi!M!Rizki! Guerman!Alexei!

   

 

 

 

 EXECUTIVE  SUMMARY    

Today  real  estate  market  has  become  very  popular,  but  the  housing  recovery  has  pushed  

up  home  prices  nearly  everywhere.  In  Taizhong,  Taiwan  has  not  been  the  exception,  real  

estate  market   has   expanded   in   the  past   couple   years   to   the  point  where   it   is   attracting  

interest,   not   only   from   other   parts   of   Taiwan   but   also   other   parts   of   the   world.     An  

accurate   prediction   on   the   house   price   is   important   to   prospective   homeowners,  

developers,  investors,  appraisers,  tax  assessors  and  other  real  estate  market  participants,  

such  as,  mortgage  lenders  and  insurers.  People  who  are  looking  to  buy  a  new  place  tend  to  

be  more  conservative  with  their  budget.      

   

The  goal  of  our  project   is   to  use  previous   transactions  data   to  predict  house  prices  and  

provide   to   consumers   the   information   they   need,   help   professionals   build   their  

businesses,   and   create   additional   value   in   adjacent  markets.   The   days   of   calling   a   local  

Realtor  or  hiring  an  expensive  appraiser  just  to  find  out  what  a  home  is  worth  are  falling  

behind.     In   the   first   step   we   obtained   data   from   the   Ministry   of   Interior   of   Taiwan.  

Followed   by   visualizing   the   raw   data   and   finding   the   relationship   between   predictors.  

Selection   of   variables   using   domain   knowledge   and   including   external   data   such   as  

national   economic   growth,   latitude   and   longitude   of   houses,   distance   to   nearest   future  

MRT  station  and  other  derivatives.    

 

We   considered   several   methods   to   approach   the   lowest   error.   First   a   multiple   linear  

regression   and   KNN   algorithm.     Second   Random   Trees   and   Neural   Nets.   The   first   pair  

showed   better   performance   so   we   continued   improving   those  methods.   The   low   error  

measure  in  the  MLR  showed  promising  opportunity  for  an  accurate  prediction  system.  We  

consider   the  model   can   be   further   improved  with  more   data.   Since   online   services   are  

becoming  the  new  business  platform,  we  encourage  the  application  of  this  system  on  an  

automatized  service  with  costumer  interaction.  

 

Page 4: PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE ... · PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET ! Andi!M!Rizki! Guerman!Alexei!

 

TECHNICAL  SUMMARY    

PROBLEM  DESCRIPTION  

Business  goal   is   to  predict  house  prices   in  Taichung   to  create  an  online  service   for   the  real  

estate  market  and  data  mining  goal   is   to  create  a  prediction  system  or  predictive  algorithm  

for  customer  to  help  customer  (buyer  and  seller)  to  get  the  fair  price  close  to  current  market  

price.  

 

DATA  DESCRIPTION  

The   data   file   containing   information   selling   home   price   in   Taichung,   during   2009   to   2014  

provided  by  government  official  website  http://plvr.land.moi.gov.tw/Index.  Data  divided  into  

2000  rows  and  27  columns.  The  purchase  price  of  a  house  will  depend  on  its  characteristics,  

including:  its  physical  properties,  such  as  its  size  and  number  of  bedrooms,  as  well  as  the  type  

of  neighborhood  in  which  the  house  is  located..    

• Transaction_land_building          

• Building_pattern  

• Total_building_area  

• Number_of_rooms  

• Number_of_bathrooms  

• Price_persquare_meter  

   

   

 

!�

�$

�� ��

�� ��

��

�!

��

��

%�

��

�"

&

��

��

#�

��

��

��

��

� ��

��

�"

� �

1 �� �� ! 48�! 0" "" 10309" �0" ��" 6"2 �� �� ! 17�! 29.87" �" 10308" �1" ��" 7"3 ��� �� ! 55�! 29.25" �" 10011" ��2" ��" 10"4 ��� �� ! 47�! 16.59" �" 10307" �1" ��" 7"5 �� �� ! 30�! 0" "" 10309" �0" ��" 6"  

• Age_of_building  

• Floor_bin  

• Longtude_latitude  

• Distance_to_MRT  

• Area/average  

Page 5: PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE ... · PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET ! Andi!M!Rizki! Guerman!Alexei!

 

DATA  PREPARATION  ANALYSIS  

The  data  were  translate  into  English  and  the  missing  values  of  the  predictor  variables  from  the  

data   sources   were   merged   to   get   a   more   robust   and   complete   data   set.   Our   initial   data  

exploration  and  output  of  certain  models  prompted  us  to  get  rid  of  records  with  missing  data  

to  bring  the  overall  data  set  down  to  995  records  and  11  columns.  In  addition,  properties  that  

were  not  typical  and  might  distort  the  series  were  also  removed  from  the  data  set.            

 

Data  partition  is  60%  of  training  and  40  %  of  validation.  We  use  external  predictor  and  notice  

that  among  the  discarded  data  for  this  analysis,  we  can  create  other  models  for  the  rest  types  

of   transactions  (e.g.   land,  parking  space)  and  types  of  house  (g.  undefined,  commercial  use).    

We  assume  price-­‐per-­‐sq-­‐m  predictor  based  on  a  historical  per-­‐area  computation  and  it  is  not  

the  function  of  the  selling  price. In  order  to  avoid  reducing  the  data,  we  tried  using  log  of  price  (outcome)  to   include  big  values  that  could  have  been  considered  outliers,  but   finally  we  use  

the   normal   price   because   it   gave   lower   error.  We   create   program-­‐using   python   to   convert  

address  to   longitude  and  latitude  and  R  programming  to  calculate  the  nearest  distance  from  

house  to  MRT  station  future  plan  to  build  (see  appendix).    

 

DATA  MINING  SOLUTIONS  

The   following   two   models   had   the   lowest   overall   error   rate.   Based   on   our   performance  

criteria,   they  had  better  accuracy  compare   to  naïve  rule  on  validation.  All  variables  selected  

were  available  and  relevant  at  the  time  of  prediction.  

MULTIPLE  LINEAR  REGRESSIONS  

This   model   gave   us   lowest   average   error   of   USD   -­‐24.902   compare   to   naïve   error   USD  

253,885.5   on   the   validation   data   set.   This   model   was   run   using   best   subset   to   select   the  

significant  variables.  See  appendix  for  detail  output.      

 

 

TECHNICAL  SUMMARY    

Page 6: PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE ... · PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET ! Andi!M!Rizki! Guerman!Alexei!

 

K  NEAREST  NEIGHBORS  

KNN  cannot  helps  us  select   the   important  variables  so  we  use  exploration  knowledge  and  

priors  model  to  select  the  input  variables  this  model  give  us  error  of  USD  -­‐70.7963  compare  

to  naïve  error  USD  253,885.5  

 

RECOMMENDATIONS  

There  are  several  recommendations  for  our  model:

•• Monitoring  of  system.  It  may  be  necessary  to  restate,  revise  or  remove  data  from  the  

index. .

•• Run  the  model  monthly  with  update  data.  

•• Create   alternative   source   of   data   by   providing   the   customer   the   option   to   upload  

their  home  information

•• Split  the  data  according  to  the  transaction  type

•• Try   external   data   to   increase   accuracy   and   automatize   the   system  with   the   online  

page.  

TECHNICAL  SUMMARY    

Page 7: PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE ... · PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET ! Andi!M!Rizki! Guerman!Alexei!

Chinese   English   Description   Type  號   record  #   index   numerical  鄉鎮市區   District  

 categorical  

    Subject  of  the  transaction   type:  land,  house,  parking   categorical  土地區段位置或建物區門牌   location   address   categorical  土地移轉總面積平方公尺   total  land  area   square  meters   numerical  都市土地使用分區   Urban  Land  Use  Zoning   residential,  office,  farm,  factory,  other   categorical  交易年月   Transaction  date  

 date  

交易筆棟數 Number  of  subdivisions   quantity  of  space,  room  and  parking  lot   numerical  移轉層次   floor  location   floor  #   categorical  總樓層數   Total  number  of  floors  

 numerical  

建物型態   Buildings  patterns  

biz  building,  11+  floor  residential  building  with  elevator,  10+  floor  simple  residential  building  with  elevator,  5  floor  apartment  building  no  elevator,  other,  suite  1  room  1  hall  1  bathroom,  shop,  house,  office  building,  warehouse,  factory,  farmhouse  

categorical  

主要用途   main  purpose    

categorical  主要建材   main  building  materials   brick,  reinforced  brick,  concrete,  steel   categorical  建築完成年月   Construction  completion  date  

 date  

建物移轉總面積平方公尺   total  building  area   square  meters   numerical  建物現況格局-­‐房   number  of  rooms  

 numerical  

建物現況格局-­‐廳   number  of  Halls    

numerical  建物現況格局-­‐衛   number  of  bathrooms  

 numerical  

有無管理組織   management   Have  management  organization   binary  總價元   total  price   NT$   numerical  單價每平方公尺   Price  per  square  meter   NT$/m2   numerical  車位類別   Parking  type   ramp,  lift  machine,  first  floor   categorical  車位移轉總面積平方公尺   total  area  of  parking  space   square  meters   numerical  車位總價元   total  price  Parking   NT$   numerical  

APPENDIX  A  VARIABLES    

Page 8: PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE ... · PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET ! Andi!M!Rizki! Guerman!Alexei!

APPENDIX  B  EXTERNAL  VARIABLES      Converting  address  to  longitude  and  latitude    #!/Users/admin6/anaconda/bin/python      import  geocoder    import  unicodecsv    import  logging    address=[]        lat=[]    lon=[]    with  open('taizhong_houses.csv',  'rb')  as  f:            reader  =  unicodecsv.DictReader(f,  encoding='utf-­‐8')            for  line  in  reader:                    address  =  line['address']                    g  =  geocoder.google([address],  method='geocode')    

                         if  g.ok:          pcode.extend(g.latlng)                        logging.info('SUCCESS:  '  +  str(address))                    else:                        logging.warning('Geocoding  ERROR:  '  +  str(address))    fields=  'lat',  'lon'    rows=(lat,lon)      with  open('/Users/admin6/python/mindis.csv',  'wb')  as  outfile:            w  =    unicodecsv.writer(outfile,  encoding='utf-­‐8')              w.writerow(fields)              for  i  in  rows:                      w.writerow(i)      

Page 9: PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE ... · PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET ! Andi!M!Rizki! Guerman!Alexei!

Calculate  the  nearest  distance  to  MRT  stations    x1=read.csv("zhonghouses.csv",header=F)    x2=read.csv("zhongmrt.csv",header=F)    mindis  <-­‐  function(x1,x2)  {            deg2rad  <-­‐  function(deg)  return(deg*pi/180)            R  <-­‐  6371            m  <-­‐  c()            for(i  in  1:  length(x1[,1]))  {                    lat1  <-­‐  deg2rad(x1[i,1])                    long1  <-­‐  deg2rad(x1[i,2])                    d  <-­‐  c()                    for(j  in  1:  length(x2[,1]))  {                            lat2  <-­‐  deg2rad(x2[j,1])                            long2<-­‐  deg2rad(x2[j,2])  

                         d[j]  <-­‐  acos(sin(lat1)*sin(lat2)  +  cos(lat1)*cos(lat2)  *  cos(long2-­‐long1))  *  R                    }                    m[i]  <-­‐  min(d)            }            return(m)    }    mininos  <-­‐  mindis(x1,x2)    write.table(mininos,  "mininos.txt",  sep="\t")                      

 

Page 10: PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE ... · PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET ! Andi!M!Rizki! Guerman!Alexei!

APPENDIX  C  MODEL  OUTPUT        K  NEAREST  NEIGHBORS      

         

                                 

Validation)error)log)for)different)k

Value)of)kTrainingRMS)Error

ValidationRMS)Error

1 2210.3255 2256777.42 3687.2424 2193350.8 <6)Best)k3 4174.7158 2221855.44 4391.3622 2230585.15 4513.3547 2285595.86 4591.5384 2340164.77 4645.9033 2389848.18 4685.8901 2305930.19 4716.5343 2275397.9

10 4740.7669 2211266

Training)Data)Scoring)6)Summary)Report)(for)k)=)2)

Total)sum)ofsquared)errors

RMS)ErrorAverageError

8116666667 3687.2424 ,1.56E,12

Validation)Data)Scoring)6)Summary)Report)(for)k)=)2)

Total)sum)ofsquared)errors

RMS)ErrorAverageError

1.91469E+15 2193350.8 ,2123.903 ,70.7968

Page 11: PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE ... · PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET ! Andi!M!Rizki! Guerman!Alexei!

         

 Training'Data'Scoring'-'Summary'Report

Total'sum'ofsquared'errors RMS'Error

AverageError

9.25995E+14 1245424 )3.03642E)05

Validation'Data'Scoring'-'Summary'Report

Total'sum'ofsquared'errors RMS'Error

AverageError

5.93783E+14 1221441 )747.0611615 )24.902 USD              

InputVariables

Coefficient Std.5Error t7Statistic P7Value CI5Lower CI5Upper

Intercept !2561147 2469034.688 !1.0373 0.3 !7411019.472 2288724.967distance_mrt !119871 98568.4806 !1.2161 0.2245 !313486.9626 73744.969area/avg 7425829 177654.5376 41.7993 0 7076866.121 7774792.165age 14074.38 9286.7429 1.5155 0.1302 !4167.3669 32316.1353Price5per5square5meter152.4028 5.5929 27.2495 0 141.4169 163.3887pinyin_bei3qu1 !2771531 1562715.578 !1.7735 0.0767 !5841140.393 298077.4218pinyin_bei3tun2qu1!2608015 1374541.222 !1.8974 0.0583 !5307997.141 91966.916pinyin_da4jia3ou1 994909.2 1823252.67 0.5457 0.5855 !2586467.121 4576285.608pinyin_da4li3ou1 !2276106 1187920.026 !1.916 0.0559 !4609512.409 57299.5731pinyin_da4ya3ou1!618740.1 1640064.308 !0.3773 0.7061 !3840283.462 2602803.277pinyin_dong1qu1 !2221918 1574174.562 !1.4115 0.1587 !5314035.949 870199.1393pinyin_dong1shi4ou1!57794.04 1826395.962 !0.0316 0.9748 !3645344.708 3529756.622pinyin_feng1yuan2ou1!1367368 1560643.553 !0.8762 0.3813 !4432906.418 1698171.326pinyin_long2jin3gou1!681842.4 1687925.741 !0.404 0.6864 !3997398.999 2633714.125pinyin_nan2qu1 !3768672 1170992.18 !3.2184 0.0014 !6068826.964 !1468517pinyin_nan2tun2qu1!1896256 1271070.041 !1.4919 0.1363 !4392992.285 600479.3022pinyin_sha1lu4ou1 41740.12 1692849.272 0.0247 0.9803 !3283487.628 3366967.869pinyin_tai4ping2qu1!2444140 1537064.563 !1.5901 0.1124 !5463363.476 575082.8534pinyin_tan2zi3ou1 !1793127 1459282.338 !1.2288 0.2197 !4659564.31 1073310.087pinyin_wai4bu4ou11480040 2022187.453 0.7319 0.4645 !2492099.984 5452179.456pinyin_wu1ri4ou1 !1276329 1435200.904 !0.8893 0.3742 !4095463.771 1542805.328pinyin_wu2qi1ou1 218781.7 1684032.02 0.1299 0.8967 !3089126.488 3526689.931pinyin_wu4fen1gou1!887091 1774947.348 !0.4998 0.6174 !4373582.24 2599400.259pinyin_xi1qu1 !2238096 1204538.376 !1.8581 0.0637 !4604144.769 127953.3547pinyin_xi1tun2qu1 !2404988 1507565.885 !1.5953 0.1112 !5366267.528 556291.5534pinyin_zhong1qu1 !2938748 1756059.335 !1.6735 0.0948 !6388138.327 510641.5304tran_pin_labu 528136.9 171108.0053 3.0866 0.0021 192033.1402 864240.7568floorbin_1 !2138095 402007.951 !5.3185 0 !2927750.758 !1348439.46floorbin_2 !2088456 414515.3956 !5.0383 0 !2902679.58 !1274232.06floorbin_3 !1828651 443781.4031 !4.1206 0 !2700361.426 !956940.721floorbin_5 !1350030 536380.9898 !2.5169 0.0121 !2403632.177 !296428.682floorbin_6 !1127701 603830.2482 !1.8676 0.0624 !2313791.893 58389.884patt_pinyin_Apartment302097.3 419285.1782 0.7205 0.4715 !521495.6769 1125690.203patt_pinyin_Landmark!39326.76 367964.1007 !0.1069 0.9149 !762110.8039 683457.2789patt_pinyin_ResBuild!348085.9 343597.36 !1.0131 0.3115 !1023006.882 326835.0612number5of5rooms_2!573007.4 319882.7061 !1.7913 0.0738 !1201346.223 55331.342number5of5rooms_3 !426779 390751.5931 !1.0922 0.2752 !1194324.016 340766.0381number5of5rooms_4!799902.7 434909.5487 !1.8392 0.0664 !1654186.244 54380.8725number5of5rooms_5!1529880 672810.9339 !2.2739 0.0234 !2851468.629 !208292.27number5of5rooms_6!2913212 1.4989E+22 0 1 !2.94425E+22 2.94425E+22number5of5bathrooms_1!986749.7 263062.9511 !3.751 0.0002 !1503478.673 !470020.776number5of5bathrooms_2!1285738 266013.4412 !4.8334 0 !1808262.093 !763213.026number5of5bathrooms_3!1441042 440034.7213 !3.2748 0.0011 !2305393.024 !576691.373number5of5bathrooms_4!2202877 866778.8557 !2.5415 0.0113 !3905472.087 !500281.618number5of5bathrooms_5!2063665 805872.8372 !2.5608 0.0107 !3646623.643 !480705.962number5of5bathrooms_6!2913212 1.4989E+22 0 1 !2.94425E+22 2.94425E+22

Residual)DF 551R² 0.9525Adjusted)R² 0.9486Std.)Error)Estimate 1296368.454RSS 9.25995E+14

MULTIPLE  LINEAR  REGRESSION  USING  BEST  SUBSET    

Page 12: PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE ... · PREDICT HOUSE PRICES IN TAICHUNG TO CREATE AN ONLINE SERVICE FOR THE REAL ESTATE MARKET ! Andi!M!Rizki! Guerman!Alexei!

                                       

   

APPENDIX  D  BOX  PLOT  VALIDATION  ERROR