uss-posco final project report

14
Final Project Report Data Mining UPI Market Opportunities Brian Joubran, Aaron Poole, Stefan State, Travis Swenson MGB 269 – BUSINESS INTELLEGENCE TECHNOLOGIES DATA MINING June 8, 2015

Upload: brian-a-joubran

Post on 15-Apr-2017

131 views

Category:

Documents


6 download

TRANSCRIPT

Page 1: USS-POSCO Final Project Report

 

Final  Project  Report  Data  Mining  UPI  Market  Opportunities  

Brian  Joubran,  Aaron  Poole,  Stefan  State,  Travis  Swenson  

MGB  269  –  BUSINESS  INTELLEGENCE  TECHNOLOGIES  DATA  MINING  

June  8,  2015        

Page 2: USS-POSCO Final Project Report

Introduction  &  Company  Background USS-­‐POSCO  Industries  (UPI)  is  a  steel  finishing  plant  located  in  Pittsburg,  

California.  The  plant  has  been  in  continuous  operation  since  1901  and  had  undergone  several  ownership  changes.  The  current  ownership  structure  was  created  in  1986  as  a  joint  venture  between  the  United  State  Steel  Corporation  headquartered  in  Pittsburgh,  PA  and  POSCO  of  Seoul  South  Korea.  The  company  employs  approximately  750  workers,  the  majority  of  which  are  represented  by  the  United  Steel  Workers  union.  

UPI  and  the  entire  steel  industry  have  struggled  since  the  financial  crisis  of  2008.  During  the  five-­‐year  period,  2009  to  2013,  the  company  had  combined  losses  of  nearly  $200  million  and  saw  its  owner's  equity  decreased  by  $100  million.  Production  at  the  facility  has  dropped  significantly  and  is  currently  operating  at  about  60%  of  capacity.  These  challenges  are  in  no  way  unique  to  USS-­‐POSCO  Industries  but  rather  represent  the  difficulties  being  faced  by  the  entire  industry. Because  of  these  challenges  UPI  has  sought  to  expand  its  product  offering  in  order  to  reach  new  customers  and  expand  sales  with  existing  customers.  The  company  has  begun  doing  this  by  modifying  existing  facilities  to  produce  a  wider  range  of  products.  The  company  has  also  partnered  with  one  of  its  owners  in  order  to  obtain  and  resell  products  that  UPI  is  not  currently  capable  of  producing.  

Expanding  into  these  new  markets  or  new  areas  of  sales  presents  a  new  challenge  for  the  company.  In  the  past,  sales  were  stable  or  at  least  they  were  well  understood  by  the  firm.  That  is  to  say  UPI  has  had  a  well-­‐established  pool  of  customers,  and  understood  the  market  in  which  they  were  a  competitor.  In  the  past  UPI  felt  they  had  a  good  understanding  of  who,  and  what,  their  customers  and  potential  were  and  what  product  they  were  willing  to  buy.  However,  as  UPI  expands  into  new  markets,  some  of  which  they  have  no  previous  experience,  the  firm  needs  to  be  able  to  use  the  data  it  has  available  to  best  focus  its  sales  efforts.

Competitor  Landscape  &  New  Product  Opportunities UPI  produces  flat  rolled  carbon  steel,  their  products  are  classified  into  three  

categories;  Cold  Rolled  Annealed  (CRA),  Hot  Dipped  Galvanized  Steel  and  Electroplated  Tinplate.  UPI  customers  use  these  products  to  produce  office  furniture,  tubing  for  electrical  conduit,  computer  case,  tin  cans  for  food  packaging,  and  oil  filters.  The  company  markets  the  majority  of  its  products  in  the  13  western  United  States  and  British  Columbia.

Of  the  three  products  the  market  for  Tinplate  is  the  most  highly  concentrated.  Customers  for  tinplate  buy  only  that  product  and  no  other.  The  reason  for  this  is  because  the  end  use  of  tinplate  is  concentrated  to  manufactures  that  produce  tin  cans.  In  UPI’s  marketplace  there  are  only  half  a  dozen  customers  for  this  product  with  one  customer  accounting  for  80%  of  all  tinplate  sales.  Currently,  UPI  enjoys  a  90%  market  share  of  tinplate  sales  in  its  market.  This  is  mostly  due  to  the  fact  the  company  is  the  only  tinplate  producer  located  in  the  marketplace,  all  other  producers  are  located  in  the  Midwest  or  overseas  and  must  bear  the  cost  of  

Page 3: USS-POSCO Final Project Report

transporting  their  material  into  the  market.  Because  of  this  competitive  advantage  Tinplate  is  currently  UPI’s  most  profitable  product.  However  this  product  faces  stagnate  demand.  As  explained  previously,  tinplate  is  most  often  used  in  making  tin  cans  for  food  and  the  demand  for  this  product  has  been  unchanged  for  many  years.

The  market  for  Cold  Rolled  Annealed  (CRA)  and  Galvanized  is  much  more  diverse  and  subject  to  competition.  Customers  of  these  products  produce  a  much  wider  range  of  end  products  and  are  more  sensitive  to  price.  This  market  and  its  opportunities  could  best  be  explained  by  describing  the  supplier  in  the  market  and  the  customers  they  compete  for.  Figures  1  &  2  below  provide  a  picture  of  UPI  sales  and  shares  in  the  west  coast  tin  market.  

Figure  1.  UPI  Tin  Sales  and  Market  Share  

 

Suppliers  &  Customers There  are  four  major  providers  of  flat  rolled  carbon  steel  on  the  west  coast.  

They  are;  UPI,  California  Steel  Industries  (CSI)  located  in  Fontana  California,  mills  located  in  the  Midwest  of  the  United  States  and  foreign  imports.    Before  the  economic  crisis  UPI  and  CSI  dominated  the  market.  Both  of  these  suppliers  are  located  on  the  west  coast  and  were  able  to  secure  lower  shipping  costs  and  short  lead  times,  this  enabled  them  to  deliver  lower  priced  goods  faster  than  foreign  imports  and  Midwest  mills.  However,  as  the  economy  has  recovered  and  demand  for  steel  has  began  to  increase,  imports  have  grown  to  capture  a  majority  of  the  market  (see  figure  2).  Price  has  been  the  primary  force  behind  this  transition.  Recently  foreign  mills  have  been  quoting  prices  30%  less  than  UPI.  Many  of  the  customers  in  the  market  have  modified  their  business  plans  to  account  for  the  longer  lead  time  required  to  import  the  material  in  order  to  take  advantage  of  the  lower  priced  imports.  Competition  for  customers  between  UPI  and  CSI  has  grown  increasingly  aggressive.  

Tin  30%  

Steel  70%  

UPI  Sales  

UPI  90%  

Other  10%  

Tin  Market  Shares    (West  Coast)  

Page 4: USS-POSCO Final Project Report

Figure  2.  UPI  Market  Shares  for  Steel  Products  Pre/Post-­‐2008  

   

The  customers  for  the  major  providers  fall  into  two  categories;  manufactures  

and  resellers.  Manufactures  use  steel  in  the  production  of  their  end  product.  Their  level  of  consumption  is  such  that  they  require  a  direct  supply  chain  to  a  steel  producer.  These  customers  buy  hundreds  if  not  thousands  of  tons  annually.  They  typically  purchase  products  within  a  certain  specification  range  and  order  quantities  following  their  business  cycle.  Manufacturers  also  require  a  short  lead-­‐time  so  that  they  are  able  to  react  to  changes  in  the  market  demand  for  their  product.  

Resellers  or  Service  Centers  are  customers  that  buy  products  from  the  steel  producers  in  bulk  and  then  split  it  into  smaller  lots  for  smaller  manufactures  or  sheet  metal  facilities.  Service  Centers  also  are  capable  of  cutting  coils  into  dimensions  required  by  their  customers  for  a  service  fee.  Resellers  and  Service  Center  buy  almost  exclusively  based  on  price.  Their  business  model  is  such  that  they  have  to  hold  inventory  ready  to  resell  and  in  order  to  secure  acceptable  margins  they  must  purchase  the  lowest  priced  material  possible.  

Business  Opportunity  of  New  Products As  the  market  has  become  more  competitive  UPI  has  searched  for  new  

opportunities.  One  strategy  that  the  company  has  is  to  expand  its  product  line  beyond  what  it  is  currently  capable  of  producing.  UPI  has  discussed  the  opportunity  of  buying  finished  product  from  POSCO  and  then  reselling  it  to  its  existing  customers.  While  this  would  be  a  lower  margin  product  it  would  increase  overall  sales  and  provide  a  higher  level  of  customer  service.

UPI  is  look  at  expanding  its  product  line  by  offering  coils  that  are  wider  than  what  they  can  produce  on  its  current  equipment.  The  commercial  group  has  reported  multiple  times  in  the  past  that  this  is  the  most  requested  item  from  the  customers.  

UPI  33%  

CSI  33%  

Midwest  17%  

International  17%  

Pre-­‐2008  

UPI  16%  

CSI  17%  

Midwest  17%  

International  50%  

Post-­‐2008  

Page 5: USS-POSCO Final Project Report

Variables  for  Analysis The  company  collects  considerable  data  on  the  material  it  sells  to  its  

customers.  We  were  able  to  obtain  a  file  that  the  company  uses  in  its  own  analysis  of  sales.  The  original  file  detailed  the  sales  for  the  last  ten  years.  The  file  had  over  50  variables  and  contained  over  1.2  million  records.  We  were  quickly  able  to  filter  this  file  down  to  a  more  manageable  size.  We  eliminated  records  over  five  years  old.  This  was  done  because  it  is  only  the  last  five  years  that  represents  the  current  market  and  customer  base.  We  also  eliminated  many  variables  that  were  not  relevant  to  an  analysis  of  offering  new  products.  Finally,  we  eliminated  records  that  contained  tin  sales  and  customers  that  required  their  product  to  be  sourced  by  U.S.  suppliers.  We  did  this  because  the  company  does  not  feel  as  if  there  are  any  increased  sales  opportunities  in  the  tin  market,  and  the  offer  to  supply  extended  finished  product  has  only  come  from  POSCO,  which  is  in  South  Korea.

Once  we  filtered  all  of  this  data  out  of  the  file  we  were  left  with  21  variables  and  just  over  300  thousand  records.  Table  1  below  lists  and  describes  the  final  variables  used.  This  file  was  then  downloaded  to  excel  and  used  for  our  analysis.  

Table  1.  Description  of  Variables  Used  for  Data  Analysis    Variable  Field   Description  

Coil_Weight   Total  Coil  Weight  Purchased  (lbs).  Used  as  interval  values.  

Coating_Type   Type  of  coating  placed  on  coils:  Regular  Spangle,  Galvaneal,  Redi-­‐Kote,  Mini-­‐Spangle.  

Coating_Weight   Weight  of  coating.  A  total  of  18  different  nominal  types.  

Coil   A  unique  value  attributed  to  each  coil  purchased.  Used  as  ID  in  SAS.  

Customer  Name  of  customer.  Several  contain  same  customer  name  but  different  IDs  (e.g.  R&R  Trading-­‐123456,  R&R  Trading  987654,  etc.).  

Customer_Spec  Description   Example:  ASTM  A653-­‐05A  CS  TYPE  A.  

Finish  Type  of  finish  placed  on  steel  coil:  NS,  Not  Temper  Rolled,  Extra  Smooth  Finish,  and  Rough  Matte.  

Industry   Description  of  Industry  of  Customer  

Oiling   Type  of  Oiling  used  on  steel,  a  nominal  value.  

Ordered_Gauge   Type  of  gauge  used,  a  nominal  value.  

Ordered_Width   Width  of  steel  coil  purchased.  Interval  values  ranging  from  26  to  60-­‐inches.  

Ordered_Width_Bin  Binned  categories  of  each  coil  width  purchased.  Binned  as  nominal  values  in  intervals  of  2-­‐inches  

Orders_Near_Capacity  Binary  values.  Those  purchased  below  50-­‐inches  receiving  a  0,  and  those  above  50-­‐inches  receiving  a  1.  

Product   6  Nominal  Values  (GALV,  CRA,  CRFH,  CRHS,  GLHR,  HRP).  

Ship  Date   Date  coil  is  shipped  from  facility.  

Ship_Method   Method  in  which  coil  was  shipped  (Rail,  Truck,  Rail  Truck,  Customer  Truck).  

Ship_To_City   City  to  which  coil  was  shipped.  

Ship_To_Postal_Code   Postal  code  of  city  where  coil  was  shipped.  

Ship_To_State   State  in  which  coil  was  shipped.    

Steel_Grade   Over  20  types  of  steel  grade  (nominal  value)  

Steel_Type   1  of  9  nominal  types  of  steel  shipped  with  order.  

Temper   Two  types  of  temper  used  with  steel:  Full  Hard  or  NS.  

Page 6: USS-POSCO Final Project Report

Data  Mining  Analysis

Exploring  Key  Variables  and  Modifying  the  Data Before  any  analysis  could  be  performed  our  team  needed  to  explore  the  data  

and  modify  values  that  would  help  identify  key  customers  likely  to  purchase  coil  greater  than  60  inches  wide.  To  do  this,  we  binned  the  Coil  Width  results  to  simplify  the  large  range  reducing  the  number  of  width  levels  from  585  to  19.  This  would  allow  SAS  to  easily  cluster  and  profile  likely  customers.  Likely  customers  in  this  case  were  those  who  would  purchase  coils  with  a  width  near  the  facility’s  production  capacity—in  other  words,  those  who  purchased  coil  near  60  inches  wide.  An  additional  binary  column  was  created  to  identify  those  purchases  that  were  made  for  coils  within  a  range  of  50-­‐60  inches.  Those  above  the  50-­‐inch  threshold  received  a  value  of  1,  and  those  below  the  threshold  received  a  value  of  zero.

The  final  modified  data  set  was  input  into  SAS  and  key  variables  were  explored  using  the  “Explore”  option  under  “Edit  Variables.”  Interesting  observations  were  made,  particularly  with  the  Coil  Width  variable,  which  showed  that  the  current  width  distribution  to  be  more  concentrated  between  the  ranges  of  46-­‐48  inches  (see  figure  3).  A  small  percentage  of  transactions  showed  purchases  greater  than  50  inches.  Approximately  13  percent  of  the  sample  transactions  showed  purchases  with  a  coil  width  between  50  and  60  inches.  Knowing  these  trends,  were  able  to  formulate  several  models  to  identify  classifications,  predictions,  and  segmentation  of  the  data.    

Figure  3.  Distribution  of  Coil  Width  Purchased    

Modeling We  created  three  model  sets  to  help  make  sense  of  the  data.  We  created  a  

classification,  prediction,  and  segmentation  model  sets  in  SAS  each  with  different  modifications  to  the  data.  The  following  describes  the  diagram  setup  for  each  model.  

0  20000  40000  60000  80000  100000  120000  140000  160000  180000  200000  

24.5  26.5  28.5  32.5  34.5  36.5  38.5  40.5  42.5  44.5  46.5  48.5  50.5  52.5  54.5  56.5  58.5  40..5  

Page 7: USS-POSCO Final Project Report

Classification  Modeling  We  imported  the  data  into  the  classification  arm  of  the  model  targeting  the  

Order_Near_Capacity  variable.  We  connected  the  file  import  node  to  a  sample  node  to  take  a  random  sampling  of  the  data  as  a  means  to  decrease  the  processing  runtime.  We  then  partitioned  the  data  for  50  percent  training  and  50  percent  validation.  The  data  was  then  run  through  a  decision  tree  and  several  neural  network  nodes  and  final  a  model  comparison  node.  The  neural  network  variations  included  some  networks  with  just  variable  selection,  another  with  a  variable  selection  and  imputation  node,  and  another  with  no  variations  at  all.  Figure  4  below  shows  the  model  diagram.    

Figure  4.  Classification  Model  Diagram  

After  running  the  model,  the  decision  tree  was  identified  as  the  model  with  the  least  miss  classification  (see  exhibit  1).  Essentially  there  were  no  surprises  to  the  results.  The  classification  model  for  this  business  problem  only  highlights  what  we  already  knew,  grouping  our  customers  into  two  groups  under  the  variable  Ordered_Width:  those  who  purchase  less  than  50-­‐inches  and  those  who  purchase  greater  than  50-­‐inches  (see  exhibit  2).

Prediction  Modeling  We  imported  the  data  into  a  prediction  model  diagram  using  the  Coil_Weight  

variable  as  the  target  as  an  attempt  to  predict  how  much  steel  each  customer  would  purchase.  To  do  this  we  again  used  the  sample  node  to  try  to  decrease  the  runtime  of  the  data  rendering.  Data  was  then  split  into  two  sets  of  models,  one  with  transformations  and  one  without,  as  an  attempt  to  improve  normalcy.  Each  set  of  models  included  a  decision  tree,  neural  network,  regression  and  memory  based  reasoning  (MBR)  node.  A  data  partition  was  applied  to  both  data  sets,  partitioning  data  to  train  50  percent  and  validate  the  other  50  percent  before  modeling.  Lastly,  one  model  set  included  a  variable  selection  node  after  partitioning  as  a  means  to  reduce  the  number  of  input  variables  for  modeling.    Figure  5  below  shows  the  prediction  model  diagram  created.  

All  models  were  compared  and  the  decision  tree  was  found  to  have  the  least  misclassification;  however,  the  results  didn’t  do  much  for  our  business  goal  of  better  understanding,  as  the  decision  tree  told  us  what  we  already  knew  about  the  customer  who  purchases  the  most  from  UPI.  While  it  may  be  useful  to  know  this  for  current  products,  it  does  not  prove  to  be  useful  to  identify  who  would  likely  

Page 8: USS-POSCO Final Project Report

purchase  our  new  product.  See  exhibit  3  and  4  for  SAS  results  of  the  model  comparison  and  decision  tree.

Figure  5.  Prediction  Model  Diagram  

 

Segmentation  Modeling  To  segment  our  customers  and  identify  those  likely  to  purchase  our  new  

product,  we  created  a  segmentation  diagram  that  included  two  cluster  and  segmentation  profile  models.  We  imported  our  paired  down  data  file  and  set  all  variables  to  input.  Again  to  reduce  the  runtime  of  processing,  we  sampled  the  data  using  a  sample  node.  We  ran  the  first  model  set  through  a  variable  selection  node  to  reduce  the  number  input  variables  and  keep  only  those  that  were  significant.  In  the  cluster  node  for  each  model  set,  we  changed  the  “use”  value  for  all  variables  from  “default”  to  “no,”  except  for  the  Orders_Near_Capacity  variable,  which  was  changed  to  “yes”  (see  table  2  below).  This  ensured  that  clustering  would  only  focus  on  width  when  segmenting.  The  same  was  done  for  the  segmentation  profile  node.  Figure  6  below  shows  the  segmentation  and  clustering  diagram  created.  

Table  2.  Cluster  Variable  Modifications  Name     Use     Report   Role   Level  Coating_Type   No   No   Input   Nominal  Coating_Weight   No   No   Input   Nominal  Coil_Weight   No   No   Input   Interval  Customer_Spec_Description   No   No   Input   Nominal  Finish   No   No   Input   Nominal  Industry   No   No   Input   Nominal  Oiling   No   No   Input   Nominal  Ordered_Width   No   No   Input   Interval  Ordered_Width_Bin   No   No   Input   Ordinal  Orders_Near_Capacity   Yes   No   Input   Binary  Product   No   No   Input   Nominal  Ship_Method   No   No   Input   Nominal  Ship_To_State   No   No   Input   Nominal  Steel_Grade   No   No   Input   Nominal  Steel_Type   No   No   Input   Nominal  Temper   No   No   Input   Nominal  _dataobs_   No   No   ID   Interval  

Page 9: USS-POSCO Final Project Report

Figure  6.  Segmentation  Model  Diagram    

   

Overall,  the  model  set  that  did  not  have  a  variable  selection  process  returned  a  runtime  error,  suggesting  there  were  too  many  variables  to  compute.  This  model  set  was  eventually  abandoned.  The  cluster  and  segment  profile  results  with  the  variable  selection  node  provided  a  general  picture  of  what  UPI  customers  look  like  (see  exhibit  5  and  6).  Two  segments  were  identified,  one  who  purchased  steel  at  a  width  greater  than  50-­‐inches  and  those  who  purchased  less  than  50-­‐inches.  Approximately  86.9%  of  customers  fell  into  segment  1,  and  13.1%  fell  into  segment  2.  Looking  over  the  segment  profile  we  can  see  which  customers  are  over  represented  in  a  number  of  key  variables  (see  exhibit  7).

Analysis  Results  &  Conclusions Upon  review  of  the  results  of  the  three  business  models  that  we  established  

we  determined  that  our  question  was  focused  on  segmenting  our  customers  to  identify  those  that  purchased  near  the  edge  of  the  plant's  capabilities.  The  reason  for  understanding  this  customer  segment  is  based  on  the  fact  that  these  customers  are  likely  in  the  market  for  wider  sheets  of  steel  and  they  may  be  getting  it  from  our  competitors.  If  we  can  figure  out  the  specifics  about  this  customer  base  we  can  aim  to  meet  their  needs  from  our  own  plant  and  capture  a  greater  market  share.  Nonetheless,  we  also  reviewed  the  results  from  all  three  models  to  gain  insights  into  the  business  answers  they  could  provide.  We  found  that  classification  modeling  was  less  relevant  to  answer  this  question,  and  the  prediction  model  results  provide  little  business  insight  into  identifying  our  target  customer  segment.  We  found  that  just  over  13%  of  our  customers  purchased  near  the  edge  of  the  plant's  capabilities.    

Some  of  the  uniqueness  we  saw  between  groups  centered  on  the  coating  and  steel  grade  (noting  that  those  that  ordered  near  the  plant’s  production  capabilities  were  the  only  ones  ordering  steel  grade  GR706,  and  coating  weight  category  CULV.  Culvert  &  concrete  pipes,  and  windows  &  doors  companies  were  unique  to  the  sample  that  purchased  at  or  near  the  plant's  capabilities.  The  more  we  can  identify  about  this  segment  the  better  we  can  target  their  specific  needs  and  hopefully  target  new  clients  that  are  looking  for  similar  products  that  we  could  now  offer  in  our  plant.  

It  is  important  to  note  that  while  a  business  problem  was  identified  and  data  was  transformed  to  provide  valuable  insight  into  current  customers  for  a  potential  new  product,  there  is  still  more  that  needs  to  be  done  to  ensure  UPI  achieves  business  gains.  For  example,  this  report  does  not  outline  an  action  plan  for  

Page 10: USS-POSCO Final Project Report

incorporating  the  insights  into  current  business  practices.  Further  investigation  is  needed  to  determine  if  the  data  insights  do  not  conflict  with  current  practices.  Additionally,  once  a  proper  action  plan  is  created,  it  is  also  advisable  to  develop  measurements  of  success  to  ensure  that  data  mining  efforts  can  be  justified.  In  other  words  a  sound  system  that  tracks  customers  likely  to  buy  the  new  UPI  product  must  be  incorporated  in  the  sales  data  and  evaluated  continuously.  Performing  all  four  of  these  steps  (identifying  the  business  problem,  transforming  data,  acting  on  the  data,  and  measuring  the  results)  completes  the  virtuous  cycle  of  successful  data  mining.

Page 11: USS-POSCO Final Project Report

Exhibits  

Exhibit  1.  Classification  Model  –  Comparison  Results  

 

Exhibit  2.  Classification  Model  –  Decision  Tree  Results  

Page 12: USS-POSCO Final Project Report

 

Exhibit  3.  Prediction  Model  –  Comparison  Results  

 

Exhibit  4.  Prediction  Model  –  Decision  Tree  Results  

Page 13: USS-POSCO Final Project Report

Exhibit  5.  Cluster  Model  Results  

 

Exhibit  6.  Segment  Profile  Model  Results  

 

Page 14: USS-POSCO Final Project Report

Exhibit  7.  Segment  Profile  Key  Variable  Results