uss-posco final project report
TRANSCRIPT
Final Project Report Data Mining UPI Market Opportunities
Brian Joubran, Aaron Poole, Stefan State, Travis Swenson
MGB 269 – BUSINESS INTELLEGENCE TECHNOLOGIES DATA MINING
June 8, 2015
Introduction & Company Background USS-‐POSCO Industries (UPI) is a steel finishing plant located in Pittsburg,
California. The plant has been in continuous operation since 1901 and had undergone several ownership changes. The current ownership structure was created in 1986 as a joint venture between the United State Steel Corporation headquartered in Pittsburgh, PA and POSCO of Seoul South Korea. The company employs approximately 750 workers, the majority of which are represented by the United Steel Workers union.
UPI and the entire steel industry have struggled since the financial crisis of 2008. During the five-‐year period, 2009 to 2013, the company had combined losses of nearly $200 million and saw its owner's equity decreased by $100 million. Production at the facility has dropped significantly and is currently operating at about 60% of capacity. These challenges are in no way unique to USS-‐POSCO Industries but rather represent the difficulties being faced by the entire industry. Because of these challenges UPI has sought to expand its product offering in order to reach new customers and expand sales with existing customers. The company has begun doing this by modifying existing facilities to produce a wider range of products. The company has also partnered with one of its owners in order to obtain and resell products that UPI is not currently capable of producing.
Expanding into these new markets or new areas of sales presents a new challenge for the company. In the past, sales were stable or at least they were well understood by the firm. That is to say UPI has had a well-‐established pool of customers, and understood the market in which they were a competitor. In the past UPI felt they had a good understanding of who, and what, their customers and potential were and what product they were willing to buy. However, as UPI expands into new markets, some of which they have no previous experience, the firm needs to be able to use the data it has available to best focus its sales efforts.
Competitor Landscape & New Product Opportunities UPI produces flat rolled carbon steel, their products are classified into three
categories; Cold Rolled Annealed (CRA), Hot Dipped Galvanized Steel and Electroplated Tinplate. UPI customers use these products to produce office furniture, tubing for electrical conduit, computer case, tin cans for food packaging, and oil filters. The company markets the majority of its products in the 13 western United States and British Columbia.
Of the three products the market for Tinplate is the most highly concentrated. Customers for tinplate buy only that product and no other. The reason for this is because the end use of tinplate is concentrated to manufactures that produce tin cans. In UPI’s marketplace there are only half a dozen customers for this product with one customer accounting for 80% of all tinplate sales. Currently, UPI enjoys a 90% market share of tinplate sales in its market. This is mostly due to the fact the company is the only tinplate producer located in the marketplace, all other producers are located in the Midwest or overseas and must bear the cost of
transporting their material into the market. Because of this competitive advantage Tinplate is currently UPI’s most profitable product. However this product faces stagnate demand. As explained previously, tinplate is most often used in making tin cans for food and the demand for this product has been unchanged for many years.
The market for Cold Rolled Annealed (CRA) and Galvanized is much more diverse and subject to competition. Customers of these products produce a much wider range of end products and are more sensitive to price. This market and its opportunities could best be explained by describing the supplier in the market and the customers they compete for. Figures 1 & 2 below provide a picture of UPI sales and shares in the west coast tin market.
Figure 1. UPI Tin Sales and Market Share
Suppliers & Customers There are four major providers of flat rolled carbon steel on the west coast.
They are; UPI, California Steel Industries (CSI) located in Fontana California, mills located in the Midwest of the United States and foreign imports. Before the economic crisis UPI and CSI dominated the market. Both of these suppliers are located on the west coast and were able to secure lower shipping costs and short lead times, this enabled them to deliver lower priced goods faster than foreign imports and Midwest mills. However, as the economy has recovered and demand for steel has began to increase, imports have grown to capture a majority of the market (see figure 2). Price has been the primary force behind this transition. Recently foreign mills have been quoting prices 30% less than UPI. Many of the customers in the market have modified their business plans to account for the longer lead time required to import the material in order to take advantage of the lower priced imports. Competition for customers between UPI and CSI has grown increasingly aggressive.
Tin 30%
Steel 70%
UPI Sales
UPI 90%
Other 10%
Tin Market Shares (West Coast)
Figure 2. UPI Market Shares for Steel Products Pre/Post-‐2008
The customers for the major providers fall into two categories; manufactures
and resellers. Manufactures use steel in the production of their end product. Their level of consumption is such that they require a direct supply chain to a steel producer. These customers buy hundreds if not thousands of tons annually. They typically purchase products within a certain specification range and order quantities following their business cycle. Manufacturers also require a short lead-‐time so that they are able to react to changes in the market demand for their product.
Resellers or Service Centers are customers that buy products from the steel producers in bulk and then split it into smaller lots for smaller manufactures or sheet metal facilities. Service Centers also are capable of cutting coils into dimensions required by their customers for a service fee. Resellers and Service Center buy almost exclusively based on price. Their business model is such that they have to hold inventory ready to resell and in order to secure acceptable margins they must purchase the lowest priced material possible.
Business Opportunity of New Products As the market has become more competitive UPI has searched for new
opportunities. One strategy that the company has is to expand its product line beyond what it is currently capable of producing. UPI has discussed the opportunity of buying finished product from POSCO and then reselling it to its existing customers. While this would be a lower margin product it would increase overall sales and provide a higher level of customer service.
UPI is look at expanding its product line by offering coils that are wider than what they can produce on its current equipment. The commercial group has reported multiple times in the past that this is the most requested item from the customers.
UPI 33%
CSI 33%
Midwest 17%
International 17%
Pre-‐2008
UPI 16%
CSI 17%
Midwest 17%
International 50%
Post-‐2008
Variables for Analysis The company collects considerable data on the material it sells to its
customers. We were able to obtain a file that the company uses in its own analysis of sales. The original file detailed the sales for the last ten years. The file had over 50 variables and contained over 1.2 million records. We were quickly able to filter this file down to a more manageable size. We eliminated records over five years old. This was done because it is only the last five years that represents the current market and customer base. We also eliminated many variables that were not relevant to an analysis of offering new products. Finally, we eliminated records that contained tin sales and customers that required their product to be sourced by U.S. suppliers. We did this because the company does not feel as if there are any increased sales opportunities in the tin market, and the offer to supply extended finished product has only come from POSCO, which is in South Korea.
Once we filtered all of this data out of the file we were left with 21 variables and just over 300 thousand records. Table 1 below lists and describes the final variables used. This file was then downloaded to excel and used for our analysis.
Table 1. Description of Variables Used for Data Analysis Variable Field Description
Coil_Weight Total Coil Weight Purchased (lbs). Used as interval values.
Coating_Type Type of coating placed on coils: Regular Spangle, Galvaneal, Redi-‐Kote, Mini-‐Spangle.
Coating_Weight Weight of coating. A total of 18 different nominal types.
Coil A unique value attributed to each coil purchased. Used as ID in SAS.
Customer Name of customer. Several contain same customer name but different IDs (e.g. R&R Trading-‐123456, R&R Trading 987654, etc.).
Customer_Spec Description Example: ASTM A653-‐05A CS TYPE A.
Finish Type of finish placed on steel coil: NS, Not Temper Rolled, Extra Smooth Finish, and Rough Matte.
Industry Description of Industry of Customer
Oiling Type of Oiling used on steel, a nominal value.
Ordered_Gauge Type of gauge used, a nominal value.
Ordered_Width Width of steel coil purchased. Interval values ranging from 26 to 60-‐inches.
Ordered_Width_Bin Binned categories of each coil width purchased. Binned as nominal values in intervals of 2-‐inches
Orders_Near_Capacity Binary values. Those purchased below 50-‐inches receiving a 0, and those above 50-‐inches receiving a 1.
Product 6 Nominal Values (GALV, CRA, CRFH, CRHS, GLHR, HRP).
Ship Date Date coil is shipped from facility.
Ship_Method Method in which coil was shipped (Rail, Truck, Rail Truck, Customer Truck).
Ship_To_City City to which coil was shipped.
Ship_To_Postal_Code Postal code of city where coil was shipped.
Ship_To_State State in which coil was shipped.
Steel_Grade Over 20 types of steel grade (nominal value)
Steel_Type 1 of 9 nominal types of steel shipped with order.
Temper Two types of temper used with steel: Full Hard or NS.
Data Mining Analysis
Exploring Key Variables and Modifying the Data Before any analysis could be performed our team needed to explore the data
and modify values that would help identify key customers likely to purchase coil greater than 60 inches wide. To do this, we binned the Coil Width results to simplify the large range reducing the number of width levels from 585 to 19. This would allow SAS to easily cluster and profile likely customers. Likely customers in this case were those who would purchase coils with a width near the facility’s production capacity—in other words, those who purchased coil near 60 inches wide. An additional binary column was created to identify those purchases that were made for coils within a range of 50-‐60 inches. Those above the 50-‐inch threshold received a value of 1, and those below the threshold received a value of zero.
The final modified data set was input into SAS and key variables were explored using the “Explore” option under “Edit Variables.” Interesting observations were made, particularly with the Coil Width variable, which showed that the current width distribution to be more concentrated between the ranges of 46-‐48 inches (see figure 3). A small percentage of transactions showed purchases greater than 50 inches. Approximately 13 percent of the sample transactions showed purchases with a coil width between 50 and 60 inches. Knowing these trends, were able to formulate several models to identify classifications, predictions, and segmentation of the data.
Figure 3. Distribution of Coil Width Purchased
Modeling We created three model sets to help make sense of the data. We created a
classification, prediction, and segmentation model sets in SAS each with different modifications to the data. The following describes the diagram setup for each model.
0 20000 40000 60000 80000 100000 120000 140000 160000 180000 200000
24.5 26.5 28.5 32.5 34.5 36.5 38.5 40.5 42.5 44.5 46.5 48.5 50.5 52.5 54.5 56.5 58.5 40..5
Classification Modeling We imported the data into the classification arm of the model targeting the
Order_Near_Capacity variable. We connected the file import node to a sample node to take a random sampling of the data as a means to decrease the processing runtime. We then partitioned the data for 50 percent training and 50 percent validation. The data was then run through a decision tree and several neural network nodes and final a model comparison node. The neural network variations included some networks with just variable selection, another with a variable selection and imputation node, and another with no variations at all. Figure 4 below shows the model diagram.
Figure 4. Classification Model Diagram
After running the model, the decision tree was identified as the model with the least miss classification (see exhibit 1). Essentially there were no surprises to the results. The classification model for this business problem only highlights what we already knew, grouping our customers into two groups under the variable Ordered_Width: those who purchase less than 50-‐inches and those who purchase greater than 50-‐inches (see exhibit 2).
Prediction Modeling We imported the data into a prediction model diagram using the Coil_Weight
variable as the target as an attempt to predict how much steel each customer would purchase. To do this we again used the sample node to try to decrease the runtime of the data rendering. Data was then split into two sets of models, one with transformations and one without, as an attempt to improve normalcy. Each set of models included a decision tree, neural network, regression and memory based reasoning (MBR) node. A data partition was applied to both data sets, partitioning data to train 50 percent and validate the other 50 percent before modeling. Lastly, one model set included a variable selection node after partitioning as a means to reduce the number of input variables for modeling. Figure 5 below shows the prediction model diagram created.
All models were compared and the decision tree was found to have the least misclassification; however, the results didn’t do much for our business goal of better understanding, as the decision tree told us what we already knew about the customer who purchases the most from UPI. While it may be useful to know this for current products, it does not prove to be useful to identify who would likely
purchase our new product. See exhibit 3 and 4 for SAS results of the model comparison and decision tree.
Figure 5. Prediction Model Diagram
Segmentation Modeling To segment our customers and identify those likely to purchase our new
product, we created a segmentation diagram that included two cluster and segmentation profile models. We imported our paired down data file and set all variables to input. Again to reduce the runtime of processing, we sampled the data using a sample node. We ran the first model set through a variable selection node to reduce the number input variables and keep only those that were significant. In the cluster node for each model set, we changed the “use” value for all variables from “default” to “no,” except for the Orders_Near_Capacity variable, which was changed to “yes” (see table 2 below). This ensured that clustering would only focus on width when segmenting. The same was done for the segmentation profile node. Figure 6 below shows the segmentation and clustering diagram created.
Table 2. Cluster Variable Modifications Name Use Report Role Level Coating_Type No No Input Nominal Coating_Weight No No Input Nominal Coil_Weight No No Input Interval Customer_Spec_Description No No Input Nominal Finish No No Input Nominal Industry No No Input Nominal Oiling No No Input Nominal Ordered_Width No No Input Interval Ordered_Width_Bin No No Input Ordinal Orders_Near_Capacity Yes No Input Binary Product No No Input Nominal Ship_Method No No Input Nominal Ship_To_State No No Input Nominal Steel_Grade No No Input Nominal Steel_Type No No Input Nominal Temper No No Input Nominal _dataobs_ No No ID Interval
Figure 6. Segmentation Model Diagram
Overall, the model set that did not have a variable selection process returned a runtime error, suggesting there were too many variables to compute. This model set was eventually abandoned. The cluster and segment profile results with the variable selection node provided a general picture of what UPI customers look like (see exhibit 5 and 6). Two segments were identified, one who purchased steel at a width greater than 50-‐inches and those who purchased less than 50-‐inches. Approximately 86.9% of customers fell into segment 1, and 13.1% fell into segment 2. Looking over the segment profile we can see which customers are over represented in a number of key variables (see exhibit 7).
Analysis Results & Conclusions Upon review of the results of the three business models that we established
we determined that our question was focused on segmenting our customers to identify those that purchased near the edge of the plant's capabilities. The reason for understanding this customer segment is based on the fact that these customers are likely in the market for wider sheets of steel and they may be getting it from our competitors. If we can figure out the specifics about this customer base we can aim to meet their needs from our own plant and capture a greater market share. Nonetheless, we also reviewed the results from all three models to gain insights into the business answers they could provide. We found that classification modeling was less relevant to answer this question, and the prediction model results provide little business insight into identifying our target customer segment. We found that just over 13% of our customers purchased near the edge of the plant's capabilities.
Some of the uniqueness we saw between groups centered on the coating and steel grade (noting that those that ordered near the plant’s production capabilities were the only ones ordering steel grade GR706, and coating weight category CULV. Culvert & concrete pipes, and windows & doors companies were unique to the sample that purchased at or near the plant's capabilities. The more we can identify about this segment the better we can target their specific needs and hopefully target new clients that are looking for similar products that we could now offer in our plant.
It is important to note that while a business problem was identified and data was transformed to provide valuable insight into current customers for a potential new product, there is still more that needs to be done to ensure UPI achieves business gains. For example, this report does not outline an action plan for
incorporating the insights into current business practices. Further investigation is needed to determine if the data insights do not conflict with current practices. Additionally, once a proper action plan is created, it is also advisable to develop measurements of success to ensure that data mining efforts can be justified. In other words a sound system that tracks customers likely to buy the new UPI product must be incorporated in the sales data and evaluated continuously. Performing all four of these steps (identifying the business problem, transforming data, acting on the data, and measuring the results) completes the virtuous cycle of successful data mining.
Exhibits
Exhibit 1. Classification Model – Comparison Results
Exhibit 2. Classification Model – Decision Tree Results
Exhibit 3. Prediction Model – Comparison Results
Exhibit 4. Prediction Model – Decision Tree Results
Exhibit 5. Cluster Model Results
Exhibit 6. Segment Profile Model Results
Exhibit 7. Segment Profile Key Variable Results