predicting paper making defects on-line using data mining

Predicting paper making defects on-line using data mining

Robert Milnea,* , Mike Drummonda, Patrick Renouxb

aIntelligent Applications Ltd, 1 Michaelson Square, Livingston, West Lothian, Scotland, UKbCaledonian Paper plc, Meadowhead Road, Irvine, Scotland, UK

Received 16 July 1998; accepted 24 July 1998

Abstract

This paper describes an application that was jointly developed by Caledonian Paper and Intelligent Applications for the early prediction ofpaper defects from process data, so that corrective action can be applied before the defect becomes too significant. Correlations betweenprocess data and past faults were extracted and then programmed into an on-line predictive software model which is able to analyse currentprocess data in real time, looking for bad patterns which may lead to defects in the paper. Depending on the degree of severity of defect thatthe model predicts, and the nature of the developing problem, the machine operators can take steps to prevent the defect from becoming sosignificant as to result in salvage. This article describes the way the application was developed and shows how data mining can besuccessfully applied to the paper industry.q 1998 Elsevier Science B.V. All rights reserved.

Keywords:Predicting on-line; Paper making defects; Data mining

1. Overview of the project

Discussions began a few years ago between CaledonianPaper (CP) and Intelligent Applications (IAL), focused onpredicting paper salvage using process data. The twocompanies had worked together before on the developmentof a software tool to diagnose drive faults in the mill [1]. Theunderlying approach had been to build up a library of pastfaults, and then match new fault symptoms into the libraryto find the most similar past fault using the technology ofcase based reasoning (CBR) [2]. Discussions took placebetween CP and IAL to investigate whether a similar tech-nical approach could be applied to predicting salvage fromprocess data. This case study describes what followed onfrom those discussions. This work received financialsupport from Ayrshire Enterprise.

CP, part of the UPM Group, is one of the UK’s mostmodern paper mills, and is based on a 50-hectare site inIrvine, Scotland (see Fig. 1). With investment in excess of£200 million, it produced its first reel of Light WeightCoated (LWC) paper in April 1989. Since then, thecompany has become established as a leading supplier ofLWC to publishers and printers throughout the world, withan annual output of over 200 000 tonnes.

IAL is a privately owned UK company, specialising in the

application of advanced software techniques to industrywith a particular focus on diagnostic systems. They devel-oped Amethyst for vibration analysis, which is believed tobe the most widely used expert system based tool in main-tenance world-wide. They also received the Queens Awardfor Technological Innovation for Amethyst. They arecurrently working on advanced modelling techniques forturbine diagnostics, and on applications of data mining.

CP has a very small percentage of salvage in any giventime period. One of their priorities is to reduce the salvage toas low a level as possible [3]. For CP, the main paper defectis deviation. This is when the customer reels build up withmisaligned edges, and therefore veer off to the side resultingin uneven edges. Corrugation also occurs occasionally, andis seen as a barring effect with finger-like rings around thecircumference (see Fig. 2). Whilst there are many possiblereasons for paper defects, many of which are not related tothe profile information directly, it was thought at the start ofthis work that a significant proportion of defects could berelated back to profile data.

2. Data mining for predicting defects

The underlying technical approach behind this work forpredicting salvage is based on comparing current processdata with historical examples of process data for which thequality of the final customer reel was known. However, dueto the volume of data to be analysed and the number of

Knowledge-Based Systems 11 (1998) 331–338

0950-7051/98/$ - see front matterq 1998 Elsevier Science B.V. All rights reserved.PII: S0950-7051(98)00061-6

* Corresponding author. Tel:1 44 (0)1506 47 20 47; fax:1 44 (0)150647 22 82

discrete data points to be compared, it was quickly recog-nised that this would be very difficult using conventionaldatabase techniques. An alternative to conventional dataanalysis techniques was to apply data mining to the problem[4].

Data mining is a set of techniques for searching throughdata sets, looking for hidden correlations and trends whichare, for all intents and purposes, inaccessible using conven-tional data analysis techniques [5]. IAL have completed arange of data mining projects in other areas such as market-ing, predicting competitors bid prices in the power industryand predicting the loyalty of customers to mortgage lenders.Although these applications on the surface are very differentto the prediction of paper defects, in reality there are manycommon issues with predicting paper salvage. Both hadlarge amounts of historical data. In both situations, therequirements were to model a very dynamic and complexprocess. In addition, IAL have developed many diagnosticsystems for manufacturing, and their background expertisein this area meant that they were well placed to providesupport to CP during this work.

For data mining to be successful, it is normally necessaryto have at least several hundred historical examples of datarecords corresponding to each outcome of interest [6]. Inthis case, CP had many thousands of historical examples ofgood paper reels, and enough historical examples of defec-tive reels to form a good basis for applying data mining.Sometimes it can be very difficult to obtain access to thehistorical data due to a range of possible technical reasons.This can happen when customers data sets are distributedacross many different types of data bases since this canmake it difficult to extract a complete set of process data

corresponding to a single paper reel. However, CP haveinvested heavily in their IT infrastructure and were wellplaced to provide the level of data access which was neededfor this work.

The business of data mining is growing for many reasons,including the fact that most business organisations are build-ing up vast amounts of data concerning their manufacturingprocesses, their customers, their markets, and other impor-tant items of data. It is estimated that the volume of storeddata in the world doubles every 18 months. One of themotivations behind storing these vast amounts of data isthat it is thought to contain useful business informationwhich describe past performance, or decisions. It shouldtherefore be possible to use this historical information toimprove future business or process decisions. However, areal problem in doing this is that the sheer volume of thedata can often obscure the key information contained withinthe data. For many organisations, this makes the process ofmanual data analysis supported by perhaps some businessgraphics and statistical tools very difficult and cumbersome,resulting in information which is less than that which ispotentially available from the historical data. This iswhere data mining comes in. Data mining is a set of tech-niques taken from the areas of machine learning and artifi-cial intelligence, which address the data analysis problem ina much more effective way by semi-automatically searchingthrough the data sets and extracting useful trends and inter-relationships. This would be extremely difficult to do usingother more manual techniques.

There are a small number of core techniques whichsupport data mining. These include neural networks, statis-tical clustering, CHAID analysis and induction. It was

R. Milne et al. / Knowledge-Based Systems 11 (1998) 331–338332

Fig. 1. The Paper Mill at Caledonian Paper, Irvine.

R. Milne et al. / Knowledge-Based Systems 11 (1998) 331–338 333

Fig. 2. Some examples of wound customer reels showing deviation.

decided to use induction for this work since it is one of thefew approaches which results in a human-understandablesoftware model. Most of the other approaches generate, ineffect, a black box which is not able to provide any reason-ing behind its findings. Induction, on the other hand, resultsin a decision tree which can be read and understood by aperson. This has a significant impact on development timessince the developers can more quickly understand thecombination of attributes within the process data whichthe data mining has identified as being relevant for predict-ing paper defects.

A further benefit of using this approach is that sometimesthe analysis will identify some correlation in the data whichwas valid historically for relating paper defects to processdata, but which is of no value to make predictions in thefuture. Using neural nets or some of the other approaches, itwould have been more difficult to identify these loop-holesin the data, leading to an unduly optimistic view of theresulting predictive model.

3. The process data

CP save the manufacturing data into their quality systemdata base using a range of integrated data acquisitionsystems. At various stations in the mill, the paper propertiesare measured by gauges which continually traverse themoving paper web. These properties include basis weight,moisture content and caliper. Basis weight is the weight perunit area of a mass of paper. It relates to the mathematicalproduct of the density and thickness of the paper being

measured. Units of measurement are grams per squaremeter (gsm). The scanner uses a radioactive source (Kr85)of beta rays and a detector/receiver to measure basis weight.Moisture represents the percentage of moisture content inthe paper. Moisture measurement is carried out by an infra-red absorption method which uses the fact that water mole-cules absorb electromagnetic energy in different bands.Caliper represents the sheet thickness in microns. Thesensor measures the thickness of a moving sheet by measur-ing the magnetic reluctance between upper and lowercontacts of the sensors as they independently follow sheetcontours. This project used data collected from the winderstation (see Fig. 3).

The Supercalender scanner carries out a large number ofscans in the cross direction for a full reel, taking 72measurement points across the reel which is approximately8 m wide. For this application, the data collected from thescanner was averaged along the length of the reel andresulted in a set of 72 data points for each of caliper, basisweight and moisture.

The data from each reel could not be compared with thedata from all other reels, due to the different grades of paperwhich are produced in the mill. It was necessary to group thehistorical data based on bands of paper grade to ensure thatcomparisons between good and defective reels were valid.

4. Model development

Before work began on the development of the predictivemodel, time was spent performing data visualisation to


Fig. 3. The winder station, where the paper reels are split into individual slices to match customer requirements. This is where the faults arise that are coveredby this system.

compare examples of good and bad profiles. Initially, thiswork was done using data which had been averaged alongthe entire length of the reels. It was clear that a lot of detailwould be lost from the data by doing this, but the IT infra-structure that was in place when the project started meantthat this was the most sensible place to start. Several monthswere spent doing visual and numeric comparisons betweengood and bad profiles, and although it was possible todeduce certain things about the differences based on theshape of the profiles, it was felt that the lack of a moreprecise method of comparing the profiles was limiting thework.

The next stage was therefore to obtain a data extract andimport it into a data mining software tool to see what datamining could offer. Using the same data as had been usedpreviously for the visual comparison work, work began ondata mining using the ReMind software shell from CSI, aUS company known for its work in case based reasoning.ReMind is a MS Windows based tool which supports induc-tion for data mining. It was chosen partly because of theprevious successful work done by Intelligent Applicationsusing ReMind’s inductive engine, and also because of itsunique capabilities to support the programming of processand manufacturing knowledge to help guide the inductionprocess. In effect, ReMind allows its users to enter qualita-tive information about the process being modelled (in thiscase the paper making process) to guide the induction.Based on the previous work done by IAL, this capabilityin the development tool can often make a significant differ-ence to the accuracy of the resulting model.

Induction is straightforward to understand at the concep-

tual level [7]. In this context, induction analyses a number ofexamples of good and bad profiles, trying to determinewhich data attribute is most different between the goodand bad examples. For example, it might determine thatthe difference in caliper between the edge of the reel andthe next 10 adjacent data points shows the largest degree ofdeviation between good and bad reels. Induction would thensplit the data into two groups based on that attribute. Itwould then analyse both of these two groups and again tryto find out which attribute is most significant within the sub-groups at separating out the good from the bad profiles.Again, each group would then be split further. This processcontinues until there are many groups, or clusters, which aredeemed to be an optimal set of groupings. This is analogousto performing market segmentation for a bank, for example.The bank may have 3 million mortgage customers, but itknows that there are certain natural groupings of customers,within which there is a certain degree of similarity betweencustomers.

The inductive process itself, once completed, can berepresented by a decision tree, with splits in the tree beingbased on the values of attributes in the process data [8]. Apartial example of such a tree is shown in Fig. 4.

Looking at the left-hand side of the diagram, there is adata field called ‘‘m5-2’’ in the process data. If the actualvalue of this field in a new reel is less than or equal to 0.3,then the next data item to check is ‘‘m15-12’’. If the actualvalue in the new reel is greater than 0.1, then the next dataitem to check is ‘‘Mois STD’’. This process of walkingthrough the decision tree continues until the end of a branchis reached. These ‘‘leaves’’ of the tree contain a group of


Fig. 4. The decision tree produced by the induction process using ReMind.

historical reels which are predominantly either good ordefective. The actual prediction for a new reel is based onthe probability distribution within the final leaf in the treethat the prediction process ends up at.

The diagram shows how induction uses various attributesin the data sets to discriminate between likely good anddefective reels, and these are indicated in the boxes in thediagram. The decisions are actually based on whether thevalue of an attribute in a new reel is above or below athreshold value appropriate for that attribute. The choiceof what the thresholds should be tested against are automa-tically determined by the data mining software. Thisapproach has the benefit that it automatically determinesthe relative significance of each attribute in the processdata based on the impact that each attribute has on thequality of the final reel. Additionally, the significance ofany given attribute may not always be constant. A particularattribute may be more important in certain situations than inothers. Therefore the significance of each attribute is depen-dent on the values of other attributes in the process data. Thedata mining technique which was used for this work is ableto identify these contextual dependencies and generatesdifferent significance figures for each attribute based onthe values of other attributes in the data.

Once completed, this tree can be used to classify newpaper reels based on the attributes that the tree is built on(see Fig. 5). When the model goes on-line, the process datais analysed by the model by percolating it down through thetree as the paper is being made, and depending on whichsub-group it ends up in, the model will indicate whether apaper defect is likely to develop or not.

A further benefit of using induction for this work was that,when the final predictive model goes on-line, it can not onlygenerate a prediction concerning likely defects, but alsogenerate the probability of the prediction being correct.The probability is based on the probability distribution ofgood and bad reels in the final leaf that the model used tobase its prediction on. This is very useful since it is possibleto impose a threshold on the system so that only predictionswhich are more likely to be correct than, say, 85%, shouldbe acted upon by the operators.

The ReMind software has specific tools which allow theinductive process to be fine-tuned based on process andmanufacturing knowledge. This can be very helpful wheresuch knowledge exists. In this case, it was suspected thatdeviations in the profile near the edge of the reels appearedto be very significant, and so this information was used toguide and provide an initial focus for the inductive engine inReMind.

The data had to be pre-processed in order to present thedata mining software with an ASCII text file which is theonly format recognised by the Development Environment ofReMind. This was done using MS Access, although Lotus123 or MS Excel would also be suitable. A range of pre-processed attributes were extracted from the data for eachcustomer reel including standard deviation, kurtosis and

other statistics which were thought to have a bearing onwhat makes a bad profile. This is almost always very impor-tant for data mining work since, unless an infinite number ofhistorical examples are available for the data mining tool, itmay not be able to produce the best model possible. Thisprocess of using process knowledge and expertise to supportinduction leads to the methodology known as Knowledge-Guided Induction. It is a blend of automatic machine learn-ing techniques and human expertise about the domain beinganalysed, and represents a significant step forward overnormal induction, usually resulting in more accurate predic-tive models than might be possible using other automaticdata mining techniques.

The process of building the model is an iterative one,where an initial model is built using a sub-set of the histor-ical examples and then it is tested using some of the otherhistorical examples. By comparing what the model predictswould happen with what actually did happen with the histor-ical test cases, it was possible to get a clear picture of theaccuracy of the model. Additionally, it was possible to workout whether the errors were more or less equally distributedthrough each branch of the decision tree, or whether certain‘‘branches’’ in the tree were responsible for a dispropor-tionate number of mis-predictions. Once each branch wasshown to be reasonably robust and correct, it would befrozen and work would begin on one of the other less accu-rate branches. This process iterates round a few times, sincemaking changes to one branch to improve it can result in anincreased error rate for other branches which had previouslybeen validated. This is where the bulk of the data miningeffort was applied, and is still ongoing. A key to this devel-opment process is recognising that at some point, enough ofthe branches will be accurate enough to enable the model togo live, and that if certain branches are not at the same levelof accuracy, then any prediction which uses these branchescan be either ignored by the operators, or displayed with anappropriate message to indicate that the operator shouldinvestigate further before taking corrective action basedon the prediction.

At the time of writing, the model is receiving furtherdevelopment effort to improve its accuracy, but it hasproduced the levels of accuracy summarised in Fig. 6.

This means that 96% of the good reels were correctlypredicted and 88% of the faulty reels correctly predicted(based on testing of 8640 reels). According to RoyMcHattie, an Electrical/Instrument Superintendent at themill:

These figures are very good given the quality of thedata which is being processed. It is interesting to notethat there are certain aspects of the predictive modelwhich are counter-intuitive in that they indicate alikely defect based on data values which are not, onthe surface, related to defects. Some of these relation-ships are particularly difficult to understand, but thiscan be useful since it can encourage a re-think of some


of their previously held beliefs about the processitself.

5. Putting the predictive model on-line

The management at the mill made the decision last yearthat the model was showing enough promise to commencethe development of an interface between the model and themanufacturing process itself [9]. This was also a joint devel-opment effort between CP and IAL, since work was requiredto extract the process data from the mill’s quality systemdata bases and download the process data to the PC runningthe model. Working together, the two companies defined thetechnical interface between the mill’s VAX and the modelwhich runs on a PC. CP developed the data downloadprogram, which acts as the data feed to the PC where thepredictive model runs. IAL developed software to receivethe process data and then feed it through the predictivemodel. They also developed an appropriate user interfacewhich shows the operators which profile is currently beingprocessed at any given time, what the model is predictingfor that profile and what the probability of it being correct is.The operator is able to enter corrective actions, which can beapplied in an attempt to prevent the defect from developingfurther. This is necessary for the ongoing monitoring of themodel. Since the manufacturing process is dynamic, theaccuracy of the model may degrade over time. By trackingthe accuracy of the models predictions, this can be checkedand the model refined to compensate for any changes in themanufacturing process.

On a commercial note, the costs for the putting the modelon-line exceeded the costs of the model development.

According to IAL, this is not uncommon. During themodel development stage, the data mining work is usuallyperformed off-line, working with data extracts. The techni-cal issues related to interfacing the model in an on-line waydepend on the existing IT systems that are in place already.These issues are to do with systems integration and are verydifferent to the difficulties associated with the data miningtask itself. However, the commercial risks associated withthis type of work are minimised, since the accuracy of themodel is known before the larger costs of model integrationare incurred. The work on the integration doesn’t commenceuntil the customer knows how accurate and therefore howcost-effective the model is going to be.

6. Current project status

Today, the interface has been completed, and the modelhas commenced trials of running on a live basis as a decisionsupport tool. Work to improve the accuracy of the model iscontinuing. The accuracy of the model did decrease whenthe system went live. This was expected and was due tochanges in the manufacturing process and also changes inthe way that the data was measured between the time of thehistorical examples and the time the system went live.

When the system’s credibility has been established in liveoperation, the intention is to provide it as a tool for opera-tors.

7. Conclusion

The full benefit of using the system will be some time inthe future, but more intermediate benefits are in sight. CP isclose to having a system which will support their manufac-turing goal of reducing the level of salvage. The strategicobjective of continuous improvement within CP will neverdisappear, but the development of this system has done thedifficult job of identifying how to improve on a processwhich already functions almost all of the time withouterror anyway.

Lessons have been learnt during this work. The modeldevelopment has taken longer than was originally expected.The testing and validation of the many models which havebeen produced during the iterative development cycles tooklonger than expected. This is largely due to the volume ofdata that was required to perform an adequate level of vali-dation. The need to adopt a meticulous approach to docu-menting model development and validation also becameapparent, since these tasks are so involved and it would


Fig. 5. Showing how the inductive filter segments the process data frommany different paper reels into groups. Each of these groups has an outcomewhich determines whether a reel is predicted to be good, deviated or corru-gated.

Fig. 6. Summary of Accuracy.

have been easy to lose track of a few details found early onin the development process which proved very useful later.

This work is continuing. CP has plans to look further backdown the process towards the Paper Machine itself.(Currently the model is interfaced to the manufacturingprocess at the Supercalenders). It is more important thatthe model is accurate at the paper machine because, if theoperator takes any corrective action based on what themodel is telling him at the paper machine, it could have avery significant impact further down the line.

Finally, since the process data being analysed is based onan average of the entire length of the paper reels, work iscommencing to look at using non-averaged process data. Itis almost certain that a new model could be built using thisnon-averaged data which would result in significantlyhigher levels of accuracy than are currently possible.

References

[1] R. Milne, C. Nelson, Electrical Drive Diagnosis Using Case Based

Reasoning, in: IEE Colloquium on CBR: Prospects for Applications,London, UK, 1995.

[2] J. Kolodner, Case Based Reasoning, Morgan Kaufmann, California,USA, 1993.

[3] P. Renoux, R. Milne, M. Drummond, Caledonian Develops A NewTechnique For Predicting Paper Defects, Paper Technology, December(1997), Vol. 38, No. 10.

[4] U. Fayyad, G. Piatetsky-Shapiro, P. Smyth, From knowledge discoveryto data mining: an overview, in: U. Fayyad, G. Piatetsky-Shapiro, P.Smyth and R. Uthurusamy (Eds.), Advances in Knowledge Discoveryand Data Mining, AAAI/MIT Press, Cambridge, MA, 1995.

[5] K. Parsaye, M. Chignell, Intelligent database tools and applications,John Wiley & Sons, 1993.

[6] W. Frawley, G. Piatetsky-Shapiro, C. Matheus, Knowledge discoveryin databases: an overview, AI Magazine, Autumn (1992).

[7] J.R. Quinlan, C4.5 Programs for Machine Learning, Morgan Kauf-mann, CA, 1993.

[8] C. Riesback, R. Schank, Inside Case-Based Reasoning, LawrenceErlbaum, Hillsdale NJ, 1989.

[9] R. Milne, C. Nelson, Using data mining to learn to be able to predictpaper making defects on-line, in: Unicom Seminar, CBR and DataMining: Putting The Technology To Use, London, UK, 18–19March, 1997.


predicting paper making defects on-line using data mining

Documents