automatic characterization of ignition processes with machine learning clustering techniques

Automatic Characterizationof Ignition Processes withMachine LearningClustering TechniquesEDWARD S. BLUROCK

Combustion Physics, Lund University, SE-22100 Lund, Sweden

Received 12 March 2004; revised 8 July 2005; accepted 27 January 2006

DOI 10.1002/kin.20191Published online in Wiley InterScience (www.interscience.wiley.com).

ABSTRACT: Machine learning clustering techniques are used to characterize and, after the train-ing phase, to identify phases within an ignition process. For the ethanol mechanism used in thispaper, four physically identifiable phases were found and characterized: the initiation phase,preignition phase, ignition phase, and the postignition phase. The clustering is done withrespect to fuzzy logic predicates identifying the maxima, minima, and inflection points of thespecies profiles. The cluster descriptions characterize the phases found and are in human in-terpretable form. In addition, these descriptions are powerful enough to be used to predict thephase structure under new conditions. Cluster phases were calculated for the ethanol mecha-nism at an equivalence ratio of 0.5, a pressure of 3.3 bar, and the temperatures 1200, 1300, 1400,and 1500 K. The resulting cluster phase descriptions were then successfully used to predictthe phase structure and ignition delay times for other temperatures in the range from 1200 to1500 K. The effect of different fuzzy logic predicate profile descriptions is studied to emphasizethat the boundaries of some phases, specifically that between the preignition and the ignitionphase, are a matter of what the modeler considers important. The end of the ignition phasecorresponds to the ignition delay time and was relatively independent of the predicate de-scriptions used to determine the phases. C© 2006 Wiley Periodicals, Inc. Int J Chem Kinet 38:621–633, 2006

INTRODUCTION

An ignition process of hydrocarbon fuels, such as thatoccurring in a constant volume bomb reactor, involvesa complex interaction of many species and reactions[1]. Moreover, the process itself is not static and at dif-ferent times within the combustion process differentsets of species and reactions are important. Different

Correspondence to: Edward S. Blurock; e-mail: [email protected]© 2006 Wiley Periodicals, Inc.

mechanisms are at work during different phases of theignition process. Identifying and characterizing thesephases provides more fundamental understanding ofthe process as a whole. This work tries to formalizeand automate the intuitive process of an experiencedchemist by reproducing, through a software implemen-tation, the reactive phases normally identified by hand.

As the mechanisms for simulations become morecomplex, the amount of information that has to be an-alyzed increases and processing the data by simply“looking at” the results becomes prohibitive. Compu-tational tools to automate processing this expanse of

622 BLUROCK

data are becoming a necessity. In addition, automationbrings with it two significant advances to the analysisprocess. First, by automating an analysis procedure, anexact definition of the criteria used is set down and am-biguities are avoided. Second, the automated analysiscan be built into larger procedures and can be used tomake decisions in complex simulations.

In a previous paper [2], machine learning cluster-ing techniques [3,4] were used to characterize the re-active phases within a simple (36 reactions) aldehydecombustion mechanism [5]. In this previous study, themechanism was characterized by the sensitivity [6] ofeach reaction. A set of physically identifiable phasesfor a single zero-dimensional simulation were found.Both the time range and the set of reactions which char-acterize this range were found.

This paper extends this previous work with a hightemperature ethanol mechanism [7] (57 species and 372reactions) using a fuzzy logic [8,9] description of theextreme points of the species profiles to characterize themechanism. In addition, the phases are characterizedfrom a small set of temperature conditions and thenused to predict phases for any temperature within arange.

In the first sections of this paper, the technique ofderiving phases using the clustering technique is ex-plained, first in a general way and then its specific ap-plication to the high temperature ethanol mechanism.This is followed by two sets of results, first the pre-dictive effectiveness of the cluster phase descriptionsis demonstrated and then the effect of different fuzzylogic descriptions on the quality of the derived phasesis examined.

To avoid confusion between the objects of the math-ematical clustering method and the physical reality de-scribed, the term “cluster” will be used when referringto the result of the calculation and “phase” will be usedto describe the correspondence to physical reality. Theterm “species profiles” will be used to describe thespecies mass fraction values versus time curves, the“raw” input to the clustering method.

IGNITION PHASES

For the combustion of fuels, such as an ethanol andoxygen mixture at an elevated temperature, one of thepronounced events in a constant volume bomb (whichis being modeled in this study) is the rapid rise of tem-perature and pressure corresponding to autoignition.Coinciding with this event is also the rapid change ofintermediate radical species concentrations. The timeof the event after the initial introduction of the fuel andoxidant is the ignition delay time. Through examina-

tion of the species profiles, this ignition event can bereadily identified by an experienced chemist.

An experienced chemist also recognizes that a com-bustion process goes through different mechanisticphases before and after this readily identifiable point.Given the species profiles and some mechanistic intu-ition of the process, different mechanistic phases canbe identified. A chemist can look at the profiles of keyspecies (known through mechanistic knowledge of thesystem) and identify the phase through which a com-bustion process is going. To do this, the chemist sets upcriteria, based on species profiles, to establish boundswithin the process to that phase. A phase has a certainset of recognizable characteristics that are true through-out the phase. In other words, these predefined signifi-cant criteria are the same or at least similar throughoutthe phase.

Clustering, by definition, groups together sets ofsimilar objects. If the objects are snapshots of a com-bustion process over time, then combustion phasesformed by clustering would be sets of snapshotshaving similar characteristics. If the set of charac-teristics encompasses the types of descriptions usedby the chemist, then it is to be expected that thesets of phases determined would not significantlydiffer from those deduced by hand. In this paper,descriptions of the species profiles that are readily in-terpretable by the chemist are set up. Thus the clusterdescriptions derived automatically can be tested againstintuition. Furthermore, these quantitatively and quali-tatively valid descriptions are tested further by theirability to predict the phases in “new” combustion pro-cesses, i.e. ignition processes with different startingconditions from those from which the clusters werederived.

This paper expands upon the basic idea presented inthe previous paper [2] on clustering in two ways. First,the descriptions used are more intuitive descriptions ofthe species profiles and can thus readily be associatedwith the intuition of the chemist. The phases identifiedautomatically can be easily associated with a physicalinterpretation. Second, these descriptions are used tobe able to predict the phases in processes started un-der different conditions. Successful prediction impliesthat the essential information was extracted from theraw data and reasonable general principles were estab-lished.

PROCESS PHASE DESCRIPTIONSWITH CLUSTERING

A key step in optimizing a process is to understandand characterize its inner workings as precisely as

AUTOMATIC CHARACTERIZATION OF IGNITION PROCESSES 623

possible. Within a process over a continuous coordinatesuch as time, an important characterization involvesknowledge about the phases through which the processpasses. Not only at what point in time these phasesoccur is important, but also a description, through theprocess parameters, of how these phases differ is useful.

The field of machine learning and statistics offers awide range of tested techniques to analyze and char-acterize processes [10]. One advantage of using suchtechniques is that they automate the analysis. They are,other than the choice of input to these methods, freeof human bias. The consequence is that these methodsare free not only to solidify, through sound quantitativeanalysis, the intuition that the expert already has, butalso, through their unbiased approach, they can be usedto identify new, possibly unnoticed, characteristics ofthe process.

One successful analysis method, applied primarilyto the classification of individual objects, is clustering[11,12]. This class of methods “characterizes” a set ofobjects by grouping them into sets where all the objectswithin the set are “similar.” What constitutes similaritydepends on what parameters are used to characterizethe objects. If the parameters describe behavior, thenthe objects within the set will have similar behavior.If the parameters describe the physical properties ofthe object, then the objects within the set would besimilar in, for example, size, shape, and color.

Clustering finds similarities between sets of“objects.” If we now define these objects as “states”within a process, then the clustering would yield setsof process states with similar characteristics. In a time-dependent process, these states would be the descrip-tion of the process at a particular point in time. The statedescriptions as used here have no knowledge as to thetime of their occurrence. There is no knowledge (andhence no bias) that the process varies continuously fromone state to the next. That the “desired” clusters shouldinvolve states that are next to each other in time, form-ing phases is not biased by a time parameter within thedescription. However, since the states describe a con-tinuous process, two adjacent states (in time) do notdiffer very much (the definition of continuity). Adja-cent states are, in fact, similar. This similarity wouldtend to put them in the same cluster. Differentiationwould occur when, as more states are put in a cluster,the difference between the first state and the last statebecomes too large. At this point a new cluster wouldstart. Hence, the intuitive notion of phases or regionswithin a continuous process can be quantitatively jus-tified with clustering.

In the choice clustering methods, a variant of theCOBWEB [3,4] (implemented [13] by the author in-cluding fuzzy logic) has been chosen because of sev-

eral important properties. The first is that the numberof clusters need not be specified and is determined dy-namically. This means that there is no additional biasof “forcing” a certain number of process phases or re-gions. The second is that it produces an arbitrarily com-plex hierarchy of clusters starting with a few “general”clusters and proceeding down to, if desired, the indi-vidual states themselves. This means that a range ofprocess phases is determined, in which general phasesare given at the upper levels and these are differentiatedto finer phases in the lower levels. A third property isthat the algorithm is “incremental,” meaning that theentire set of states need not be specified at the begin-ning of analysis. In this way, not only can finite pro-cesses (in which there is a definite beginning and endto the process) be characterized but also continuous (inprinciple nonending) processes.

PROCEDURE

Detecting combustion phases within a process throughclustering essentially starts with the prepared combus-tion species profile data (calculated here using the LundIGNITION system [14]) of a constant volume zero-dimensional calculation (energy and mass conserved).Each object is a vector of species mass fraction valuesfor each combustion time. A description of each objectis created based on (possibly fuzzy logic) predicatesabout the local behavior of the profiles. The methodthen groups these into clusters. The set of combustiontimes grouped together represents the phase of the com-bustion process. The procedure to detect phases withina continuous process has essentially five steps:

Ignition Profiles. Model calculations are made ofa zero-dimensional ignition calculation. The resultof each calculation is a set of vectors, one for eachtime period, having the mass fraction values of eachspecies at that time. A calculation is defined by itsinitial conditions.

Derivatives. In order to characterize the form of thecontinuous curves more exactly, the derivatives ofeach data point are calculated. The characterizationof one data point then consists of the value and itsfirst and second derivatives.

Predicates. The clustering method does not use thevalues directly, but predicates of the values. In thispaper, basically very simple (fuzzy) predicates areused, namely greater than, equal to, and less thanzero.

Clustering. With the selected predicates, theclustering procedure creates a hierarchy of clusters.

624 BLUROCK

Associated with each cluster is a set of ordereddescriptions (the predicates over the parameters)and subsets (one for each cluster) of the points.

Ranges. For each cluster, the ranges (over thecontinuous parameter) are determined. Selectioncriteria, in terms of minimum number of pointsallowed in a range, how large “gaps” (missingpoints) can be within a range and when a clusterdivides into two ranges.

Ignition Profiles

Species, temperature, and pressure profiles within athermal ignition process can be approximated witha homogeneous (zero-dimensional) model where theconservation of energy and mass is observed. Thesource term kinetics is represented with Arrheniusexpressions. In this study, the ethanol mechanism ofMarinov [7] is used. The calculations were done assum-ing adiabatic and constant volume conditions. The ini-tial temperature of each calculation ranged from 1200to 1500 K. The initial pressure was also 3.4 bar, and theinitial equivalence ratio was always 2.0 (as in Fig. 12of the Marinov paper [7]). The result of the calcula-tion is the species composition, the temperature andthe pressure evolution over time. For the calculationsdone in this paper, the output of a single calculationis a matrix of 200 vectors, one vector representing thecomposition at a given time. Each vector represents the“state” of the process at that given time (under the givenstarting conditions). The density of the time points overtime is not evenly spaced and is determined by how fastthe species profiles are changing. It is a natural conse-quence of the numerical integration used to calculatethe ignition process. The highest density of time pointsis during ignition. It is important for the clustering pro-cess that the species mixture fraction vectors do notchange significantly from one time step to the next.

Derivatives

Each point in time has associated with it a set of param-eters describing the “state” of the process at that point.Though ANALYSIS [13] allows more general descrip-tions, most commonly this is a vector of floating pointnumbers. In the ignition process modeling, the param-eters are the mixture fractions of each species at thatpoint in the reaction process. This vector represents the“state” of the process at that time.

When examining a curve, such as species mixturefraction versus time in an ignition process, often gen-eral information about the curvature or “shape” of theprofile, i.e. whether it has maximum, minimum, or in-flection points, whether it is increasing or decreasing

and where these occur, is important. For example, in anignition process, ignition itself is characterized by therapid rise or decline of intermediate species mixturefractions. The actual values of the maximum can be ofsecondary importance, especially in determining whenthe processes start and stop. For this reason, this studyutilizes differentials, first and second, of the speciesprofiles as fundamental parameters.

Predicates

The determination of the cluster phases can be thoughtof as a decision process, i.e. is a given point withina cluster or not. In making this “decision,” questionscan be asked about the species profiles. In fact, thesequestions can be quite general, such as is the profileincreasing or is it around a maximum. More specificinformation, such as the rate of increase in a givenvalue, can be more detailed than needed. Asking suchgeneral questions is the more “intuitive” method ofexamining a curve, i.e. as if one is just “looking” at it.

In the procedure used here, the questions are in theform of “predicates” such as “equal to,” “less than,”and “greater than.” The characteristics of the profileshown in Fig. 1 are determined with predicates withrespect to derivatives. For example, “negative slope” isequivalent to the first derivative being less than zero and“maximum” is equivalent to the first derivative beingzero and the second derivative being negative. The clus-tering phases are thus characterized by descriptions as,for example, one set of species profiles is decreasingand another is decreasing in mixture fraction. Such a

Figure 1 The values and the derivatives give the important

characteristics of the curve. By simply examining whether

the first and second derivatives are greater than, equal to, or

less than zero, the essential properties of the curvature are

found.


Figure 2 The fuzzy membership function “equal to” zero

defines a “region” (of decreasing importance as one move

away from zero) around the zero value.

description is close to the intuitive notions for describ-ing species profiles. This makes the final clustering re-sults derived from these predicates easily interpretable.

Since the maximum, minimum, and inflection pointsare single points in space, they would rarely be capturedin the predicate description (numerically, the discretetime points the algorithm is working with may not fallexactly on these points). For this reason “fuzzy” pred-icates, as in Fig. 2, are used to allow values that are“close to” these points. The fuzzy predicate is a func-tion which gives a number between zero, which corre-sponds to absolutely false, and one, which correspondsto absolutely true. The values in between represent howclose the value is to true or false. Completely undecidedis halfway between true and false, 0.5 (see Fig. 3). Inaddition, the use of fuzzy predicates gives a smooth

Figure 3 The fuzzy predicate description for the fuzzy pred-

icates equal to, greater than, and less than. The “width” of

the fuzzy functions is 0.1. The value of each of the predicates

is shown for −0.08. The input −0.08 is “barely” in the range

of the equal to predicate giving a value of 0.2. The number is

definitely not greater than zero, so the greater than predicate

gives a value of 0.0. The number is negative, but not very

large, so the less than predicate gives a value of 0.8. Note

that the sum of all three predicates is always one. This is

important for the “fuzzy counting” property.

transition between the boundaries and thus no “sharp”transitions are encountered. The gradual transition, in-troduced by the fuzzy predicate functions, also trans-fers some of the “continuous” character in the inputparameter.

Though the implementation of the procedure allowsmore general fuzzy predicates, in the experiments, sim-ple classical ramp functions are used. In transformingthe data, only the boundary, i.e. the position of the ramp,and how smooth the transition is, i.e. the slope of theramp, need be specified (see Fig. 2).

Clustering

The clustering method used in this paper is based on theCOBWEB method of Fischer [3,4]. The implementa-tion used extended the original formulation with fuzzylogic. The important, to data analysis, characteristicsof this method are as follows:

Unsupervised. This means that no knowledge of agoal result, the time of the point or the phases of theprocess, is given as input to the method. The goalresults are determined completely by the parametersand, if these are chosen properly, the goal results willreflect what “intuitively” is desired.

Flexible Number of Clusters. The number of clustersis not specified as in put to the procedure. Clustersare formed as needed by the data, i.e., when a pointis different enough, it gets assigned to a new cluster,otherwise it goes in an existing one. In this respect,the number of clusters is also unsupervised.

Hierarchal. Clustering, which is synonymous withsimilarity, is not a flat concept, but one of differingdegrees. In this cluster implementation, a hierarchyof similarity clusters is created. The depth of thehierarchy can go down to most detailed, i.e., theindividual objects.

Incremental. Each point is given individually to themethod and placed appropriately in the hierarchy.The consequence of this is that the whole set ofpoints need not be present during the analysis. Inprinciple, this means that the clusters can be formed“dynamically,” as the points are “generated” (whichcould be useful for “nonending” processes and dy-namic analysis).

Fuzzy Counting

To count how many objects have a particular property,only the number of objects which are “true” for thatproperty need be counted. To count with respect to

626 BLUROCK

fuzzy predicates, only the values of the predicates needto be added up. When absolutely true, a one is addedand when absolutely false, a zero is added (not changingthe count).

A fundamental step in determining whether a givenpoint belongs in a cluster is the calculation of proba-bilities, speciffically the probability that a given pred-icate is true (or not). These are calculated by simplecounting of how many times a predicate is true or falsefor a point (over the total number of points). The useof fuzzy logic allows partial counting, i.e. one pointcan be “partially” true (in the sense of membership)for one predicate and partially true for its complement.This means that it would be “partially” counted in theprobabilities. However, the sum of partially true andpartially false has to be one.

For example, in the second derivative predicate ofFig. 3, a value of −0.08 produces “fuzzy” values for the“equal to,” “greater than,” and “less than,” predicatesof 0.2, 0.0, and 0.8, respectively. In the count of thenumber of objects which are, for example, “less than”zero, this object (representing “sort of”) would con-tribute 0.8 to the count. In other words, it is not clearlyless than zero, so the fuzzy value is not one, but it isgetting far away from zero to contribute a somethingclose to a one, i.e. 0.8. The sum of the values is alwaysone.

Ranges

The clustering procedure has no knowledge of the com-bustion time at which the object occurred. It couldhappen that there are phases of similarity, hence clus-ters, that are separated by other activities, i.e. differingbehavior that would result in the point ending up inanother cluster. This means that the clusters have toundergo a further procedure to separate the differentcontiguous phases of similarity.

From the cluster, ranges of points in the continuousvariable must be extracted because the cluster itself isonly a set of points. The continuity of these individualpoints, forming phases, has to be determined via thecontinuous variable, in this case the combustion time.“Valid” ranges are determined by an implemented pro-cedure based on two user-defined criteria:

Minimum Length. This is the minimum numberof data points within a range. Some clusters havevery few points (some are even isolated points), andthus it makes no sense to consider these as ranges.The requirement that the predicates come in sets of“partitions” ensures that the total number of timesthe point is counted is exactly one.

Figure 4 The determination of ranges of data points is done

by recognizing sequences of consecutive assigned cluster

points with a minimal number of points that are not assigned.

In this schematic, the vertical bars represent points assigned

to a cluster and the dots are those not assigned. The sequences,

or phases, correspond to sets of cluster assigned points with

a minimal number of unassigned points interrupting them.

Maximum Gap. A few consecutive points missing(as compared to the original full set) in the cluster ofpoints may not destroy the continuity of the rangeas a whole represented by that cluster (see Fig. 4).

Using these criteria in the formation of ranges from asingle cluster of points has the following consequencesin interpreting where the ranges are and where theyexist (see Fig. 4):

Too Small. These ranges are too small, i.e. notenough points, to be considered “consecutive,” sothey are discarded.

Range with Gaps. A contiguous range of points wasfound with only a few missing points of discontinu-ity is still considered a valid range.

Separate Ranges. If a too large gap was encountered,but beyond the “gap” the criteria of a valid range ismet, then two separate ranges are produced. The firstrange ends at the beginning of the gap, and the nextbegins after the gap.

In the implementation, the only requirement of thecontinuous variable is that there is a strict ordering de-fined through a “greater than” relation. The entire setof points is ordered according to this relation. The con-tinuity and missing points within a subset of pointswithin a cluster is determined by comparison with theseordered points.

PHASE DESCRIPTION AND PREDICTION

Each cluster, representing a combustion phase, has es-sentially two sets of information. The first is the sub-set of “training” data which has been assigned to eachcluster group. Each vector is associated with a givencombustion time (from the original combustion cal-culation). The set of combustion times of the cluster


represents the range of combustion times that the phaserepresents.

The second set of information is the cluster de-scription. This represents the essential properties of thecluster. This description can also be used to “predict”in which cluster (representing a combustion phase) aparticular mixture fraction vector (even one not in theoriginal training set) belongs. A “new” mixture fractioncomposition can be compared to each cluster descrip-tion. The mixture fraction composition belongs to thecluster with the closest description. In this way, thecluster description can be used to predict the phase ofany given mixture fraction composition.

To illustrate the prediction aspect of the cluster de-scription, suppose a set of cluster phases are determinedusing the combustion calculation of a ethanol mixturewith an equivalence ratio of 0.5 at a pressure of 3.4 barand a starting temperature of 1300 K. From the cal-culation, a vector of species mixture fractions at eachcombustion time is determined (typically this is a ma-trix of 200 vectors). From these “row” data, the firstand second derivatives are determined and then fuzzypredicate values (greater than and less than) are deter-mined. At a given time, the “state” of the system isthen represented by a vector of fuzzy true and falsevalues (a number from 0 to 1). From these data, a setof phase clusters with corresponding cluster descrip-tions are formed. The resulting cluster descriptions canthen be used to predict where the corresponding phaseswould appear in combustion processes under differentstarting conditions. For example, another calculationcan be done under the same condition, except with astarting temperature of 1350 K. From the new calcula-tion, a set of mixture fraction vectors for each combus-tion time are determined. Each of the derived predicatevalue vectors is compared to the cluster descriptionand assigned to the closest. When all of the vectorsfor each combustion time have been assigned, then thephases under these new conditions can be defined. Thisis done in the same way as with the original set; themass fraction vectors and the combustion times of thevectors assigned to a particular cluster define the phase.The only difference is that with the “new” set of vec-tors, the cluster description and cluster structure is notupdated.

The correct assignment of clusters to each combus-tion time vector and hence the correct prediction ofthe reaction phases has the prerequisite that the gen-eral mechanistic behavior is the same for the originalset that was used to form the clusters and the “new”set where the clusters are to be assigned. For example,if the starting conditions of the original combustioncalculations and the new combustion calculations aresimilar (for example, similar starting temperatures), it

can be expected that the mechanistic behavior will alsobe similar.

REACTION PHASES OVER ATEMPERATURE RANGE

In order to increase the range of validity of the clus-ters and hence the description of the reaction mechan-ics, instead of basing the cluster formation on a singlecombustion run with a single set of starting conditions(as was done in the previous paper), the clusters canbe formed using a set of combustion runs with a set ofstarting conditions. The fundamental assumption is thatan ignition process has distinct mechanistic propertiesregardless of starting condition. The similarity of thereaction phases is captured by the clustering. Differentmechanistic phases are captured by the formation ofdifferent clusters.

To form the clusters all mass fraction vectors fromall of the runs are used. The clusters are formed purelyfrom the mass fraction descriptions. The combustiontime, information which is different between runs withdifferent starting conditions, is not used. For exam-ple, two similar mass fraction compositions originatingfrom two combustion runs of different starting temper-atures would occur at different times. That with thehigher starting temperature would be expected to oc-cur earlier because the combustion process proceedsfaster at a higher temperature.

To illustrate, the automatic recognition of reactionphases over a temperature range of 1200–1500 K isstudied. Four independent combustion runs are made:1200, 1300, 1400, and 1500 K. Each calculation pro-duces 200 mass fraction vectors, one for each time step.To produce the cluster descriptions, the entire set of 800vectors is used. These 800 vectors should be represen-tative of the cumulative behavior over this tempera-ture range. As explained in the procedure section, theclusters were formed using first and second derivativeinformation and predicates about this information.

Four cluster phases were found with this analy-sis. The phases roughly correspond to an “initiation”phase, a “preignition” phase, an “ignition” phase, anda “postignition” phase. The predicted phases derivedfrom the fixed cluster descriptions are shown inTable I.

Comparing the cluster descriptions in Table I withthe mole fraction profiles (shown in Fig. 5 for calcu-lations at a starting temperature of 1300) justifies the“reasons” for forming the clusters at the times found.

The most marked phase boundary is the ignition de-lay time. The profiles for the species CH3HCO, CH2O,and CH4 exhibit a dramatic decrease up to the boundary.

628 BLUROCK

Table I Relative Size and Significant Descriptors for Each Phasea

Initial Phase (18.5%) Preignition Phase (13.8%)

Predicate Description Measure True Predicate Description Measure True

d2(O2)dt2 < 0 0.482 99 d(CO)

dt > 0 0.431 86

d(CH2O)dt > 0 0.435 100 d(CH3HCO)

dt > 0 0.428 100

d(CO)dt > 0 0.417 85 d(CH2O)

dt > 0 0.413 98

d(CH3HCO)dt > 0 0.398 96 d(CH4HCO)

dt > 0 0.389 98

d(C2H5OH)dt2 < O 0.393 97 d(C2H4)

dt > 0 0.385 100

d(C2H4)dt > 0 0.388 100 d(CH3)

dt > 0 0.363 100

d(CH3CH2O)dt > 0 0.383 95 d(H2)

dt > 0 0.362 98

d(CH4)dt > 0 0.372 96 d(HO2)

dt > 0 0.325 100

d2(H2O)dt2 > 0 0.367 99 d(C2H5)

dt > 0 0.295 90

d(CH3)dt > 0 0.366 100 d(HCOOH)

dt > 0 0.295 80

Ignition Phase (7.7%) Postignition Phase (60.0%)

d(H2O2)dt < 0 0.548 55 d(CO)

dt < 0 0.329 94

d(CO)dt > 0 0.548 100 d2(O2)

dt2 > 0 0.258 78

d2(CH3HCO)dt2 < 0 0.517 90 d(CH3)

dt < 0 0.257 66

d2(C2H4)dt2 < 0 0.512 90 d(H2)

dt < 0 0.255 68

d2(CH4)dt2 < 0 0.48 85 d(CH4)

dt < 0 0.233 67

d(HCCO)dt > 0 0.477 98 d(CH3H4)

dt < 0 0.226 65

d2(CH2O)dt2 < 0 0.47 88 d(CH2O)

dt < 0 0.203 67

d2(O2)dt2 < 0 0.461 98 d(CH3HCO)

dt < 0 0.201 67

d2(OH)dt2 > 0 0.457 81 d2(H2O)

dt2 < 0 0.183 59

d(HCCOH)dt > 0 0.455 95 d(CH3OH)

dt < 0 0.179 64

a With each description a measure of its significance is given.

The “predicate description” column lists the set of significant descriptors of the phase. The “measure” is a relative indication of how

significant that predicate is (a result of the cluster calculation). The “true” column show the percent of the times within that cluster in which

the predicate is true. The value in parenthesis next to the general physical description of the phase is the percentage of the total set, the times

of the four starting temperature calculations combined, that belong to that cluster. The text gives a more detailed physical interpretation of each

cluster.

The cluster description of the ignition phase is de-scribed by the second derivatives of these species beingless than zero. The boundary is also marked by the COconcentration reaching a peak at this boundary. In thecluster descriptions, this is seen as the first derivativebeing greater than zero in the ignition phase and thefirst derivative being less than zero in the postignitionphase. The O2 profile can be seen to have an inflectionpoint at this boundary. In the cluster descriptions, thisis seen as the second derivative being negative beforethe ignition and greater than zero after ignition.

The second boundary, marking the transition be-tween the preignition and the ignition phase, is char-acterized by more subtle changes. This is the transi-

tion between the building up of intermediate radicalsand their consumption during the ignition process. Thespecies profiles of CH2O and CH3HCO show a max-imum at or around this boundary. In the cluster de-scription, the preignition phase is described with all theintermediate species having a first derivative greaterthan zero. In the ignition phase, the species H2O2,CH3HCO, C2H4, and CH4 have negative first deriva-tives.

The third boundary marks the transition between theinitial and the preignition phases. Both phases are quitesimilar and marked by the increase in intermediate rad-icals. These are characterized in the cluster descriptionsas having first derivatives greater than zero. The marked


Figure 5 These graphs illustrate the combustion phases found with respect to species concentrations for the case with a starting

temperature of 1300 K. The graph on the left plots the fuel (CH3CH2OH) and the oxidizer mass fractions with two intermediates,

CO and OH as a function of ignition time. The graph on the right plots a few significant (as found in the cluster descriptions of

Table I) intermediates in the combustion process, CH3HCO, CH2O, CH4, and H2O2. The units of the graphs are mass fractions.

The vertical lines delineate the phase boundaries. Comparing the derivative information of the descriptions of Table I gives

justification of the choice of boundaries. This is explained in more detail in the text.

difference between the description of the initial phaseand the preignition phase is the description of the sec-ond derivative of the fuel, C2H5OH and oxidizer, O2,being less than zero.

PHASE PREDICTION OVER ATEMPERATURE RANGE

Four phases were determined from the data of com-bustion runs at four discrete starting temperatures. Theprevious section illustrated that the cluster phase de-scriptions were human interpretable and could be in-tuitively justified. In this section, it is shown that thedescriptions are also powerful enough to describe phasebehavior beyond the four conditions used to form theclusters. Assuming that the mechanistic characteristicsdo not change between two adjacent temperatures, theformed cluster phases can be used to predict the phasesof runs with starting temperatures between these fourdiscrete starting temperatures used to establish thesephases.

For each “new” starting temperature (a starting tem-perature different from the original four used to formthe phase), a combustion run is performed. The resultbeing 200 mass fraction vectors representing the com-bustion process at 200 discrete times in the process.Each of these vectors is then compared to the existingcluster descriptions and assigned to the cluster withthe closest matching description. When all the vectorshave been assigned to the clusters, the reactive phase isdetermined by the set of combustion times within eachcluster (as outlined in the “ranges” section).

Figure 6 shows the “predicted” boundaries of thenew combustion calculations as a function of startingtemperature. The plot of boundaries is fairly continu-ous and indicates a consistency and accuracy in phaseassignments. From 1300 to 1500 K, the phase bound-aries imply a similar chemical mechanism is at workin this range. In the region from 1200 to 1300 K, the

Figure 6 Predicated phase boundaries for “test” temper-

atures between the “training” temperatures of 1200, 1300,

1400, and 1500 K. The upside-down triangles mark the

boundary between the postignition and the ignition phases,

the triangles mark the boundary between the ignition and

the preignition phases, and the dots mark the boundary be-

tween the preignition and the initial phases. The pressure and

equivalence ratio used in the calculation is 3.3 bar and 0.5,

respectively. The continuity of the phase boundaries shows

that the mechanistic behavior is relatively consistent, and this

can be predicted with the cluster descriptions. See text for

more detailed explanations.

630 BLUROCK

change in the end of the initial phase boundary impliesthat another mechanism is coming into play.

Examination of the results in Fig. 6 shows thatfour phases were formed. The boundary formed bythe largest time division corresponds to the predictedignition delay time and separates the ignition andthe postignition phases. The next lower phase can bethought of as the very beginning of the ignition processand separates the preignition phase with the ignitionphase. The lowest time division marks the end of theinitialization phase. For the entire region from 1200 to1500 K, the predicted ignition delay time and the begin-ning of the ignition phase are consistent. As the startingtemperature increases, the positions of the boundariesbecome earlier. This reflects the physical intuition thatwith higher starting temperatures, the process becomesfaster.

For the temperature region from 1300 to 1500 K,the beginning of the preignition is consistent. From1200 to 1300 K another behavior is observed. It is thesimilarity of the initial phase and the preignition phasewhich explains the behavior of the predictions betweenthe temperatures 1200 and 1300 K. For temperaturesclose to 1200 K, these two phases are merged into onephase. This is exemplified by the fact that the 1300,1400, and 1500 K sets of the training examples all havethe four phases. However, the 1200 K training exampleonly has three phases, the initial and the preignitionphases are found to be in a single phase. This explainswhy the predictions in the range from 1300 to 1500 Kwere consistent, and in the range from 1200 to 1300 Ka transitional behavior is observed. All four data setshad consistent postignition and ignition phases, and theprediction of these phases was consistent through theentire range. From a data analysis point of view, thisexemplifies that it is important that the test cases arechosen so as to display a consistent behavior.

Table I shows that the phase descriptions are di-rectly interpretable and correspond to physical intu-ition. However, in addition, the results of this sectionshow that the description of the clusters is powerfulenough to predict phase behavior under other condi-tions. The results given up until now have concentratedon one type of description. In the next section, the ef-fects of changing the description are examined.

EFFECT OF FUZZY DESCRIPTIONS

The recognition of clusters and consequently the igni-tion phases is not as dependent on the clustering methodthat is used, but rather the parameters that are usedas input, i.e. the description used. In this study, thecluster is determined from information derived from

the species mass fraction values as a function of time.The descriptions were with respect to the derivativeof the species mass fraction values versus time andwhether these values were rising (derivative greaterthan zero) or falling (derivative less than zero). Thismeans the phases are combinations of consistently ris-ing and falling species mass fraction values. One con-sequence of this can be seen in Fig. 5. The phase bound-ary between the preignition zone and the ignition zoneis placed at or at least close to the maximum of theCH3HCO, CH2O, and H2O2 profiles. On the left of theboundary in the preignition phase, the values are ris-ing (derivative greater than zero) and on the right inthe ignition phase they are falling (derivative less thanzero).

An interesting feature that is not considered in theprevious section is the points where the mass frac-tions reach a maximum or minimum (or even inflectionpoints). By the strictly less than and greater than predi-cates used, the wanted extrema information is only in-directly considered by the transition between the risingand falling of mass fraction values. A phase containingan extremum (where the derivative is zero) is not con-sidered. However, adding a “derivative equal to zero”predicate would not be enough. The extrema is a sin-gle exact point and not, as desired, a region around theextrema. Moreover, the chances that the time intervalvector would have exactly this point, i.e. a derivativeequal to zero, would be small.

For phase determination, the extrema point and theregion “around” the point is significant. To be useful,the predicate description has to reflect that fact. Oneway to include this type of description is to use fuzzylogic descriptions. As can be seen in Fig. 2 (if the pro-file is interpreted to be the value of the first deriva-tive), the fuzzy predicate reaches a maximum of one,or absolutely true, at the maximum (derivative equalto zero). However, as the derivative values move awayfrom zero, the value of the predicate does not fall di-rectly to zero, i.e. absolutely false. Instead it linearly ta-pers off to zero. The slope of predicate line determinesthe “width” of influence around the maximum. This“width” then defines a region around the maximum.

Combining this fuzzy “equal to” predicate withfuzzy “less than” and fuzzy “greater than” predicates(see Fig. 3 for an example of all three as fuzzy predi-cates) expands the description to being able to distin-guish between rising and falling profiles to include thereaching of a maximum. With this change in descriptionon which to base the clusters, it is expected that the clus-ter boundaries will also be accordingly effected. Thephases could now not only be combinations of risingand falling values, but also regions around the maxi-mum and minimum. The behavior that the boundaries


are places of transition between rising and falling val-ues would be expected to lessen in importance. It wouldbe expected that the boundaries would shift to includeregions around the maximum and minimum.

In the preceding section, the predicates were de-scribed by a “hard” division between the derivative be-ing less than zero and greater than zero. In this section,a fuzzy predicate with a ramp function will be used. Tocreate the same “fuzziness” of the derivative parame-ters, the species derivatives were normalized. The pur-pose of this study is to examine the effect of the degreeof “fuzziness,” i.e. the slope of the ramp function onthe placement of the phase boundaries. These studieswere done with a single data set from a calculation withan equivalence ratio of 0.5, a pressure of 3.3 bar, andan initial temperature of 1300. For each type of fuzzypredicate the phases were determined. The effect of“fuzziness,” for example expressed by the width of thefuzzy zone (see Fig. 2), of the description is then ex-amined. The width is defined as the difference betweenwhen the predicate is 1.0 (true) and 0.0 (false).

The parameter and predicate data were prepared inthe following fashion:

Derivatives. Using finite differences, the first andsecond derivatives were determined for each speciesmole fraction proffile.

Statistics. The positive and negative maximum isdetermined for each species derivative within eachignition.

Normalization. Under a “Zero preserving normal-ization,” the parameter is scaled by either the posi-tive or negative maximum (which ever has the high-est absolute value). The range of values is −1 to+1.

Table II Zones Formed with Different Types of Fuzzy Predicates, Distinguished by the Type or the Width of the RampFunction (See Text)

Ramp Type Ignition Flame Other Zone(s) Zone Structure

0.1 LTEQGT 0.2171 0.205881 0.000045 a, b,c,b

0.05 LTEQGT 0.2173 0.199328 0.000058 a, b,c,b

0.01 LTEQGT 0.2175 0.199328 0.000313 a, b,c,d

0.001 LTEQGT 0.2180 0.182944 0.000672 a, b,c,d

0.0001 LTEQGT 0.2186 0.189498 0.001491 a, b,c,d

0.1 LTGT 0.2172 0.214073 0.130515,0.000058 a, b,c,d,b

0.05 LTGT 0.2174 0.212435 0.209158,0.104301,0.000570 a, b,c,d,b,d

0.01 LTGT 0.2176 0.214278 0.189498,0.032211 a, b,c,d,a

0.001 LTGT 0.2124 0.150176 0.000467 a, b,c,d

0.0001 LTGT 0.2124 0.143622 0.000365 a, b,c,d

The ignition column shows the highest zone boundary which corresponds to the ignition delay time. The flame boundary is approximately

where the ignition starts. The other zones mark the boundaries found for initial and preignition zones. The zone structure denotes which cluster

node description is used to describe the zone. In the case a, b, c, d, each zone is described with a different cluster node description. However,

in the case a, b, c, b the second and fourth zones are described by the same cluster node.

Predicates. The predicates are formed around zerowith a given “percentage” fuzzy. Two sets of predi-cates were produced for each species derivative:

LTGT. Two fuzzy predicates representing lessthan zero and greater than zero.

LTEQGT. Three fuzzy predicates representingless than zero, equal to zero, and greater than zero.

Table II shows the effect of using two types of sets offuzzy predicates and the effect of narrowing the rampfunction. For all the cases, the ignition boundary, cor-responding to the ignition delay time, is between 0.212and 0.219. The range for the LTEQGT predicates withdifferent ramp slopes is smaller, between 0.217 and0.219, than the range for the LTGT predicates, between0.212 and 0.217. In any case, these correspond withthe ignition delay times either defined by the where themaximum of the CO2 mixture fraction lies, 0.2197, orby the point of the maximum increase of temperature,0.2141. The ignition point is such a distinct event thatits placement is relatively independent of differencesin the descriptions.

Where the flame begins, shown in column Flame inTable II, is closer to the ignition time for the broaderfuzzy predicates (ramp size of 0.1, 0.05, and 0.01) of theLTGT predicate type and all the cases for the LTEQGTpredicate types. Examining the profiles in Fig. 5 indi-cates that boundary is shifted more to the times wherethere is a rapid drop of intermediate species and awayfrom their maximums. At the same time, this meansthat the position of the maximum is more “inside” thepreignition zone. As the LTGT fuzzy predicates get“harder,” i.e. a ramp of 0.001 and 0.0001, the boundaryis moved to the time where the intermediate radical hitsa maximum. This is because a less than and greater thanfunction would only distinguish between being on one

632 BLUROCK

side of the maximum or the other. This would tend toput the boundary at a maximum point. For the LTEQGTpredicates, the range of variance for the different rampfunctions is small, from 0.182 to 0.205. The placementof this “Flame” boundary is indeed dependent on thedescription. It is up to the modeler to decide whether,for example, reaching a maximum is still part of thepreignition phase or whether it represents the transi-tion to the ignition phase.

Table II also shows that all have at least one bound-ary between the initialization and preignition zones.For the broader ramps using the LTGT predicates,more zones are indicated, i.e. more types of preigni-tion zones. For the LTEQGT predicates, for the broaderramp functions, the initialization zone is very small andgets larger as the ramp function narrows.

As can be seen in the zone structure column ofTable II, the zone descriptions are not unique for thedifferent zones, i.e. the same zone description is usedfor two separate time zones. In terms of determiningthe set of zones as outlined in section Ranges, the twozones would be recognized because there would be alarge “gap,” mixture fraction points not belonging tothe zone, between the two zones. However, in termsof predicting which zone a particular mixture fractionbelongs to, there would be an ambiguity as to whichzone to choose. This phenomenon occurs in both thepredicate types. For the LTEQGT predicate type, theramp has to be at least 0.01 or narrower and for theLTGT predicate type, the ramp has to be at least 0.001or narrower.

For the LTEQGT predicates, a broadened fuzzypredicate tends to have an averaging effect on the zonedescriptions. For the 0.1 case and the 0.05 case, the ini-tialization zone (marked as “a” in column zone struc-ture in Fig. 2) is only about 5% of the total of mixturefraction points in the data set. So, effectively, there areonly three zones with only the ignition zone being dis-tinguishable. With the rapid fall and rise of radicals inthis zone, it is clearly distinguished from the post-andpre-ignition zones where, on the average, more gradualchanges are occurring. For the LTGT cases, this aver-aging effect creates some “indecision” and hence morezones are formed.

In general, the use of the LTEQGT predicatesgives more consistent results in terms of the positionof the boundaries as a function of broadness of theramp. The size of the zones does not change signif-icantly. This indicates that having the extra informa-tion about positions of the maxima (through the fuzzyequal predicate) has a positive effect on zone recog-nition. Only after a certain degree of narrowing doesthe simpler LTGT set of predicates give consistentresults.

The differing phases detected with differing fuzzypredicates emphasize that the exact placement ofboundaries between phases is highly dependent on theexact criteria used. This is true even when deciding theboundaries by hand.

Some boundaries are fairly constant, such as theboundary representing the occurrence of ignition.There is such a dramatic change between before igni-tion and after ignition that a large range of descriptorsyield the same result.

However, when the boundaries have a more qualita-tive behavior, such as the boundary between the preig-nition and the ignition phases, the boundaries dependon which criteria one considers important. In terms ofautomatic detection through clustering, it depends onwhich type of descriptors are being used. The modelercan, of course, use this to his advantage by choosingthe descriptors so as to emphasize the important char-acteristics.

When building clusters with just the less than andgreater than descriptors, then the only type feature de-tection is, for example when looking at the first deriva-tive, the rising (derivative is greater than zero) or falling(derivative less than zero) species concentrations. Thisputs the emphasis on the transition between rising andfalling slopes and hence the boundaries would tend tobe placed at extremum of the species profiles.

In contrast, if the equal to predicate is included,the description is expanded to include the region, forexample with respect to the first derivative, around theextremum. The first derivative is zero at an extremumand using fuzzy logic, when the point is “around” theextremum, the first derivative is “close to” zero. Havingthis extra descriptor tends to shift the boundary suchthat the extremum is included in the cluster.

CONCLUSION AND FUTURE WORK

The purpose of this work was to show that the clus-tering method cannot only successfully identify physi-cally reasonable phases in an ignition process, but alsocan characterize them enough to be able to predicatethe phases, given the initial setup of phase descriptionthrough clustering.

An ignition process has the property that at the timeof ignition, there is a rapid change of radical concentra-tions. Some are rapidly increasing (along with the tem-perature) and some are rapidly decreasing due to theincrease of temperature which increases the reactionrates of reactions involving intermediate species. Thispoint was consistently identified, independent of thepredicate description, by the clustering process. Thisis a well-defined point physically and in terms of the


species profiles and is thus easily characterized andpredicted.

For the most part, the clustering yielded four phases.First, a fast initialization phase. This is initial consump-tion of the fuel and oxidizer. This is followed by a largerpreignition phase where there is a gradual buildup ofintermediate radicals which will eventually lead to ig-nition. The next phase, the ignition phase, is when theradicals are starting to be consumed leading finallyto ignition where very rapid changes occur. The finalphase is the final conversion to end products.

The cluster method consistently produced bound-aries around physically justifiable times for the ini-tialization, preignition, and ignition phases. Thoughgenerally around the same time, their exact placementdid depend on the type of predicate description thatwas used. This, however, would also be the case if theprocess was done by hand. Depending on what is be-ing looked for or what criteria are important, differentboundaries would be drawn.

The prediction process in this paper used the entireset of predicate descriptions to predict which phasea particular configuration of mixture fractions belongsto. In future studies, other machine learning techniqueswill be used, specifically that of decision trees [15],where a minimal set of predicates are used to “decide”which phase to assign.

A process, such as ignition in combustion, evolvesthrough several distinct phases. This is a natural intu-itive concept. Thus clustering provides an additionaltool to aid in the understanding of the process. Thesephases are assignable through identifiable characteris-tics which can remain constant over a range of start-ing conditions. Clustering provides a methodology toidentify and describe the phases. In addition, the clusterdescription is powerful enough to predict these phasesunder a new set of conditions.

Identifying and describing the phases through auto-matic clustering is a means to increase the understand-ing of the process in general by their characterization.The use of the clustering descriptions to identify whena process is in a particular phase is a means to use thesephases in processing a combustion process in a higher,more complex algorithm.

The clustering method described to identify phaseshas as input a set of matrices where the columns rep-resent the process parameters and the rows representthe evolution through the process. The illustration inthis paper was an ignition combustion process, but,in fact, any continuous process evolving through (asingle dimensional) parameter, such as time or space,can be characterized and predicted in an analogousway.

BIBLIOGRAPHY

1. Griffiths, J. F.; Barnard, J. A. Flame and Combustion,

3rd ed; Blackie Academic and Professional: Glasgow,

UK, 1995.

2. Blurock, E. S.; Int J Chem Kinet 2004, 36, 107–118.

3. Fisher, D.; Machine Learning 1987, 2, 139–172.

4. Gennari, J. H.; Langley, P.; Fisher, D. In Machine Learn-

ing: Pardigms and Methods; Carbonell, J. G. (ed.); MIT

Press: Cambridge, MA; 1990, pp. 11–61.

5. Hochgreb, S.; Dryer, F. L. Combustion Flame 1992, 91,

257–284.

6. Vajda, S.; Valko, P.; Turanyi, T. Int J Chem Kinet 1985,

17, 55–81.

7. Marinov, N. M. Int J Chem Kinet 1999, 31, 183–220.

8. Zadeh, L. A. Inf Control 1965, 8, 338–353.

9. Kruse, R.; Gebhardt, J.; Klawonn, F. Foundations of

Fuzzy Systems; Wiley: New York, 1994.

10. Mitchell, T. M. Machine Learning; McGraw-Hill:

Boston, MA, 1997.

11. Tou, J. T.; Gonzalez, R. C. Pattern Recognition Princi-

ples; Addison-Wesley: Reading, MA, 1974.

12. Briscoe, G.; Caelli, T. Symbolic Machine Learning,

Vol. 1 of A Compendium of Machine Learning: Ablex;

Norwood, NJ, 1996.

13. Blurock, E. S. Analysis: Object oriented system for ma-

chine learning and data analysis; Combustion Physics,

University of Lund; Lund, Sweden.

14. Mauss, F. Ignition: Numeric solver for ignition pro-

cesses; Combustion Physics, University of Lund: Lund,

Sweden.

15. Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J.

Classification and Regression Trees; Wadsworth: Pacific

Grove, CA, 1984.

automatic characterization of ignition processes with machine learning clustering techniques

Documents