8 th european aec/apc conference - dresden 2007 extracting correlated sets using the chi-squared...

1
8 t h E u r o p e a n A E C / A P C C o n f e r e n c e - D r e s d e n 2 0 0 7 Extracting correlated sets using the chi- squared measure within n-ary relations: an implementation A. Casali 1 , C. Ernst 2 , F. Gasnier 3 , J. Stephan 2 1: Université de la Méditerranée / LIF 2: École des Mines de St Étienne / CMP-GC 3: STMicroElectronics Rousset The field of APC aims at highlighting correlations between Production parameters. This study focuses on the device analysis of the principal trajectories impacting the yield. The goal is to detect correlations between data measurements structured as n-ary relations and involving (at least) one target attribute. The method uses a data mining levelwise algorithm based on both the chi-squared and the support measures. Motivations Methodology: a KDD approach Results This approach makes it possible for STMicroElectron to highlight unknown correlations between various validated by electrical and/or physical analysis. While the proposed mining method confirmed that lev algorithms do not provide results beyond four sea proved its value for n-ary relations with a very numerical attributes. The study aims at supporting the development of eff control loops. Conclusions Future Work This work was initiated while the fourth author was at Ecole Saint-Étienne / CMP-GC, and was supported by Research Project 2003-2008”, financed by the Communauté du Pays d'Aix, Conseil des Bouches du Rhône and Conseil Régional Provence Alpes Côte Acknowledgments Selecte d File Raw (Excel) Data Measurement Files Preprocesse d File Transform ed File SELECTION PREPROCESSING TRANSFORMATION DATA MINING IN : ItemSet I, Fraction p%, Threshold mc (chi2), Threshold s (support), Target Attribute ta, Relation r OUT : Set of minimal correlated patterns 1 C 2 := APrioriGen(I); // (2-pattern) candidates generation 2 i := 2 3 while C i <> 0 do 4 L i := 0 5 for each X C i do 6 Build the contingency table of X 7 if p% of the table’s cells have a support s then 8 if chi2(X) mc then L i := L i X 9 endif 10 end for 11 C i+1 := APrioriGen(C i – L i ) 12 i := i + 1 13 end while 14 return i L i // limited to the patterns including one item of ta Attribute removal. Criteria: attributes - with too few distinct values - having too many null values - presenting doubles (one is kept) - with a too small standard deviation Files with a vast number of numerical attributes (and often incomplete data) Current developments are focused on: - The optimization of the procedure, - And the implementation of other search methods. We plan to initiate a background procedure integra sets of methods, measurements and results. Automatic generation of the most suitable resul each new analysis. - Normalization - Interval discretization / Item encoding - Elimination of attributes with no item having the support INTERPRETATION - Item decoding - Presentation (processing) of correlations Knowledg e Generati on Retrieved Patterns Report Item1 Item2 Item3 Item4 Chi2 3453 3489 - - 6.29 964 1990 3489 - 15.96 1106 1990 3489 - 23.55 1767 1990 3489 - 15.75 1962 1990 3489 - 28.55 1990 2115 3489 - 46.57 A complete data transformation, mining and int Model for correlation detection within data me Attribute1 Attribute2 Target Attribute _9592_TRAN- - PCTH- [-47.8, - 32.7] 0.4 1 - [0.3, 11.8] 0.8 2 _2565_EPPO- _4692_IMPT- PCTH- [2060.6, 2076.8] 0.3 9 [328.5, 373.5] 0.6 2 [0.3, 11.8] 0.8 2 _3700_ALIX- _4692_IMPT- PCTH- [17.5, 23.0] 0.3 7 [328.5, 373.5] 0.6 2 [0.3, 11.8] 0.8 2 _4572_EOXR- _4692_IMPT- PCTH- [127.1, 136.5] 0.3 8 [328.5, 373.5] 0.6 2 [0.3, 11.8] 0.8 2 _4690_ALIY- _4692_IMPT- PCTH- [52.3, 75.5] 0.3 7 [328.5, 373.5] 0.6 2 [0.3, 11.8] 0.8 2 _4692_IMPT- _4748_EPTE- PCTH- [328.5, 373.5] 0.6 2 [79.6, 81.1] 0.3 4 [0.3, 11.8] 0.8 2

Upload: bethany-goodwin

Post on 06-Jan-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 8 th European AEC/APC Conference - Dresden 2007 Extracting correlated sets using the chi-squared measure within n-ary relations: an implementation A. Casali

8th European AEC/APC Conference - D

resden 2007

Extracting correlated sets using the chi-squared

measure within n-ary relations: an implementation

A. Casali1, C. Ernst2, F. Gasnier3, J. Stephan2

1: Université de la Méditerranée / LIF ― 2: École des Mines de St Étienne / CMP-GC ― 3: STMicroElectronics Rousset

The field of APC aims at highlighting correlations between Production parameters. This study focuses on the device analysis of the principal trajectories impacting the yield.

The goal is to detect correlations between data measurements structured as n-ary relations and involving (at least) one target attribute. The method uses a data mining levelwise algorithm based on both the chi-squared and the support measures.

Motivations

Methodology: a KDD approach

Results

This approach makes it possible for STMicroElectronics Rousset to highlight unknown correlations between various parameters, validated by electrical and/or physical analysis. While the proposed mining method confirmed that levelwise algorithms do not provide results beyond four search levels, it proved its value for n-ary relations with a very large number of numerical attributes. The study aims at supporting the development of effective R2R control loops.

Conclusions

Future Work

This work was initiated while the fourth author was at Ecole des Mines de Saint-Étienne / CMP-GC, and was supported by Research Project “Rousset 2003-2008”, financed by the Communauté du Pays d'Aix, Conseil Général des Bouches du Rhône and Conseil Régional Provence Alpes Côte d'Azur.

Acknowledgments

Selected

File

Raw (Excel) Data Measurement Files

Preprocessed

File

Transformed

File

SELECTION

PREPROCESSING

TRANSFORMATION

DATA MINING

IN : ItemSet I, Fraction p%, Threshold mc (chi2), Threshold s (support), Target Attribute ta, Relation rOUT : Set of minimal correlated patterns

1 C2 := APrioriGen(I); // (2-pattern) candidates generation2 i := 23 while Ci <> 0 do4 Li := 05 for each X Ci do6 Build the contingency table of X7 if p% of the table’s cells have a support s then8 if chi2(X) mc then Li := Li X9 endif10 end for11 Ci+1 := APrioriGen(Ci – Li)12 i := i + 113 end while14 return i Li // limited to the patterns including one item of ta

Attribute removal. Criteria: attributes- with too few distinct values- having too many null values- presenting doubles (one is kept)- with a too small standard deviation

Files with a vast number of numerical attributes (and often incomplete data)

Current developments are focused on:- The optimization of the procedure,- And the implementation of other search methods.

We plan to initiate a background procedure integrating different sets of methods, measurements and results. → Automatic generation of the most suitable result for

each new analysis.

- Normalization- Interval discretization / Item encoding- Elimination of attributes with no item having the support

INTERPRETATION - Item decoding- Presentation (processing) of correlations

Knowledge

Generation

Retrieved Patterns

Report

Item1 Item2 Item3 Item4 Chi2… … … … …

3453 3489 - - 6.29

964 1990 3489 - 15.96

1106 1990 3489 - 23.55

1767 1990 3489 - 15.75

1962 1990 3489 - 28.55

1990 2115 3489 - 46.57… … … … …

A complete data transformation, mining and interpretationModel for correlation detection within data measurements

Attribute1 Attribute2 … Target Attribute… … … …

_9592_TRAN- -   PCTH-[-47.8, -32.7] 0.4

1-   [0.3, 11.8] 0.8

2_2565_EPPO- _4692_IMPT-   PCTH-[2060.6, 2076.8]

0.39

[328.5, 373.5] 0.62   [0.3, 11.8] 0.82

_3700_ALIX- _4692_IMPT-   PCTH-[17.5, 23.0] 0.3

7[328.5, 373.5] 0.62   [0.3, 11.8] 0.8

2_4572_EOXR- _4692_IMPT-   PCTH-

[127.1, 136.5] 0.38

[328.5, 373.5] 0.62   [0.3, 11.8] 0.82

_4690_ALIY- _4692_IMPT-   PCTH-[52.3, 75.5] 0.3

7[328.5, 373.5] 0.62   [0.3, 11.8] 0.8

2_4692_IMPT- _4748_EPTE-   PCTH-

[328.5, 373.5] 0.62

[79.6, 81.1] 0.34   [0.3, 11.8] 0.82

… … … …