association rule mining on remotely sensed imagery using peano-trees (p-trees) qin ding, qiang ding,...

28
Association Rule Mining on Association Rule Mining on Remotely Sensed Imagery Using Remotely Sensed Imagery Using Peano-trees (P-trees) Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North Dakota State University, USA May 2002 (P-tree technology is patent pending by NDSU) (P-tree technology is patent pending by NDSU)

Upload: anna-payne

Post on 04-Jan-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Association Rule Mining on Remotely Sensed Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees)Imagery Using Peano-trees (P-trees)

Qin Ding, Qiang Ding, and William PerrizoComputer Science Department

North Dakota State University, USA

May 2002

(P-tree technology is patent pending by NDSU)(P-tree technology is patent pending by NDSU)

Page 2: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

OutlineOutline Concepts

– Association Rule Mining– Market Basket Data– Remotely Sensed Imagery (RSI) data– Peano Count Trees (P-trees)

Association rule mining on RSI data using P-trees Performance analysis Conclusion

Page 3: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Association Rule MiningAssociation Rule Mining Originally proposed for market basket data. Given

– A set of items I = {i1,i2,…im} (e.g., items purchasable in a market)

– A set of transactions D (e.g., customers checking out = id + itemset)

An association rule is X=>Y, where X, Y are disjoint itemsets– X, Y are consider as events.

E.g., X is the event that a transaction contains X. X=>Y is the event: “if t contains X, then it contains Y” X is called the antecedent, Y is called the consequent.

Two measures: support (% trans containing XY) and confidence (% of those transactions containing X which also contain Y)

Given minimum thresholds, minsup and minconf,– Find the frequent itemsets which have support above minsup.– Derive all rules supported by frequent sets, with confidence above minconf.

Page 4: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Association rule mining on RSI dataAssociation rule mining on RSI data

RSI data can be viewed as a relational table– Each band (column) is an attribute (for simplicity we assume all

values are bytes)– Each pixel (row) is a transaction.– Each interval in each band is an item.– Row/column or longitude/latitude is the primary key

ARM task on RSI data– To mine implicit relations among different bands, for example,

relations among spectral bands and yield. Example Rule (NDVI): NIR[192,255] ^ RED[0,63] => Yield[128,255]

Page 5: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Important ARM AlgorithmsImportant ARM Algorithms

Apriori – stepwise algorithm

DHP (Direct Hashing and Pruning) – hash itemset counts and prune transactions

Partition – divide the database into small partitions such that each can be processed independently and efficiently in memory.

DIC (Dynamic Itemset Counting) – overlap the counting of candidate itemsets at different points during a scan.

FP-growth – uses Frequent Pattern tree (FP-tree) to optimize candidate generation.

Others…

Page 6: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Remotely Sensed Imagery (RSI) DataRemotely Sensed Imagery (RSI) Data

Satellite image– TM (Thematic Mapper) imagery (6, 7 or 8 bands)

TM is Landsat satellite imagery covering the earth every 18 days since 1972. ETM+ (Landsat-7) contains 8 bands

– 7 VIR bands (Blue, Green, Red, NIR, MIR, TIR, MIR2)– 1 Panchromatic band (PC).

Aerial photography– TIFF (3 bands: Blue, Green, Red)

Ground data– Yield, Moisture, Nitrate, Temperature, Elevation, etc

Page 7: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Precision Agriculture Dataset:Precision Agriculture Dataset:TIFF Image and related Bands TIFF Image and related Bands

(1320(1320×1320)×1320)

RGB

Moisture

Yield

Nitrate

Page 8: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

812 445 43 60 59 146 83 188 812 446 43 58 50 146 83 188 812 447 44 60 52 146 83 187 812 448 43 63 54 146 83 186 812 449 43 69 52 146 83 186 812 450 47 73 54 146 83 185 812 451 50 68 58 146 83 184 812 452 51 65 54 146 83 183 812 453 46 63 54 146 83 182 812 454 33 53 50 146 83 182 812 455 30 49 47 146 83 181 812 456 41 55 54 146 83 180 812 457 40 55 57 146 83 179 812 458 43 56 52 146 83 178 812 459 42 52 52 146 83 177 812 460 40 58 45 146 83 176 812 461 40 66 47 146 83 176 812 462 38 59 47 145 83 175 812 463 34 51 55 145 82 175 812 464 39 53 63 145 82 174 812 465 36 54 57 145 82 173 812 466 42 57 48 145 82 173 812 467 40 59 43 145 82 172 812 468 39 68 50 145 82 172 812 469 40 56 57 145 82 172 812 470 30 45 43 145 82 172 812 471 33 57 45 145 82 172 812 472 35 58 62 145 82 173 812 473 30 54 63 145 82 173 812 474 30 57 52 145 82 173

x y R G B Y M N

x: Row

y: Column

R: Red

G: Green

B: Blue

Y: Yield

M: Moisture

N: Nitrate

As a relationAs a relation

Page 9: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Spatial Data FormatsSpatial Data FormatsBAND-1

254 127 (1111 1110) (0111 1111)

14 193 (0000 1110) (1100 0001)

BAND-237 240(0010 0101) (1111 0000)

200 19(1100 1000) (0001 0011)

BSQ format (2 files)

Band 1: 254 127 14 193 Band 2: 37 240 200 19

Page 10: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Spatial Data FormatsSpatial Data FormatsBAND-1

254 127 (1111 1110) (0111 1111)

14 193 (0000 1110) (1100 0001)

BAND-237 240(0010 0101) (1111 0000)

200 19(1100 1000) (0001 0011)

BSQ format (2 files)

Band 1: 254 127 14 193 Band 2: 37 240 200 19

BIL format (1 file)

254 127 37 240 14 193 200 19

Page 11: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Spatial Data FormatsSpatial Data FormatsBAND-1

254 127 (1111 1110) (0111 1111)

14 193 (0000 1110) (1100 0001)

BAND-237 240(0010 0101) (1111 0000)

200 19(1100 1000) (0001 0011)

BSQ format (2 files)

Band 1: 254 127 14 193 Band 2: 37 240 200 19

BIL format (1 file)

254 127 37 240 14 193 200 19

BIP format (1 file)

254 37 127 240 14 200 193 19

Page 12: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Spatial Data FormatsSpatial Data FormatsBAND-1

254 127 (1111 1110) (0111 1111)

14 193 (0000 1110) (1100 0001)

BAND-237 240(0010 0101) (1111 0000)

200 19(1100 1000) (0001 0011)

BSQ format (2 files)

Band 1: 254 127 14 193 Band 2: 37 240 200 19

BIL format (1 file)

254 127 37 240 14 193 200 19

BIP format (1 file)

254 37 127 240 14 200 193 19

bSQ format (16 files)B11 B12 B13 B14 B15 B16 B17 B18 B21 B22 B23 B24 B25 B26 B27 B28 1 1 1 1 1 1 1 0 0 0 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 1 0 0 0 1 1 0 0 0 0 0 1 0 0 0 1 0 0 1 1

Page 13: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Peano Count Tree (P-tree)Peano Count Tree (P-tree)

P-tree represents RSI data bit-by-bit in a recursive quadrant-by-quadrant arrangement.

P-trees are a lossless compressed representation of the original data.

Page 14: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

An example 2-D a P-treeAn example 2-D a P-tree

Quadrant-based, Pure (Pure-1/Pure-0) quadrant Peano or Z-ordering Root Count

1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0

39

16 8 15 0

3 0 4 1 4 4 3 4

1 1 1 0 0 0 1 0 1 1 0 1

16 0

39

0 4 4 4 4

158

1 1 1 0

3

0 0 1 0

1

1 1

3

0 1

1111111111111111111000001111001011111111111111111111111111111111

bSQ file

bSQ file arranged as a spatialdataset (2-D raster order)

Page 15: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Peano Mask Tree (PM-tree)

1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0

1 0

0 1 1 1 1

m

1 1 1 0 0 0 1 0 1 1 0 1

m

m

1 1 1 0

m

0 0 1 0

m

1 1

m

0 1

Truth-Trees (1 if condition is true of quadrant, else 0– E.g., Pure-1 and Pure-0 Trees– All are lossless compressed representations of the dataset

Page 16: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

55

16 8 15 16

3 0 4 1 4 4 3 4

1 1 1 0 0 0 1 0 1 1 0 1

Peano or Z-ordering Pure-1/Pure-0 quadrant Root Count

Level Fan-out QID (Quadrant ID)

1 1 1 1 1 1 0 01 1 1 1 1 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1

0 1 2 3

111

( 7, 1 ) ( 111, 001 ) 10.10.11

2

3

2 . 2 . 3

001

Page 17: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

P-tree OperationsP-tree Operations

P-tree 55 PM-tree m ______/ / \ \_______ ______/ / \ \______ / __ / \___ \ / __ / \ __ \ / / \ \ / / \ \ 16 __8____ _15__ 16 1 m m 1 / / | \ / | \ \ / / \ \ / / \ \ 3 0 4 1 4 4 3 4 m 0 1 m 1 1 m 1 //|\ //|\ //|\ //|\ //|\ //|\ 1110 0010 1101 1110 0010 1101

P-tree-1: m ______/ / \ \______ / / \ \ / / \ \ 1 m m 1 / / \ \ / / \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ 1110 0010 1101

P-tree-2: m ______/ / \ \______ / / \ \ / / \ \ 1 0 m 0 / / \ \ 1 1 1 m //|\ 0100

AND-Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 0 m 0 / | \ \ 1 1 m m //|\ //|\ 1101 0100

OR-Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 m 1 1 / / \ \ m 0 1 m //|\ //|\ 1110 0010

Complement 9 m ______/ / \ \_______ ______/ / \ \______ / __ / \___ \ / __ / \ __ \ / / \ \ / / \ \ 0 __8____ _1__ 0 0 m m 0 / / | \ / | \ \ / / \ \ / / \ \ 1 4 0 3 0 0 1 0 m 1 0 m 0 0 m 0 //|\ //|\ //|\ //|\ //|\ //|\ 0001 1101 0010 0001 1101 0010

Page 18: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Ptree ANDing OperationPtree ANDing Operation

PM-tree1: m ______/ / \ \______ / / \ \ / / \ \ 1 m m 1 / / \ \ / / \ \ m 0 1 m 1 1 m 1 //|\ //|\ //|\ 1110 0010 1101

PM-tree2: m ______/ / \ \______ / / \ \ / / \ \ 1 0 m 0 / / \ \ 1 1 1 m //|\ 0100

Result: m ________ / / \ \___ / ____ / \ \ / / \ \ 1 0 m 0 / | \ \ 1 1 m m //|\ //|\ 1101 0100

0 100 101 102 12 132 20 21 220 221 223 23 3 & 0 20 21 22 231 RESULT0 0 0 20 20 20 21 21 21 220 221 223 22 220 221 223 23 231 231

Depth-first Pure-1 path code

Page 19: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Various P-treesVarious P-trees

Basic P-treesPi, j

Value P-treesPi(v)

Tuple P-treesP(v1, v2, …, vn)

AND COMPLEMENT

AND

Interval P-treesPi(v1, v2)

Cube P-treesP([v11, v12], …, [vN1, vN2])

OR

OR

AND

AND, OR, COMPLEMENT

AND, ORPredicate P-trees

P(p) COMPLEMENT

AND, OR, COMPLEMENT

Page 20: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Association Rule Mining on RSI Data Association Rule Mining on RSI Data using P-treesusing P-trees

Admissible Itemsets (Asets )– Asets are itemsets of the form, Int1 Int2 ... Intn =

Π i=1...n Inti , where Inti is an interval of values in Bandi

(some of which may be the full value range).

– Example: Aset {[01,01]1, [11,11]2}

P-ARM algorithmPruning techniques

Page 21: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

P-ARM algorithmP-ARM algorithm

Procedure P-ARM{ Data_Discretization; F1 = {frequent 1-Asets}; For (k=2; F k-1 ) do begin Ck = p-gen(F k-1); Forall candidate Asets c Ck do c.count = AND_rootcount(c); Fk = {cCk | c.count >= minsup} end Answer = k Fk

}

•F1 is determined directly from P-tree root counnts and pruning techniques rather than transaction database scan.

•The p-gen function differs from the apriori-gen function in Apriori by using some pruning techniques.

• The AND_rootcount function is used to calculate Aset counts directly by ANDing the appropriate basic P-trees instead of scanning the transaction databases.

The support count for Aset {B1[0,64), B2[64,127)} (or {[00, 00]1, [01, 01]2}) is the root count of P1(00) AND P2(01).

Page 22: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

Pruning TechniquesPruning Techniques

Band-based pruning– An itemset with two items from the same band will have support zero.

Constraint-base pruning– E.g., specify yield as the only consequent band of interest.– Note: in the performance comparisons we did not use this pruning

technique (to maintain fairness, since it is hard to implement in other alogrithms)

Bit-based pruning for multi-level rules– if Aset [128,255] (or [1,1]2) is not frequent, then the Aset [128,191] (or [10,10]2) and

[192,255] (or [11,11]2) cannot be frequent either.

Others

Page 23: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

P-ARM versus AprioriP-ARM versus Apriori

Scalability with support threshold

0

100

200

300

400

500

600

700

800

10%20%30%40%50%60%70%80%90%

Support threshold

Ru

n t

ime

(Sec

.)

P-ARM

Apriori

1,742,400 pixels (transactions)

Page 24: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

P-ARM versus Apriori (cont.)P-ARM versus Apriori (cont.)

Scalability with number of transactions

0

200

400

600

800

1000

1200

100 500 900 1300 1700

Number of transactions (K)

Tim

e (

Sec.)

Apriori

P-ARM

Support threshold =10%

Page 25: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

P-ARM versus FP-growthP-ARM versus FP-growth

Scalability with support threshold

0

100

200

300

400

500

600

700

800

10% 30% 50% 70% 90%

Support threshold

Ru

n t

ime

(S

ec.)

P-ARM

FP-growth

17,424,000 pixels (transactions)1,742,400 pixels (transactions)

Page 26: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

P-ARM versus FP-growth (cont.)P-ARM versus FP-growth (cont.)

Scalability with the number of transactions

0

200

400

600

800

1000

1200

100 500 900 1300 1700

Number of transactions(K)

Tim

e (S

ec.)

FP-growth

P-ARM

Support threshold =10% Support threshold =10%

Page 27: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North

ConclusionConclusion A model for association rule mining on RSI data

– P-trees facilitate fast calculation of support– P-trees facilitates significant pruning techniques

Applications other than precision agriculture– Flood prediction and monitoring– Community and regional planning– Virtual archeology– Mineral exploration– Bioinformatics/Genomics– VLSI design

Page 28: Association Rule Mining on Remotely Sensed Imagery Using Peano-trees (P-trees) Qin Ding, Qiang Ding, and William Perrizo Computer Science Department North