10. data mining data mining is one aspect of database query processing (on the "what if" or pattern...
Embed Size (px)
TRANSCRIPT
-
10. Data MiningData Mining is one aspect of Database Query Processing (on the "what if" or pattern and trend querying end of Query Processing, rather than the "please find" end. To say it another way, data mining queries are on the ad hoc or unstructured end of the query spectrum rather than standard report generation or "retieve all records matching a criteria" or SQL side).
Still, Data Mining queries ARE queries and are processed (or will eventually be processed) by a Database Management System the same way queries are processed today, namely:
1. SCAN and PARSE (SCANNER-PARSER): A Scanner identifies the tokens or language elements of the DM query. The Parser check for syntax or grammar validity.
2. VALIDATED: The Validator checks for valid names and semantic correctness.
3. CONVERTER converts to an internal representation.
|4. QUERY OPTIMIZED: the Optimzier devises a stategy for executing the DM query (chooses among alternative Query internal representations).
5. CODE GENERATION: generates code to implement each operator in the selected DM query plan (the optimizer-selected internal representation).
6. RUNTIME DATABASE PROCESSORING: run plan code.
Developing new, efficient and effective DataMining Query (DMQ) processors is the central need and issue in DBMS research today (far and away!).
These notes concentrate on 5,i.e., generating code (algorithms) to implement operators (at a high level):
Association Rule Mining (ARM), Clustering (CLU), Classification (CLA)
- Machine Learning is almost always based on Near Neighbor Set(s), NNS.Clustering, even density based, identifies near neighbor cores 1st (round NNSs, about a center).Classification is continuity based and Near Neighbor Sets (NNS) are the central concept in continuity>0 >0 : d(x,a)
-
Query Processing and Optimization: Relational Queries to Data MiningMost people have Data from which they want information.So, most people need DBMSs whether they know it or not.A major component of any DBMS is the DataMining Query Processor.Queries can range from structure to unstructured:On the Data Mining end, we have barely scratched the surface.But those scratches have already made the difference between becoming the worldsbiggest corporation (Walmart - got into DM for supply chain mgmt early) and filing for bankruptcy (KMart - didn't!).
-
Recall the Entity-Relationship (ER) Model's notion of a RelationshipRelationship: Association among 2 [or more, the # of entities is the degree] entities.The Graph of a Relationship: A degree=2 relationship between entity T and I generates a bipartite undirected graph (bipartite means that the node set is a disjoint union of two subsets and that all edges must run from one subset to the other).A degree=2 relationship between an entity and itself, e.g., Employee Reports_To Employee, generates a uni-partite undirected graph.Relationships can have attributes too!
-
Association Rule Mining (ARM)In a relationship between entities, T is the set of Transactions an enterprise performs and I is the set of Items on which those transactions are performed.In Market Basket Research (MBR) a transaction is a checkout transaction and an item is an Item in that customer's market basket which gets checked out)
An I-Association Rule, AC, relates 2 disjoint subsets of I (I-temsets) has 2 main measures, support and confidence (A is called the antecedent, C is called the consequent)There are also the dual concepts of T-association rules (just reverse the roles of T and I above).Examples of Association Rules include: The MBR, relationship between customer cash-register transactions, T, and purchasable items, I (t is related to i iff i is being bought by that customer during that cash-register transaction.).
In Software Engineering (SE), the relationship between Aspects, T, and Code Modules, I (t is related to i iff module, i, is part of the aspect, t).
In Bioformatics, the relationship between experiments, T, and genes, I (t is related to i iff gene, i, expresses at a threshold level during experiment, t).
In ER diagramming, any part of relationship in which iI is part of tT (t is related to i iff i is part of t); and any ISA relationship in which iI ISA tT (t is related to i iff i IS A t) . . .The support of an I-set, A, is the fraction of T-instances related to every I-instance in A, e.g. if A={i1,i2} and C={i4} then supp(A)= |{t2,t4}|/|{t1,t2,t3,t4,t5}| = 2/5 Note: | | means set size or count of elements in the set. I.e., T2 and T4 are the only transactions from the total transaction set, T={T1,T2,T3,T4,T5}. that are related to both i1 and i2, (buy i1 and i2 during the pertinent T-period of time).support of rule, AC, is defined as supp{A C} = |{T2, T4}|/|{T1,T2,T3,T4,T5}| = 2/5confidence of rule, AC, is supp(AC)/ supp(A) = (2/5) / (2/5) = 1 DM Queriers typically want STRONG RULES: suppminsupp, confminconf (minsupp and minconf are threshold levels)Note that Conf(AC) is also just the conditional probability of t being related to C, given that t is related to A).
-
Finding Strong Assoc RulesThe relationship between Transactions and Items can be expressed in a Transaction Table where each transaction is a row containing its IDand the list of the items that are related to that transaction:If minsupp is set by the querier at .5 and minconf at .75:To find frequent or Large itemsets (support minsupp)PseudoCode: Assume the items in Lk-1 are ordered:
Step 1: self-joining Lk-1 insert into Ck select p.item1, p.item2, , p.itemk-1, q.itemk-1 p from Lk-1, q from Lk-1 where p.item1=q.item1,..,p.itemk-2=q.itemk-2, p.itemk-1
-
R11
00001011then process using multi-operand logical ANDs.Vertical basic binary Predicate-tree (P-tree): vertically partition table; compress each vertical bit slice into a basic binary P-tree as followsPtree Review: A data table, R(A1..An), containing horizontal structures (records) isprocessed vertically (vertical scans)The basic binary P-tree, P1,1, for R11 is built top-down by record truth of predicate pure1 recursively on halves, until purity.But it is pure (pure0) so this branch ends
-
R11
00001011Top-down construction of basic binary P-trees is good for understanding, but bottom-up is more efficient.Bottom-up construction of P11 is done using in-order tree traversal and the collapsing of pure siblings, as follow:0 1 0 1 1 1 1 1 0 0 0 10 1 1 1 1 1 1 1 0 0 0 00 1 0 1 1 0 1 0 1 0 0 10 1 0 1 1 1 1 0 1 1 1 11 0 1 0 1 0 0 0 1 1 0 00 1 0 0 1 0 0 0 1 1 0 11 1 1 0 0 0 0 0 1 1 0 01 1 1 0 0 0 0 0 1 1 0 0
R11 R12 R13 R21 R22 R23 R31 R32 R33 R41 R42 R43
P11
0
-
R(A1 A2 A3 A4)2 7 6 16 7 6 02 7 5 12 7 5 75 2 1 42 2 1 57 0 1 47 0 1 421-level has the only 1-bit so the 1-count = 1*21 = 2Processing Efficiencies? (prefixed leaf-sizes have been removed)
-
Database D Example ARM using uncompressed P-trees (note: I have placed the 1-count at the root of each Ptree)
TID1234510010110200011013001110140001001
Sheet1
TIDItems
1001 3 4
2002 3 5
3001 2 3 5
4002 5
-
L3L1L21-ItemSets dont support Association Rules (They will have no antecedent or no consequent).Are there any Strong Rules supported byLarge 2-ItemSets (at minconf=.75)?{1,3}conf{1}{3} = supp{1,3}/supp{1} = 2/2 = 1 .75 STRONGconf{3}{1} = supp{1,3}/supp{3} = 2/3 = .67 < .75{2,3}conf{2}{3} = supp{2,3}/supp{2} = 2/3 = .67 < .75 conf{3}{2} = supp{2,3}/supp{3} = 2/3 = .67 < .75{2,5}conf{2}{5} = supp{2,5}/supp{2} = 3/3 = 1 .75 STRONG!conf{5}{2} = supp{2,5}/supp{5} = 3/3 = 1 .75 STRONG!{3,5}conf{3}{5} = supp{3,5}/supp{3} = 2/3 = .67 < .75 conf{5}{3} = supp{3,5}/supp{5} = 2/3 = .67 < .75Are there any Strong Rules supported byLarge 3-ItemSets?{2,3,5}conf{2,3}{5} = supp{2,3,5}/supp{2,3} = 2/2 = 1 .75 STRONG!conf{2,5}{3} = supp{2,3,5}/supp{2,5} = 2/3 = .67 < .75conf{3,5}{2} = supp{2,3,5}/supp{3,5} = 2/3 = .67 < .75No subset antecedent can yield a strong rule either (i.e., no need to check conf{2}{3,5} or conf{5}{2,3} since both denominators will be at least as large and therefore, both confidences will be at least as low.No need to check conf{3}{2,5} or conf{5}{2,3}DONE!2-Itemsets do support ARs.
Sheet1
itemsetsup
{2 3 5}2
Sheet1
itemsetsup.
{1}2
{2}3
{3}3
{5}3
Sheet1
itemsetsup
{1 3}2
{2 3}2
{2 5}3
{3 5}2
-
Ptree-ARM versus Apriori on aerial photo (RGB) data together with yeild dataScalability with support threshold 1320 1320 pixel TIFF- Yield dataset (total number of transactions is ~1,700,000).P-ARM compared to Horizontal Apriori (classical) and FP-growth (an improvement of it).In P-ARM, we find all frequent itemsets, not just those containing Yield (for fairness)Aerial TIFF images (R,G,B) with synchronized yield (Y).Scalability with number of transactions Identical resultsP-ARM is more scalable for lower support thresholds.P-ARM algorithm is more scalable to large spatial datasets.
Chart2
70.0491060.464
50.706860
39.391705
26.354516
23.343300
16.044168
16.1989
14.84132
14.84310.716
P-ARM
Apriori
Support threshold
Run time (Sec.)
Sheet1
minsupbits consideredFrequent ItemsetsP-treeFP-tree
20%11519.835300
22450.706703.4211160.168
31680.823384.472
49128.74500
10%11519.852300
24270.0491060.464417
344133.451816.054
427153.847400
Sheet1
00
00
00
00
P-tree
FP-tree
bits considered
time (s)
Minimum-support=20%
Sheet2
00
00
00
00
P-tree
FP-tree
bits considered
time (s)
Minimum-support=10%
Sheet3
P-tree:
FreqItemsets1234time1234
1%41841821%68.772124.509521.04
10%1542442710%19.85270.049133.451153.847
20%152416920%19.83550.70680.823128.74
30%15178230%19.77939.39136.06850.805
40%15115040%20.28126.35430.45249.37
50%1582050%19.97123.34326.65649.781
60%1530060%20.38116.04426.08949.51
70%1530070%20.21416.1925.89250.055
80%1510080%19.77314.84125.97949.252
90%900090%19.42714.84325.9949.526
Sheet3
0000
0000
0000
0000
0000
0000
0000
0000
0000
Band1
Band2
Band3
Band4
Total Frequent Itemsets
0000
0000
0000
0000
0000
0000
0000
0000
0000
Bit1
Bit2
Bit3
Bit4
Minimum Support
Time (s)
bits=2
P-treeFP-tree
10%70.0491060.464417
20%50.7068601160.168
30%39.391705
40%26.354516
50%23.343300
60%16.044168
70%16.1989
80%14.84132
90%14.84310.716
00
00
00
00
00
00
00
00
00
P-tree runtime
FP-tree runtime
Support threshold
Run time (Sec.)
Chart3
998
13815
20025
25234
32943
44052
65059
85665
113170
Apriori
P-ARM
Number of transactions(K)
Time (Sec.)
Sheet1
minsupbits consideredFrequent ItemsetsP-treeFP-tree
20%11519.835300
22450.706703.4211160.168
31680.823384.472
49128.74500
10%11519.852300
24270.0491060.464417
344133.451816.054
427153.847400
Sheet1
00
00
00
00
P-tree
FP-tree
bits considered
time (s)
Minimum-support=20%
Sheet2
00
00
00
00
P-tree
FP-tree
bits considered
time (s)
Minimum-support=10%
Sheet3
P-tree:
FreqItemsets1234time1234
1%41841821%68.772124.509521.04
10%1542442710%19.85270.049133.451153.847
20%152416920%19.83550.70680.823128.74
30%15178230%19.77939.39136.06850.805
40%15115040%20.28126.35430.45249.37
50%1582050%19.97123.34326.65649.781
60%1530060%20.38116.04426.08949.51
70%1530070%20.21416.1925.89250.055
80%1510080%19.77314.84125.97949.252
90%900090%19.42714.84325.9949.526
time/itemsets10%1.32351.6678133.451153.847
20%1.32232.112880.823128.74
30%1.31862.317136.06850.805
40%1.352126.35430.45249.37
50%1.331423.34326.65649.781
60%1.358716.04426.08949.51
70%1.347616.1925.89250.055
80%1.318214.84125.97949.252
90%2.158614.84325.9949.526
Sheet3
0000
0000
0000
0000
0000
0000
0000
0000
0000
Band1
Band2
Band3
Band4
Total Frequent Itemsets
Sheet4
0000
0000
0000
0000
0000
0000
0000
0000
0000
Bit1
Bit2
Bit3
Bit4
Minimum Support
Time (s)
bits=2
P-treeFP-tree
10%70.0491060.464417
20%50.706703.4211160.168
30%39.391598.371
40%26.354328.662
50%23.343201.039
60%16.04468.238
70%16.1966.245
80%14.84114.651
90%14.84310.716
00
00
00
00
00
00
00
00
00
P-tree runtime
FP-tree runtime
Support threshold
Run time (Sec.)
Scalability with number of transactionssupport threshold=10%
FPP-tree
1.7M106070
100998
30013815
50020025
70025234
90032943
110044052
130065059
150085665
1700113170
990
1380
2000
2520
3290
4400
6500
8560
11600
FP-tree runtime
P-tree runtime
Number of transactions(K)
Time (Sec.)
-
P-ARM versus FP-growth (see literature for definition)Scalability with support threshold 17,424,000 pixels (transactions)Scalability with number of trans FP-growth = efficient, tree-based frequent pattern mining method (details later)For a dataset of 100K bytes, FP-growth runs very fast. But for images of large size, P-ARM achieves better performance. P-ARM achieves better performance in the case of low support threshold.
Chart2
70.0491060.464
50.706703.421
39.391598.371
26.354328.662
23.343201.039
16.04468.238
16.1966.245
14.84114.651
14.84310.716
P-ARM
FP-growth
Support threshold
Run time (Sec.)
Sheet1
minsupbits consideredFrequent ItemsetsP-treeFP-tree
20%11519.835300
22450.706703.4211160.168
31680.823384.472
49128.74500
10%11519.852300
24270.0491060.464417
344133.451816.054
427153.847400
Sheet1
00
00
00
00
P-tree
FP-tree
bits considered
time (s)
Minimum-support=20%
Sheet2
00
00
00
00
P-tree
FP-tree
bits considered
time (s)
Minimum-support=10%
Sheet3
P-tree:
FreqItemsets1234time1234
1%41841821%68.772124.509521.04
10%1542442710%19.85270.049133.451153.847
20%152416920%19.83550.70680.823128.74
30%15178230%19.77939.39136.06850.805
40%15115040%20.28126.35430.45249.37
50%1582050%19.97123.34326.65649.781
60%1530060%20.38116.04426.08949.51
70%1530070%20.21416.1925.89250.055
80%1510080%19.77314.84125.97949.252
90%900090%19.42714.84325.9949.526
Sheet3
0000
0000
0000
0000
0000
0000
0000
0000
0000
Band1
Band2
Band3
Band4
Total Frequent Itemsets
0000
0000
0000
0000
0000
0000
0000
0000
0000
Bit1
Bit2
Bit3
Bit4
Minimum Support
Time (s)
bits=2
P-treeFP-tree
10%70.0491060.464417
20%50.706703.4211160.168
30%39.391598.371
40%26.354328.662
50%23.343201.039
60%16.04468.238
70%16.1966.245
80%14.84114.651
90%14.84310.716
00
00
00
00
00
00
00
00
00
P-tree runtime
FP-tree runtime
Support threshold
Run time (Sec.)
Chart3
310
618
3625
9034
12543
38252
57659
81365
106070
FP-growth
P-ARM
Number of transactions(K)
Time (Sec.)
Sheet1
minsupbits consideredFrequent ItemsetsP-treeFP-tree
20%11519.835300
22450.706703.4211160.168
31680.823384.472
49128.74500
10%11519.852300
24270.0491060.464417
344133.451816.054
427153.847400
Sheet1
00
00
00
00
P-tree
FP-tree
bits considered
time (s)
Minimum-support=20%
Sheet2
00
00
00
00
P-tree
FP-tree
bits considered
time (s)
Minimum-support=10%
Sheet3
P-tree:
FreqItemsets1234time1234
1%41841821%68.772124.509521.04
10%1542442710%19.85270.049133.451153.847
20%152416920%19.83550.70680.823128.74
30%15178230%19.77939.39136.06850.805
40%15115040%20.28126.35430.45249.37
50%1582050%19.97123.34326.65649.781
60%1530060%20.38116.04426.08949.51
70%1530070%20.21416.1925.89250.055
80%1510080%19.77314.84125.97949.252
90%900090%19.42714.84325.9949.526
time/itemsets10%1.32351.6678133.451153.847
20%1.32232.112880.823128.74
30%1.31862.317136.06850.805
40%1.352126.35430.45249.37
50%1.331423.34326.65649.781
60%1.358716.04426.08949.51
70%1.347616.1925.89250.055
80%1.318214.84125.97949.252
90%2.158614.84325.9949.526
Sheet3
0000
0000
0000
0000
0000
0000
0000
0000
0000
Band1
Band2
Band3
Band4
Total Frequent Itemsets
Sheet4
0000
0000
0000
0000
0000
0000
0000
0000
0000
Bit1
Bit2
Bit3
Bit4
Minimum Support
Time (s)
bits=2
P-treeFP-tree
10%70.0491060.464417
20%50.706703.4211160.168
30%39.391598.371
40%26.354328.662
50%23.343201.039
60%16.04468.238
70%16.1966.245
80%14.84114.651
90%14.84310.716
00
00
00
00
00
00
00
00
00
P-tree runtime
FP-tree runtime
Support threshold
Run time (Sec.)
Scalability with number of transactionssupport threshold=10%
FPP-tree
1.7M106070
100310
300618
5003625
7009034
90012543
110038252
130057659
150081365
1700106070
00
00
00
00
00
00
00
00
00
FP-tree runtime
P-tree runtime
Number of transactions(K)
Time (Sec.)
-
Other methods (other than FP-growth) to Improve Aprioris Efficiency(see the literature or the html notes 10datamining.html in Other Materials for more detail)Hash-based itemset counting: A k-itemset whose corresponding hashing bucket count is below the threshold cannot be frequentTransaction reduction: A transaction that does not contain any frequent k-itemset is useless in subsequent scansPartitioning: Any itemset that is potentially frequent in DB must be frequent in at least one of the partitions of DBSampling: mining on a subset of given data, lower support threshold + a method to determine the completenessDynamic itemset counting: add new candidate itemsets only when all of their subsets are estimated to be frequentThe core of the Apriori algorithm:Use only large (k 1)-itemsets to generate candidate large k-itemsetsUse database scan and pattern matching to collect counts for the candidate itemsetsThe bottleneck of Apriori: candidate generation 1. Huge candidate sets: 104 large 1-itemset may generate 107 candidate 2-itemsets To discover large pattern of size 100, eg, {a1a100}, we need to generate 2100 1030 candidates.2. Multiple scans of database: (Needs (n +1 ) scans, n = length of the longest pattern)
-
3 steps: Build a Model of the TDS feature-to-class relationship, Test that Model, Use the Model (to predict the most likely class of each unclassified sample). Note: other names for this process: regression analysis, case-based reasoning,...)Other Typical Applications:Targeted Product Marketing (the so-called classsical Business Intelligence problem)Medical Diagnosis (the so-called Computer Aided Diagnosis or CAD)Nearest Neighbor Classifiers (NNCs) use a portion of the TDS as the model (neighboring tuples vote)finding the neighbor set is much faster than building other models but it must be done anew for each unclasified sample. (NNC is called a lazy classifier because it get's lazy and doesn't take the time to build a concise model of the relationship between feature tuples and class labels ahead of time).Eager Classifiers (~all other classifiers) build 1 concise model once and for all - then use it for all unclassified samples. The model building can be very costly but that cost can be amortized over all the classifications of a large number of unclassified samples (e.g., all RGB points in a field).ClassificationUsing a Training Data Set (TDS) in which each feature tuple is already classified (has a class value attached to it in the class column, called its class label.),1. Build a model of the TDS (called the TRAINING PHASE). 2. Use that model to classify unclassified feature tuples (unclassified samples). E.g., TDS = last year's aerial image of a crop field (feature columns are R,G,B columns together with last year's crop yeilds attached in a class column, e.g., class values={Hi, Med, Lo} yeild. Unclassified samples are the RGB tuples from this year's aerial image 3. Predict the class of each unclassified tuple (in the e.g.,: predict yeild for each point in the field.)
-
yesyesnoEager Classifiers
Sheet1
NAMERANKYEARSTENURED
MikeAssistant Prof3no
MaryAssistant Prof7yes
BillProfessor2yes
JimAssociate Prof7yes
DaveAssistant Prof6no
AnneAssociate Prof3no
-
NAMERANKYEARSTENUREDTomAssistant Prof2noMerlisaAssociate Prof7noGeorgeAssociate Prof5yesJosephAssistant Prof7no% correctclassifications?Test Process (2): Usually some of the Training Tuples are set aside as a Test Set and after a model is constructed, the Test Tuples are run through the Model. The Model is acceptable if, e.g., the % correct > 60%. If not, the Model is rejected (never used).Correct=3Incorrect=175%Since 75% is above the acceptability threshold, accept the model!
-
Classification by Decision Tree InductionDecision tree (instead of a simple case statement of rules, the rules are prioritized into a tree)Each Internal node denotes a test or rule on an attribute (test attribute for that node)Each Branch represents an outcome of the test (value of the test attribute)Leaf nodes represent class label decisions (plurality leaf class is predicted class)
Decision tree model development consists of two phasesTree constructionAt start, all the training examples are at the rootPartition examples recursively based on selected attributesTree pruningIdentify and remove branches that reflect noise or outliers
Decision tree use: Classifying unclassified samples by filtering them down the decision tree to their proper leaf, than predict the plurality class of that leaf (often only one, depending upon the stopping condition of the construction phase)
-
Algorithm for Decision Tree InductionBasic ID3 algorithm (a simple greedy top-down algorithm)
At start, the current node is the root and all the training tuples are at the root
Repeat, down each branch, until the stopping condition is true
At current node, choose a decision attribute (e.g., one with largest information gain).Each value for that decision attribute is associated with a link to the next level down and that value is used as the selection criterion of that link.Each new level produces a partition of the parent training subset based on the selection value assigned to its link.
stopping conditions:When all samples for a given node belong to the same classWhen there are no remaining attributes for further partitioning majority voting is employed for classifying the leafWhen there are no samples left
-
Bayesian Classification (eager: Model is based on conditional probabilities. Prediction is done by taking the highest conditionally probable class)A Bayesian classifier is a statistical classifier, which is based on following theorem known as Bayes theorem:
Bayes theorem:Let X be a data sample whose class label is unknown.Let H be the hypothesis that X belongs to class, H.P(H|X) is the conditional probability of H given X. P(H) is prob of H, then
P(H|X) = P(X|H)P(H)/P(X)
-
Nave Bayesian ClassificationGiven training set, R(f1..fn, C) where C={C1..Cm} is the class label attribute.
A Naive Bayesian Classifier will predict the class of unknown data sample, X=(x1..xn), to be the class, Cj having the highest conditional probability, conditioned on X.That isit will predict the class to be Cj iff (a tie handling algorithm may be required). P(Cj|X) P(Ci|X), i j.
From the Bayes theorem;P(Cj|X) = P(X|Cj)P(Cj)/P(X)
P(X) is constant for all classes so we need only maximize P(X|Cj)P(Cj): P(Ci)s are known.To reduce the computational complexity of calculating all P(X|Cj)s, the naive assumption is to assume class conditional independence: P(X|Ci) is the product of the P(Xi|Ci)s.
-
Neural Network ClassificatonA Neural Network is trained to make the predictionAdvantagesprediction accuracy is generally highit is generally robust (works when training examples contain errors)output may be discrete, real-valued, or a vector of several discrete or real-valued attributesIt provides fast classification of unclassified samples.CriticismIt is difficult to understand the learned function (involves complex and almost magic weight adjustments.)It makes it difficult to incorporate domain knowledgelong training time (for large training sets, it is prohibitive!)
-
The input feature vector x=(x0..xn) is mapped into variable y by means of the scalar product and a nonlinear function mapping, f (called the damping function). and a bias function, A Neuron
-
The ultimate objective of training obtain a set of weights that makes almost all the tuples in the training data classify correctly (usually using a time consuming "back propagation" procedure which is based, ultimately on Neuton's method. See literature of Other materials - 10datamining.html for examples and alternate training techniques).StepsInitialize weights with random values Feed the input tuples into the networkFor each unitCompute the net input to the unit as a linear combination of all the inputs to the unitCompute the output value using the activation functionCompute the errorUpdate the weights and the biasNeural Network Training
-
Output nodesInput nodesHidden nodesOutput vectorInput vector: xiwijNeural Multi-Layer Perceptron
-
These next 3 slides treat the concept of Distance it great detail. You may feel you don't need this much detail - if so, skip what you feel you don't need.For Nearest Neighbor Classification, a distance is needed (to make sense of "nearest". Other classifiers also use distance.)A distance is a function, d, applied to two n-dimensional points X and Y, is such that d(X, Y) is positive definite: if (X Y), d(X, Y) > 0; if (X = Y), d(X, Y) = 0d(X, Y) is symmetric: d(X, Y) = d(Y, X)d(X, Y) holds triangle inequality: d(X, Y) + d(Y, Z) d(X, Z)
-
An Exampled1 d2 d always A two-dimensional space:
-
Neighborhoods of a Point A Neighborhood (disk neighborhood) of a point, T, is a set of points, S, :
X S iff d(T, X) r If X is a point on the boundary, d(T, X) = r
-
Classical k-Nearest Neighbor ClassificationSelect a suitable value for k (how many Training Data Set (TDS) neighbors do you want to vote as to the best predicted class for the unclassified feature sample? )
Determine a suitable distance metric (to give meaning to neighbor)
Find the k nearest training set points to the unclassified sample.Let them vote (tally up the counts of TDS neighbors that for each class.)
Predict the highest class vote (plurality class) from among the k-nearest neighbor set.
-
Closed-KNNExample assume 2 features (one in the x-direction and one in the yT is the unclassified sample. using k = 3, find the three nearest neighbor,KNN arbitrarily select one point from the boundary line shown Closed-KNN includes all points on the boundary
Closed-KNN yields higher classification accuracy than traditional KNN (thesis of MD Maleq Khan, NDSU, 2001).
The P-tree method always produce closed neighborhoods (and is faster!)
-
k-Nearest Neighbor (kNN) Classification andClosed-k-Nearest Neighbor (CkNN) Classification 1)Select a suitable value for k 2) Determine a suitable distance or similarity notion.3) Find the k nearest neighbor set [closed] of the unclassified sample.4) Find the plurality class in the nearest neighbor set.5) Assign the plurality class as the predicted class of the sample TThat's 1 !CkNN yields higher classification accuracy than traditional kNN.At what additional cost? Actually, at negative cost (faster and more accurate!!)T is the unclassified sample. Use Euclidean distance.k = 3: Find 3 closest neighbors. Move out from T until 3 neighborskNN arbitrarily select one point from that boundary line as 3rd nearest neighbor, whereas, CkNN includes all points on that boundary line.That's 2 !That's more than 3 !
-
The slides numbered 28 through 93 give great detail on the relative performance of kNN and CkNN, on the use of other distance functions and some exampels, etc. There may be more detail on these issue that you want/need. If so, just scan for what you are most interested in or just skip ahead to slide 94 on CLUSTERING.Experimented on two sets of (Arial) Remotely Sensed Images of Best Management Plot (BMP) of Oakes Irrigation Test Area (OITA), ND
Data contains 6 bands: Red, Green, Blue reflectance values, Soil Moisture, Nitrate, and Yield (class label). Band values ranges from 0 to 255 (8 bits) Considering 8 classes or levels of yield values
-
Performance Accuracy (3 horizontal methods in middle, 3 vertical methods (the 2 most accurate and the least accurate) 1997 Dataset:404550556065707580256102440961638465536262144Training Set Size (no. of pixels)Accuracy (%)kNN-ManhattankNN-EuclidiankNN-MaxkNN using HOBbit distanceP-tree Closed-KNN0-maxClosed-kNN using HOBbit distance
-
Performance Accuracy (3 horizontal methods in middle, 3 vertical methods (the 2 most accurate and the least accurate)1998 Dataset:20404550556065256102440961638465536262144Training Set Size (no of pixels)Accuracy (%)kNN-ManhattankNN-EuclidiankNN-MaxkNN using HOBbit distanceP-tree Closed-KNN-maxClosed-kNN using HOBbit distance
-
Performance Speed (3 horizontal methods in middle, 3 vertical methods (the 2 fastest (the same 2) and the slowest)1997 Dataset: both axis in logarithmic scale0.00010.0010.010.11256102440961638465536262144Training Set Size (no. of pixels)Per Sample Classification time (sec)kNN-ManhattankNN-EuclidiankNN-MaxkNN using HOBbit distanceP-tree Closed-KNN-maxClosed-kNN using HOBbit distHint: NEVER use a log scale to show a WIN!!!
-
Performance Speed (3 horizontal methods in middle, 3 vertical methods (the 2 fastest (the same 2) and the slowest)1998 Dataset : both axis in logarithmic scale0.00010.0010.010.11256102440961638465536262144Training Set Size (no. of pixels)Per Sample Classification Time (sec)kNN-ManhattankNN-EuclidiankNN-MaxkNN using HOBbit distanceP-tree Closed-kNN-maxClosed-kNN using HOBbit distWin-Win situation!!(almost never happens)
P-tree CkNN and CkNN-H are more accurate and much faster.
kNN-H is not recommended because it is slower and less accurate (because it doesn't use Closed nbr sets and it requires another step to get rid of ties (why do it?).
Horizontal kNNs are not recommended because they are less accurate and slower!
-
Key a1 a2 a3 a4 a5 a6 a7 a8 a9 a10=C a11 a12 a13 a14 a15 a16 a17 a18 a19 a20t12 1 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1t13 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1t15 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0t16 1 0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0t21 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1t27 0 1 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0t31 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 1t32 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1t33 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1t35 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0t51 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1t53 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1t55 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0t57 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0t61 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1t72 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1t75 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0t12 0 0 1 0 1 1 0 2t13 0 0 1 0 1 0 0 1t15 0 0 1 0 1 0 1 2t53 0 0 0 0 1 0 0 10 1WALK THRU: 3NN CLASSIFICATION of an unclassified sample, a=(a5 a6 a11a12a13a14 )=(000000).HORIZONTAL APPROACH ( relevant attributes are Note only 1 of many training tuple at a distance=2 from the sample got to vote. We didnt know that distance=2 was going to be the vote cutoff until the end of the 1st scan.Finding the other distance=2 voters (Closed 3NN set or C3NN) requires another scan.a5 a6 a11 a12 a13 a14 )
-
Key a1 a2 a3 a4 a5 a6 a7 a8 a9 a10=C a11 a12 a13 a14 a15 a16 a17 a18 a19 a20t12 1 0 1 0 0 0 1 1 0 1 0 1 1 0 1 1 0 0 0 1t13 1 0 1 0 0 0 1 1 0 1 0 1 0 0 1 0 0 0 1 1t15 1 0 1 0 0 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0t16 1 0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 0 0 1 0t21 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 0 1t27 0 1 1 0 1 1 0 0 0 1 0 0 1 1 0 0 1 1 0 0t31 0 1 0 0 1 0 0 0 1 1 1 0 1 0 0 0 1 1 0 1t32 0 1 0 0 1 0 0 0 1 1 0 1 1 0 1 1 0 0 0 1t33 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 0 0 0 1 1t35 0 1 0 0 1 0 0 0 1 1 0 1 0 1 0 0 1 1 0 0t51 0 1 0 1 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 1t53 0 1 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 0 1 1t55 0 1 0 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0t57 0 1 0 1 0 0 1 1 0 0 0 0 1 1 0 0 1 1 0 0t61 1 0 1 0 1 0 0 0 1 0 1 0 1 0 0 0 1 1 0 1t72 0 0 1 1 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1t75 0 0 1 1 0 0 1 1 0 0 0 1 0 1 0 0 1 1 0 0WALK THRU of required 2nd scan to find Closed 3NN set. Does it change vote?Vote after 1st scan.YES! C=0 wins now!
-
C
00000000001111111C
11111111110000000WALK THRU: Closed 3NNC using P-treesa2011001011101100110keyt12t13t15t16t21t27t31t32t33t35t51t53t55t57t61 t72t75a111110000000000100a2 00001111111111000a311111100000000111a4000000000 01111011a500001111110000100a600001100000000000a711110000001111011a811110000001111011a900000011110000100
C11111111110000000a1100011010001000100a12 11100001110110011a13 10011111001001110a14 00100100010011001a15 110100011001000 10a1610000001000000010a17 00101110011011101a1800101110011011101a1901010000100100000Ps00000000000000000a14
11011011101100110a13
01100000110110001a12
00011110001001100a11
11100101110111011a6
11110011111111111a51111000000111 1011No neighbors at distance=0First let all training points at distance=0 vote, then distance=1, then distance=2, ... until 3 For distance=0 (exact matches) constructing the P-tree, Ps then AND with PC and PC to compute the vote. (black denotes complement, red denotes uncomplemented
-
C
00000000001111111C
11111111110000000a2011001011101100110keyt12t13t15t16t21t27t31t32t33t35t51t53t55t57t61 t72t75a111110000000000100a2 00001111111111000a311111100000000111a4000000000 01111011a500001111110000100a600001100000000000a711110000001111011a811110000001111011a900000011110000100a10 =C11111111110000000a1100011010001000100a12 11100001110110011a13 10011111001001110a14 00100100010011001a15 110100011001000 10a1610000001000000010a17 00101110011011101a1800101110011011101a1901010000100100000PD(s,1)01000000000100000a14
11011011101100110a13
01100000110110001a12
00011110001001100a11
11100101110111011a6
11110011111111111a500001111110000100Construct Ptree, PS(s,1) = OR Pi = P|si-ti|=1; |sj-tj|=0, ji
= OR PS(si,1) S(sj,0) j{5,6,11,12,13,14}-{i}i=5,6,11,12,13,14i=5,6,11,12,13,14WALK THRU: C3NNCdistance=1 nbrs:
-
keyt12t13t15t16t21t27t31t32t33t35t51t53t55t57t61 t72t75a500001111110000100a600001100000000000a10 C11111111110000000a1100011010001000100a12 11100001110110011a13 10011111001001110a14 00100100010011001P5,6P5,11P5,12P5,13P5,14P6,11P6,12P6,13P6,14P11,12P11,13P11,14P12,13P12,14We now have 3 nearest nbrs. We could quite and declare C=1 winner?P13,14We now have the C3NN set and we can declare C=0 the winner!WALK THRU: C3NNCdistance=2 nbrs:
-
In the previous example, there were no exact matches (dis=0 neighbors or similarity=6 neighbors) for the sample.
There were two neighbors were found at a distance of 1 (dis=1 or sim=5) and nine dis=2, sim=4 neighbors.
All 11 neighbors got an equal votes even though the two sim=5 are much closer neighbors than the nine sim=4. Also processing for the 9 is costly.
A better approach would be to weight each vote by the similarity of the voter to the sample (We will use a vote weight function which is linear in the similarity (admittedly, a better choice would be a function which is Gaussian in the similarity, but, so far, it has been too hard to compute).
As long as we are weighting votes by similarity, we might as well also weight attributes by relevance also (assuming some attributes are more relevant neighbors than others. e.g., the relevance weight of a feature attribute could be the correlation of that attribute to the class label).
P-trees accommodate this method very well (in fact, a variation on this theme won the KDD-cup competition in 02 ( http://www.biostat.wisc.edu/~craven/kddcup/ )
-
Association of Computing Machinery KDD-Cup-02NDSU Team
-
Closed Manhattan Nearest Neighbor Classifier (uses a linear fctn of Manhattan similarity) Sample is (000000), attribute weights of relevant attributes are their subscripts)keyt12t13t15t16t21t27t31t32t33t35t51t53t55t57t61 t72t75a500001111110000100a600001100000000000
C11111111110000000a1100011010001000100a12 11100001110110011a13 10011111001001110a14 00100100010011001a14
11011011101100110a13
01100000110110001a12
00011110001001100a11
11100101110111011a6
11110011111111111a51111000000111 1011 black is attribute complement, red is uncomplemented.The vote is even simpler than the "equal" vote case. We just note that all tuples vote in accordance with their weighted similarity (if the ai values differs form that of (000000) then the vote contribution is the subscript of that attribute, else zero). Thus, we can just add up the root counts of each relevant attribute weighted by their subscript.Class=1 root counts:rc(PC^Pa5)=4C=1 vote is: 343 =4*5 + 8*6 + 7*11 + 4*12 + 4*13 + 7*14rc(PC^Pa6)=8rc(PC^Pa11)=7rc(PC^Pa12)=4rc(PC^Pa13)=4rc(PC^Pa14)=7C=1 vote is: 343Similarly, C=0 vote is: 258= 6*5 + 7*6 + 5*11 + 3*12 + 3*13 + 4*14
-
We note that the Closed Manhattan NN Classifier uses an influence function which is pyramidal It would be much better to use a Gaussian influence function but it is much harder to implement.One generalization of this method to the case of integer values rather than Boolean, would be to weight each bit position in a more Gaussian shape (i.e., weight the bit positions, b, b-1, ..., 0 (high order to low order) using Gaussian weights. By so doing, at least within each attribute, influences are Gaussian.
We can call this method, Closed Manhattan Gaussian NN Classification.
Testing the performance of either CM NNC or CMG NNC would make a greatpaper for this course (thesis?).
Improving it in some way would make an even better paper (thesis).
- Machine Learning is based on Near Neighbor Set(s), NNS.Clustering, even density based, identifies near neighbor cores 1st (round NNSs, about a center).Classification is continuity based and Near Neighbor Sets (NNS) are the central concept in continuity>0 >0 : d(x,a)
-
and S Y, contour(f,S) = f-1(S) Equivalently, contour(Af,S) = SELECT A1..An FROM R* WHERE x.AfS. Graphically:A Weather map, f = barometric pressure or temperature, {Si}=equi-width partion of Reals.f = local density (eg, OPTICS: f = reachability distance, {Sk} = partition produced by intersection points of {graph(f), plotted wrt to some walk of R} and a horizontal threshold line.A grid is the intersection of dimension projection contour partitions (next slide for more defintions).A Class is a contour under f:RClassAttr wrt the partition, {Ci} of ClassAttr (where {Ci} are the classes).An L -disk about a is the intersection of all -dimension_projection contours containing a.Functional Contours: function, f:R(A1..An) Y partition, {Si} of Y, the contour set, {f-1(Si)}, is a partition of R (clustering of R):f(x)
-
f:RY, partition S={Sk} of Y, {f-1(Sk)}= S,f-grid of R (grid cells=contours) If Y=Reals, thej.lo f-grid is produced by agglomerating over the j lo bits of Y, fixed (b-j) hi bit pattern.
The j lo bits walk [isobars of] cells. The b-j hi bits identify cells. (lo=extension / hi=intention) Let b-1,...,0 be the b bit positions of Y. The j.lo f-grid is the partition of R generated by f and S ={Sb-1,...,b-j | Sb-1,...,b-j = [(b-1)(b-2)...(b-j)0..0, (b-1)(b-2)...(b-j)1..1)} partition of Y=Reals.
If F={fh}, thej.lo F-grid is the intersection partition of the j.lo fh-grids (intersection of partitions). The canonical j.lo grid is the j.lo -grid; ={d:RR[Ad] | d = dth coordinate projection} j-hi gridding is similar ( the b-j lo bits walk cell contents / j hi bits identify cells).
If the horizontal and vertical dimensions have bitwidths 3 and 2 respectively:GRIDs
-
2.lo grid 1.hi grid
111
110
101
100j.lo and j.hi gridding continued
The horizontal_bitwidth = vertical_bitwidth = b iff j.lo grid = (b-j).hi grid
e.g., for hb=vb=b=3 and j=2:
000 001 010 011 100 101 110 111011
010
001
000111
110
101
100000 001 010 011 100 101 110 111011
010
001
000
-
Similarity NearNeighborSets (SNNS) Given similarity s:RRPartiallyOrderedSet (eg, Reals)( i.e., s(x,y)=s(y,x) and s(x,x)s(x,y) x,yR ) and given any C RThe Cardinal disk, skins and rings are (PartiallyOrderedSet = Reals)disk(C,r) {xR | s(x,C)r} also = functional contour, f-1([r, ), where f(x)=sC(x)=s(x,C)skin(C,r) disk(C,r) - Cring(C,r2,r1) disk(C,r2)-disk(C,r1) skin(C,r2)-skin(C,r1) also = functional contour, sC-1(r1,r2]The Ordinal disks, skins and rings are:disk(C,k) C : |disk(C,k)C'|=k and s(x,C)s(y,C) xdisk(C,k), ydisk(C,k)skin(C,k) = disk(C,k)-C (skin comes from s k immediate neighbors and is a kNNS of C.)ring(C,k) = cskin(C,k)-cskin(C,k-1) closeddisk(C,k)alldisk(C,k); closedskin(C,k)allskin(C,k)L skins: skin(a,k) = {x | d, xd is one of the k-NNs of ad} (a local normalizer?)Note: closeddisk(C,r) is redundant, since all r-disks are closed and closeddisk(C,k) = disk(C,s(C,y)) where y = kth NN of C
-
PtreesPredicate-tree: For a predicate on the leaf-nodes of a partition-tree (also induces predicates on i-nodes using quantifiers) Predicate-tree nodes can be truth-values (Boolean P-tree); can be quantified existentially (1 or a threshold %) or universally; Predicate-tree nodes can count # of true leaf children of that component (Count P-tree)Purity-tree: universally quantified Boolean-Predicate-tree (e.g., if the predicate is , Pure1-tree or P1tree)A 1-bit at a node iff corresponding component is pure1 (universally quantified)There are many other useful predicates, e.g., NonPure0-trees; But we will focus on P1trees.All Ptrees shown so far were: 1-dimensional (recursively partition by halving bit files), but they can be; 2-D (recursively quartering) (e.g., used for 2-D images); 3-D (recursively eighth-ing), ; Or based on purity runs or LZW-runs or Vertical, compressed, lossless structures that facilitates fast horizontal AND-processingJury is still out on parallelization, vertical (by relation) or horizontal (by tree node) or some combination?Horizontal parallelization is pretty, but network multicast overhead is hugeUse active networking? Clusters of Playstations?...Formally, P-trees are be defined as any of the following;Partition-tree: Tree of nested partitions (a partition P(R)={C1..Cn}; each component is partitioned by P(Ci)={Ci,1..Ci,ni} i=1..n; each component is partitioned by P(Ci,j)={Ci,j1..Ci,jnij}... )Partition treeR/ \ C1 Cn /\ /\ C11C1,n1 Cn1Cn,nn . . .Further observations about Ptrees:Partition-tree: have set nodesPredicate-tree: have either Boolean nodes (Boolean P-tree) or count nodes (Count P-tree)Purity-tree: being universally quantified Boolean-Predicate-tree have Boolean nodes (since the count is always the full count of leaves, expressing Purity-trees as count-trees is redundant.Partition-tree can sliced at a given level if each partition at that level is labeled with very same label set (e.g., Month partition of years).A Partition-tree can be generalized to a Set-graph when the siblings of a node do not form a partition.
-
0Pf,S = equivalently, the existential R*-bit map of predicate, R*.Af S The Compressed Ptree, sPf,S is the compression of 0Pf,S with equi-width leaf size, s, as follows
1. Choose a walk of R (converts 0Pf,S from bit map to bit vector)2. Equi-width partition 0Pf,S with segment size, s (s=leafsize, the last segment can be short)3. Eliminate and mask to 0, all pure-zero segments(call mask, NotPure0 Mask or EM)4. Eliminate and mask to 1, all pure-one segments(call mask, Pure1 Mask or UM)Compressing each leaf of sPf,S with leafsize=s2 gives: s1,s2Pf,S Recursivly, s1, s2, s3Pf,S s1, s2, s3, s4Pf,S ...(builds an EM and a UM tree)BASIC P-treesIf Ai Real or Binary and fi,j(x) jth bit of xi ; {(*)Pfi,j ,{1} (*)Pi,j}j=b..0 are basic (*)P-trees of Ai, *= s1..skIf Ai Categorical and fi,a(x)=1 if xi=a, else 0; {(*)Pfi,a,{1} (*)Pi,a}aR[Ai] are basic (*)P-trees of Ai
Notes:The UM masks (e.g., of 2k,...,20Pi,j, with k=roof(log2|R| ), form a (binary) tree.Whenever the EM bit is 1, that entire subtree can be eliminated (since it represents a pure0 segment), then a 0-node at level-k (lowest level = level-0) with no sub-tree indicates a 2k-run of zeros. In this construction, the UM tree is redundant.We call these EM trees the basic binary P-trees. The next slide shows a top-down (easy to understand) construction of and the following slide is a (much more efficient) bottom up construction of the same. We have suppressed the leafsize prefix.(EM=existential aggregation UM=universal aggregation)The partitions used to create P-trees can come from functional contours (Note: there is a natural duality between partitions and functions, namely a partition creates a function from the space of points partitioned to the set of partition components and a function creates the pre-image partition of its domain).In Functional Contour terms (i.e., f-1(S) where f:R(A1..An)Y, SY), the uncompressed Ptree oruncompressed Predicate-tree 0Pf, S = bitmap of set containment-predicate, 0Pf,S(x)=true iff xf-1(S)
-
Example functionals: Total Variation (TV) functionals TV(a) = xR(x-a)o(x-a) If we use d for a index variable over the dimensions, = xRd=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes Note that the first term does not depend upon a. Thus, the derived attribute, TV-TV() (eliminate 1st term) is much simpler to compute and has identical contours (just lowers the graph by TV() ).We also find it useful to post-compose a log to reduce the number of bit slices.The resulting functional is called the High-Dimension-ready Total Variation or HDTV(a).
-
The length of LNTV(a) depends only on the length of a-, so isobars are hyper-circles centered at The graph of LNTV is a log-shaped hyper-funnel: From equation 7,Normalized Total Variation, NTV(a) TV(a)-TV() = d(adad- dd))= |R| (-2d(add-dd) + TV(a) = x,d,i,j 2i+j xdixdj+ |R| ( -2dadd + dadad )LNTV(a) = ln( NTV(a) ) = ln( TV(a)-TV() ) = ln|R| + ln|a-|2= |R| |a-|2 go inward and outward along a- by to the points;inner point, b=+(1-/|a-|)(a-) andouter point, c=-(1+/|a-|)(a-).For an -contour ring (radius about a)Then take g(b) and g(c) as lower and upper endpoints of a vertical interval.
Then we use EIN formulas on that interval to get a mask P-tree for the -contour(which is a well-pruned superset of the -neighborhood of a)Thus there is a simpler function which gives us circular contours, the Log Normal TV function
-
use circumscribing Ad-contour (Note: Ad is not a derived attribute at all, but just Ad, so we already have its basic P-trees).LNTV(b)LNTV(c)bcAs pre-processing, calculate basic P-trees for the LNTV derived attribute (or another hypercircular contour derived attribute).To classify a1. Calculate b and c (Depend on a, )2. Form mask P-tree for training pts with LNTV-values[LNTV(b), LNTV(c)]3. User that P-tree to prune out the candidate NNS.If the count of candidates is small, proceed to scan and assign class votes using Gaussian vote function, else prune further using a dimension projections).contour of dimension projection f(a)=a1x1x2LNTV(x)If the LNTV circumscribing contour of a is still too populous, We can also note that LNTV can be further simplified (retaining same contours) using h(a)=|a-|. Since we create the derived attribute by scanning the training set, why not just use this very simple function?Others leap to mind, e.g., hb(a)=|a-b|
-
Graphs of functionals with hyper-circular contours
-
Angular Variation functionals: e.g., AV(a) ( 1/|a| ) xR xoa d is an index over the dimensions, = (1/|a|)xRd=1..nxdad COS (and AV) has hyper-conic isobars center on COS(a) AV(a)/(|||R|) = oa/(|||a|) = cos(a)COS and AV have -contour(a) = the space between two hyper-cones center on which just circumscribes the Euclidean -hyperdisk at a.= (1/|a|)d(xxdad) factor out adIntersection (in pink) with LNTV -contour.Graphs of functionals with hyper-conic contours:E.g., COSb(a) for any vector, b
-
f(a)x = (x-a)o(x-a) d = index over dims, = d=1..n(xd2 - 2adxd + ad2) i,j,k bit slices indexes Adding up the Gaussianweighted votes for class c:Collecting diagonal terms inside expi,j,d inside exp, coefs are multiplied by 1|0-bit (depends on x). For fixed i,j,d either coef is x-indep (if 1bit) or not (if 0bit)=d,i,j 2i+j ( xdixdj - 2adixdj + adiadj )=d,i,j 2i+j ( xdi-adj )(xdj-adj)Some additiona formulas:
-
fd(a)x = |x-a|d= |i 2i ( xdi adi )|= | i: adi=0 2ixdi - i: adi=1 2ix'di |Thus, for the derived attribute, fd(a) = numeric distance of xd from ad, if we remember that: when adi=1, subtract those contributing powers of 2 (don't add) and that we use the complement dimension=d basic Ptrees, then it should work.
The point is that we can get a set of near basic or negative basic Ptrees, nbPtrees, for derived attr fd(a) directly from the basic Ptrees for Ad for free. Thus, the near basic Ptrees for fd(a) are
the basic Ad Ptrees for those bit-positions where adi = 0 and they are the complements of the basic Ad Ptrees for those bit-positions where adi = 1 (called fd(a)'s nbPtrees)
Caution: subtract the contribution of the nbPtrees for positions where adi=1Note: nbPtrees are not predicate trees (are they? What's the predicate?) The EIN ring formulas are related to this, how?
If we are simply after easy pruning contours containing a (so that we can scan to get the actual Euclidean epsilon nbrs and/or to get Guassian weighted vote counts, we can use Hobbit-type contours (middle earth contours of a?).
See next slide for a discussion of hobbit contours.
-
A principle: A job is not done until the Mathematics is completed.The Mathematics of a research job includes0. Getting to the frontiers of the area (researching, organizing, understanding and integrating everything others have done in the area up to the present moment and what they are likely to do next).1. developing a killer idea for a better way to do something.2. proving claims (theorems, performance evaluation, simulation, etc.),3. simplification (everything is simple once fully understood),4. generalization (to the widest possible application scope), and4. insight (what are the main issues and underlying mega-truths (with full drill down)).
Therefore, we need to ask the following questions at this point:Should we use the vector of medians (the only good choice of middle point in mulidimensional space, since the point closest to the mean definition is influenced by skewness, like the mean).
We will denote the vector of medians as
h(a)=|a-| is an important functional (better than h(a)=|a-|?)
If we compute the median of an even number of values as the count-weighted average of the middle two values, then in binary columns, and coincide. (so if and are far apart, that tells us there is high skew in the data (and the coordinates where they differ are the columns where the skew is found).
-
Are they easy P-tree computations? Do they offer advantages? When? What? Why?
E.g., do they automatically normalize for us?What about the vector of standard deviations, ? (computable with P-trees!) Do we have an improvement of BIRCH here? - generating similar comprehensive statistical measures, but much faster and more focused?)
We can do the same for any rank statistic (or order statistic), e.g., vector of 1st or 3rd quartiles, Q1 or Q3 ; the vector of kth rank values (kth ordinal values).
If we preprocessed to get the basic P-trees of , and each mixed quartile vector (e.g., in 2-D add 5 new derived attributes; , Q1,1, Q1,2, Q2,1, Q2,2; where Qi,j is the ith quartile of the jth column), what does this tell us (e.g., what can we conclude about the location of core clusters? Maybe all we need is the basic P-trees of the column quartiles, Q1..Qn ?)Additional Mathematics to enjoy:L ordinal disks:
disk(C,k) = {x | xd is one of the k-Nearest Neighbors of ad d}.
skin(C,k), closed skin(C,k) and ring(C,k) are defined as above.
-
The Middle Earth Contours of a are gotten by ANDing in the basic Ptree for ad,i=1 and ANDing in the complement if ad,i=0 (down to some bit-position threshold in each dimension, bptd . bptd can be the same for each d or not).
Caution: Hobbit contours of a are not symmetric about a. That becomes a problem (for knowing when you have a symmetric nbrhd in the contour) expecially when many lowest order bits of a are identical (e.g., if ad = 8 = 1000 )
If the low order bits of ad are zeros, one should union (OR) take the Hobbit contour of ad - 1 (e.g., for 8 also take 7=0111)
If the low order bits of ad are ones, one should union (OR) the Hobbit contour of ad + 1 (e.g, for 7=111 also take 8=1000)
Some need research:
Since we are looking for an easy prune to get our mask down to a scannable size (low root count) but not so much of a prune that we have too few voters within Euclidean epsilon distance of a for a good vote, how can we quickly determine an easy choice of a Hobbit prune to accomplish that? Note that there are many Hobbit contours. We can start with pruning injust one dimension and with only the lowest order bit in that dimension and work from there, how though?
THIS COULD BE VERY USEFUL?
-
Suppose there are two classes, red and green and they are on the cylinder shown. Then the vector connecting medians (vcm) in YZ space is shown in purple.sxyzThen the unit vector in the direction of the vector connecting medians (uvcm) in YZ space is shown in blue.The vector from the midpoint of the medians to s is in orange.The inner product of the blue and the orange is the same as the inner product we would get by doing the same thing in all 3 dimensions!The point is that the x-component of the red vector of medians and that of the green are identical so that the x component of the vcm is zero.Thus, when the small vcm comp in a given dimension is very small or zero, we can eliminate that dimension!That's why I suggest a threshold for the innerproduct in each dimension first.It is a feature or attribute relevance tool.
-
DBQ versus MIKE (DataBase Querying vs. Mining through data for Information and Knowledge Extraction
Why do we call it Mining through data for Information & Knowledge Extraction and not just Data Mining? We Mine Silver and Gold! We don't just Mine Rock (The emphasis should be on the desired result, not the discard. The name should emphasize what we mine for, not what we mine from.)
Silver and Gold are low-volume, high-value products, found (or not) in the mountains of rock (high-volume, low-value). Information and knowledge are low-volume, high-value, hiding in mountains of data (high-volme, low-value)
In both MIKE and MSG the output and substrate are substantially different in structure (chemical / data structure)Just as in Mining Silver and Gold, we extract (hopefully) Silver and Gold from raw Rock, in Mining through data for Information and Knowledge, we extract (hopefully) Information and Knowledge from raw Data.So Mining through data for Information and Knowledge Extraction is the correct terminology and MIKE is the correct acronym, not Data Mining (DM).
How is Data Base Querying (DBQ) different from Mining thru data for Info & Knowledge (MIKE)?
In all mining (MIKE as well as MSG) we hope to successfully mine out something of value, but failure is likely, whereas in DBQ, valuable results are likely and no result is unlikely.
DBQ should be called Data Base Quarrying, since it is more similar to Granite Quarrying (GQ), in that what we extract has the same structure as that from which we extract it (the substrate). It has higher value because its detail and specificity. I.e., the output records of a DBQ are exactly the reduce size set of records we demanded and expected from our query and the output grave stones of GQ are exactly the size and shape we demanded and expected, and in both cases what is left is a substance that is the same as what is taken).
In sum,DBQ = Quarrying (highly predictable output and the output has same structure as the substrate (sets of records)).MIKE = Mining (unpredictable output and the output has different structure than the substrate (e.g., T/F or partition).
-
Some good Dataset for classificationKDDCUP-99 Dataset (Network Intrusion Dataset)4.8 millions records, 32 numerical attributes6 classes, each contains >10,000 recordsClass distribution:
Testing set: 120 records, 20 per class4 synthetic datasets (randomly generated):10,000 records (SS-I)100,000 records (SS-II)1,000,000 records (SS-III) 2,000,000 records (SS-IV)
-
Speed and ScalabilitySpeed (Scalability) Comparison (k=5, hs=25)Machine: Intel Pentium 4 CPU 2.6 GHz, 3.8GB RAM, running Red Hat LinuxNote: these evaluations were done when we were still sorting the derived TV attribute and before we used Gaussian vote weighting. Therefore both speed and accuracy of SMART-TV have improved markedly!10002000300048910102030405060708090100Training Set Cardinality (x1000)Time in SecondsRunning Time Against Varying CardinalitySMART-TVPKNNKNN
-
Dataset (Cont.)OPTICS dataset8,000 points, 8 classes (CL-1, CL-2,,CL-8) 2 numerical attributes
Training set: 7,920 points Testing set: 80 points, 10 per class
Chart1
16.7761772765.0091357622.9276316365.2047094490.135320750.4449740629.9621098587.67001084
18.3051486761.8767911118.4712202765.0185938188.0707381953.4611027329.6359121490.0135566
21.2792716164.4699536917.9725660566.3336033388.1867684751.090928131.0787970183.41408683
16.1435602565.6513780621.1910170164.9472245788.597001352.4150659938.4444553577.63286703
16.7420585463.2387485629.7235656484.6751108188.1080080453.9368570643.4689063683.32916991
14.4499963764.6058561123.6068665.6656406489.0258629553.8098363729.497133494.92067401
10.6868109764.7326687916.2749864877.4878003186.8609283752.3417924138.4154899283.03252521
10.3087588462.0819055422.0476277457.7298784190.9823841953.6768415932.9008658675.40521067
13.4793692670.1517386817.3742779370.4610992288.544966552.3514586334.4717279373.69890335
17.4331697364.1032025119.3890197460.3271947384.6952716351.6141748237.4065219591.08995444
8.52517401666.9966755719.0889137244.0818428686.3605620154.687182942.7680183777.46519691
13.2333300464.5292956114.4150769363.8036113888.8597991953.6018875433.8748898184.86968391
16.575384963.7165272323.3700083773.3439729288.3067822954.7524733536.3987137377.67961139
20.9088710662.8239512922.2658379159.9296959985.9552774452.1729709444.868454688.37414868
13.1794948459.383418429.1220067371.4297061488.9625909453.6344847435.1449663396.51599607
16.1558172969.4154470713.6080838169.3818978585.7660218751.1510249934.3772583882.08871189
14.787308664.7535917822.9930640367.8454019388.051561853.5952752739.1364641380.22199488
23.2283582662.9190104817.8663476153.1887323487.7034396555.9215297128.1944055284.64807101
20.8846630863.8464245419.1569892452.5674122588.4139083649.6447929842.2487392874.02553761
24.9576940165.0657689216.6532144266.86313189.7710479351.6790236731.150293397.89335929
15.160224157.9418249228.0708087565.8521218685.4068503850.8174841139.7030305390.32958767
16.809156463.5738621120.3722123959.1198651487.2497866153.6393896132.7594422897.69059763
22.3143104962.0055824718.7582359973.829609688.0901788953.3650007837.5869868491.03931203
14.5971208962.8769015716.5557946174.6827378590.5881137953.8405359244.0532624285.90480813
9.84865627559.0316082324.6881079249.0635437886.8204731455.1441296140.4123588377.16694786
11.8775134859.3400933220.5965455884.6782979486.2298375353.9144057928.4506668881.65590649
10.3104213765.5823007121.6013665645.7719144486.2175176752.9733516835.0994531194.95543874
16.485591862.8541857624.8562837261.9484073284.004863651.1442414333.1553647581.56278298
18.1971271461.9542324224.2068580968.8035897788.4690799356.824831237.697624276.65577857
8.99895089362.1054092712.1135449666.7905368687.9952956248.8702032942.2733461477.86826032
6.90930506266.0169205124.4368074378.9126414187.0386744651.2961076237.4962971789.81509606
14.7203361363.1595428117.9046349562.3204794487.4066843751.8091188632.2449414377.49523145
25.0534929165.5928203926.7020172154.908428887.0573059955.2454079638.8070865789.9379522
10.8696240259.7350128817.0123309165.225008588.9557724456.2047608343.5981722988.21333767
7.09426955156.8504885428.9481048169.6838969788.862567352.9983130131.6555831776.14141414
20.8607706163.8350254219.5251216254.7409976488.4912687251.9329977435.6490166493.14530373
14.9943606961.5975904517.0269648872.1268052891.1807958549.515576934.3123250873.64419899
7.08270303862.3786621220.2063251944.1674044286.1492509853.1007489930.1574063380.94184528
19.3399190263.9141277716.4264113149.0613477687.4991739854.552762241.9260286291.39970858
9.78708992962.2105962520.67340864.5476168488.1026399951.0759342941.4851697781.25525652
12.1199655361.9481338220.4494630269.6252621288.4173946353.0188664137.6739201387.62433093
23.4244747563.2535779919.8947937752.041685987.7999123350.1955773332.1021908385.10032731
23.9644850564.0525419319.3445528344.7531880489.1900448552.2840184839.6807333280.27629782
7.8516241764.6464772312.6914615562.6356579685.5205273955.4975811236.6964053990.93930945
16.8184284762.131071117.0352387951.744896187.3389282251.6219496636.7010066793.8652306
10.8829460860.0514680922.9742977861.4518656987.889526254.8328696636.0868567572.86186337
11.8226933966.4076881423.5266805659.7029145989.0429025852.6704205444.1942881484.9703418
12.2251103566.5488171716.4915485973.306006789.0508807354.0558660127.840893492.44700196
11.6313700161.4275526314.3098874247.4624478385.6878467453.8415275140.7712646477.8050158
10.1323993457.8546952317.6006617779.5211555188.5680113348.7975781433.7639722292.37291101
14.4902040165.9776689221.9658199363.401971386.4601863455.5441424932.9610117181.13897567
24.3355903366.9927959425.8591746479.9723816989.1705698352.6527432736.7540743584.60221883
25.0435797864.1139653919.232702148.1619874588.7605471252.431452239.3096860995.9859113
18.0544472563.7609226621.4365849446.0760356188.260644354.9026027835.030347680.85463198
12.273303764.0902594121.8704821385.1431562688.0618542954.2958261430.2852070485.21539663
22.9145551660.9146551224.6248489857.3707470489.9186064350.8259082537.8117446479.7186515
19.4772464965.294735223.0011456774.2838562685.3597598150.5975026939.2550998374.25468812
8.32650353766.8571786220.1899563666.7367936988.4803595853.5576080639.6666748473.62670683
15.7089408667.9734663321.0067510449.9411934691.3112590851.2837375835.7757056679.503839
16.827599759.8871404718.635782368.0529483185.3125369650.7864904827.8431552685.37971363
13.3220682465.230348921.1500305776.9681457388.1211191955.7582291639.8426848783.06822421
8.09797704564.0763788621.0781183669.5162084186.4746997453.1467821834.1558300892.28066005
14.2966063663.2482462718.962675652.550824787.3421810755.2161149937.4774579676.87396389
19.3600110662.6516715720.3140584147.7526531887.0316267955.5132286236.7010644182.70472117
20.0268947764.2515212222.5841707968.2116803588.7149806950.9029111535.1250272175.58227227
11.2564053463.6528874823.5445585966.7820910289.6272385455.9088265240.7871522573.64780423
17.4262772562.6219175922.1474764171.0026896390.9710673654.1328281432.9012979871.68161419
11.7438890162.6523231315.7614928783.0960853591.0280358252.8042731341.6979079879.47663531
9.14996572563.9759599917.6082469665.7690953687.8295276853.1796759235.8991191288.6406292
15.3990333564.3281269415.9257713148.2540230490.6063347455.2245297242.6305018782.68311564
22.4646963462.5515352118.7991919657.3171352587.9378463250.34330936.9914130380.04482089
21.8994178264.8124867319.4971606848.8088322890.7736166455.6440769931.8649169683.44304651
8.55533620659.3316891122.3083896856.9229124986.3378675653.4939724233.8827279287.62453609
20.9321297860.751167815.7290566771.8615803488.9199731352.690580633.1006695485.58372465
8.40048202864.4259520712.3151557465.1793247189.9511246449.4670330633.9967428581.18305049
21.5977356967.1750751415.3426245466.0405715188.3644812355.7868692742.2923387181.73048625
10.2736797760.5170417917.0629743858.3130087189.4221782855.5971629639.295718184.2638124
19.1114635364.4546992918.2042517656.868248686.8014597953.8230555435.0685776279.86988451
15.2070874263.3054571512.7952034467.3753738690.6702751457.104944530.8711164491.55081457
20.1362467759.9196906719.5130014858.0509561487.9658706954.170130434.0158821782.54350847
17.6725688956.4596291218.4316255880.8460775887.4525475955.4930946442.7235520995.90306801
15.5669267863.7038003822.7412960850.2680451687.3331153350.4197918339.0293740886.70460393
15.7918421862.756816198.32525747664.1357724989.2767256254.9308856941.5799774773.12912795
16.8152038561.7830937419.0077848665.0179219185.8053469853.1032808833.1472250988.03040303
21.7588327259.836529421.0209709559.6182498189.8026779453.8054721428.074113575.98281618
21.1311089561.6586437326.7526617879.9937184286.0257727149.2046816439.3748043978.28099097
24.7798136863.2595362222.1633348780.8075332687.6180568652.0856840735.2209097796.34858684
20.4310272165.5454921316.948276963.3440155788.1419726851.8499401233.2922518991.49112101
13.8806619965.2481380516.8157607868.2032951190.3577178950.5255162337.3722713577.65667099
19.5515232261.1585722721.9173278675.0135076387.4801614252.0672678334.5194291872.3593858
10.8208859860.4867227421.97373861.6406390787.8564628653.8038467735.0626457282.48103914
23.7170922367.4403710424.2441046459.4755751487.5345641251.1474877940.1236473788.70213712
13.8853828165.12756669.0817968877.8646290686.0575185654.4268144433.10796276.85017063
7.57772213161.0109469717.025740562.7872793190.4459947754.1704244638.2879814283.45244903
13.1559155560.8126250922.5500931346.3755354487.1181813449.7232758738.5005830187.09227395
12.7859826359.9482542723.0342175560.0476975789.1053903454.4160220433.1165007978.06373462
18.1384290461.8655256622.7977976270.1158908991.4052416853.3168689737.8908126992.3502691
23.2971668864.4494666523.7938350456.5788306488.9854069750.0216126637.5964173279.05535602
20.2400772164.647392319.5777764760.713416884.9895517553.1492957835.8732243582.42874693
21.5470235369.8943488417.2092524849.2474429586.7337788654.4686917840.7979831887.08944628
10.1090499464.178441720.5855276771.4457821287.0656333453.1528932533.1345095191.38741049
11.1229398863.8562592621.8326174663.6352828789.7045282750.6843674236.3150073695.64888207
17.3513895863.9194296725.1833249267.4872311285.756468352.8591883335.2854099989.4553544
16.9735543562.1731273722.4409350960.3901619188.5742411952.6912510631.4944303489.96644567
19.4878583965.8587863524.7353105171.1304203487.1576303252.5427359638.285010989.31201778
18.772394659.6342822122.1792286763.9478148590.0333867956.3441528341.0845500986.62923056
18.0965781268.1292121916.0132283458.0648920384.8252620251.4928361637.3875632390.80085381
19.0226706165.4604837217.5121144469.7613999687.6205489353.7148174737.5133634781.70626913
12.7500939758.6226717518.6485213457.8570953287.9038136552.8973249735.2000184479.99051811
13.3841477960.3125283922.0035593949.3742397585.5143947150.0228260534.1146434390.99819869
12.4884945861.143045710.2232722661.0374740485.7067454952.4529380631.0786634384.92018959
10.4506469969.6215039913.4685727763.2530503488.4230003952.1529593740.8810179889.99841049
20.8243031561.6367414322.3151448875.1702070287.8365014251.6352624438.7208419788.69152401
21.069824964.6624352522.1041589572.5676784385.9280695754.5792647835.5220667775.41881398
11.7804721764.561842522.8159323979.2623689787.2329406451.1543957942.216370693.01989827
17.920465860.5207180521.8963310751.339331987.4161214453.7886376636.8158780186.70420663
14.1983413661.2436102124.5009158669.2383508489.3193714954.1227387136.1629205379.20748927
21.4428174865.0000751919.8447969264.4833458788.3405422852.5452843438.177278783.18357099
15.6189967160.1821393221.922886451.9426866886.2951179153.3591138838.5227026283.21081899
22.6422217663.7741811622.0575568377.3909560387.9585222954.3057195933.7359300574.56870796
14.196244365.2076418317.8530193969.5781009387.5561553154.0488856939.2996376476.2826525
10.2843414665.2786634213.7487974366.0846471990.880001652.6406112537.9926519494.1762044
14.7064393165.8482952220.9501549555.9886324286.0695491951.6358534435.2730008378.23397926
18.5159422260.8789528323.6724197848.4725316591.7684461753.3752735343.9809795186.77226228
24.2486620362.7034468314.2029583566.0614101685.6750017953.6781411733.6630534280.22563709
18.6567449663.1102726418.1651977776.1091309686.8033032853.7624855737.9020490386.50780563
23.787894263.8429604420.7506983780.4762542988.7172296453.4558225834.9428116781.42580116
9.36897585262.7145819121.1707105971.3067521289.0207084751.3497185538.8012155675.1422148
19.5580734163.8252253818.8099494864.0574917686.5914576357.3879538635.8179928672.94614895
9.33270862758.1436045216.1804341971.0395900291.1283152651.4705587142.0071500598.31114902
14.5715722563.4526687117.4966077477.6883915585.4163463251.9743532734.4116165179.25689781
20.4316360565.7864320412.765484372.4690182187.0310541352.5207517137.3314775390.08252161
8.42245371165.9499032415.4175258668.7424615890.7062545952.3514170937.8045945790.65855043
23.0037182963.1754080923.6146578263.3201402488.1770821251.875386436.7434539281.0450679
16.3545903964.9152509622.7717456469.6159143985.2035484450.6651843134.3846855579.17239002
10.5749322463.1687254222.8482307671.7116599789.1027303152.8937428331.6560707475.88632727
19.3205862966.778790118.1856473149.363431888.7750837251.7189772335.0596699274.3158974
13.9516200861.3419021212.0276749568.8068063787.2781158752.1162153637.3541581487.90798143
23.1745760360.3118289619.1506908483.3168244988.9556969653.7739082639.4914832893.66437843
10.3351600464.2042850818.4731002560.9782573991.2560487150.6073453733.2677348177.68367899
10.095359666.6115557522.9121777355.101888588.3519341155.8838882137.8494713271.99177991
14.4482401361.2291743417.349816682.6888029888.2827512155.6914496635.9962262888.91227251
15.9035218361.5547014526.7801301762.3068074392.4833118154.4239470941.1802994890.7566343
13.5666947662.2184944822.8203408969.9617350588.8416457850.7285389233.0381843981.7938433
14.4522788758.4000288718.2547092271.71373687.884644251.2627887940.0286150292.64430311
12.0909102566.4180793820.1289172155.7042973588.0961060851.9554838638.8863418278.81495262
11.4446951161.2854827522.2973198576.8626586989.8639460951.0839730238.0308290292.74802788
12.087242958.1600443315.074092358.2787333387.6261187252.450794932.7132766585.63104177
13.0579260764.0246011718.7661411552.3297280588.7918001556.1606933540.9767095384.87403999
19.1848386463.6305103920.9314876870.7511244690.1719868153.9589714140.9555640289.9042664
16.7037475663.6569967318.5029885251.0591710490.7889709752.7259490934.5640362586.7495234
16.9407442364.8881609818.9935060263.9488577289.1349875253.6102361838.550691382.53211479
12.0676130566.2318844523.2650981756.4513635485.9110631551.4835328836.3372512195.31891694
11.8016231462.5256895821.0826100558.9204877687.6596098951.8779072936.7607074189.11111569
19.6210290265.3688666319.2336164180.2758608886.0211189250.3104938440.1952489382.97084243
22.6805540664.750346520.0573127754.54007888.9802131454.1155994143.7684233688.06512198
19.7675353360.9113865516.2371475673.7252050988.0636349251.0959568746.3614933182.09328714
21.4903398760.2351225818.4194026980.1793678888.2821432451.5759008438.8095558286.94612308
17.8434230856.9987802614.0399074157.5165247888.8672660253.454082930.8439695483.30113136
22.4422786166.0979990117.0958682880.3159724786.508761252.4817170944.4750758680.65911866
9.75515924263.737044712.5287873945.7355384489.5763969251.6113875536.3534970387.43105941
9.49985999360.6434820915.9053851674.0991381890.3368904956.263211537.9291115879.88528341
12.978308164.9170100318.8110402252.2040783188.2549108753.6024782426.0767105593.16613558
21.3921156364.2364106615.7323370160.6550875190.4057599556.1641734138.7060723386.15936973
13.256578861.4946796117.6113860182.5671830987.8380381154.0661104936.5931266893.2234635
21.4126506265.1123506221.5479015677.6483326290.9792920551.8939057237.7956357185.39032335
22.2548381367.129511123.2204886580.5182091590.0725219449.2058452433.7511630279.53236045
9.53952609964.621924426.6428956749.5040737586.3505326150.8868170330.1657713284.041867
19.9267938264.4981292713.6874692256.9963809988.5500974451.7789365534.4136737188.32776394
8.02261054162.0127306515.4870059583.1064538687.3541969955.4614638331.6859105387.52950926
6.71753706560.000126116.9415851972.9298185190.5229512451.6076991633.5583165384.49754153
11.4426292363.2135717521.0102504670.9646692188.6415901353.1655552332.9974339290.23311607
20.1502408665.4960246712.5765969862.3619437888.1631403451.0618699730.4134493877.7549319
17.8083273766.4742233815.5038348657.9698911286.440686451.0684169936.1671931978.24971077
16.8903158462.9429283217.131040870.1193972286.5344784552.3496876838.1901723290.50265427
11.3937498260.6041940621.2016324262.6555843685.0549853153.4466536834.043841690.04010775
16.2470868255.0559040415.2299304759.0485795589.8080603257.1396634836.9488187580.13325307
19.1794141366.8956343417.4935266561.0431572786.3876890750.3169204634.8456242279.79364463
12.7780170761.9966168818.173399364.2224936688.2922700455.4303570137.9572020876.23819363
6.39813246564.0707346518.2312696153.0547032890.5099153351.4791521736.9929226175.44658405
18.0217824562.7580357514.9089081749.9698702988.642485850.6925140935.2172835186.38112866
21.3043214962.8129462315.0021787576.4689788688.8009927451.9384906142.9355235375.35240195
17.3461045264.3412467320.9543981747.6880002487.9808320854.6910141734.7070505380.00333341
17.8344561957.6260024122.6147803671.9138806390.4850908551.7709940642.3285455378.55916511
20.3911180457.0813184517.268128957.5569964790.0078798653.4527985734.2407549884.25899918
11.0428747659.8030376921.7836681850.1822943183.6106465254.6616440537.3365331985.87530043
23.5067312364.6530367623.7692389962.5256561389.9785269250.465052534.5506444878.38201735
7.47580400464.7205162918.5563058382.229520588.0746531454.1379408938.3703223473.39357617
14.0838994764.833807815.1320810578.9977721789.9359905356.1496839537.1741523390.73698866
17.5636986267.0067705924.8421503169.1691167887.912090551.7121691434.7665385877.36716189
8.95234007662.2054088821.9781373569.3977750688.2710958552.8297604935.270784387.06720477
11.1169141358.8476130316.0885713358.9914323993.4016548151.5498672132.2057064978.57393027
14.8182375264.2853289118.7426552684.468992988.1885925451.824059637.5527507588.30547703
14.8858890858.9623929915.2258824873.0227079190.7933007552.4508445135.890739192.12289609
12.8820239964.0993622118.8904389868.0304827388.380700552.592941630.3396318277.98129216
13.4659593459.1692049223.5801916261.5310442788.2243570453.1074822139.6729637585.06043604
17.8629508662.1787221715.5714973964.9962688586.4955096953.1571087834.6368954482.8368656
11.6673256666.3243620715.1994306453.4999788584.8375418555.287040336.7368868691.14721317
15.9990642961.3204639423.1101083958.2375802386.3824162353.7032627636.7122066372.04802185
20.9809769265.5819468122.3139726561.6658672187.9428016152.9574483939.1740729876.9698209
24.5442352261.8790213921.9982308553.7660581988.3230336150.59829436.521948674.4574515
14.7674546363.120061669.83769889862.0067705388.3759338454.0759508244.2544373193.40052757
10.5093937466.022978029.42167193982.2109960989.2180359251.8452085543.0437371292.1621962
18.5419767565.3143027117.6671510476.2277822387.0954774851.7623961140.6622976378.7801211
16.466132463.1300072222.0210623280.7627063989.9948703152.2148222542.6866444893.63636883
8.31058622764.4940984517.6651280159.0020989287.5129827754.7424613537.5773707381.38670792
19.0049898666.7226235211.3147695157.0140568886.4632573452.4264805837.1796384387.63403927
11.8655892566.2451964511.8081960761.8329394283.4142202850.7511747941.3366027679.17524144
10.4091417561.4324583414.4506954669.9590183487.9572084556.369411236.8908673987.35423259
13.6310262865.5051194621.1529981951.6432644687.8908724450.9662452737.9635447177.25134302
16.4309895358.791598822.4869002152.3964306385.9857822150.9603099235.9785178974.75708473
7.60923083863.471986219.5716164448.8343472987.6016801254.0552144134.9614382478.15897464
17.2823692661.9053068525.1017625846.6343994787.8957377655.8209156232.0533730179.55422296
20.293571762.563338413.1990437853.7428966388.203624750.9561374439.4793145377.75607812
16.7863892262.4228302819.472948267.0169978190.0647946351.6778591137.1306246984.34133318
11.8205309664.7633365914.0404638663.5325057687.1407803154.4596197927.8850524694.10944373
7.3824261767.8635659921.9446524758.0612470490.2182737353.618205639.770274394.50985383
17.0553611662.3803087516.7984378780.6930788489.5693984550.75965836.0317004788.65928559
15.34484358.7090705922.0763496779.4396834587.5149795449.8786355643.1308987782.07319224
13.9887337561.1514514418.4763416857.6758644787.5183765757.0361689638.7297018195.3559265
21.5391188657.2372070221.1846183149.099190186.5685897453.1832717837.8930404581.3865541
16.1086087261.3434287418.0925159181.6452443687.7308358853.4006681141.8535469280.02239947
17.7627035660.8121457227.292740565.6908555788.1224595549.8008367741.1338072282.2019511
8.24009573164.5823641118.7272390867.4174937690.4572195552.63084633.4093316788.59219069
20.2848657767.2371181724.9044007555.2800172290.0699752252.8696716439.7109738593.04764201
23.2848191860.7877036725.0122937954.4992256885.9786209357.6266262938.0459247374.08560711
13.9705978562.5189314221.9149060578.8208203285.8461419753.5839172434.1321496682.1078255
18.1957686661.4423051121.4038441381.5047359488.6265652355.2539684832.9636064995.12757784
9.9456247565.260068420.7815041177.2183933790.4648426652.0444645644.6246984778.22383421
15.2515168858.8857695115.4141659677.5353168789.0377010850.7143075136.4219829182.00308257
15.8259399962.1021250315.2924203171.6893302585.9826147953.1203266130.495938373.18892837
8.47509168361.4393194712.8841926962.0640180790.1295444654.6061010135.2101636784.07210276
22.0005535363.7875051524.0751489261.6530016885.9624467253.6187134640.9180400795.11822704
18.2553911961.692039721.660062168.4348222688.4974851249.8550687336.2438249771.99830978
24.4914094361.892569413.8982977365.2734310984.8625112952.6965238135.3712547187.80214376
23.1873299363.552990526.4855069454.0382025190.4102648553.5140981338.0914321890.14723496
23.144257465.9218814516.4398620676.0060287187.7091264254.6409944841.6468768173.12659435
20.0999043460.5926761614.612654578.9521719287.6065550954.3262055540.9690138285.22569261
14.3309410764.3169257319.0821353751.9729673385.7490020553.9379323541.4877897496.19006389
18.8030839168.4839809817.5422201366.8531320289.3175173952.3779182936.7447287176.78324933
23.263071966.1911938823.1026000264.4794675385.7729237651.3873095137.0077036173.29789073
9.17651532762.0725749621.0185546565.6831308187.0612913556.3708122843.9484060472.94363288
23.9501104262.8836526920.5529367683.7424093686.8197315452.7225311931.7619666185.44470973
11.4991681463.094725721.832584263.8430206588.2925979553.2602290440.2721534895.61275893
10.0999755860.55891813.6909753253.1633727586.8598861452.0181219139.0840208477.7245738
10.7783688566.2458783123.4904640871.3529726686.9612270553.3643639137.4153207275.28209492
8.72918234264.1706183322.220420250.3200627189.9588560355.9918090636.4762430294.01597357
22.6562306760.4149386517.0202048176.9894333485.8042633251.6770714837.1630788270.7692346
23.4172315464.4942719119.933796784.6546972489.7549342354.5240409836.9950773591.52124284
11.0056174762.6859549219.478751974.0503806487.4770794755.2242268540.0766834188.3911286
8.18716421765.4938497315.9367897259.9184178987.1200249654.4275493536.1653554671.75693597
12.6033687562.2251054520.5759933747.5925050789.0224891753.6315428337.9408253484.88717489
18.6573120262.5427688320.3654761267.3690385989.4068581252.6422600734.2960977781.6024978
18.6367170861.7771704120.8999680452.0562221886.5515224657.0151687131.5423902671.87307961
23.8425928359.9982388823.1609843845.8859500884.0306143854.6312094440.3039399486.99609323
13.690377162.5806348921.3809253682.0585339190.1035605153.6606602936.6911280790.74434345
20.7566832370.1408629420.5650956964.6924292687.3085237954.3972247838.5225076183.78264525
7.18235638760.3557876419.1674567155.4214498785.8054733151.6334951337.7972826181.71656787
22.3392804662.6751328219.4109363954.3713013486.5611938753.6352659533.5162376396.66605979
8.36066349362.4662660822.8358061560.79971686.0597267552.4059761233.0347415493.45891984
12.0972475660.5793612514.281083682.5938332786.4698114647.8959087433.67467585.55607984
12.5593096660.0900119725.974113783.2804809587.1078830352.7828113228.369829686.92439641
13.5163957362.7939705322.4664982359.0770760589.211539555.5368587439.0612198997.54617928
16.9979683462.4659410614.1478739260.5829653387.2006203454.2884838735.837764395.6333706
17.4997239263.5700515920.9357980453.5328246690.3113805951.2659239931.3633908681.32837613
21.4840737263.4631521525.2795470950.4273825289.7923542754.4161690441.3380733395.16000142
17.4966849961.040581917.5455950250.7070815689.60499252.0380325635.8071460483.18577009
18.4550755867.9267268422.787122348.390850489.0689906748.56746840.1757593578.16693144
22.3979319163.1926187923.453414683.498625185.765415652.6035821341.4819141283.06977997
14.7534010463.8987525420.7884414566.1054670788.4537762250.4228866837.5743274595.7656547
22.2561468466.8101983311.2155238367.8952896887.1934590252.0860596832.2287018680.98025006
6.54552057861.8663339612.3254184264.4697918790.0001143451.2396840445.6973762673.50967567
16.5889779662.4141415419.4862282483.4269393189.2148778253.6933827234.6558203673.60692832
11.2028289656.6176822823.6139080953.3678879887.1593291149.0635050735.329406382.96188864
8.94473323662.7004920220.8552961177.5439348889.0337368351.9596473230.5159319591.73812081
14.6335469561.3275754521.1030389983.6176585787.1011039152.6494463540.9662563492.67657537
20.0679106163.0331716712.7346813464.5764241889.0783117751.3718095441.1125869788.66941273
12.4757648261.8031732513.9427781659.4046968386.3060077349.8452826739.3882506487.72321075
16.068884959.9502556422.727672976.4237843289.1566577853.539454128.7509100994.97828065
22.0067221158.4385381521.8219951960.8581628287.9820402451.6084855832.3572233890.30085536
19.6877998259.8281678123.9698095961.3266494686.8937343554.2373162737.5108461884.84057182
14.8818045360.7715233920.7950015868.4925229588.2837997254.9036042532.8264839186.54787262
22.8622124766.2559048114.404349147.8791135686.2554535951.5314556936.4704604282.31871391
15.754920366.3875624122.3693942165.6739365391.179213751.1325060642.60979549