learning relational probability trees

Learning Relational Probability Trees

Jennifer Neville

David Jensen

Lisa Friedland

Michael Hay

Presented by Andrew Tjang

About the authors

David Jensen – Director of KDL, research asst. prof @ umass

– Research focuses on the statistical aspects and architecture of data mining systems and the evaluation of those systems for government and business applications

Authors, cont’d

Lisa Friedland – graduate student, KDL, interest in DB/data mining

Michael Hay – graduate student, KDL Jennifer Neville –

– masters student @ Umass in Knowledge Discovery Lab (KDL investigates how to find useful patterns in large and complex databases.)

Learning Relational Probability Trees (RPTs)

Classification trees used in data mining: easily interpretable – Conventional: homogeneous data, statistically

independent– Much research gone into classification trees ->

relational data, so variances can be safely “ignored” and relational effects can be on the front burner

Why is relational data so complicated?

Traditional assumption of instance independence is violated.

3 characteristics of relational data– Concentrated linkage– Degree disparity– Relational autocorrelation

All 3 potentially produce overly complex models with excess structure

– This leads to factually incorrect relations and more space/computation requirements.

What did they do?

Conventional techniques/modeling assume independence– RPT and class of algorithms to adjust for relational

characteristics. RPTs the only statistical model to adjust for these relational characteristics

Applied these algorithms to a set of databases (IMDb, Cora, WebKB, human genome)– Then evaluated accuracies of their RPT applications

Examples of relational data

Social networks, genomic data, data on interrelated people, places, things/events from text documents.

First part was to apply RPT to WebKB – (a collection of websites classified into categories –

course, faculty staff, student, research project)– Appx 4000 pages, 8000 hyperlinks– DB contains attributes associated with each

page/hyperlink (path/domain, direction of hyperlink)

Probability (sub) tree of WebKB

RBTs

Estimate probability distros over att values These algs differ from regular – they consider

the effect of related object– Movie example

Use Aggregation used to “proportionalize” relational data.– E.g. some movies have 10 actors, some have 1000

Aggregate as avg age. (either preprocess and or dynamically during learning process

The Algorithm

RPT algorithm takes collection of subgraphs as input (single target object ot be classified + other objects/links in relat’l neighborhood

Construct a probability estimation tree (like previous fig) to predict target class label given:– The atts of target object– The atts of other objects in neighborhood– The degree atts counting object and neighborhood

objects

Algorithm (cont’d)

Searches over space of binary relat’l features to “split data”– MODE(actor.gender)=female

Constructs features based on atts of different objects in subgraphs and aggregation of values.

Feature scores calculated from class after split using chi squared analysis

Algorithm Part trois

The p-value from the chi squared will reveal which features are significantly correlated. The max of these is included in tree

Does this alg recursively greedily until class distros don’t change significantly– “Bonferroni adjustment” + Laplace correction to

improve probability estimates

Relational features

Traditional features– Genre=comedy

ID att, operator, value

Relational features– Movie(x),Y = {y|ActedIn(x,y)}:Max(Age(Y))>65

ID relation that links an object x to a set of other object Y.

Feature Selection Biases

Concentrated linkage and autocorrelation combine to reduce effective sample size and increase variance.

Degree disparity can cause sprious elements and miss useful elements.

Randomization Test

Used on relational data sets where hypothesis testing not as useful.

– Used to account for biases

Computationally intensive– Pseudo samples generated by permuting variables– In this case permute att values before aggregation

This new permuted vector preserves intrinsic links/relations but removes correlations btw atts and class label

Tests

Use IMDb to predict $2million box office receipts on opening weekend

1) IMDb with class labels only correlated with degree features.

2)Use structure of IMDb as well as attributes 3)From Cora : if paper is of topic “neural networks” 4) Genome project: if gene is in nucleus 5) WebKB: is student web page

Tests (cont’d)

Each of these tasks a set of classification techniques were applied:– RPT w/ Chi squared (CT)– RPT with randomized testing (RT)– RPT (pseudo) w/o randomized testing (C4.5)– Baseline Relat’l Bayesian Classifier (RBC)

* entries statistically significant differences

Results

Results explained

None except for random showed that their algorithm performed any better than other algorithms

For random they show:– RPTs can adjust for linkage and autocorrelation.– RTs perfrom better than other 3 models– Biased models select random attributes that hinder

performance.

Conclusions (their’s and mine)

Possible to extend conventional methods for estimation tree algorithms

RPT w/randomization and RPT w/o randomization perform about the same (smaller trees in the former)

Randomization adjusts for biases (but is this so important if they perform the same?)

Non selective don’t have biases, but lose interpretablilty.

Future work enrichments to RPT– Better ways of calc’ing ranking scores

learning relational probability trees

Documents