1/23 learning from positive examples main ideas and the particular case of cprogol4.2 daniel...

23
1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

Upload: ella-greene

Post on 28-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

1/23

Learning from positive examples

Main ideas and the particular case of CProgol4.2

Daniel Fredouille, CIG talk,11/2005

Page 2: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

2/23

What is it all about?

• Symbolic machine learning.• Learning from positive examples instead

of positive and negative examples.• The talk contains two parts:

1. General ideas and tactics to learn from positives.

2. How the particular ILP system CProgol 4.4 of S. Muggleton (1997) deals with positive only learning

Page 3: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

3/23

Disclaimer

• This talk has not been extracted from a survey or any article in particular: this is more like a patchwork of my experiences in the domain and how I interpret them.

• Feel free to criticize: I would like feedback on these ideas since I never shared them before.

• I would really appreciate comments on the

slides with the ? sign.

Page 4: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

4/23

Definitions

Concept space Instances space

orde

ring

Inferred concept C’

Positive/Negative example of CTarget concept C

• Is more general / less specific than• The concept space is usually partially ordered with this relation

Page 5: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

5/23

Positive and Negative Learning

Possibility 1: Discrimination of classes• Characterise the difference in the pos/neg examples• No model of the positive concept !

?

Page 6: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

6/23

Positive and Negative Learning

Possibility 2: Characterisation of a class• Use neg. examples to prevent over-generalisation• Needs neg. examples “close” to the concept border

?

Page 7: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

7/23

Positive Only Learning

Aim: Characterisation of a class

Choice ?

Page 8: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

8/23

Positive Only Learning

• Two strategies:1. Bias in the search space: choosing a space

with a (very) strong structure.

2. Bias in the evaluation function: choose a concept with a compromise between:– Generality/specificity of the concept– Coverage of the positives by the concept– Complexity of the hypothesis representing the

concept

?

Page 9: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

9/23

Search space bias approach

• Main idea: consider strongly organised concept spaces

• Possible inference algorithm:– Select the concept the least general covering all

examples.– The constraints on the search space ensures there is

only one such concept.

Trivial example (generally not useful), “tree organisation”:

Page 10: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

10/23

Search space bias approach

• Advantages: – Strong theoretical convergence results possible.– Can lead to (very) fast inference algorithms.

• Drawback:– Not available for all concepts spaces!– Theorem: super-finite classes of concepts are not

inferable in the limit this way (Gold 69).Super-finite = contains all concepts covering a finite number of examples and at least one concept covering an infinity.

Page 11: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

11/23

Heuristic Approach

• Scoring making a compromise between:1. Specificity of the concept

2. Coverage of the positives by the concept

3. Complexity of the concept

• Implementations: – Ad-hoc measure of points 1, 2, 3 and combination in

a formulae, e.g.: Score = Coverage + Specificity – Complexity

– Minimum Message Length ideas (~MDL)

?

Page 12: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

12/23

Heuristic Approach: Ad-hoc implementation

• Elements of the score– Coverage: counting covered instances– Specificity: measure of the “proportion” of instances of

the space covered– Complexity: the size of the concept representation

(e.g., number of rules)• Advantages:

– Usually easy to implement– Usually provides parameters to tune the compromise

• Disadvantage: – No theory– Bias not always clear– How to combine coverage/specificity/complexity?

?

Page 13: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

13/23

Heuristic Approach: MML implementation

Canal

Examples

Hyp. Examples classes ¦ Hyp classes

0100101001011010101110101

Canal Examples and classesHyp.

00101101010111011101101

Examplesand classes ¦ Hyp

MML for discrimination

MML for characterisation

Gain = number of bits needed to send the message without compression – number of bits needed to send the message with compression.

?

Page 14: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

14/23

Heuristic Approach: MML implementation

• Advantages:– Some theoretical justifications in Kolmogorov/

Solomonov/ Ockam/ Bayes/ Chaitin works.– Absolute and meaningful score.

• Disadvantage:– Limit of the theory: the optimal code can NOT

be computed !– Difficult implementation:

the choices of the encoding creates the inference biases, this is not very intuitive.

Page 15: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

15/23

Positive only learning in ILP with CProgol4.2

Page 16: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

16/23

Positive only learning in ILP• The following is not a survey! This is from what I

already encountered but I have not looked for further references.

• MML implementations– Muggleton [88]– Srinivasan, Muggleton, Bain [93]– Stahl [96]

• Other implementations:– Muggleton CProgol4.2 [97]– Heuristic had-hoc method– Somehow based on MML, but the implementation

details makes it quite different.

Page 17: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

17/23

CProgol4.2 uses Bayes

DH DI DI ¦h

h H i I

Score: P(h ¦ E) = P(h) * P(E ¦ h) / P(E) • Fixing distributions and computing P(h), P(E ¦ h), P(E)

h

IH

E

Page 18: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

18/23

Assumptions for the distributions

• P(h) = e- size(h)

– Large theories are less probable than small ones

– size(h) = sum over the rules ci of h of the number of literals in the body of ci

• P(E ¦ h) = ΠeE DI¦h(e) = ΠeE DI (e) / DI (h)

– Assumption that DI and DH gives DI¦h

– Independence assumption between examples

Page 19: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

19/23

Replacing in Bayes

• P(h ¦ E) =

e- size(h) * [ ΠeE DI (e) / DI (h) ] / P(E)

• As we want to compare hypotheses:= [e- size(h) / DI (h)|E|] * Cste1

• Take the log: ln(P(h ¦ E)) = -size(h) + |E| * ln(1/DI (h)) + Cste2

• We still have to compute DI (h) ...

Page 20: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

20/23

DI (h): weight of h in the instance set

• Computing DI:

– Using a stochastic logic program S trained with the BK to model DI (not included in the talk)

• Computing DI(h):

– Generate R instances from DI

– h covers r of them

– DI (h) = (r+1) / (R+2)H

Page 21: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

21/23

Formulae for a whole theory covering E

• ln(P(h ¦ E)) = -size(h) - |E| * ln((r+1)/(R+2)) + C2

Complexity SpecificityCoverage

Estimation of final theory score from a partially inferred theory:• ln(P(h’ ¦ E)) =

|E|/p * size(h’) - |E| * ln( |E|/p * (r’+1)/(R+2)) + C3

Page 22: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

22/23

Final evaluation

• Suppression of |E| and C2:– f(h’) = size(h’) /p + ln(p) - ln(|E| * (r’+1)/(R+2))

• Possible boost of positives with k:– size(h’)/(k*p) + ln(k*p) - ln( |E|*(r’+1)/(R+2) )

• The formulae is not written anywhere (the above one is my best guess !).

• The papers are hard to understand• But it seems to work ...

Complexity SpecificityCoverage

Page 23: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005

23/23

Conclusion

• Learning from positives only is a real challenge and methods from positive and negatives can hardly be adapted.

• Some nice theoretical frameworks exist. • When it gets to implementing heuristic

frameworks:– The theory is often lost in approximations and choices

of implementation.– Useful systems can be created but tuning and

understanding the biases have to be considered as very important stages of inference.