1 solving ilp problems in the eela infrastructure inês dutra departamento de ciência de...

26
1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

Upload: lucas-lewis

Post on 08-Jan-2018

221 views

Category:

Documents


0 download

DESCRIPTION

3 Introduction EELA selected application Task 3.3: additional applications

TRANSCRIPT

Page 1: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

1

Solving ILP Problems in the EELA infrastructure

Inês DutraDepartamento de Ciência de

ComputadoresUniversidade do Porto, Portugal

Page 2: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

2

Outline

• Introduction– ILP– Examples– Motivation

• Experiments• Conclusions• Future Work

Page 3: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

3

Introduction

• EELA selected application• Task 3.3: additional applications

Page 4: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

4

Introduction

• What is ILP?– It is NOT Instruction Level Parallelism– It is NOT Integer Linear Programming

• So, what is it????• .......

Page 5: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

5

Introduction• It is Inductive Logic Programming

– data mining – machine learning– Knowledge/information extraction

• Where:– Given:

• Set of observations (positive and negative)• Background knowledge (descriptions)• Language bias

– Find:• A hypothesis (in first order language) that best explains all

positive observations and none of the negatives.

Page 6: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

6

Introduction

• Advantages:– Use of an understandable description

language– Relational knowledge

Page 7: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

7

Introduction: example

TRAINS GOING EAST TRAINS GOING WEST

Page 8: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

8

Introduction: example

short(car_12).closed(car_12).long(car_11).long(car_13).short(car_14).open_car(car_11).open_car(car_13).open_car(car_14).shape(car_11,rectangle). shape(car_12,rectangle).shape(car_13,rectangle).shape(car_14,rectangle).

load(car_11,rectangle,3). load(car_12,triangle,1).load(car_13,hexagon,1).load(car_14,circle,1).wheels(car_11,2).wheels(car_12,2).wheels(car_13,3).wheels(car_14,2).has_car(east1,car_11).has_car(east1,car_12).has_car(east1,car_13).has_car(east1,car_14).

Page 9: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

9

Introduction: example

TRAINS GOING EAST TRAINS GOING WEST

Page 10: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

10

Introduction: example

eastbound(T) IF has_car(T,C) AND short(C) AND closed(C)

TRAINS GOING EAST TRAINS GOING WEST

Page 11: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

11

Another less “toyish” example: extracting knowledge from mammograms

is_malignant(A) if 'BIRADS_category'(A,b5), 'MassPAO'(A,present), 'Age'(A,age6570), previous_finding(A,B,C), 'MassesShape'(B,none), 'Calc_Punctate'(B,notPresent), previous_finding(A,C), 'BIRADS_category'(C,b3).

This rule states that finding (A) IS malignant IF it is:

classified as BI-RADS 5 ANDhad a mass presentin a patient who: was between the ages of 65 and 70 had two prior mammograms (B, C)and prior mammogram (B): had no mass shape described had no punctate calcificationsand prior mammogram (C) was classified as BI-RADS 3

Page 12: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

12

Introduction: Motivation

• Applications:– Link discovery– Social Network Analysis– Equivalent identities– Drug design– Protein unfolding– Protein metabolism– Why not? Classifying grid failures ()– And...many others!

Page 13: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

13

Introduction: Motivation

• Why does ILP need a grid?– Search space can become large very

quickly– Need many experiments to have statistical

significant results• Cross-validation• Training, tuning, testing

– Can combine classifiers: ensembles

Page 14: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

14

Introduction: Motivation

• Assume we want to run a task for one domain: find a “good” hypothesis that describes pos examples

• Assume we run 5x4-fold cross-validation• Assume we have 100 classifiers per fold• # of experiments: 2,000

Page 15: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

15

Introduction: Motivation

• Now assume each experiment takes 1 hour to run

• How long would it take to generate the 2,000 classifiers to be combined?

~ 83 days!!!• If we consider varying learning parameters

and learning algorithms, this number can be really big!!

Page 16: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

16

Experiment

• Predict carcinogenecity in rodents– Difficult task– large search space!– Important problem

• Phase 1:– Tuning using 5x4-fold cross-validaton– Generating ensembles up to 100

• Aleph: well-known ILP system• Yap: Yet another prolog

Page 17: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

17

Experiment: one of the classifiers

active(A) if atom(A,_,n,32,B), B ≤ -0.401, has_property(A,cytogen_sce,n), methyl(A,_).

Sister Chromatid Exchange (SCE)SCE is used for the determination of mutagenity

Page 18: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

18

Experiment

• 2 submissions:– From LA– From EU

Page 19: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

19

Submitting jobs from LA....

Page 20: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

20

Experiment

EELA resources

utilised

Resource # of jobsCERN 1,160

CIEMAT 279CETA-CIEMAT 173

UniCan 98LIP 10

INFN 38UNAM 16

BIOF.UFRJ 159IF.UFRJ 8UFCG 28Total 1,969

~ 300 resources in LA

211 jobs in LA

Page 21: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

21

Experiments

• Why 1,969 out of 2,000???• 2 reasons:

– Proxy expiration:• On submission (takes loooooong!!!)• On execution

– Use of dynamic libraries

Page 22: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

22

• Submitting jobs from EU...• from a non-EELA site, BUT• Using the EELA VO:

– Jobs run only on EU resources...• Reasons:

– Misconfiguration?– Closer brokers with more machines?

Page 23: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

23

Conclusions• Happiness: EELA is working!!!• We can run thousands of experiments!• Frida is happy!!! (see Condor introductory

tutorials, if you feel curious about Frida )• Experiment showed good utilization of EELA

resources in LA and EU• Low failure rate (1%)• Failures motivated by:

– Dynamic libs not available in the remote machine– Proxy expiration

Page 24: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

24

Future work

• More detailed analysis of jobs and logs• Full ILP experiment• More domains• Other kinds of experiments based on

Statistical Relational Learning

• And, do not forget: ILP can help to model and diagnose errors in the grid environment!

Page 25: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

25

Collaborators

• Fernando Silva (DCC-UPorto)• Vítor Santos Costa (DCC-UPorto)• Rui Camacho (FE-UPorto)• Nuno Fonseca (IBMC/IBMEC, Porto)• Beth Burnside (UW-Madison hospital)• David Page (UW-Madison)• Jesse Davis (UWashington)

Page 26: 1 Solving ILP Problems in the EELA infrastructure Inês Dutra Departamento de Ciência de Computadores Universidade do Porto, Portugal

26

Thanks!!!

Questions??