transfer of predictive models for classification of

51
Transfer of Predictive Models for Classification of Statutory Texts in Multi-jurisdictional Settings Jaromir Savelka Kevin D. Ashley Intelligent Systems Program University of Pittsburgh [email protected] ISP AI Forum January 23, 2015

Upload: others

Post on 15-Apr-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Transfer of Predictive Models for Classification of

Transfer of Predictive Models for Classification ofStatutory Texts in Multi-jurisdictional Settings

Jaromir SavelkaKevin D. Ashley

Intelligent Systems ProgramUniversity of Pittsburgh

[email protected]

ISP AI ForumJanuary 23, 2015

Page 2: Transfer of Predictive Models for Classification of

Presentation Overview

Motivation

Task Description

Related and Prior Work

Data from Multiple Jurisdictions

Data Processing

Framework

Experimental Setup

Evaluation and Results

Future Work

Conclusions

2

Page 3: Transfer of Predictive Models for Classification of

Presentation Overview

Motivation

Task Description

Related and Prior Work

Data from Multiple Jurisdictions

Data Processing

Framework

Experimental Setup

Evaluation and Results

Future Work

Conclusions

3

Page 4: Transfer of Predictive Models for Classification of

Ebola Patient in Texas Presbytarian Hospital

4

Page 5: Transfer of Predictive Models for Classification of

Example Network

5

Page 6: Transfer of Predictive Models for Classification of

Presentation Overview

Motivation

Task Description

Related and Prior Work

Data from Multiple Jurisdictions

Data Processing

Framework

Experimental Setup

Evaluation and Results

Future Work

Conclusions

6

Page 7: Transfer of Predictive Models for Classification of

Task Description (Manual)

1. Set of candidate statutory texts is retrieved on basis ofpredefined set of search queries from legal IR system.

2. Expert human annotators go through texts and identifyrelevant spans, i.e. parts containing relevant legal norms.

3. Each relevant span is represented as numeric code followingguidelines provided in codebook (citation and 9 descriptors).[28]

NOTE: 95% confidence interval for average inter-annotator agreement for all tasks

was reported as (63.1%, 74.9%).

7[28] PHASYS Codebook [online]

Page 8: Transfer of Predictive Models for Classification of

Example Code Assignment

Example statutory provision

The number of patients admitted to any area of the hospital shallnot exceed the number for which the area is designed, equipped,and staffed except in cases of emergency, and then only inaccordance with the emergency or disaster plan of the hospital.(28 Pa. Code para 101.172)

Corresponding code

28 Pa. Code § 101.172; Hospital (14); Must Do (2); Suspend (29);Rule/Regulations/Restrictions (4); For Emergency Response (2);Non-specified Disaster/Emergency (5); Public/Individuals (27);Silent (0); Silent (0)

8

Page 9: Transfer of Predictive Models for Classification of

Coding Scheme Elements

I Citation

I Relevance

I Acting PHS agent (Who is acting?)

I Prescription

I Action (Which action is being taken?)

I Goal

I Purpose (For what purpose is action being taken?)

I Type of Emergency Disaster

I Receiving PHS agent

I Timeframe (In what timeframe can/must action be taken?)

I Condition

9[15] Grabmair et al. 2011, [22] Sweeney et al. 2014

Page 10: Transfer of Predictive Models for Classification of

Problem

10

Page 11: Transfer of Predictive Models for Classification of

Problem

10

Page 12: Transfer of Predictive Models for Classification of

Task Description (Automated)

I In our work we perform described tasks automatically, i.e.:

1. We transform textual data into feature vectors.2. We classify vectors in terms of relevance for PHS analysis.3. We classify vectors in terms of each of nine code categories.4. We evaluate performance of our system with respect to labels

created by expert annotators (treated as gold standard).

I In prior work data sparsity was recognized as key elementlimiting performance.

I We decided to focus on use of data from other jurisdictions asone possible way to mitigate problem of data sparsity.

I Currently, we have developed a framewrok for transfer of textclassification models among different jurisdictions.

11

Page 13: Transfer of Predictive Models for Classification of

Presentation Overview

Motivation

Task Description

Related and Prior Work

Data from Multiple Jurisdictions

Data Processing

Framework

Experimental Setup

Evaluation and Results

Future Work

Conclusions

12

Page 14: Transfer of Predictive Models for Classification of

Related Work (AI & Law)

1. Classification of legal norms in terms of type.[3], [8], [10], [11], [13]

We classify texts as containing, e.g., obligation (‘must’), permission

(‘may’) or prohibition (‘must not’).

2. Classification of legal literature and legislative texts withhierarchically organized topics.[12], [18]

Closely related to classification of the texts in terms of relevance.

3. Rule-based techniques for extraction of specificelements.[3], [10], [11], [13], [24], [25]

We mine texts for presence of similar elements.

4. Classification of EU documents with terms fromEuroVoc.[4], [7], [20], [21]

Close to mining texts for specific topical and functional information.

13

[3] Biagioli et al. 2005, [4] Boella 2012, [7] Daudaravicius 2012, [8] de Maat & Winkels 2007,[10] Francesconi et al. 2010, [11] Francesconi 2009, [12] Francesconi & Peruginelli 2008,

[13] Francesconi & Passerini 2007, [18] Opsomer et al. 2009, [20] Pouliquen 2003,

[21] Steinberger 2012, [24] Winkels & Hoekstra 2012, [25] Wyner & Peters 2011

Page 15: Transfer of Predictive Models for Classification of

Related Work (Transfer Learning)

I Transfer learning, in contrast to traditional ML framework,allows the domains, tasks, and distributions used in trainingand testing to be different.

I Transfer learning aims to extract the knowledge from one ormore source tasks and applies the knowledge to a target task.

14[19] Pan & Yang 2010; [9] Evgeniou & Pontil 2004

Page 16: Transfer of Predictive Models for Classification of

Prior Work (Results)

15

PA→PA FL→PA FL+PA→PA FL→FL PA→FL FL+PA→FL

Relevance F: 0.72 F: 0.54 F: 0.73 F: 0.52 F: 0.35 F: 0.54

P: 0.75 P: 0.62 P: 0.77 P: 0.62 P: 0.27 P: 0.55

R: 0.70 R: 0.47 R: 0.69 R: 0.45 R: 0.50 R: 0.52

Act. agent A: 0.49 A: 0.30 A: 0.52 A: 0.36 A: 0.25 A: 0.44

Prescription A: 0.76 A: 0.72 A: 0.77 A: 0.77 A: 0.75 A: 0.75

Action A: 0.29 A: 0.23 A: 0.30 A: 0.23 A: 0.18 A: 0.24

Goal A: 0.32 A: 0.17 A: 0.32 A: 0.20 A: 0.16 A: 0.25

Purpose A: 0.59 A: 0.53 A: 0.61 A: 0.58 A: 0.61 A: 0.62

Emg. Type A: 0.78 A: 0.69 A: 0.80 A: 0.76 A: 0.72 A: 0.77

Rec. agent A: 0.36 A: 0.25 A: 0.35 A: 0.25 A: 0.25 A: 0.28

Time frame A: 0.84 A: 0.81 A: 0.85 A: 0.80 A: 0.78 A: 0.80

Condition A: 0.77 A: 0.68 A: 0.75 A: 0.65 A: 0.65 A: 0.67

Relevance Acting agent Prescription Action Goal Purpose Emergency type Receiving agent Time frame Condition

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Relevance Acting agent Prescription Action Goal Purpose Emergency type Receiving agent Time frame Condition

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

[15] Grabmair et al. 2011 & [23] Savelka et al. 2014

Page 17: Transfer of Predictive Models for Classification of

Prior Work (Similar Traits in Both Jurisdictions)

Intra-jurisdictional classifiers trained for Florida (yellow) andPennsylvania (blue) show that they both share similar traits.

16

Relevance Acting agent Prescription Action Goal Purpose Emergency type Receiving agent Time frame Condition0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 18: Transfer of Predictive Models for Classification of

Presentation Overview

Motivation

Task Description

Related and Prior Work

Data from Multiple Jurisdictions

Data Processing

Framework

Experimental Setup

Evaluation and Results

Future Work

Conclusions

17

Page 19: Transfer of Predictive Models for Classification of

Comparison of Similar PA and FL Provisions

18

COMAR 01.01.2003.18(D)(2) Fla. Stat. § 943.0312(3)

CODE OF MARYLAND REGULATIONS Florida Annotated Statutes

TITLE 01. EXECUTIVE DEPARTMENT TITLE 47. CRIMINAL PROCEDURE AND CORRECTIONS

SUBTITLE 01. EXECUTIVE ORDERS CHAPTER 943. DEPARTMENT OF LAW ENFORCEMENT

Establishment of the Governor’s Office Of Homeland Security Regional domestic security task forces

The Director shall be responsible for the following activities:Advise the Governor on policies, strategies, and measures toenhance and improve the ability to detect, prevent, preparefor, protect against, respond to, and recover from, man-madeemergencies or disasters, including terrorist attacks;

The Chief of Domestic Security, in conjunction with the Divi-sion of Emergency Management, the regional domestic secu-rity task forces, and the various state entities responsible forestablishing training standards applicable to state law enforce-ment officers and fire, emergency, and first-responder person-nel shall identify appropriate equipment and training needs,curricula, and materials related to the effective response tosuspected or actual acts of terrorism or incidents involvingreal or hoax weapons of mass destruction [...]

Administrative agency [Active agent: 26] of the State [Activeagent subset: 2] (homeland security) [Active agent footnote:502] must [Prescription: 2] advise [Action: 21] the elected of-ficials [Receiving agent: 20] on a plan [Goal: 1] for emergencypreparedness, response, and recovery [Purpose: 1, 2 and 4]for an event of terrorist/bioterrorist/biohazardous emergency[Emergency type: 5, 19].

Law enforcement agency [Active agent: 16] of the State [Ac-tive agent subset: 2] must [Prescription: 2] advise [Action: 21]the elected officials [Receiving agent: 20] on a training pro-gram, equipment and personnel [Goal: 5, 7, 16] for emergencypreparedness, response, and recovery [Purpose: 1, 2 and 4]for an event of terrorist/bioterrorist/biohazardous emergency[Emergency type: 5, 19].

Page 20: Transfer of Predictive Models for Classification of

Presentation Overview

Motivation

Task Description

Related and Prior Work

Data from Multiple Jurisdictions

Data Processing

Framework

Experimental Setup

Evaluation and Results

Future Work

Conclusions

19

Page 21: Transfer of Predictive Models for Classification of

Source Data

20

Page 22: Transfer of Predictive Models for Classification of

Partitioning into Subtrees

I Statutory documents are (in comparison to other types ofdocuments) well structured.

I Document can be viewed as a tree graph with given spans oftext as nodes and sub-part relations as edges.

I We need to divide each statutory text into smaller parts thatcould be referred via citations.

21

Page 23: Transfer of Predictive Models for Classification of

Partitioning into Subtrees

Fla. Stat. §101.62Florida Annotated StatutesTITLE 9. ELECTORS AND ELECTIONS (Chs. 97-107)CHAPTER 101. VOTING METHODS ANDPROCEDUREFla. Stat. §101.62 (2010)§101.62. Request for absentee ballots(1) (a) The supervisor shall accept a request for anabsentee ballot from an elector in person or in writing.One request shall be deemed sufficient to receive anabsentee ballot for all elections through the next regularlyscheduled general election, unless the elector or theelector’s designee indicates at the time the request ismade the elections for which the elector desires to receivean absentee ballot. Such request may be consideredcanceled when any first-class mail sent by the supervisorto the elector is returned as undeliverable.(b) The supervisor may accept a written or telephonicrequest for an absentee ballot from the elector, or, ifdirectly instructed by the elector, a member of theelector’s immediate family, or the elector’s legal guardian.For purposes of this section, the term ”immediate family”has the same meaning as specified in paragraph (4)(b).The person making the request must disclose:1. The name of the elector for whom the ballot isrequested.2. The elector’s address.3. The elector’s date of birth.4. The requester’s name.5. The requester’s address.6. The requester’s driver’s license number, if available.7. The requester’s relationship to the elector.

22

Page 24: Transfer of Predictive Models for Classification of

Partitioning into Subtrees

Fla. Stat. §101.62Florida Annotated StatutesTITLE 9. ELECTORS AND ELECTIONS (Chs. 97-107)CHAPTER 101. VOTING METHODS ANDPROCEDUREFla. Stat. §101.62 (2010)§101.62. Request for absentee ballots(1) (a) The supervisor shall accept a request for anabsentee ballot from an elector in person or in writing.One request shall be deemed sufficient to receive anabsentee ballot for all elections through the next regularlyscheduled general election, unless the elector or theelector’s designee indicates at the time the request ismade the elections for which the elector desires to receivean absentee ballot. Such request may be consideredcanceled when any first-class mail sent by the supervisorto the elector is returned as undeliverable.(b) The supervisor may accept a written or telephonicrequest for an absentee ballot from the elector, or, ifdirectly instructed by the elector, a member of theelector’s immediate family, or the elector’s legal guardian.For purposes of this section, the term ”immediate family”has the same meaning as specified in paragraph (4)(b).The person making the request must disclose:1. The name of the elector for whom the ballot isrequested.2. The elector’s address.3. The elector’s date of birth.4. The requester’s name.5. The requester’s address.6. The requester’s driver’s license number, if available.7. The requester’s relationship to the elector.

22

Page 25: Transfer of Predictive Models for Classification of

Partitioning into Subtrees

Fla. Stat. §101.62Florida Annotated StatutesTITLE 9. ELECTORS AND ELECTIONS (Chs. 97-107)CHAPTER 101. VOTING METHODS ANDPROCEDUREFla. Stat. §101.62 (2010)§101.62. Request for absentee ballots(1) (a) The supervisor shall accept a request for anabsentee ballot from an elector in person or in writing.One request shall be deemed sufficient to receive anabsentee ballot for all elections through the next regularlyscheduled general election, unless the elector or theelector’s designee indicates at the time the request ismade the elections for which the elector desires to receivean absentee ballot. Such request may be consideredcanceled when any first-class mail sent by the supervisorto the elector is returned as undeliverable.(b) The supervisor may accept a written or telephonicrequest for an absentee ballot from the elector, or, ifdirectly instructed by the elector, a member of theelector’s immediate family, or the elector’s legal guardian.For purposes of this section, the term ”immediate family”has the same meaning as specified in paragraph (4)(b).The person making the request must disclose:1. The name of the elector for whom the ballot isrequested.2. The elector’s address.3. The elector’s date of birth.4. The requester’s name.5. The requester’s address.6. The requester’s driver’s license number, if available.7. The requester’s relationship to the elector.

22

Page 26: Transfer of Predictive Models for Classification of

Partitioning into Subtrees

Fla. Stat. §101.62Florida Annotated StatutesTITLE 9. ELECTORS AND ELECTIONS (Chs. 97-107)CHAPTER 101. VOTING METHODS ANDPROCEDUREFla. Stat. §101.62 (2010)§101.62. Request for absentee ballots(1) (a) The supervisor shall accept a request for anabsentee ballot from an elector in person or in writing.One request shall be deemed sufficient to receive anabsentee ballot for all elections through the next regularlyscheduled general election, unless the elector or theelector’s designee indicates at the time the request ismade the elections for which the elector desires to receivean absentee ballot. Such request may be consideredcanceled when any first-class mail sent by the supervisorto the elector is returned as undeliverable.(b) The supervisor may accept a written or telephonicrequest for an absentee ballot from the elector, or, ifdirectly instructed by the elector, a member of theelector’s immediate family, or the elector’s legal guardian.For purposes of this section, the term ”immediate family”has the same meaning as specified in paragraph (4)(b).The person making the request must disclose:1. The name of the elector for whom the ballot isrequested.2. The elector’s address.3. The elector’s date of birth.4. The requester’s name.5. The requester’s address.6. The requester’s driver’s license number, if available.7. The requester’s relationship to the elector.

22

Page 27: Transfer of Predictive Models for Classification of

Partitioning into Subtrees

Fla. Stat. §101.62Florida Annotated StatutesTITLE 9. ELECTORS AND ELECTIONS (Chs. 97-107)CHAPTER 101. VOTING METHODS ANDPROCEDUREFla. Stat. §101.62 (2010)§101.62. Request for absentee ballots(1) (a) The supervisor shall accept a request for anabsentee ballot from an elector in person or in writing.One request shall be deemed sufficient to receive anabsentee ballot for all elections through the next regularlyscheduled general election, unless the elector or theelector’s designee indicates at the time the request ismade the elections for which the elector desires to receivean absentee ballot. Such request may be consideredcanceled when any first-class mail sent by the supervisorto the elector is returned as undeliverable.(b) The supervisor may accept a written or telephonicrequest for an absentee ballot from the elector, or, ifdirectly instructed by the elector, a member of theelector’s immediate family, or the elector’s legal guardian.For purposes of this section, the term ”immediate family”has the same meaning as specified in paragraph (4)(b).The person making the request must disclose:1. The name of the elector for whom the ballot isrequested.2. The elector’s address.3. The elector’s date of birth.4. The requester’s name.5. The requester’s address.6. The requester’s driver’s license number, if available.7. The requester’s relationship to the elector.

22

Page 28: Transfer of Predictive Models for Classification of

Selected Properties of Data Sets

state # statutes # text units # relevant # codes

AK 135 1965 331 386

CA 1174 19857 2296 2712

FL 464 16618 1033 1476

KS 304 5003 713 1190

MD 248 7593 687 760

ND 208 3114 458 656

PA 808 10882 1665 1873

TX 811 30474 1462 1712

I The individual text units are stored in XML files (one for eachstate).

I These files are the starting point for all of our experiments.

I There are 18,998 unique terms/lemmas (i.e., features) afterstop-words removal.

23

Page 29: Transfer of Predictive Models for Classification of

Labels

0 5 10 15 20 25 300

100

200

300

400

500

600

700

5 10 15 20 25 300

200

400

600

800

1000

1200

0 20 40 60 80 100 120 140 160 1800

20

40

60

80

100

120

140

160

0 1 2 3 4 5 6 7 80

200

400

600

800

1000

1200

1400

1600

1800

2000

0 1 2 3 4 5 6 7 8 9 100

200

400

600

800

1000

1200

1400

1600

1800

0 1 2 3 4 5 6 7 80

500

1000

1500

2000

2500

Acting agent Emergency type Prescription24

Page 30: Transfer of Predictive Models for Classification of

Presentation Overview

Motivation

Task Description

Related and Prior Work

Data from Multiple Jurisdictions

Data Processing

Framework

Experimental Setup

Evaluation and Results

Future Work

Conclusions

25

Page 31: Transfer of Predictive Models for Classification of

Framework: Data Sets

At minimum, framework assumes existence of labeled datasetDtrain = 〈Xtrain,Ytrain〉 ∈ Dtarget

In addition, there may be an arbitrary number of labeled datasetsDaux = 〈Xaux ,Yaux〉 ∈ Daux ∼ Dtarget

Goal is to train f (·) which performs well on unseen x (i)test ∈ Dtarget .

Framework uses Daux to train f (·) which performs better thanpredictive function trained on Dtrain only.

Underlying idea is to train a number of different fi (·) on differentDi and decide about their usefulness in particular contexts.

26

Page 32: Transfer of Predictive Models for Classification of

Framework: Predictive Models

Framework does not rely on a specific model of f (·).

For different datasets different models or combination of modelsmay be used.

Instead of actual prediction for x (i)test probability distribution over

label space is used.

Therefore, f (·) should be capable of providing probabilitydistribution (or at least some score for each possible yj).

f (x (i)test)→ 〈p(y1), p(y2), . . . , p(ym)〉

27

Page 33: Transfer of Predictive Models for Classification of

Framework: Training

We train a predictive function ftrain(·) on Dtrain.

In addition, we train f(i)aux(·) for each available D(i)

aux .

Next we generate accuracy matrix:

A =

a1,1 a1,2 · · · a1,n

a2,1 ai ,j · · · a2,n...

.... . .

...am,1 am,2 · · · am,n

where

ai ,j =1

n

n∑k=1

[f (i)(x (k)) = j

]

28

Page 34: Transfer of Predictive Models for Classification of

Framework: Prediction

First, we generate a prediction matrix:

P(x (k)) =

p1,1 p1,2 · · · p1,n

p2,1 pi ,j · · · p2,n...

.... . .

...pm,1 pm,2 · · · pm,n

We can perform element-wise multiplication of A and P(x (k)) toobtain confidence matrix for x (k):

C (x (k)) = A� P(x (k)) =

a1,1 × p1,1 · · · a1,n × p1,n...

. . ....

am,1 × pm,1 · · · am,n × pm,n

Each ai ,j × pi ,j can be understood as our confidence that x (k)

should be labeled with class j emulated by fi (·).

29

Page 35: Transfer of Predictive Models for Classification of

Presentation Overview

Motivation

Task Description

Related and Prior Work

Data from Multiple Jurisdictions

Data Processing

Framework

Experimental Setup

Evaluation and Results

Future Work

Conclusions

30

Page 36: Transfer of Predictive Models for Classification of

Experiments

We generate following data sets:

D(i)train = 〈X (i)

train,Y(i)train〉 (100 times)

D(i)test = 〈X (i)

test ,Y(i)test〉 (100 times)

D(i)aux = 〈X (i)

aux ,Y(i)aux〉 (# of auxiliary states)

For each task we conduct 8 related experiments:(AK, MD, TX, KS, CA, ND, PA)(KS, PA, AK, ND, CA, TX, MD)(PA, CA, ND, MD, AK, TX, KS)

In related experiments there are 100 runs for first and eighthexperiments and 300 runs for other experiments.

Experiments show how performance changes as we use more D(i)aux .

31

Page 37: Transfer of Predictive Models for Classification of

Training and Test Set Vectorization

We create vectorized data sets X n×m with rows as documents andcolumns as terms by setting each entry of matrix to:

weight(t, d ,D) = tf (t, d) ∗ log(idf (t,D))

t: termd : documentD: document collectiontf (t, d): number of occurrences of t in didf (t,D): number of d ∈ D over number of d ∈ D containing t

Each x (i) ∈ X n×m is vector with m dimensions, where m is numberof unique terms that occur in document collection.

Each x (i) ∈ X n×m is referrenced with unique citation connectingvector to text unit from which it originates.

32

Page 38: Transfer of Predictive Models for Classification of

Presentation Overview

Motivation

Task Description

Related and Prior Work

Data from Multiple Jurisdictions

Data Processing

Framework

Experimental Setup

Evaluation and Results

Future Work

Conclusions

33

Page 39: Transfer of Predictive Models for Classification of

Evaluation Metrics

PrecisionRatio of correctly retrieved instances over all instances that wereretrieved.

RecallRatio of correctly retrieved instances over all instances that shouldhave been retrieved.

F1 MeasureHarmonic mean of precision and recall where both measures aretreated as equally important.

34

P(f (·),D) =

n∑i=1

∣∣f (x (i)) ∩ y (i)∣∣∣∣f (x (i))

∣∣

R(f (·),D) =

n∑i=1

∣∣f (x (i)) ∩ y (i)∣∣∣∣y (i)

∣∣

F1(P(f (·),D),R(f (·),D)) =2 ∗ P(·) ∗ R(·)P(·) + R(·)

Page 40: Transfer of Predictive Models for Classification of

Results (F1-measure)

35

Florida Maryland

task 0aux 1aux 2aux 3aux 4aux 5aux 6aux 7aux 0aux 1aux 2aux 3aux 4aux 5aux 6aux 7aux

AA .43 .45 .45 .46 .47 .47 .48 .48 .42 .44 .45 .47 .48 .50 .50 .51PR .78 .80 .81 .82 .82 .82 .82 .82 .86 .89 .89 .89 .89 .90 .90 .90AC .21 .22 .23 .24 .24 .25 .26 .26 .24 .25 .26 .26 .27 .27 .28 .28GL .25 .27 .28 .28 .29 .29 .30 .30 .27 .29 .30 .31 .32 .32 .33 .33PP .67 .70 .71 .71 .72 .72 .72 .72 .74 .77 .78 .78 .78 .78 .79 .79ET .78 .79 .79 .79 .80 .80 .79 .80 .73 .76 .76 .77 .77 .77 .78 .78RA .30 .30 .31 .31 .32 .32 .33 .33 .30 .30 .31 .31 .31 .32 .32 .32CN .62 .66 .67 .67 .67 .67 .67 .67 .58 .63 .63 .63 .64 .63 .64 .64TF .80 .81 .82 .83 .83 .83 .83 .83 .81 .83 .84 .84 .84 .84 .85 .85

AA PR AC GL PP ET RA CN TF0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Florida

AA PR AC GL PP ET RA CN TF0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Maryland

Page 41: Transfer of Predictive Models for Classification of

Comparison to Prior Work

36

task +0 +1 +2 +3 +4 +5 +6 +7

AA P .42 .42 .42 .42 .43 .43 .43 .43

R .45 .44 .44 .45 .45 .45 .45 .45

F .43 .43 .44 .44 .44 .44 .44 .44

PP P .66 .66 .66 .66 .66 .66 .66 .66

R .70 .70 .70 .70 .70 .70 .70 .70

F .67 .68 .68 .68 .68 .68 .68 .68

ET P .78 .79 .79 .80 .80 .80 .80 .80

R .79 .80 .80 .80 .80 .80 .81 .81

F .78 .79 .79 .80 .80 .80 .80 .80

task +0 +1 +2 +3 +4 +5 +6 +7

AA P .42 .42 .42 .42 .42 .41 .42 .42

R .45 .48 .50 .52 .54 .55 .56 .57

F .43 .45 .45 .46 .47 .47 .48 .48

PP P .66 .65 .64 .65 .65 .64 .64 .64

R .70 .75 .78 .80 .81 .82 .83 .84

F .67 .70 .71 .71 .72 .72 .72 .72

ET P .78 .76 .75 .75 .75 .75 .75 .75

R .79 .82 .84 .84 .84 .85 .85 .86

F .78 .79 .79 .79 .80 .80 .79 .80

AA PP ET

0.3

0.4

0.5

0.6

0.7

0.8

0.9

AA PP ET

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Page 42: Transfer of Predictive Models for Classification of

Improvement Example

task Manual 0aux 1aux 2-5aux 6aux 7aux

AA 26 20 26 26 26 26

26

PR 2 1 1 1 2 2

2 2

AC 21 43 21 21 21 21

48 43 48 48 48

48

GL 1 50 50 50 50 50

PP 1 1 1 1 1 1

2 2 2 2 2 2

4 4 4 4 4 4

ET 5 5 5 5 5 5

19 19 19 19 19

RA 20 20 20 20 20 20

CN 0 7 7 0 0 0

29 29

30 30

40 40

41 41

TF 0 0 0 0 0 0

P 1 .56 .59 .78 .83 .83

R 1 .5 .78 .89 .89 .89

F1 1 .53 .67 .83 .86 .86

37

Page 43: Transfer of Predictive Models for Classification of

Presentation Overview

Motivation

Task Description

Related and Prior Work

Data from Multiple Jurisdictions

Data Processing

Framework

Experimental Setup

Evaluation and Results

Future Work

Conclusions

38

Page 44: Transfer of Predictive Models for Classification of

Future Work

I Implement similar framework for the relevance task.

I Experiment with techniques to handle imbalanced and sparsedata sets, e.g. SMOTE.[6] Chawla et al. 2002

I Experiment with overlay framework for multi-dimensionalclassification.[2] Batal et al. 2013

I Generate richer text representation (automatic annotation).

I Experiment with learning tasks simultaneously (multi-tasklearning).[9] Evgeniou & Pontil 2004

I Experiment with other transfer learningtechniques.[19] Pan & Yang 2010

I Utilize existing knowledge:I codebook[28] Codebook [online]

I tables of corresponding agents from different statesI data generated by network analysis

39

Page 45: Transfer of Predictive Models for Classification of

Presentation Overview

Motivation

Task Description

Related and Prior Work

Data from Multiple Jurisdictions

Data Processing

Framework

Experimental Setup

Evaluation and Results

Future Work

Conclusions

40

Page 46: Transfer of Predictive Models for Classification of

Conclusions

I We have presented framework for transfer of textcategorization models among different US state jurisdictions.

I Performance of most classifiers gradually improve as we usemodels from increasing number of states.

I Relatedness of domains as well as tasks we deal with wasconfirmed.

I Possible way to deal with data sparsity was further exploredand confirmed as promising.

I The framework’s potential benefits are not limited to contextof United States.

41

Page 47: Transfer of Predictive Models for Classification of

References I

Aggarwal, Charu & Zhai, ChengXiang (eds.). Mining Text Data. Springer, 2012.

Batal, I., Hong, C., and Hauskrecht, M., An Efficient Probabilistic Framework forMulti-Dimensional Classification. ACM Conference on Information andKnowledge Management. San Fransisco (2013).

Biagioli, C., Francesconi, E., Passerini, A., Montemagni, S., and Soria, C.,Automatic Semantics Extraction in Law Documents, ICAIL 2005 Proceedings,133–140, ACM Press (2005).

Boella, G., Di Caro, L., Lesmo, L., Rispoli, D., and Robaldo, L., Multi-labelClassification of Legislative Text into EuroVoc. JURIX 2012 Proceedings, pp.21–30, B. Schafer (Ed.), IOS Press (2012).

Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and RegressionTrees. Boca Raton, FL: CRC Press, 1984.

Chawla, N.V., Bowyer, K.W., Hall, L.O., and Kegelmeyer, W.P., SMOTE:synthetic minority over-sampling technique. J. Artif. Int. Res. 16:321–357 (2002).

Daudaravicius, V., Automatic multilingual annotation of EU legislation withEurovoc descriptors. EEOP2012 Workshop Proceedings (2012).

42

Page 48: Transfer of Predictive Models for Classification of

References II

de Maat, E., Winkels, R., Categorisation of norms. JURIX 2007, pp.79–88, IOSPress (2007).

Evgeniou, Theodoros & Pontil, Massimiliano. Regularized Multi–Task Learning.KDD’04. Seattle, WSH, USA, 2004.

Francesconi, E., Montemagni, S., Peters, W., and Tiscornia, D., Integrating aBottom-Up and Top-Down Methodology for Building Semantic Resources for theMultilingual Legal Domain. In Semantic Processing of Legal Texts. LNAI 6036,pp. 95–121. Springer: Berlin (2010).

Francesconi, E., An Approach to Legal Rules Modelling and Automatic Learning.JURIX 2009 Proceedings (G. Governatori, Ed.), 59–68, IOS Press (2009).

Francesconi, E., and Peruginelli, G., Integrated Access to Legal Literaturethrough Automated Semantic Classification. Artificial Intelligence and Law17:31–49 (2008).

Francesconi, E., and Passerini, A., Automatic Classification of Provisions inLegislative Texts. Artificial Intelligence and Law 15:1–17 (2007).

Jones, Karen Sparck. ”A statistical interpretation of term specificity and itsapplication in retrieval.” Journal of documentation 28.1 (1972): 11-21.

43

Page 49: Transfer of Predictive Models for Classification of

References III

Grabmair, M., Ashley, K.D., Hwa, R., and Sweeney, P.M., Toward ExtractingInformation from Public Health Statutes using Text Classification and MachineLearning. JURIX 2011 Proceedings, pp. 73-82 (Katie M. Atkinson ed.) IOS Press2011.

Kakwani, N., On a class of poverty measures. Econometrica, pp. 437-446 (1980).

Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., andMcClosky, D., The Stanford CoreNLP Toolkit. In 52nd Annual Meeting of theACL: System Demonstrations, pp. 55-60.

Opsomer, R., De Meyer, G., Cornelis, C., van Eetvelde, G., Exploiting Propertiesof Legislative Texts to Improve Classification Accuracy. JURIX 2009 (G.Governatori, Ed.), 136–145, IOS Press (2009).

Pan, Sinno Jialin & Yang, Qiang. A Survey on Transfer Learning. IEEETransactions on Knowledge and Data Engineering, 22(10):1345–1359, 2010.

Pouliquen, B., Steinberger, R., and Ignat, C., Automatic annotation ofmultilingual text collections with a conceptual thesaurus. arXiv preprintcs/0609059 (2006).

Steinberger, R., Ebrahim, M., and Turchi, M., JRC EuroVoc Indexer JEX-A freelyavailable multi-label categorisation tool. arXiv preprint arXiv:1309.5223 (2013).

44

Page 50: Transfer of Predictive Models for Classification of

References IV

Sweeney, P.M., Bjerke, E.F., Potter, M.A., Guclu, H., Keane, C.R., Ashley, K.D.,Grabmair, M., Hwa, R., Network Analysis of Manually-Encoded State Laws andProspects for Automation. In Winkels, R., Lettieri, N., Faro, S., (Eds.) NetworkAnalysis in Law. Diritto Scienza Tecnologia (2014).

Savelka, J., Ashley, K.D., Grabmair, M., Mining Information from StatutoryTexts in Multi-jurisdictional Settings. In Hoekstra, R. (Ed.) JURIX 2014. IOSPress (2014).

Winkels, R., and Hoekstra, R., Automatic Extraction of Legal Concepts andDefinitions. JURIX 2012, pp. 157–166, IOS Press (2012).

Wyner, A., and Peters, W., On Rule Extraction from Regulations. JURIX 2011,pp. 113–122, IOS Press (2011).

Zhang, M., and Zhou, Z., ML-KNN: A lazy learning approach to multi-labellearning. Pattern recognition 40.7:2038-2048 (2007).

Lucene [online]. 2012 [cit. 08/27/2014]. Accessed at:http://lucene.apache.org/core/

PHASYS ARM 2 - LEIP codebook [online]. Revised 11/18/2012 [cit.08/27/2014]. Accessed at: http://www.phasys.pitt.edu/

45

Page 51: Transfer of Predictive Models for Classification of

Thank you!

Questions, comments and suggestions are welcome nowor any time at [email protected].

This work was supported by the University of Pittsburgh’s University Research Council Multidisciplinary Small Grant Program.This publication was also supported in part by the Cooperative Agreement 5P01TP000304 from the Centers for Disease Control and

Prevention. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the CDC.