a pattern-based approach to hyponymy relation acquisition for the agricultural thesaurus

31
A Pattern-Based Approach to Hyponymy Relation Acquisition for the Agricultural Thesaurus *Makoto Nakamura Ryusei Kobayashi Yasuhiro Ogawa Katsuhiko Toyama (Nagoya University, Japan)

Upload: aims-agricultural-information-management-standards

Post on 30-Jul-2015

446 views

Category:

Documents


0 download

TRANSCRIPT

A Pattern-Based Approach to Hyponymy Relation Acquisition for the Agricultural Thesaurus *Makoto Nakamura Ryusei Kobayashi Yasuhiro Ogawa Katsuhiko Toyama (Nagoya University, Japan)

First of All... •  Japan Legal Information Institute, Graduate School of Law, Nagoya University, Japan •  established in order to provide Japan’s legal information to the world

•  Tasks •  Base for issuing English translations of Japanese legal information

•  Provision of Japanese legal information to overseas

•  Development of software for legislative support •  etc.

•  Natural language processing for legal texts

Outline 1.  Introduction 1.  Example of a hyponymy relation 2.  AGROVOC 3.  Purpose

2.  Previous Works on Legal Text Processing 3.  Acquisition of Legal Terms from Legal Texts 4.  Experiments 5.  Conclusion

Introduction The goal of this study: •  to construct a legal ontology based on legal terms Legal terms are: •  special, idiomatic expressions that often describe legal matters in legal documents

•  defined by law prior to use

Example of a Hyponymy Relation in a Law (1/2)

Input text: 漁業法 第六条3 「定置漁業」とは、漁具を定置して営む漁業であつて次に掲げるものをいう。 (以下略) Fishery Act Article 6-3 The “fixed gear fishery” refers to a fishery operated with fixed gear, which falls under any of the following items. *snip*

Example of a Hyponymy Relation in a Law (2/2)

Input text: 漁業法 第六条3 「定置漁業」とは、漁具を定置して営む漁業であつて次に掲げるものをいう。 (以下略) Fishery Act Article 6-3 The “fixed gear fishery” refers to a fishery operated with fixed gear, which falls under any of the following items. *snip*

Hypernym: 漁業 / fishery Hyponym: 定置漁業 / fixed gear fishery

Hyponymy Relation

AGROVOC (Niu et al., 2012) •  the world’s most comprehensive multilingual agricultural vocabulary

•  contains more than 40,000 concepts in 21 languages •  covers topics on food, nutrition, agriculture, fisheries, forestry, environment, and other related domains

•  expressed in a Simple Knowledge Organization System (SKOS) and published as Linked Data

•  All the terms or concepts have been added to the thesaurus by the domain experts in different languages.

•  This laborious human work is very time consuming and expensive.

AGROVOC

Purpose •  to acquire hyponymy relations from the legal corpus

•  to increment the vocabulary of AGROVOC

Assumption Legal terms are qualified for AGROVOC as long as they are related to the agricultural domain.

Example of Word Tree in AGROVOC Labels Status Scope Created Last modified fisheries (EN) Descriptor (20) n/a 1981-01-09 1981-01-09 00:00:00

fisheries

capture fisheries

Commercial fisheries economic activities

fishery economics

Fishery oceanography

fishing methods

related term used for

narrower term

broader term activities

Fishing industry

Inland fisheries Marine fisheries

AGROVOC

How to Add Legal Terms to AGROVOC Labels Status Scope Created Last modified fisheries (EN) Descriptor (20) n/a 1981-01-09 1981-01-09 00:00:00

fisheries

capture fisheries

Commercial fisheries economic activities

fishery economics

Fishery oceanography

fishing methods

related term used for

narrower term

broader term activities

Fishing industry

Inland fisheries Marine fisheries

fixed gear fisheries

fisheries

AGROVOC

Outline 1.  Introduction 2.  Previous Works on Legal Text Processing 3.  Acquisition of Legal Terms from Legal Texts 4.  Experiments 5.  Conclusion

Previous Works on Legal Text Processing •  Legal text processing using surface pattern recognition •  Knowledge acquisition from itemized expressions (Kimura et al., 2008)

•  Detection of legal definitions (Höfler et al., 2012) Surface pattern recognition is sufficient for boilerplate (fixed) expressions

•  Hyponymy relation acquisition •  The expressions “y is a (kind of) x,” “such x as y” (Miller et al., 1990, Hearst., 1992)

•  This approach is applicable to Japanese (Ando et al., 2004) Legal ontologies could automatically be constructed from legal texts containing boilerplate expressions.

Outline 1.  Introduction 2.  Previous Works on Legal Text Processing 3.  Acquisition of Legal Terms from Legal Texts 1.  Extracting terms and their explanations in Japanese legal texts

2.  Text processing for hyponymy relations 4.  Experiments 5.  Conclusion

Legal Corpus •  A set of statutory sentences from laws and regulations

•  A set of 109,380 Japanese legal sentences in 241 laws and regulations.

•  A wide variety of laws and regulations Bankruptcy Act / Measurement Act / Act on Promotion of Global Warming Countermeasures, etc.

Step-1: Example of Surface Pattern Rules

Input text: ガス事業法 第二条 この法律において「一般ガス事業」とは、一般の需要に応じ導管によりガスを供給する事業をいう。 第二条10 この法律において「ガス事業」とは、一般ガス事業、簡易ガス事業、ガス導管事業及び大口ガス事業をいう。 Gas Business Act Article 2-1 The term “General Gas Utility Business” as used in this Act shall mean the business of supplying gas via pipelines to meet general demand. Article 2-10 The term “Gas Business” as used in this Act shall mean a General Gas Utility Business, Community Gas Utility Business, Gas Pipeline Service Business or Large-Volume Gas Business.

Pattern of Definitions /「(.+)」とは、(.+)(を、|をいい、|といい、|という。|とする。)/ /”(.+)” (as used in this Act)? (shall mean|means) (.+)./

Step-1: Example of Surface Pattern Rules

Input text: ガス事業法 第二条 この法律において「一般ガス事業」とは、一般の需要に応じ導管によりガスを供給する事業をいう。 第二条10 この法律において「ガス事業」とは、一般ガス事業、簡易ガス事業、ガス導管事業及び大口ガス事業をいう。 Gas Business Act Article 2-1 The term “General Gas Utility Business” as used in this Act shall mean the business of supplying gas via pipelines to meet general demand. Article 2-10 The term “Gas Business” as used in this Act shall mean a General Gas Utility Business, Community Gas Utility Business, Gas Pipeline Service Business or Large-Volume Gas Business.

Pattern of Definitions /「(.+)」とは、(.+)(を、|をいい、|といい、|という。|とする。)/ /”(.+)” (as used in this Act)? (shall mean|means) (.+)./

Step-1: Acquisition of Definitions and Explanations

We made 6 patterns for extracting definitions from legal corpus.

Output definition: 1.  Term: 一般ガス事業

Explanation: 一般の需要に応じ導管によりガスを供給する事業 2.  Term: ガス事業

Explanation: 一般ガス事業、簡易ガス事業、ガス導管事業及び大口ガス事業

1.  Term: General Gas Utility Business Explanation: the business of supplying gas via pipelines to meet general demand

2.  Term: Gas Business Explanation: a General Gas Utility Business, Community Gas Utility Business, Gas Pipeline Service Business or Large-Volume Gas Business

Step-2: Extraction of the Hypernym from a Dependency Tree (intensive)

事業 business

供給する supply

ガスを gas

より via

導管に pipelines

応じ meet

需要に demand

一般の general

Explanation of ‘General Gas Utility Business’ ( the business of supplying gas via pipelines to meet general demand )

•  Intensive (hypernym), extensive (hyponym), and mixed patterns •  Head word(s) becomes a hypernym of the term.

•  CaboCha ‒ a Japanese dependency parser (Kudo et al., 2002) •  Complement to the parser with special terms and syntactic rules peculiar to the legal domain (Ogawa et al., 2011)

Head word Hypernym

Step-2: Extraction of the Hyponyms by a cue phrase (extensive)

大口ガス事業 Large-Volume Gas Business

ガス導管事業 Gas Pipeline

Service Business

簡易ガス事業 Community Gas

Utility Business

、 ,

一般ガス事業 General Gas Utility Business

Explanation of ‘Gas Business’ ( General Gas Utility Business, Community Gas Utility Business, Gas Pipeline Service Business or Large-Volume Gas Business )

•  classification by cue phrases (comma (,) and ‘or’) •  Hyponymy relation - a tuple of two noun phrases and a conceptual relation

Hyponym 、 ,

及び or

Hyponym Hyponym Hyponym

Hypernym: ガス事業 / Gas Business Hyponym: 一般ガス事業 / General Gas Utility Business

Hyponymy Relation

Outline 1.  Introduction 2.  Previous Works on Legal Text Processing 3.  Acquisition of Legal Terms from Legal Texts 4.  Experiments 5.  Conclusion

Experiments

•  Purpose •  Acquisition of hyponymy relations qualified for AGROVOC

•  The legal corpus •  109,380 Japanese legal sentences

•  Classification of hyponymy relations •  Category (i): Neither the hypernym or the hyponym is registered in AGROVOC.

•  Category (ii): Only the hyponym is not registered.

•  Category (iii): Only the hypernym is not registered.

•  Category (iv): Both are registered. AGROVOC

Hypernym Hyponym

new existing

(i) (ii)

(iii) (iv)

Experimental Result

Category of a hyponymy pair # of types Precision Category (i)-(iv) 1,027 †64.0% Category (ii) & (iii) 222 67.1%

Category (ii) 137 89.1% Category (iii) 75 21.3% Unknown 10 -

Category (iv) 25 88.0% Existing relations 9 88.9% New relations 16 87.5%

† is calculated from 100 samples chosen at random.

Experimental result in finding terms related to AGROVOC

AGROVOC

Hypernym Hyponym

new existing

(i) (ii)

(iii) (iv)

Example of Hyponymy Relations

Category Example 1 Example 2

(i) district court *maximum limit

bankruptcy court total allowable effort

(ii) business *injurious plant

General Gas Utility Business fungus

(iii) oocyte *measuring instrument

Unfertilized Egg equipment

(iv) greenhouse gases real property

Carbon dioxide land

- *common fishery -- fishery

Hypernym Hyponym

new existing

Experimental Result

Category of a hyponymy pair # of types Precision Category (i)-(iv) 1,027 †64.0% Category (ii) & (iii) 222 67.1%

Category (ii) 137 89.1% Category (iii) 75 21.3% Unknown 10 -

Category (iv) 25 88.0% Existing relations 9 88.9% New relations 16 87.5%

† is calculated from 100 samples chosen at random.

Experimental result in finding terms related to AGROVOC

AGROVOC

Hypernym Hyponym

new existing

(i) (ii)

(iii) (iv)

Example of Hyponymy Relations

Category Example 1 Example 2

(i) district court *maximum limit

bankruptcy court total allowable effort

(ii) business *injurious plant

General Gas Utility Business fungus

(iii) oocyte *measuring instrument

Unfertilized Egg equipment

(iv) greenhouse gases real property

Carbon dioxide land

- *common fishery -- fishery

Hypernym Hyponym

new existing

Experimental Result

Category of a hyponymy pair # of types Precision Category (i)-(iv) 1,027 †64.0% Category (ii) & (iii) 222 67.1%

Category (ii) 137 89.1% Category (iii) 75 21.3% Unknown 10 -

Category (iv) 25 88.0% Existing relations 9 88.9% New relations 16 87.5%

† is calculated from 100 samples chosen at random.

Experimental result in finding terms related to AGROVOC

AGROVOC

Hypernym Hyponym

new existing

(i) (ii)

(iii) (iv)

Example of Hyponymy Relations

Category Example 1 Example 2

(i) district court *maximum limit

bankruptcy court total allowable effort

(ii) business *injurious plant

General Gas Utility Business fungus

(iii) oocyte *measuring instrument

Unfertilized Egg equipment

(iv) greenhouse gases real property

Carbon dioxide land

- *common fishery -- fishery

Hypernym Hyponym

new existing

Experimental Result

Category of a hyponymy pair # of types Precision Category (i)-(iv) 1,027 †64.0% Category (ii) & (iii) 222 67.1%

Category (ii) 137 89.1% Category (iii) 75 21.3% Unknown 10 -

Category (iv) 25 88.0% Existing relations 9 88.9% New relations 16 87.5%

† is calculated from 100 samples chosen at random.

Experimental result in finding terms related to AGROVOC

AGROVOC

Hypernym Hyponym

new existing

(i) (ii)

(iii) (iv)

Example of Hyponymy Relations

Category Example 1 Example 2

(i) district court *maximum limit

bankruptcy court total allowable effort

(ii) business *injurious plant

General Gas Utility Business fungus

(iii) oocyte *measuring instrument

Unfertilized Egg equipment

(iv) greenhouse gases real property

Carbon dioxide land

- *common fishery -- fishery

Hypernym Hyponym

new existing

Conclusion •  Since legal documents are likely to use fixed expressions, surface pattern rules work well for term extraction.

•  We succeeded in finding 222 terms that seem qualified for AGROVOC with high precision.

•  Some error-prone rules and a procedural mistake are detected.

•  We plan to expand our method to multilingualism. •  As long as boilerplate expressions are used often, our simple method is applicable to any language.

•  The other method is to use bilingual lexicons as a dictionary (Jin et al., 2012)

Thank you

A Pattern-Based Approach to Hyponymy Relation Acquisition for the Agricultural Thesaurus

Makoto Nakamura ( [email protected] )