product named entity recognition for chinese query questions based on a skip-chain crf model

9
ORIGINAL ARTICLE Product named entity recognition for Chinese query questions based on a skip-chain CRF model Zhifeng Hao · Hongfei Wang · Ruichu Cai · Wen Wen Received: 27 December 2011 / Accepted: 16 March 2012 / Published online: 11 May 2012 © Springer-Verlag London Limited 2012 Abstract As more and more commercial information can be obtained from the Internet, product named entity recog- nition plays an important role in market intelligence management. In this paper, a product named entity recog- nition method based on a skip-chain CRF model is proposed. This method considers not only the dependence between neighboring words but also the fact that product named entities are often connected by a connective. In this situation, the dependence between the words around the connective is more important than the dependence between neighboring words. This information improves the result of product named entity recognition as shown in the experiments. Experimental results on corpuses of mobile phone and digital camera demonstrate that the skip-chain CRF model works well and produces better results than the linear-chain CRF model. Keywords Chinese query · Product named entity recognition · Skip-chain CRF · Conditional random fields 1 Introduction With the developing of information technology, more and more commercial information can be obtained from the Internet. However, most of the information in the Internet is unstructured or semi-structured, which is difficult to be directly analyzed. Thus, how to extract key information becomes a crucial and valuable problem in many business IE applications [1]. As one of the key problems in this area, product named entity recognition plays an important role in market intelli- gence management, enterprise content management and enterprise competitive intelligence collection [1]. As pointed by JM Pierre, effective product named entity recognition technology means that analysis of large collections of unstructured text documents can be processed quickly, inex- pensively, and with a minimum of human intervention, then human users can concentrate on higher level problems [2]. Compared with traditional named entity recognition [3], differences on expression forms and structures between product named entity (PRO NE) and traditional named entity make PRO NER more difficult. Up to now, less work has been done in PRO NER. In these existing works, two main approaches have been applied, that is, rule-based approaches and statistical model-based approaches. Rule-based approaches [2, 4] have bad portability, for rules should be redefined or regenerated when applied in new domains. Statistical model-based approaches apply statistical models integrated with some heuristic information and external knowledge bases. One category of statistical approaches adopts generative models [1, 5]. But for these approaches, strict independence assumptions are needed and label bias problem is the underlying problem. The other category of statistical approaches applies discriminative models such as linear-chain CRF model [6, 7]. These approaches can rep- resent rich overlapping features and do not require strict independence assumptions, but some important dependence information is missed in this model. Thus, CRF models can solve problems like various expressive and structural forms in PRO NER and are the Z. Hao · H. Wang (&) Faulty of Applied Mathematics, Guangdong University of Technology, Guangzhou 510006, PA, China e-mail: [email protected] Z. Hao · R. Cai · W. Wen Faulty of Compute Science, Guangdong University of Technology, Guangzhou 510006, PA, China 123 Neural Comput & Applic (2013) 23:371–379 DOI 10.1007/s00521-012-0922-5

Upload: wen-wen

Post on 08-Dec-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

ORIGINAL ARTICLE

Product named entity recognition for Chinese query questionsbased on a skip-chain CRF model

Zhifeng Hao · Hongfei Wang · Ruichu Cai ·Wen Wen

Received: 27 December 2011 / Accepted: 16 March 2012 / Published online: 11 May 2012

© Springer-Verlag London Limited 2012

Abstract As more and more commercial information can

be obtained from the Internet, product named entity recog-

nition plays an important role in market intelligence

management. In this paper, a product named entity recog-

nitionmethod based on a skip-chain CRFmodel is proposed.

This method considers not only the dependence between

neighboring words but also the fact that product named

entities are often connected by a connective. In this situation,

the dependence between the words around the connective is

more important than the dependence between neighboring

words. This information improves the result of product

named entity recognition as shown in the experiments.

Experimental results on corpuses ofmobile phone and digital

camera demonstrate that the skip-chain CRF model works

well and produces better results than the linear-chain CRF

model.

Keywords Chinese query ·

Product named entity recognition · Skip-chain CRF ·

Conditional random fields

1 Introduction

With the developing of information technology, more and

more commercial information can be obtained from the

Internet. However, most of the information in the Internet

is unstructured or semi-structured, which is difficult to be

directly analyzed. Thus, how to extract key information

becomes a crucial and valuable problem in many business

IE applications [1].

As one of the key problems in this area, product named

entity recognition plays an important role in market intelli-

gence management, enterprise content management and

enterprise competitive intelligence collection [1]. As pointed

by JM Pierre, effective product named entity recognition

technology means that analysis of large collections of

unstructured text documents can be processed quickly, inex-

pensively, and with a minimum of human intervention, then

human users can concentrate on higher level problems [2].

Compared with traditional named entity recognition [3],

differences on expression forms and structures between

product named entity (PRONE) and traditional named entity

make PRO NER more difficult. Up to now, less work has

been done in PRO NER. In these existing works, two main

approaches have been applied, that is, rule-based approaches

and statistical model-based approaches. Rule-based

approaches [2, 4] have bad portability, for rules should be

redefined or regenerated when applied in new domains.

Statistical model-based approaches apply statistical models

integrated with some heuristic information and external

knowledge bases. One category of statistical approaches

adopts generative models [1, 5]. But for these approaches,

strict independence assumptions are needed and label bias

problem is the underlying problem. The other category of

statistical approaches applies discriminative models such as

linear-chain CRF model [6, 7]. These approaches can rep-

resent rich overlapping features and do not require strict

independence assumptions, but some important dependence

information is missed in this model.

Thus, CRF models can solve problems like various

expressive and structural forms in PRO NER and are the

Z. Hao · H. Wang (&)

Faulty of Applied Mathematics, Guangdong University

of Technology, Guangzhou 510006, PA, China

e-mail: [email protected]

Z. Hao · R. Cai · W. Wen

Faulty of Compute Science, Guangdong University

of Technology, Guangzhou 510006, PA, China

123

Neural Comput & Applic (2013) 23:371–379

DOI 10.1007/s00521-012-0922-5

most suitable in existing approaches. But linear-chain CRF

in PRO NER assumes that dependencies between neigh-

boring labels are the strongest [8], which misses some other

more important dependence information. Thus, skip-chain

CRF, a model with rich dependence expression ability, is

first introduced to solve the PRO NER in this paper.

The main contributions of this paper are as follows: (1)

A skip-chain CRF model is firstly introduced to solve the

problem caused by connectives. (2) The proposed model

does not only represent dependencies between neighboring

labels, but also represents the dependencies between labels

in the formal of coordinative relation, which contains much

more important information. (3) A set of features is defined

which can express rich context information around words

well.

The remaining of this paper is organized as follows.

Section 2 introduces and analyzes the related work about

PRO NER. In Sect. 3, we present the PRO NER problem

and its difficulties. Section 4 introduces our solution for the

PRO NER: a skip-chain CRF model. The experimental

results and analysis are presented in Sect. 5. Conclusion

and future works are given in Sect. 6.

2 Related work

The main existing PRO NER approaches can be divided

into two classes as described in detail in this section. The

advantages and disadvantages of these methods are also

introduced.

2.1 Rule-based approaches

Rule-based approaches mainly perform PRO NER with

some rules defined manually or generated automatically.

Pierre [2] developed an English NER system capable of

identifying product names in product reviews. It employed

a simple Boolean classifier for identifying product names,

which is similar to string pattern matching. But this method

produces bad results in the condition that a lot of new

product names are not contained in the training corpus.

Bick and his colleagues solved the name entity recognition

problem using a constraint-grammar-based parser for

Danish [4]. This rule-based approach is highly dependent

on the performance of the Danish parser and has bad

portability.

2.2 Statistical model-based approaches

Statistical model-based approaches usually apply statistical

models integrated with some heuristics and external

knowledge bases. For example, a bootstrapping approach is

presented for English NER [9] using two successive

learners (parsing-based decision list and hidden Markov

model). The major advantage of this method is that the

manual annotation of a sizable training corpus can be

avoided, but it suffers from two problems: one is that it is

difficult to find sufficient concept-based seeds for boot-

strapping; the other is that it is highly dependent on

parser’s performance. Jun Zhao and his colleagues used a

hierarchical hidden Markov model-based statistical model

to recognize the named entity in Chinese documents about

digital products and mobile phones [1]. However, hierar-

chical hidden Markov model is a generative model.

Generative models have their own inherent disadvantages:

they need to make strict independence assumptions, but

most data sequence cannot be presented as a series of

factors that are independent with each other. So, it cannot

represent the data sequence very well. Huang et al. [6] first

introduced the CRF model into product named entity rec-

ognition, in which a modified Viterbi algorithm combined

with some constraint rules is applied to get the N-Best

results. Then, some rules are applied to filter the N-Best

results which may still include bad results. It shows that it

can get a good final result. In the work of Luo et al. [7],

some domain ontology features are introduced to a linear-

chain CRF, and their result shows it can improve the per-

formance. But the linear-chain CRF model in this approach

only describes dependencies between neighboring labels,

on the assumption that those dependencies are the strongest

[8]. This assumption ignores some more important depen-

dence information, and it shows that this information can

improve the performance of CRF models easily.

For advantages of CRF models mentioned in Sect. 1,

CRF models are more appropriate for PRO NER than other

models. However, the linear-chain CRF model misses

some more important dependence information in some

situations. For example, through observing the data set,

sequences like “诺基亚 5230 和诺基亚 5233 哪个好

(which is better between Nokia 5230 and Nokia 5233)”

often occur, the words around the word “和 (and)” often

belong to the same type. Obviously, in this example, the

dependence between “5230” and “诺基亚 (Nokia),” which

are connected by “和 (and),” is more important than

dependence between “诺基亚 (Nokia)” and “和 (and).”

This kind of information will help recognition, but a linear-

chain CRF cannot present it. So, in this paper, a skip-chain

CRF is proposed to include this kind of information.

3 Problem statements

3.1 A formalization description of PRO NER problem

As a subfield of named entity recognition, PRO NER can

be formalized as a NER problem that consists of two steps:

372 Neural Comput & Applic (2013) 23:371–379

123

(1) Modeling stepGiven a labeled and processed training set, the purpose

is to find an appropriate distribution PðyjxÞ by maximizing

the likelihood PðyjxÞ of the training set, that is:

PðyjxÞ ¼ arg maxPðyjxÞ

YTðx;yÞ

PðyjxÞ; PðyjxÞ 2 P;

or

k ¼ arg maxk

YTðx;yÞ

Pðyjx : kÞ; k 2 R1;

where x is the observation sequence instance, y is the labelsequence instance correspondingly, and T represents the

training set consisted of the pair (x, y). λ means the

parameter vector of distribution PðyjxÞ which belongs to a

l-dimensional space of real numbers, P in the first formula

represents a space of distribution functions that define the

conditional probability PðyjxÞ: This step provides an

approach for finding a specific model that can describe the

relationship between the space of observation sequences

and the space of label sequences well.

(2) Inference stepGiven the model trained in the modeling step, the

inference step aims to find the label sequence y 2 L (Lrepresents the space of label sequence), which satisfies the

following conditions given the observation sequence x:

y ¼ arg maxy

PðyjxÞ; y 2 L:

This step provides an approach for finding the most

possible label sequence y given an observation sequence x.

3.2 Difficulties of PRO NER

As a new field of NER, PRONER has its own characteristics

distinguished fromgeneral NER. These characteristics result

in the following difficulties.

1. Few open corpus resources about PRO NER can be

obtained because labeling a big corpus is expensive.

This is one of the bottlenecks for statistical model

applications. Though some corpus libraries are avail-

able, such as the one built by FUJITSU R&D CENTER

and the Institute of Automation Chinese Academy of

Sciences [6], this library is not open for commercial

secrets.

2. Remarkable featurewords that clearly indicate a product

named entity cannot be found in sentence. This is quite

different from the traditional name entity recognition

problem, for the latter usually contains remarkable

entity words like “市 (city),” “公司 (company),”“街

(street),”“公园 (garden)” and so on.

3. The same product probably has different names. For

example, “索爱 x10i” is an abbreviation of “索尼爱立

信 x10i,” both of them stand for the same mobile

phone.

4. Cross-language phenomenon is common in product

names, for example, “HTC 渴望 (Desire)” is another

form of “HTC Desire,” which is hard to be recognized

very well.

4 A skip-chain CRF model for product named entityrecognition

4.1 Linear-chain CRF

Linear-chain CRF model [10] is the most basic and sim-

plest model of CRF models; skip-chain CRF used in this

paper is an improved version of linear-chain CRF.

A linear-chain CRF approach has been applied in PRO

NER and shown its effectiveness [6, 7]. It models depen-

dencies between neighboring labels, based on the

assumption that those dependencies are the strongest [8].

However, sometimes, the assumption disaccords with the

fact and ignores some other more important dependencies

as shown in Sect. 2.

Because of its first-order Markov assumption among

labels, a linear-chain CRF cannot represent the important

dependencies between words connected by a connective.

To relax this assumption, a skip-chain CRF is proposed that

also take these important dependencies into consideration.

These dependencies are represented in our model by add-

ing factors that depend on the labels of words connected by

a connective, such as in the sequence “诺基亚 5230 和诺

基亚 5233哪个好(which is better between Nokia 5230 and

Nokia 5233),” “5230” and “诺基亚 (Nokia)” are connected

by a connective “和 (and),” and they are all parts of a

product named entity. In general, words connected by a

connective are very possible to have the same labels or

relative labels such as different parts of a product named

entity. Our model is inspired by the skip-chain CRF model

proposed by Sutton and McCallum [8], which takes

advantage of the dependencies between the labels of distant

but similar words.

4.2 Skip-chain model

A skip-chain CRF is essentially a linear-chain CRF with

additional skip edges between words connected by a con-

nective. Information on both endpoints of the skip edges

can be used to recognize the class of either endpoint. As we

Neural Comput & Applic (2013) 23:371–379 373

123

have observed from the corpus, the number of skip edges in

a question is small, so the difficulty increased by the skip

edges can be ignored [8].

Formally, our skip-chain CRF is defined as a general

CRF with two clique templates: one is for the adjacent

label nodes that are also contained in a linear-chain CRF,

and the other is for the label nodes on skip edges. Given an

observation sequence, assuming S = {(j, j + 2)} is the set

of all pairs of label nodes on skip edges, the probability of a

label sequence Y conditioned on X is modeled as:

P Y jXð Þ ¼ 1

ZðYÞYIi¼1

wi yi; yi�1;Xð ÞYJ

ðj;jþ2Þ2Swj;jþ2ðyj; yjþ2;XÞ

where wi is the factor for the adjacent label nodes, wj;jþ2 is

the factor for the label nodes on skip edges. They are

defined as below:

wi yi; yi�1;Xð Þ ¼ expXk

kk � fk yi; yi�1;X; ið Þ( )

wj;jþ2 yj; yjþ2;X� � ¼ exp

Xl

gl � flðyj; yjþ2;X; j; jþ 2Þ( )

where fk yi; yi�1;X; ið Þ is the feature on linear-chain edges,

flðyj; yjþ2;X; j; jþ 2Þ is the feature on skip edges; in our

work, they are both indicator function and detailed features

will be described in Sect. 5. h1 ¼ fkkgK1

k¼1 are the param-

eters of the linear-chain template, h2 ¼ fglgK2

l¼1 are the

parameters of the skip edge template. The factor graph of

our model is shown in Fig. 1.

In this paper, L-BFGS algorithm [11] is applied to

estimate parameters θ of our skip-chain CRF model, and

loopy belief propagation algorithm [12] is used to perform

approximate inference.

4.3 Workflow of PRO NER

In this paper, only two kinds of product named entities are

considered, namely product brand name (PBN) and product

type name (PTN), which are important parts of a product

name. So, the result of PBN and PTN recognition will

directly affect the final result of product named entity

recognition. The work of product named entity recognition

will be the follow-up work in future, which is not taken

into consideration here.

Five labels are used in the model: b1, b2, t1, t2 and o.

Here, b1 and t1 denote the starting part of brand name and

type name, respectively. b2 and t2 mean the rest part of the

brand name and type name, respectively, o means other

words. Our workflow is summarized in Fig. 2, and the

detail is as follows:

1. Preprocessing: Word segmentation and POS tagging

are conducted on the raw input text using the open-

source software ICTCLAS [14]. ICTCLAS allows

adding the user dictionary that leads to better results

than other segmentation tools. In this paper, a partic-

ular designed user dictionary composed of English and

Chinese official brand names is used.

2. Inducing features: Convert words in sequence to a

feature vector with scalable dimensions. The following

features are contained in the feature vector:

(a) POS tag: the POS tag of the word has a one-size

window at both sides. For example, the POS tag

of the adjective “便宜 (cheap)” is adj.

(b) in_brdict: If the word is in the brand dictionary, it

has a two-size window on the left. For example,

the word “Nokia” is in the brand dictionary.

(c) in_lbodict: If the current word is in the left-

boundary dictionary, it has a one-size window on

the left. The left-boundary dictionary contains

words that often occur on the left of product

names, for example “买 (buy).”

(d) in_rbodict: If the current word is in the right-

boundary dictionary, it has a one-size window on

the right. The right-boundary dictionary contains

words that often occur on the right of product

names, for example “卖 (sell).”

(e) in_dict: If the current word is in the connect-word

dictionary, for example, the current word “和

(and)” is in the connect-word dictionary.

(f) has_n: If there are some numbers in the current

word, it has a three-size window on both sides,

like “5230.”

(g) has_l: If there are some letters in the current

word, it also has a three-size window on both

sides, like “n97.”

3. Training the skip-chain model: Input the feature

vectors of training set and L-BFGS algorithm is

applied to estimate the weight of each feature.

4. Getting labels sequence: Input the feature vector of a

test question into the trained model, loopy belief

propagation algorithm is applied to find the appropriate

labels sequence of the test question.

5 Experiments and analysis

5.1 Purpose of our experiments

This paper focuses on recognizing two types of named

entities: PBN and PTN. In order to verify the performance

of the proposed method, four experiments are designed: (1)

an experiment on training corpus, (2) an experiment for

374 Neural Comput & Applic (2013) 23:371–379

123

comparing the performance of linear-chain CRF and skip-

chain CRF on the natural mobile phone test corpus, (3) a

comparison experiment on a manual mobile phone test

corpus, (4) a portability experiment, a skip-chain CRF

model, is applied on a camera test corpus to estimate its

portability.

In the experiments, precision, recall and F1 score are

used to evaluate the performance, and they are, respec-

tively, denoted as P, R and F1, respectively. They are

defined below:

Pi ¼ Correcti

Returnedi;

Ri ¼ Correcti

Ni

;

F1i ¼ 2 � Pi � Ri

Pi þ Ri

;

where Correcti is the number of correct recognized entity i,Returnedi is the number of recognized entity i, Ni is the

number of entity i in testing data set.

In the experiments, the MALLET implementation of

CRFs [13] is used. MALLET uses a quasi-Newton method

called L-BFGS to find these feature weights efficiently and

loopy belief propagation algorithm to perform approximate

inference.

5.2 Data set

Experimental data set consists of 750 questions crawled from

Baidu knows,1 a famous question–answer website in China.

Most of them are about mobile phones and the rest are about

cameras. Among the questions about mobile phones, 205

randomly selected samples are used as training corpus. In the

remaining questions, 200 randomly selected samples are

used as natural test corpus, and 200 samples that contain

more than one mobile phone are manually selected from the

rest as our manual test corpus. Hundred questions about

cameras are also selected as our portability test corpus. The

details of these four corpuses are summarized in Table 1.

5.3 Results and analysis

5.3.1 Test of training algorithm

First, a skip-chain model is applied on the training corpus

to test the performance of training algorithm, and the result

is illustrated in Table 2.

Results in Table 2 show that the proposed model has

good performance on the training corpus, the precisions of

X:the ith word isa connective

like” (and)”

iy +1iy 2+iy-2iy -1iy

Fig. 1 Factor graph of skip-chain CRFs

Sequences in thetraining corpus

Segmentation and POS tagging

Inducing features

Training Model

Skip -Chain CRF model

An incoming sequence

Segmentation and POS tagging

Inducing features

Tagging Labels

Label sequence

Prep

roce

ssin

g

Prep

roce

ssin

g

(a) Training part of model (b) Inference part of model

Fig. 2 Workflow of PRO NER in our work

1 Baidu knows: http://zhidao.baidu.com/.

Neural Comput & Applic (2013) 23:371–379 375

123

four labels are all very high, and the precisions of t1 and t2are a little lower, this is because letters and numbers in time

or price may disturb the recognition.

5.3.2 Comparison of linear-chain CRF with skip-chainCRF

Then, the open test experiments are conducted aimed at

comparing the performance of linear-chain CRF model and

skip-chain CRF model on two opening corpuses. First,

these two models are applied on the opening natural cor-

pus, respectively; the results are shown in Figs. 3, 4 and 5.

The detail of data is shown in Table 3.

Results demonstrate that the performance of skip-chain

CRF is slightly better than linear-chain CRF’s. This is

because in the opening natural corpus, there are not many

sentences that contain two or more mobile phones, and most

mobile phone names in these sentences are regular and are

easy to recognize without the help of skip-chain. These

mobile phone names often occur in the form of a regular

brand name followedwith a type name that just has oneword,

for example, “三星 i5800多少钱? (How much is a Anycall

i5800?).” In the situation that a sentence contains more than

onemobile phone names and themobile phone names are not

regular, skip-chain CRFwill have better performance, which

is demonstrated in the following.

Skip-chain CRF and linear-chain CRF model are con-

ducted on the opening manual corpus. This corpus

consisted of sentences with more than one mobile phone

names, these mobile phone names are often connected by

some connectives, and some of these names are not regular,

for example, some abbreviations of brand names that are

not in brand dictionary,”索爱 (abbreviations of Sony

Ericsson),”“摩托 (abbreviations of Motorola)” and so on.

The results of these two models are shown in Figs. 6, 7 and

8. The details are shown in Table 4.

Table 4 shows that skip-chain CRF performs much

better than linear-chain CRF. This is because some type

names occur in a sentence without a brand name in front of

them, they may contain letters or numbers, but these evi-

dences are not enough to prove they are type names, as the

price or other words may also contain letters or numbers.

Fig. 3 The precision of skip-chain CRF and linear-chain CRF on

opening natural corpus

Fig. 4 The recall of skip-chain CRF and linear-chain CRF on

opening natural corpus

Table 1 The detail of each corpus

Other (o) Front part of

brand name (b1)End part of

brand name (b2)Front part of

type name (t1)End part of

type name (t2)

Training corpus 1539 255 10 315 59

Natural test corpus 766 169 4 206 23

Manual test corpus 782 213 0 298 31

Portability test corpus 893 108 0 119 21

Table 2 Training algorithm test result

Other

(o)Front part of

brand name

(b1)

End part of

brand name

(b2)

Front part of

type name

(t1)

End part of

type name

(t2)

P 0.9782 1.0 1.0 0.9648 0.8545

R 0.9922 0.9529 0.6 0.9587 0.7966

F1 0.9851 0.9759 0.7499 0.9617 0.8245

The highest values in the table are kept in bold

376 Neural Comput & Applic (2013) 23:371–379

123

So, in a linear-chain CRF, it is hard to recognize them

correctly. But in a skip-chain CRF model, they can be

connected with a skip-chain; the evidence around a type

name will help recognize the other type names. So, if one

of them can be recognized, then others can be recognized

easily, for example, “我想买诺基亚 C2-01、 5130 和

2730c 中的一部手机 (I want to buy one among Nokia

C2-01,5130 and 2730c),” “C2-01” is easy to be recognized

because it is behind the brand name “诺基亚 (Nokia),”

then “5130” and “2730c” are also easy to be recognized

because of the punctuation “、” and the connective “和

(and).” The improvement on b1 owes to some brand names

that do not exist in the brand dictionary; because it is hard

to be recognized in a linear-chain CRF, a skip-chain CRF

applies the information around the words that are con-

nected with this word by a connective, for example, “华为

U8800 与摩托 ME52 哪一个好 (which is better, HU-

AWEI U8800 or MOTO ME52),” where “摩托 (MOTO)”

is an abbreviation of “摩托罗拉 (MOTOROLA).”

5.3.3 Portability test

At last, the proposed model is applied on a portability test

corpus. This corpus consisted of sentences about digital

camera. The components of a digital camera name are sim-

ilar with a mobile phone name, what have to be modified is

just adding usual camera brand names into the brand dic-

tionary. The final result is shown in Table 5 below:

Table 5 shows the proposed model still has good per-

formance on camera domains. This demonstrates that the

Fig. 5 The F1 value of skip-chain CRF and linear-chain CRF on

opening natural corpus

Fig. 6 The precision of skip-chain CRF and linear-chain CRF on

opening manual corpus

Fig. 7 The recall of skip-chain CRF and linear-chain CRF on

opening manual corpus

Table 3 The experimental results of skip-chain CRF and linear-chain

CRF on opening natural corpus

Labels Indicators Skip-chain CRF Linear-chain CRF

o P 0.9719 0.9690

R 0.9802 0.9817

F1 0.9760 0.9753

b1 P 0.9859 0.9807

R 0.9090 0.9053

F1 0.9459 0.9415

b2 P 1.0 1.0

R 0.5 0.5

F1 0.6666 0.6666

t1 P 0.9206 0.9333

R 0.9508 0.9514

F1 0.9354 0.9423

t2 P 0.56 0.5416

R 0.6086 0.5652

F1 0.5833 0.5531

The highest values in the table are kept in bold

Neural Comput & Applic (2013) 23:371–379 377

123

skip-chain CRF has good portability and can be easily

applied on other domains with little modification. There

still be some characteristics in camera corpus, for example,

camera names and camera battery names often occur in a

sentence together, and their forms are very similar. This

may lead to recognize a camera battery name as a camera

name. In this case, further modifications need to be made

such as adding some boundary features that indicate battery

names and so on. These modifications are easy to make.

6 Conclusion

In this paper, the problem caused by connectives in PRO

NER is first considered, and a skip-chain CRF model is

proposed to solve this problem. The proposed model does

not only consider the dependence information between

neighboring labels but also takes advantage of dependence

information between words that have coordinating relation,

and outperforms the existing linear-chain CRF approach.

Since phenomena such as nested entities often occur in

product named entities, and they make recognition work

more complex, hierarchical CRF models with skip-chain

can be applied to improve the performance.

References

1. Zhao J, Liu F (2008) Product named entity recognition in Chinese

text. Lang Resour Eval 42(2):197–217

2. Pierre JM (2002) Mining knowledge from text collections using

automatically generated metadata. In: Proceedings of fourth

international conference on practical aspects of knowledge

management

3. Yu Hk, Zhang H, Liu Q, Lv X, Shi S (2006) Chinese named

entity identification using cascaded hidden Markov model. J

Commun 27(2):87–93 (in Chinese)

4. Bick E (2004) A named entity recognizer for Danish. In: Pro-

ceedings of 4th international conference on language resources

and evaluation, pp 305–308

5. Liu F, Zhao J, Lv B, Xu B, Yu H, Xia Y (2006) Study on product

named entity recognition for business information extraction. J

Chin Info Process 20(1):7–13 (in chinese)

6. Huang L, Liu Q (2008) Method for Chinese product name rec-

ognition based on conditional random fields. Appl Res Comput

25:1829–1831 (in Chinese)

7. Luo F, Xiao H, Chang W (2011) Product named entity recogni-

tion using conditional random fields. In: 2011 Fourth

international conference on business intelligence and financial

engineering

8. Sutton C, McCallum A (2007) An introduction to conditional

random fields for relational learning. In: Getoor L, Taskar B (eds)

Introduction to statistical relational learning. MIT Press,

Cambridge

9. Niu C, Li W, Ding Jh, Srihari RK (2003) A bootstrapping

approach to named entity classification using successive learners.

In: Proceedings of the 41st ACL, Sapporo, Japan, pp 335–342

10. Lafferty J, McCallum A, Pereira F (2001) Conditional random

fields: Probabilistic models for segmenting and labeling sequence

data. In: Proceedings of the international conference on machine

learning (ICML-2001)

11. McCallum A, Wellner B (2005) Conditional models of identity

uncertainty with application to noun coreference. In: Saul LK,

Weiss Y, Bottou L (eds) Advances in neural information pro-

cessing systems, vol 17. MIT Press, Cambridge, pp 905–912

Fig. 8 The F1 value of skip-chain CRF and linear-chain CRF on

opening manual corpus

Table 4 The experimental result of skip-chain CRF and linear-chain

CRF on opening manual corpus

Labels Indicators Skip-chain CRF Linear-chain CRF

o P 0.9798 0.9773

R 0.9961 0.9923

F1 0.9879 0.9847

b1 P 0.9901 0.9842

R 0.9436 0.8826

F1 0.9663 0.9306

b2 P 1.0 1.0

R 1.0 1.0

F1 1.0 1.0

t1 P 0.9733 0.9397

R 0.9798 0.9429

F1 0.9765 0.9413

t2 P 0.7307 0.55

R 0.6129 0.7096

F1 0.6666 0.6197

The highest values in the table are kept in bold

Table 5 Skip-chain CRF on portability test corpus

o b1 b2 t1 t2

P 0.9897 0.8991 1.0 0.8916 0.4

R 0.9720 0.9907 1.0 0.8991 0.4761

F1 0.9807 0.9427 1.0 0.8953 0.4347

378 Neural Comput & Applic (2013) 23:371–379

123

12. Taskar B, Abbeel P, Koller D (2002) Discriminative probabilistic

models for relational data. In: Eighteenth conference on uncer-

tainty in artificial intelligence (UAI02)

13. McCallum AK (2002) MALLET: a machine learning for lan-

guage toolkit. http://mallet.cs.umass.edu

14. Zhang HP, Yu HK, Xiong DY, Liu Q (2003) HHMM-based

Chinese lexical analyzer ICTCLAS. SIGHAN ‘03. In: Proceed-

ings of the second SIGHAN Workshop affiliated with 41th ACL,

vol 17, pp 184–187

Neural Comput & Applic (2013) 23:371–379 379

123