research article semisupervised learning based disease...

14
Research Article Semisupervised Learning Based Disease-Symptom and Symptom-Therapeutic Substance Relation Extraction from Biomedical Literature Qinlin Feng, 1 Yingyi Gui, 2 Zhihao Yang, 1 Lei Wang, 3 and Yuxia Li 3 1 College of Computer Science and Technology, Dalian University of Technology, Dalian 116024, China 2 School of Optoelectronics, Beijing Institute of Technology, Beijing 100081, China 3 Beijing Institute of Health Administration and Medical Information, Beijing 100850, China Correspondence should be addressed to Zhihao Yang; [email protected] and Lei Wang; [email protected] Received 24 April 2016; Revised 13 July 2016; Accepted 18 August 2016 Academic Editor: Md. Altaf-Ul-Amin Copyright © 2016 Qinlin Feng et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. With the rapid growth of biomedical literature, a large amount of knowledge about diseases, symptoms, and therapeutic substances hidden in the literature can be used for drug discovery and disease therapy. In this paper, we present a method of constructing two models for extracting the relations between the disease and symptom and symptom and therapeutic substance from biomedical texts, respectively. e former judges whether a disease causes a certain physiological phenomenon while the latter determines whether a substance relieves or eliminates a certain physiological phenomenon. ese two kinds of relations can be further utilized to extract the relations between disease and therapeutic substance. In our method, first two training sets for extracting the relations between the disease-symptom and symptom-therapeutic substance are manually annotated and then two semisupervised learning algorithms, that is, Co-Training and Tri-Training, are applied to utilize the unlabeled data to boost the relation extraction performance. Experimental results show that exploiting the unlabeled data with both Co-Training and Tri-Training algorithms can enhance the performance effectively. 1. Introduction In recent years, with the rapid growth of biomedical literature, the technology of information extraction (IE) has been extensively applied to relation extraction in this literature, for example, extracting the semantic relations between diseases, drugs, genes, proteins, and so forth [1–3]. e related chal- lenges (e.g., BioCreative II protein-protein interaction (PPI) task [4], DDIExtraction 2011 [5], and DDIExtraction 2013 [6]) have been held successfully. In our work, we focus on extracting the relations between diseases and their symptoms and symptoms and their ther- apeutic substances. ese relations are defined the same as those in [4–6] and also annotated at the sentence level. e former is the relationship between a disease and its related physiological phenomenon in a sentence. For example, the sentence “many blood- and blood vessel-related character- istics are typical for Raynaud patients: Blood viscosity and platelet aggregability are high” shows that blood viscosity and platelet aggregability are physiological phenomenon of Raynaud disease. e latter is the relationship between a phys- iological phenomenon and the therapeutic substance that can relieve it in a sentence. For example, the sentence “fish oil and its active ingredient eicosapentaenoic acid (EPA) lowered blood viscosity” shows that fish oil and EPA can relieve the physiological phenomenon (blood viscosity). ese two kinds of relations can be further utilized to extract the relations between disease and therapeutic substance. As shown in the above example, it can be assumed that fish oil and EPA may relieve or heal Raynaud disease. erefore, such information is important for drug discovery and disease treatment. Cur- rently, a large amount of knowledge on diseases, symptoms, and therapeutic substances remains hidden in the literature and needs to be mined with IE technology. Generally, the methods of extracting the semantic rela- tion between biomedical entities include cooccurrence-based Hindawi Publishing Corporation BioMed Research International Volume 2016, Article ID 3594937, 13 pages http://dx.doi.org/10.1155/2016/3594937

Upload: others

Post on 25-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

Research ArticleSemisupervised Learning Based Disease-Symptom andSymptom-Therapeutic Substance Relation Extraction fromBiomedical Literature

Qinlin Feng1 Yingyi Gui2 Zhihao Yang1 Lei Wang3 and Yuxia Li3

1College of Computer Science and Technology Dalian University of Technology Dalian 116024 China2School of Optoelectronics Beijing Institute of Technology Beijing 100081 China3Beijing Institute of Health Administration and Medical Information Beijing 100850 China

Correspondence should be addressed to Zhihao Yang yangzhdluteducn and Lei Wang wangleibihamigmailcom

Received 24 April 2016 Revised 13 July 2016 Accepted 18 August 2016

Academic Editor Md Altaf-Ul-Amin

Copyright copy 2016 Qinlin Feng et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

With the rapid growth of biomedical literature a large amount of knowledge about diseases symptoms and therapeutic substanceshidden in the literature can be used for drug discovery and disease therapy In this paper we present a method of constructing twomodels for extracting the relations between the disease and symptom and symptom and therapeutic substance from biomedicaltexts respectively The former judges whether a disease causes a certain physiological phenomenon while the latter determineswhether a substance relieves or eliminates a certain physiological phenomenon These two kinds of relations can be furtherutilized to extract the relations between disease and therapeutic substance In our method first two training sets for extracting therelations between the disease-symptom and symptom-therapeutic substance are manually annotated and then two semisupervisedlearning algorithms that is Co-Training and Tri-Training are applied to utilize the unlabeled data to boost the relation extractionperformance Experimental results show that exploiting the unlabeled data with both Co-Training and Tri-Training algorithms canenhance the performance effectively

1 Introduction

In recent years with the rapid growth of biomedical literaturethe technology of information extraction (IE) has beenextensively applied to relation extraction in this literature forexample extracting the semantic relations between diseasesdrugs genes proteins and so forth [1ndash3] The related chal-lenges (eg BioCreative II protein-protein interaction (PPI)task [4] DDIExtraction 2011 [5] andDDIExtraction 2013 [6])have been held successfully

In our work we focus on extracting the relations betweendiseases and their symptoms and symptoms and their ther-apeutic substances These relations are defined the same asthose in [4ndash6] and also annotated at the sentence level Theformer is the relationship between a disease and its relatedphysiological phenomenon in a sentence For example thesentence ldquomany blood- and blood vessel-related character-istics are typical for Raynaud patients Blood viscosity and

platelet aggregability are highrdquo shows that blood viscosityand platelet aggregability are physiological phenomenon ofRaynaud diseaseThe latter is the relationship between a phys-iological phenomenon and the therapeutic substance that canrelieve it in a sentence For example the sentence ldquofish oiland its active ingredient eicosapentaenoic acid (EPA) loweredblood viscosityrdquo shows that fish oil and EPA can relieve thephysiological phenomenon (blood viscosity) These two kindsof relations can be further utilized to extract the relationsbetween disease and therapeutic substance As shown in theabove example it can be assumed that fish oil and EPA mayrelieve or heal Raynaud disease Therefore such informationis important for drug discovery and disease treatment Cur-rently a large amount of knowledge on diseases symptomsand therapeutic substances remains hidden in the literatureand needs to be mined with IE technology

Generally the methods of extracting the semantic rela-tion between biomedical entities include cooccurrence-based

Hindawi Publishing CorporationBioMed Research InternationalVolume 2016 Article ID 3594937 13 pageshttpdxdoiorg10115520163594937

2 BioMed Research International

methods [7] pattern-based methods [8] and machine learn-ing methods [9] Cooccurrence-based methods use frequentcooccurrence to extract the relations between entities Thismethod is simple and shows very low precision for high recall[10] Yen et al developed a cooccurrence approach basedon an information retrieval principle to extract gene-diseaserelationships from text [11] Pattern-based methods define aseries of patterns in advance and use pattern matching toextract the relations between entities Huang et al used adynamic programming algorithm to compute distinguish-ing patterns by aligning relevant sentences and key verbsthat describe protein interactions [12] Since templates aremanually defined its generalization ability is not satisfactoryMachine learning methods the most popular ones use clas-sification algorithms to extract the relations between entitiesfrom literature such as support vector machine (SVM) [13]maximum entropy [14] and Naive Bayes [15] Among otherskernel-based methods are widely used in relation extractionThese methods define different kernel functions to extractthe relations between entities such as graph kernel [16] treekernel [17] and walk path kernel [18]

The machine learning methods belong to the supervisedlearning ones which need a large of labeled examples to trainthe model However currently no corpuses for extraction ofdisease-symptom and symptom-therapeutic substance rela-tions are available In addition even if limited labeled data areavailable it is still difficult to achieve satisfactory generaliza-tion ability for a classifier To solve the problem we first man-ually annotated two training sets for extracting the relationsbetween the disease-symptom and symptom-therapeuticsubstance and then introduced the semisupervised learningmethods to utilize the unlabeled data for training themodels

Semisupervised learning methods attempt to exploit theunlabeled data to help improve the generalization ability ofthe classifier with limited labeled data They can be roughlydivided into four categories that is generative paramet-ric models [19] semisupervised support vector machines(S3VMs) [20] graph-based approaches [21] and Co-Training[22ndash27] Co-Training was proposed by Blum and Mitchell[22] This method requires two sufficient and redundantviews which do not exist in most real-world scenarios Inorder to relax this constraint Zhou and Li proposed a Tri-Training algorithm that neither requires the instance spaceto be described with sufficient and redundant views nor putsany constraints on the supervised learning method [28] Thealgorithm uses three classifiers which can not only tacklethe problem of determining how to label the unlabeled databut also improve generalization ability of a classifier withunlabeled dataWang et al made a large number of studies onCo-Training and proved that if two views have large diversityCo-Training is able to improve the learning performance byexploiting the unlabeled data even with insufficient views[23ndash25] Until now Tri-Training and Co-Training have beenwidely used in natural language processing Pierce andCardie[26] applied Co-Training to noun phrase recognition Theyregarded the current word and the 119896 words which appearbefore the current word in the document as a view and the 119896words appear after the current word as another view and thentrained the classifiers on these two views with Co-Training

algorithm Mavroeidis et al [29] applied Tri-Training algo-rithm to spam detection filtering and achieved a satisfactoryresult

Meanwhile the ensemble learning methods have beenproposed which combine the outputs of several base learnersto form an integrated output for enhancing the classificationperformance There are three popular ensemble methodsthat is Bagging [30] Boosting [31] and Random Subspace[32] The Bagging method uses random independent boot-strap replicates from a training dataset to construct baselearners and calculates the final result by a simple vote [30]For Boosting method the base learners are constructed onweighted versions of training set which are dependent onprevious base learnersrsquo results and the final result is calculatedby a simple vote or a weighted vote [31] The Random Sub-space method uses random subspaces of the feature space toconstruct the base learners [32]

In our method we regard three kernels (ie the featurekernel graph kernel and tree kernel whichwill be introducedin the following section) as three different views Co-Trainingand Tri-Training algorithms are then employed to exploitthe unlabeled data with these views and build the disease-symptommodel and symptom-therapeutic substance modelMeanwhile in the Tri-Training process we adopted theensemble learning method to integrate three individualkernels and achieved a satisfactory result

2 Methods

21 Feature Kernel The core work of the feature-basedmethod is feature selection which has a significant impacton the performance The following features are used in ourfeature-based kernel

(1) Word Feature Word feature uses two disordered setsof words which are between two concept entities (diseasessymptoms and therapeutic substances) and surrounding twoconceptual entities as the eigenvectorThe features surround-ing two concept entitiesrsquo names include the leftMwords of thefirst concept entity name and the rightM words of the secondconcept entity name (in our experimentsM is set to 4)

(2) N-Gram Word Feature In our method we use N-gram(119873 = 1 2 and 3 in our experiments) words from the left fourwords of the first concept entity to the right four words of thesecond concept as features N-gram features enrich the wordfeature and add contextual information which can effectivelyexpress the relation of concept entities

(3) Position FeatureThe relative position information ofwordfeature and N-gram feature for the concept entities has animportant influence on relation extraction and thereforeis introduced into our method For example ldquoE1 L featurerdquodenotes a word feature or N-gram feature appears in the leftof first concept entity ldquoE B featurerdquo between two conceptentities ldquoE2 R featurerdquo in the right of second concept entity

(4) Interaction Word and Distance Features Some wordssuch as ldquoinducerdquo ldquoactionrdquo and ldquoimproverdquo often imply the

BioMed Research International 3

V-WalksNMOD NMOD NMOD OBJ

ENTRY1 Interleukin and Interleukin ENTRY2 risk risk increases

E-WalksNMOD NMOD OBJ

Interleukin

NMOD

risk

NMOD

NMOD

OBJ

NMOD

ENTRY1 and Interleukin increases ENTRY2 risk

Figure 1 An example of a dependency graph The candidate interaction pair is marked as ldquoENTRY1rdquo and ldquoENTRY2rdquo

existence of relations Therefore the existence of these words(we called interaction words) is chosen as a binary feature Inaddition we found that the shorter the distance between twoconcept entities is the more likely the two concept entitieshave an interactive relationship Therefore the distance ischosen as a feature For example ldquoDISLessThanTreerdquo is a fea-ture value showing that the distance between the two conceptentities is less than three

The initial eigenvector extracted with our feature-basedkernel has a high dimension and includes many sparsefeatures In order to reduce the dimension we employed thedocument frequency method [33] to select features Initiallythe feature-based kernel method extracts 248000 featuresfrom the disease-symptom training set and we preserved thefeatures with document frequencies exceeding five (a totalof 12000 features) Similarly 345000 features were extractedfrom the symptom-therapeutic substance training set and13700 features were retained

22 Convolution Tree Kernel In our method convolutiontree kernel 119870119888(1198791 1198792) a special convolution kernel is usedto obtain useful structural information from substructureIt calculates the syntactic structure similarity between twoparse trees by counting the number of common subtrees ofthe two parse trees rooted by 1198791 and 1198792

119870119888 (1198791 1198792) = sum1198991isin11987311198992isin1198732

Δ (1198991 1198992) (1)

where119873119895 denotes the set of nodes in the tree 119879119895 and Δ(1198991 1198992)denotes the number of common subtrees of the two parsetrees rooted by 1198991 and 1198992

221 Tree Pruning in Convolution Kernel In our methodStanford parser [34] is used to parse the sentences Before asentence is parsed the concept entity pairs in the sentence arereplaced with ldquoENTRY1rdquo and ldquoENTRY2rdquo and other entitiesare replaced with ldquoENTRYrdquo Take gene-gene interactionbetween C0021764 and interleukin increases C0002395 risk

(the sentence is processed with MetaMap and the two con-cept entities are represented with their CUIs) for example Itis replacedwith ldquogene-gene interaction between ENTRY1 andinterleukin increases ENTRY2 riskrdquo Then we use Stanfordparser to parse the sentence to get a Complete Tree (CT)Since a CT includes too much contextual information whichmay introduce many noisy features we used the methoddescribed in [35] to obtain the shortest path enclosed tree(SPT)and replace the CTwith it SPT is the smallest commonsubtree including the two concept entities which is a part ofCT

222 Predicate Argument Path The representation of apredicate argument is a graphic structure which expresses thedeep syntactic and semantic relations between words In thepredicate argument structure different substructures on theshortest path between the two concept entities have differentinformation An example of a dependency graph is shown inFigure 1 In our method v-walk and e-walk features (whichare both on the shortest dependency paths) are added intothe tree kernel V-walk contains the syntactic and semanticrelations between two words For example in Figure 1 therelation between ldquoENTRY1rdquo and ldquointerleukinrdquo is ldquoNMODrdquoand the relation between ldquoriskrdquo and ldquoincreasesrdquo is ldquoOBJrdquo andso forth E-walk contains the relations between a word andits two adjacent nodes Figure 1 shows the relation of ldquointer-leukinrdquo with its two adjacent nodes ldquoNMODrdquo and ldquoNMODrdquoand the relation of ldquoriskrdquo with its two adjacent nodesldquoNMODrdquo and ldquoOBJrdquo

23 Graph Kernel The graph kernel method uses the syntaxtree to express a graph structure of a sentence The similarityof two graphs is calculated by comparing the relation betweentwo public nodes (vertices) Our method uses the all-pathsgraph kernel proposed byAirola et al [16]Thekernel consistsof two directed subgraphs that is a parse graph and a graphrepresenting the linear order of words In Figure 2 the upperpart is the analysis of the structure subgraph and the lowerpart is the linear order subgraphThese two subgraphs denote

4 BioMed Research International

ENTRY1_IPNN_IP

interactsVBZ

withIN

ENTRYNN

Disassemble_IPVB_IP

toT

ENTRY2_IPNN_IP

filaments_IPNNS_IP

auxnsubj Nmod_IPPrep_with

xcomp

Xubj_IP Dobj_IP

03 03 03 03 03 03

03 03

09 09 09 09

09 09

Interacts_MVBZ_M

With_MIN_M

ENTRY_MNN_M

disassemble_MVB_IP_M

to_MTO_M

ENTRY2NN

filaments_ANNS_A

ENTRY1NN

09 09 09 09 09 09 09

Figure 2 Graph kernel with two directed subgraphs The candidate interaction pair is marked as ldquoENTRY1rdquo and ldquoENTRY2rdquo In thedependency based subgraph all nodes in a shortest path are specialized using a post-tag (IP) In the linear order subgraph possible tagsare (B)efore (M)iddle and (A)fter

the dependency structure and linear sequence of a sentencerespectively

In our method a simple weight allocation strategy ischosen that is the edges of the shortest path are assigned aweight of 09 other edges 03 all edges in the linear order sub-graph 09The representation thus allows us to emphasize theshortest pathwithout completely disregarding potentially rel-evant words outside of the path A graph kernel calculates thesimilarity between two input graphs by comparing the rela-tions between common vertices (nodes) A graph matrix119866 iscalculated as

119866 = 119871infin

sum119899=1

119860119899119871119879 (2)

where 119860 is an edge matrix whose rows and columns areindexed vertices119860 119894119895 is a weight if edge119881119894 is connected to edge119881119895 119871 is the label matrix whose row indicates the label and col-umn indicates the vertex 119871 119894119895 = 1 indicates that vertex119881119895 con-tains 119894th label The graph kernel 119870(119866 1198661015840) is defined by usingtwo input graph matrices 119866 and 1198661015840 [15]

119870(1198661198661015840) =119871

sum119894=1

119871

sum119895=1

1198661198941198951198661015840119894119895 (3)

24 Co-Training Algorithm The initial Co-Training algo-rithm (or standard Co-Training algorithm) was proposed byBlum and Mitchell [22] They assumed that the training sethas two sufficient and redundant views namely the set ofattributes meets two conditions First each attribute set issufficient to describe the problem that is if the training set issufficient each attribute set is able to learn a strong classifierSecond each attribute set is conditionally independent ofthe other given the class label Our Co-Training algorithm isdescribed in Algorithm 1

Algorithm 1 (Co-Training algorithm)

(1) Input is as followsThe labeled data 119871 and the unlabeled data 119880Initialize training set 1198711 1198712 (1198711 = 1198712 = 119871)Sufficient and redundant views 1198811 1198812Iteration number N

(2) Process is as follows

(21) Create a pool 119906 of examples by choosing 119899examples at random from 119880 119880 = 119880 minus 119906

(22) Use 1198711 to train a classifier ℎ1 in 1198811Use 1198712 to train a classifier ℎ2 in 1198812

(23) Use ℎ1 and ℎ2 to label the examples from u(24) Take119898 positive examples and119898 negative exam-

ples out which were consistently labeled by ℎ1and ℎ2 Then take 119901 positive examples out fromthe119898 positive examples and add them to 1198711 and1198712 respectively Choose 2119898 examples from119880 toreplenish u 119880 = 119880 minus 2119898119873 = 119873 minus 1

(25) Repeat the processes (22)ndash(24) until the unla-beled corpora 119880 are empty or the number ofunlabeled data in 119906 is less than a certain numberor119873 = 0

(3) Outputs are as followsThe classifiers ℎ1 and ℎ2

25 Tri-Training Algorithm The Co-Training algorithmrequires two sufficient and redundant views However thisconstraint does not exist in most real-world scenarios TheTri-Training algorithm neither requires the instance space tobe described with sufficient and redundant views and norputs any constraints on the supervised learning algorithm[28] In this algorithm three classifiers are used which can

BioMed Research International 5

Table 1 The details of two corpora

Corpus Training set Test set Unlabeled dataPositive Negative Positive Negative Total

Diseases and symptoms 299 299 249 250 19298Symptoms and therapeutic substances 300 300 249 249 19392

tackle the problem of determining how to label the unlabeleddata and produce the final hypothesis Our Tri-Trainingalgorithm is described in Algorithm 2

In addition the different classifiers calculate the similaritywith different aspects between the two sentences Combiningthe similarities can reduce the danger of missing importantfeatures Therefore in each Tri-Training round two differentensemble strategies are used to integrate the three classifiersfor further performance improvementThe first strategy inte-grates the classifiers with a simple votingmethodThe secondstrategy assigns each classifier with a different weight Thenthe normalized output 119870 of three classifier outputs 119870119898 (119898 =1 2 3) is defined as

119870 =119872

sum119898=1

120590119898119870119898

119872

sum119898=1

120590119898 = 1 120590119898 ge 0 forall119898(4)

where119872 represents the number of classifiers (119872 = 3 in ourmethod)

Algorithm 2 (Tri-Training algorithm)

(1) Input is as followsThe labeled data L and the unlabeled data UInitializing training set 1198711 1198712 1198713 (1198711 = 1198712 = 1198713 = 119871)Selecting views 1198811 1198812 and 1198813Iterations number N

(2) Process is as follows

(21) Create a pool 119906 of examples by choosing 119899examples at random from 119880 119880 = 119880 minus 119906

(22) Use 1198711 to train a classifier ℎ1 in 1198811Use 1198712 to train a classifier ℎ2 in 1198812Use 1198713 to train a classifier ℎ3 in 1198813

(23) Use ℎ1 ℎ2 and ℎ3 to label examples from 119906(24) Take119898 positive examples and119898 negative exam-

ples out which were consistently labeled byℎ1 ℎ2 and ℎ3 Then take 1199011 positive examplesfrom the 119898 positive examples and add them to1198711 1198712 and 1198713 respectively take 1199012 negativeexamples from the119898 negative examples and addthem to 1198711 1198712 and 1198713 respectively Choose 2119898examples from 119880 to replenish 119906 119880 = 119880 minus 2119898119873 = 119873 minus 1

(25) Repeat the processes (22)ndash(24) until the unla-beled corpora 119880 are empty or the number ofunlabeled data in 119906 is less than a certain numberor119873 = 0

(3) Outputs are as followsThe classifiers ℎ1 ℎ2 and ℎ3

3 Experiments and Results

31 Experimental Datasets In our experiments the diseaseand symptom corpus data was obtained through searchingSemantic MEDLINE Database [36] using 200 concepts cho-sen from MeSH (Medical Subject Headings) with semantictype ldquoDisease or Syndromerdquo Since these sentences (corpusdata) have been processed by SemRep [37] a natural languageprocessing tool based on the rule to identify relationshipin the MEDLINE documents the possibility of the relationbetween the two concept entities in the sentences is high Tolimit the semantic types of two concept entities in a sentencewe only preserved the sentences containing the concepts ofthe needed semantic types (ie biologic function cell functionfinding molecular function organism function organ or tissuefunction pathologic function phenomenon or process andphysiologic function) Finally we obtained a total of about20400 sentences from which we manually constructed twolabeled datasets as the initial training set 119879initial (598 labeledsentences as shown in Table 1) and test set (499 labeledsentences) respectively

During the manual annotation the following criteria areapplied the disease and symptom relationship indicates thatthe symptom is a physiological phenomenon of the diseaseIf an instance in a sentence semantically expresses the diseaseand symptom relationship it is labeled as a positive exampleAs in the example provided in Section 1 the sentence ldquomanyblood- and blood vessel-related characteristics are typical forRaynaud patients blood viscosity and platelet aggregabilityare highrdquo contains two positive examples that is Raynaudand blood viscosity and Raynaud and platelet aggregability Inaddition some special relationships such as ldquoB in Ardquo and ldquoAcan change Brdquo are also classified as the positive examples sincethey show a physiological phenomenon (B) occurs whensomeone has the disease (A) However if a relation in asentence is only a cooccurrence one it is labeled as a negativeexample For the patterns such as ldquoA is a Brdquo and ldquoA and Brdquothey are labeled as the negative examples since ldquoA is a Brdquo isa ldquoIS Ardquo relation and ldquoA and Brdquo is a coordination relationwhich are not the relations we need

The symptom-therapeutic substance corpus data wasobtained as follows First some ldquoAlzheimerrsquos diseaserdquo relatedsymptom terms were obtained from the Semantic MEDLINE

6 BioMed Research International

Database Then these symptom terms were used to searchthe database for the sentences which contain the query termsand terms belonging to the semantic types of therapeuticsubstance (eg pharmacologic substance and organic chemi-cal) We obtained about 20500 sentences and then manuallyannotated about 1100 sentences as the disease-symptomcorpora 600 labeled sentences are used as the initial trainingset and the remaining 498 labeled sentences as the test setSimilar to the disease and symptom relationship annotationthe following criteria are applied the symptom-therapeuticsubstance relationship indicates that a therapeutic substancecan relieve a physiological phenomenon If an instance ina sentence semantically expresses the symptom-therapeuticsubstance relationship it is labeled as a positive example Asin the example provided in Section 1 the sentence ldquofish oiland its active ingredient eicosapentaenoic acid (EPA) loweredblood viscosityrdquo contains two positive examples that is fish oiland blood viscosity and EPA and blood viscosity

When the manual annotation process was completedthe level of agreement was estimated Cohenrsquos kappa scoresbetween each annotator of two corpora are 0866 and 0903respectively and content analysis researchers generally thinkof aCohenrsquos kappa scoremore than 08 as good reliability [38]In addition the two corpora are available for academic use(see Supplementary Material available online at httpdxdoiorg10115520163594937)

32 Experimental Evaluation The evaluation metrics usedin our experiments are precision (P) recall (R) F-score (F)and Area under Roc Curve (AUC) [39] They are defined asfollows

119875 = TPTP + FP (5)

119877 = TPTP + FN (6)

119865 = 2 lowast 119875 lowast 119877119875 + 119877 (7)

AUC =sum119898+119894=1sum

119898minus

119895=1119867(119909119894 minus 119910119895)119898+119898minus

(8)

where TP denotes true interaction pair TN denotes truenoninteraction pair FP denotes false interaction pair andFN denotes false noninteraction pair F-score is the balancedmeasure for quantifying the performance of the systems Inaddition the AUC is also used to evaluate the performance ofour method It is not affected by the distribution of data andit has been advocated to be used for performance evaluationin the machine learning community [40] In formula (8)119898+and 119898minus are the numbers of positive and negative examplesrespectively and 1199091 119909119898

+are the outputs of the system for

the positive examples and 1199101 119910119898minusare the ones for the

negative examples The function119867(119903) is defined as follows

119867(119903) =

1 119903 gt 005 119903 = 00 119903 lt 0

(9)

Table 2The initial results on the disease-symptom test set Method1 integrates three classifiers (the feature kernel graph kernel and treekernel) with the same weight while Method 2 integrates them witha weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 9138 6211 7395 8713Graph kernel 9387 5977 7304 8721Tree kernel 6910 6289 6585 7337Method 1 9205 6328 7500 8947Method 2 9281 6055 7329 8974

33 The Initial Performance of the Disease-Symptom ModelTable 2 shows the performance of the classifiers on the initialdisease-symptom test set Feature kernel and graph kernelachieve almost the same performance which is better thanthat of tree kernel When the three classifiers are integratedwith the sameweight the higher F-score (7500) is obtainedwhile when they are integrated with a weight ratio of 4 4 2the F-score is a bit lower than that of feature kernel Howeverin both cases the AUC performances are improved whichshows that since different classifiers calculate the similaritywith different aspects between two sentences combiningthese similarities can boost the performance

331The Performance of Co-Training on theDisease-SymptomTest Set In our method the feature set for the disease-symptom model is divided into three views the featurekernel graph kernel and tree kernel In Co-Training exper-iments to compare the results of each combination of twoviews the experiments are divided into three groups as shownin Table 3 Each group uses same experimental parametersthat is u = 4000 m = 300 and p = 100 (u m and 119901 inAlgorithm 1) The performance curves of different combina-tions are shown in Figures 3 4 and 5 respectively and theirfinal results with different iteration times (13 27 and 22 resp)are shown in Table 3

From Figures 3 4 and 5 we can obtain the followingobservations (1) With the increase of the iteration timeand more unlabeled data added to the training set the F-score shows a rising trend The reason is that as the Co-Training process proceeds more and more unlabeled dataare labelled by one classifier for the other which improvesthe performance of both classifiers However after a numberof iterations the performance of the classifiers could not beimproved any more since too much noise (false positives andfalse negatives) may be introduced from the unlabeled data(2)The AUC of classifiers have different trends with differentcombinations of the views The AUC of the feature kernelfluctuate around 88 while the ones of the graph kernel fluc-tuate between 85 and 87 In contrast all of the tree kernelrsquosAUC have a rising trend since the performance of the initialtree kernel classifier is relatively low and then improved withthe relatively accurate labelled data provided by feature kernelor graph kernel

In fact the performance of semisupervised learningalgorithms is usually not stable because the unlabeled exam-ples may often be wrongly labeled during the learning

BioMed Research International 7

Feature kernelGraph kernel

Feature kernelGraph kernel

64

66

68

70

72

74

76

78F

-sco

re (

)

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

Figure 3 Co-Training performance curve of feature kernel and graph kernel on the disease-symptom test set

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

72747678808284868890

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

F-s

core

()

Figure 4 Co-Training performance curve of feature kernel and tree kernel on the disease-symptom test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

64

66

68

70

72

74

76

78

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

Figure 5 Co-Training performance curve of graph kernel and tree kernel on the disease-symptom test set

8 BioMed Research International

Table 3The results obtained with Co-Training on the disease-symptom test set Combinationmethod integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight

Combination View 119875 119877 119865-score AUC

Feature and graph kernelFeature kernel 8832 6797 7682 8801Graph kernel 8326 7188 7715 8754Combination 7491 8516 7971 8866

Feature and tree kernelFeature kernel 8606 6992 7715 8851Tree kernel 5780 9258 7117 7499Combination 7508 8711 8065 8718

Graph and tree kernelGraph kernel 8404 6992 7633 8604Tree kernel 5810 9531 7219 7810Combination 8243 7695 7960 8684

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

80

82

84

F-s

core

()

72

74

76

78

80

82

84

86

88

90

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

Figure 6 The Tri-Training performance on the disease-symptom test set

process [28] At the beginning of the Co-Training the num-ber of the noises is limited and unlabeled data added to thetraining set can help the classifiers improve the performanceHowever after a number of learning rounds more and morenoises introduced will cause the performance decline

332 The Performance of Tri-Training on the Disease-Symptom Test Set In our method we select three views toconduct the Tri-Training that is the feature kernel graphkernel and tree kernel In each Tri-Training round SVM isused to train the classifier on each view The parameters areset as follows u = 4000m = 300 1199011 = 100 1199012 = 0 and119873 =27 (u m 1199011 1199012 and119873 in Algorithm 2) Here 1199012 = 0 meansthat only the positive examples are added into the trainingset In this way the recall of the classifier can be improved (therecall is defined as the number of true positives divided by thetotal number of examples that actually belong to the positiveclass and usually more positive examples in the training setwill improve the recall) since it is lower compared with theprecision (see Table 2) The results are shown in Table 4 andFigure 6

Table 4 The results obtained with Tri-Training on the disease-symptom test set Method 1 integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight whileMethod 2 integrates them with a weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 8300 8008 8151 8880Graph kernel 7774 8594 8163 8980Tree kernel 5738 9414 7130 7600Method 1 7979 8789 8364 9157Method 2 7993 8555 8264 9075

Compared with the performances of the classifiers on theinitial disease-symptom test set shown in Table 2 the onesachieved through Tri-Training are significantly improvedThis shows that Tri-Training can exploit the unlabeled dataand improve the performance more effectively The reason isthat as mentioned in Section 1 the Tri-Training algorithmcan achieve satisfactory results while neither requiring theinstance space to be described with sufficient and redundant

BioMed Research International 9

Feature kernelGraph kernel

Feature kernelGraph kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

79

80

81

82

83

84

85

86

87

88

89

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

Figure 7 Co-Training performance curve of feature kernel and graph kernel on the symptom-therapeutic substance test set

views nor putting any constraints on the supervised learningmethod

In addition when three classifiers are integrated eitherwith the same weight or with a weight ratio of 4 4 2the higher F-scores and AUCs are obtained Furthermorecomparing the performance of Co-Training and Tri-Trainingshown in Tables 3 and 4 we found that in most casesTri-Training outperforms Co-Training The reason is thatthrough employing three classifiers Tri-Training is facilitatedwith good efficiency and generalization ability because itcould gracefully choose examples to label and use multipleclassifiers to compose the final hypothesis [28]

34 The Performance of the Symptom and Therapeutic Sub-stance Model Table 5 shows the performances of the clas-sifiers on the initial symptom-therapeutic substance test setSimilar to the results on the initial disease-symptom testset the feature kernel achieves the best performance whilethe tree kernel performs the worst One difference is thatwhen the three classifiers are integrated with a weight ratio of4 4 2 the higher F-score andAUC are obtainedwhile whenthey are integrated with the same weight the F-score andAUC are a little lower than those of feature kernel

341 The Performance of Co-Training on the Symptom andTherapeutic Substance Test Set Similar to that in the disease-symptom experiments the feature set for the symptom-therapeutic substance model is also divided into three viewsthe feature graph and tree kernels The experiments aredivided into three groups Each group uses the same experi-mental parameters that is u = 4000 m = 300 and p = 100The performance curves of different combinations are shownin Figures 7 8 and 9 and their final results with differentiteration times (27 26 and 9 resp) are shown in Table 6

From the figures we can draw similar conclusions as fromthe disease-symptom experiments In most cases the per-formance can be improved through the Co-Training process

Table 5 The initial results on the symptom-therapeutic substancetest set

Method 119875 119877 119865 AUCFeature kernel 7930 9076 8464 8790Graph kernel 7627 9036 8272 8730Tree kernel 6890 8273 7518 7994Method 1 7599 9277 8354 8759Method 2 7781 9438 8530 8894

while they are usually not stable since noisewill be introducedduring the learning process

342 The Performance of Tri-Training on the Symptom andTherapeutic Substance Test Set In the experiments of Tri-Training on the symptom-therapeutic substance the param-eters are set as follows u = 4000m = 300 1199011 = 100 1199012 = 0and119873 = 27 (um 1199011 1199012 and119873 in Algorithm 2) The resultsare shown in Table 7 and Figure 10

Compared with the performance of the classifiers onthe initial symptom-therapeutic substance test set shown inTable 6 the ones achieved through Tri-Training are alsoimproved as in the disease-symptom experiments This veri-fies that the Tri-Training algorithm is effective in utilizing theunlabeled data to boost the relation extraction performanceonce again When the three classifiers are integrated with aweight ratio of 4 4 2 a better AUC is obtained

Comparing the performance of Co-Training and Tri-Training on the symptom-therapeutic substance test set asshown in Tables 6 and 7 we found that in most cases Tri-Training outperforms Co-Training which is consistent withthe results achieved in the disease-symptom experimentsThis is due to the better efficiency and generalization ability ofTri-Training over Co-Training

In addition the performances of the classifiers on thedisease-symptom corpus are improved more than those on

10 BioMed Research International

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

7980818283848586878889

AUC

()

Figure 8 Co-Training performance curve of feature kernel and tree kernel on the symptom-therapeutic substance test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 90Iterators

1 2 3 4 5 6 7 8 90Iterators

74

76

78

80

82

84

86

F-s

core

()

79

80

81

82

83

84

85

86

87

88

89

AUC

()

Figure 9 Co-Training performance curve of graph kernel and tree kernel on the symptom-therapeutic substance test set

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

74

76

78

80

82

84

86

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

Figure 10 The results of Tri-Training on the symptom-therapeutic substance test set

BioMed Research International 11

Table 6 The results obtained with Co-Training on the symptom-therapeutic substance test set

Combination View 119875 119877 119865 AUC

Feature kernel and graph kernelFeature kernel 7800 9398 8525 8841Graph kernel 7151 9880 8297 8644Combination 7745 9518 8540 8910

Feature kernel and tree kernelFeature kernel 7872 9357 8551 8851Tree kernel 6713 9759 7954 8175Combination 7751 9679 8566 8861

Graph kernel and tree kernelGraph kernel 7414 9558 8351 8771Tree kernel 6782 9478 7906 8014Combination 7105 9759 8223 8624

Table 7 The results of Tri-Training on symptom-therapeutic sub-stance test set

119875 119877 119865 AUCFeature kernel 7898 9357 8566 8894Graph kernel 7431 9759 8437 8778Tree kernel 6801 9478 7919 8110Method 1 7477 9880 8512 8808Method 2 7562 9839 8551 8913

the symptom-therapeutic substance corpus There are tworeasons for that First on the symptom-therapeutic substancecorpus the classifiers have better performanceTherefore theCo-training and Tri-training algorithms have less room forthe performance improvement Second as the Co-trainingand Tri-training process proceeds more unlabeled data areadded into the training set which could introduce new infor-mation for the classifiers Therefore the recalls of the classi-fiers are improved Meanwhile more noise is also introducedcausing the precision decline For the initial classifiers thehigher the precision is the less the noise is introduced in theiterative process and the performance of the classifier wouldbe improved As a summary if the initial classifiers have bigdifference the performance can be improved through twoalgorithms In the experiment when more unlabeled dataare added to the training set the difference between theclassifiers becomes smallerThus after a number of iterationsperformance could not be improved any more

35 Some Examples for Disease-Symptom and Symptom-Ther-apeutic Substance Relations Extracted from Biomedical Liter-atures Some examples for disease-symptom or symptom-therapeutic substance relations extracted from biomedicalliteratures are shown in Tables 8 and 9 Table 8 shows somesymptoms of disease C0020541 (portal hypertension) Onesentence containing the relation between portal hypertensionand its symptomC0028778 (block) is provided Table 9 showssome relations between the symptom C0028778 (block) andsome therapeutic substances in which the sentences contain-ing the relations are provided

Table 8 Some disease-symptom relations extracted from biomedi-cal literature

Disease Symptom Sentence

C0020541(portalhypertension)

C0028778 (block) C0020541 as C2825142 ofintrahepatic C0028778accounted for 83 of thepatients (C0023891 65meta-C0022346 12) and

C0018920 11

C1565860C0035357C0005775C0014867C0232338

Table 9 Some symptom-therapeutic substance relations extractedfrom biomedical literature

Symptom Therapeuticsubstance Sentence

C0028778(block)

C0017302 (generalanesthetic agents)

Use-dependent conductionC0028778 produced by

volatile C0017302

C0006400(bupivacaine)

Epidural ropivacaine isknown to produce less

motor C0028778 comparedto C0006400 at anaesthetic

concentrations

C0053241(benzoquinone)

In contrast C0053241 andhydroquinone led to

g2-C0028778 rather than toa mitotic arrest

4 Conclusions and Future Work

Models for extracting the relations between the disease-symptom and symptom-therapeutic substance are importantfor further extracting knowledge about diseases and theirpotential therapeutic substances However currently there isno corpus available to train such models To solve the prob-lem we firstmanually annotated two training sets for extract-ing the relations Then two semisupervised learning algo-rithms that is Co-Training and Tri-Training are applied toexplore the unlabeled data to boost the performance Exper-imental results show that exploiting the unlabeled data withboth Co-Training and Tri-Training algorithms can enhance

12 BioMed Research International

the performance In particular through employing threeclassifiers Tri-training is facilitated with good efficiency andgeneralization ability since it could gracefully choose exam-ples to label and use multiple classifiers to compose the finalhypothesis [28] In addition its applicability is wide because itneither requires sufficient and redundant views nor puts anyconstraint on the employed supervised learning algorithm

In the future work we will study more effective semisu-pervised learningmethods to exploit the numerous unlabeleddata pieces in the biomedical literature On the other handwewill apply the disease-symptom and symptom-therapeuticsubstance models to extract the relations between diseasesand therapeutic substances from biomedical literature andpredict the potential therapeutic substances for certain dis-eases [41]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

Acknowledgments

This work is supported by the grants from the NaturalScience Foundation of China (nos 61272373 6107009861340020 61572102 and 61572098) Trans-Century TrainingProgram Foundation for the Talents by the Ministry of Edu-cation of China (NCET-13-0084) the Fundamental ResearchFunds for the Central Universities (nos DUT13JB09 andDUT14YQ213) and the Major State Research DevelopmentProgram of China (no 2016YFC0901902)

References

[1] D Hristovski B Peterlin J A Mitchell and S M HumphreyldquoUsing literature-based discovery to identify disease candidategenesrdquo International Journal of Medical Informatics vol 74 no2ndash4 pp 289ndash298 2005

[2] M N Prichard and C Shipman Jr ldquoA three-dimensional modelto analyze drug-drug interactionsrdquo Antiviral Research vol 14no 4-5 pp 181ndash205 1990

[3] Q-C Bui S Katrenko and PM A Sloot ldquoA hybrid approach toextract protein-protein interactionsrdquo Bioinformatics vol 27 no2 pp 259ndash265 2011

[4] M Krallinger F Leitner C Rodriguez-Penagos and A Valen-cia ldquoOverview of the protein-protein interaction annotationextraction task of BioCreative IIrdquo Genome Biology vol 9supplement 2 article S4 2008

[5] I Segura Bedmar P Martinez and D Sanchez Cisneros ldquoThe1st DDIExtraction-2011 challenge task extraction of Drug-Drug Interactions from biomedical textsrdquo in Proceedings ofthe 1st Challenge task on Drug-Drug Interaction Extraction(DDIExtraction rsquo11) pp 1ndash9 Huelva Spain September 2011

[6] I Segura-Bedmar P Martınez andM Herrero-Zazo SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from Biomed-ical Texts (DDIExtraction 2013) Association for ComputationalLinguistics 2013

[7] M Krallinger and A Valencia ldquoText-mining and information-retrieval services formolecular biologyrdquoGenome Biology vol 6no 7 article 224 2005

[8] T-K Jenssen A Laeliggreid J Komorowski and E Hovig ldquoAliterature network of human genes for high-throughput analysisof gene expressionrdquo Nature Genetics vol 28 no 1 pp 21ndash282001

[9] C Blaschke M A Andrade C Ouzounis and A ValencialdquoAutomatic extraction of biological information from scientifictext protein-protein interactionsrdquo in Proceedings of the Inter-national Conference on Intelligent Systems for Molecular Biology(ISMB rsquo99) pp 60ndash67 1999

[10] P Zweigenbaum D Demner-Fushman H Yu and K BCohen ldquoFrontiers of biomedical text mining current progressrdquoBriefings in Bioinformatics vol 8 no 5 pp 358ndash375 2007

[11] Y T Yen B Chen H W Chiu Y C Lee Y C Li and C YHsu ldquoDeveloping anNLP and IR-based algorithm for analyzinggene-disease relationshipsrdquoMethods of Information inMedicinevol 45 no 3 pp 321ndash329 2006

[12] M Huang X Zhu Y Hao D G Payan K Qu and M LildquoDiscovering patterns to extract proteinndashprotein interactionsfrom full textsrdquo Bioinformatics vol 20 no 18 pp 3604ndash36122004

[13] I Tsochantaridis T Hofmann T Joachims and Y AltunldquoSupport vector machine learning for interdependent andstructured output spacesrdquo inProceedings of the 21st InternationalConference on Machine Learning (ICML rsquo04) pp 823ndash830Alberta Canada July 2004

[14] J Xiao J Su G Zhou et al ldquoProtein-protein interaction extrac-tion a supervised learning approachrdquo in Proceedings of the 1stInternational Symposium on Semantic Mining in Biomedicinepp 51ndash59 Hinxton UK April 2005

[15] L A Nielsen ldquoExtracting protein-protein interactions usingsimple contextual featuresrdquo in Proceedings of the HLT-NAACLBioNLPWorkshop on Linking Natural Language and Biology pp120ndash121 ACM June 2006

[16] A Airola S Pyysalo J Bjorne T Pahikkala F Ginter and TSalakoski ldquoAll-paths graph kernel for protein-protein interac-tion extraction with evaluation of cross-corpus learningrdquo BMCBioinformatics vol 9 no 11 article 52 2008

[17] L Qian and G Zhou ldquoTree kernel-based proteinndashproteininteraction extraction from biomedical literaturerdquo Journal ofBiomedical Informatics vol 45 no 3 pp 535ndash543 2012

[18] S Kim J Yoon J Yang and S Park ldquoWalk-weighted subse-quence kernels for protein-protein interaction extractionrdquoBMCBioinformatics vol 11 no 1 article 107 2010

[19] D J Miller and H S Uyar ldquoA mixture of experts classifierwith learning based on both labelled and unlabelled datardquo inAdvances in Neural Information Processing Systems pp 571ndash5771997

[20] T Joachims ldquoTransductive inference for text classificationusing support vector machinesrdquo in Proceedings of the 16thInternational Conference on Machine Learning (ICML rsquo99) pp200ndash209 1999

[21] X Zhu Z Ghahramani and J Lafferty ldquoSemi-supervised learn-ing using gaussian fields and harmonic functionsrdquo in Proceed-ings of the 20th International Conference on Machine Learning(ICML rsquo03) vol 3 pp 912ndash919 Washington DC USA August2003

[22] A Blum and T Mitchell ldquoCombining labeled and unlabeleddata with co-trainingrdquo in Proceedings of the 11th Annual Confer-ence on Computational LearningTheory (COLT rsquo98) pp 92ndash100ACM 1998

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 2: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

2 BioMed Research International

methods [7] pattern-based methods [8] and machine learn-ing methods [9] Cooccurrence-based methods use frequentcooccurrence to extract the relations between entities Thismethod is simple and shows very low precision for high recall[10] Yen et al developed a cooccurrence approach basedon an information retrieval principle to extract gene-diseaserelationships from text [11] Pattern-based methods define aseries of patterns in advance and use pattern matching toextract the relations between entities Huang et al used adynamic programming algorithm to compute distinguish-ing patterns by aligning relevant sentences and key verbsthat describe protein interactions [12] Since templates aremanually defined its generalization ability is not satisfactoryMachine learning methods the most popular ones use clas-sification algorithms to extract the relations between entitiesfrom literature such as support vector machine (SVM) [13]maximum entropy [14] and Naive Bayes [15] Among otherskernel-based methods are widely used in relation extractionThese methods define different kernel functions to extractthe relations between entities such as graph kernel [16] treekernel [17] and walk path kernel [18]

The machine learning methods belong to the supervisedlearning ones which need a large of labeled examples to trainthe model However currently no corpuses for extraction ofdisease-symptom and symptom-therapeutic substance rela-tions are available In addition even if limited labeled data areavailable it is still difficult to achieve satisfactory generaliza-tion ability for a classifier To solve the problem we first man-ually annotated two training sets for extracting the relationsbetween the disease-symptom and symptom-therapeuticsubstance and then introduced the semisupervised learningmethods to utilize the unlabeled data for training themodels

Semisupervised learning methods attempt to exploit theunlabeled data to help improve the generalization ability ofthe classifier with limited labeled data They can be roughlydivided into four categories that is generative paramet-ric models [19] semisupervised support vector machines(S3VMs) [20] graph-based approaches [21] and Co-Training[22ndash27] Co-Training was proposed by Blum and Mitchell[22] This method requires two sufficient and redundantviews which do not exist in most real-world scenarios Inorder to relax this constraint Zhou and Li proposed a Tri-Training algorithm that neither requires the instance spaceto be described with sufficient and redundant views nor putsany constraints on the supervised learning method [28] Thealgorithm uses three classifiers which can not only tacklethe problem of determining how to label the unlabeled databut also improve generalization ability of a classifier withunlabeled dataWang et al made a large number of studies onCo-Training and proved that if two views have large diversityCo-Training is able to improve the learning performance byexploiting the unlabeled data even with insufficient views[23ndash25] Until now Tri-Training and Co-Training have beenwidely used in natural language processing Pierce andCardie[26] applied Co-Training to noun phrase recognition Theyregarded the current word and the 119896 words which appearbefore the current word in the document as a view and the 119896words appear after the current word as another view and thentrained the classifiers on these two views with Co-Training

algorithm Mavroeidis et al [29] applied Tri-Training algo-rithm to spam detection filtering and achieved a satisfactoryresult

Meanwhile the ensemble learning methods have beenproposed which combine the outputs of several base learnersto form an integrated output for enhancing the classificationperformance There are three popular ensemble methodsthat is Bagging [30] Boosting [31] and Random Subspace[32] The Bagging method uses random independent boot-strap replicates from a training dataset to construct baselearners and calculates the final result by a simple vote [30]For Boosting method the base learners are constructed onweighted versions of training set which are dependent onprevious base learnersrsquo results and the final result is calculatedby a simple vote or a weighted vote [31] The Random Sub-space method uses random subspaces of the feature space toconstruct the base learners [32]

In our method we regard three kernels (ie the featurekernel graph kernel and tree kernel whichwill be introducedin the following section) as three different views Co-Trainingand Tri-Training algorithms are then employed to exploitthe unlabeled data with these views and build the disease-symptommodel and symptom-therapeutic substance modelMeanwhile in the Tri-Training process we adopted theensemble learning method to integrate three individualkernels and achieved a satisfactory result

2 Methods

21 Feature Kernel The core work of the feature-basedmethod is feature selection which has a significant impacton the performance The following features are used in ourfeature-based kernel

(1) Word Feature Word feature uses two disordered setsof words which are between two concept entities (diseasessymptoms and therapeutic substances) and surrounding twoconceptual entities as the eigenvectorThe features surround-ing two concept entitiesrsquo names include the leftMwords of thefirst concept entity name and the rightM words of the secondconcept entity name (in our experimentsM is set to 4)

(2) N-Gram Word Feature In our method we use N-gram(119873 = 1 2 and 3 in our experiments) words from the left fourwords of the first concept entity to the right four words of thesecond concept as features N-gram features enrich the wordfeature and add contextual information which can effectivelyexpress the relation of concept entities

(3) Position FeatureThe relative position information ofwordfeature and N-gram feature for the concept entities has animportant influence on relation extraction and thereforeis introduced into our method For example ldquoE1 L featurerdquodenotes a word feature or N-gram feature appears in the leftof first concept entity ldquoE B featurerdquo between two conceptentities ldquoE2 R featurerdquo in the right of second concept entity

(4) Interaction Word and Distance Features Some wordssuch as ldquoinducerdquo ldquoactionrdquo and ldquoimproverdquo often imply the

BioMed Research International 3

V-WalksNMOD NMOD NMOD OBJ

ENTRY1 Interleukin and Interleukin ENTRY2 risk risk increases

E-WalksNMOD NMOD OBJ

Interleukin

NMOD

risk

NMOD

NMOD

OBJ

NMOD

ENTRY1 and Interleukin increases ENTRY2 risk

Figure 1 An example of a dependency graph The candidate interaction pair is marked as ldquoENTRY1rdquo and ldquoENTRY2rdquo

existence of relations Therefore the existence of these words(we called interaction words) is chosen as a binary feature Inaddition we found that the shorter the distance between twoconcept entities is the more likely the two concept entitieshave an interactive relationship Therefore the distance ischosen as a feature For example ldquoDISLessThanTreerdquo is a fea-ture value showing that the distance between the two conceptentities is less than three

The initial eigenvector extracted with our feature-basedkernel has a high dimension and includes many sparsefeatures In order to reduce the dimension we employed thedocument frequency method [33] to select features Initiallythe feature-based kernel method extracts 248000 featuresfrom the disease-symptom training set and we preserved thefeatures with document frequencies exceeding five (a totalof 12000 features) Similarly 345000 features were extractedfrom the symptom-therapeutic substance training set and13700 features were retained

22 Convolution Tree Kernel In our method convolutiontree kernel 119870119888(1198791 1198792) a special convolution kernel is usedto obtain useful structural information from substructureIt calculates the syntactic structure similarity between twoparse trees by counting the number of common subtrees ofthe two parse trees rooted by 1198791 and 1198792

119870119888 (1198791 1198792) = sum1198991isin11987311198992isin1198732

Δ (1198991 1198992) (1)

where119873119895 denotes the set of nodes in the tree 119879119895 and Δ(1198991 1198992)denotes the number of common subtrees of the two parsetrees rooted by 1198991 and 1198992

221 Tree Pruning in Convolution Kernel In our methodStanford parser [34] is used to parse the sentences Before asentence is parsed the concept entity pairs in the sentence arereplaced with ldquoENTRY1rdquo and ldquoENTRY2rdquo and other entitiesare replaced with ldquoENTRYrdquo Take gene-gene interactionbetween C0021764 and interleukin increases C0002395 risk

(the sentence is processed with MetaMap and the two con-cept entities are represented with their CUIs) for example Itis replacedwith ldquogene-gene interaction between ENTRY1 andinterleukin increases ENTRY2 riskrdquo Then we use Stanfordparser to parse the sentence to get a Complete Tree (CT)Since a CT includes too much contextual information whichmay introduce many noisy features we used the methoddescribed in [35] to obtain the shortest path enclosed tree(SPT)and replace the CTwith it SPT is the smallest commonsubtree including the two concept entities which is a part ofCT

222 Predicate Argument Path The representation of apredicate argument is a graphic structure which expresses thedeep syntactic and semantic relations between words In thepredicate argument structure different substructures on theshortest path between the two concept entities have differentinformation An example of a dependency graph is shown inFigure 1 In our method v-walk and e-walk features (whichare both on the shortest dependency paths) are added intothe tree kernel V-walk contains the syntactic and semanticrelations between two words For example in Figure 1 therelation between ldquoENTRY1rdquo and ldquointerleukinrdquo is ldquoNMODrdquoand the relation between ldquoriskrdquo and ldquoincreasesrdquo is ldquoOBJrdquo andso forth E-walk contains the relations between a word andits two adjacent nodes Figure 1 shows the relation of ldquointer-leukinrdquo with its two adjacent nodes ldquoNMODrdquo and ldquoNMODrdquoand the relation of ldquoriskrdquo with its two adjacent nodesldquoNMODrdquo and ldquoOBJrdquo

23 Graph Kernel The graph kernel method uses the syntaxtree to express a graph structure of a sentence The similarityof two graphs is calculated by comparing the relation betweentwo public nodes (vertices) Our method uses the all-pathsgraph kernel proposed byAirola et al [16]Thekernel consistsof two directed subgraphs that is a parse graph and a graphrepresenting the linear order of words In Figure 2 the upperpart is the analysis of the structure subgraph and the lowerpart is the linear order subgraphThese two subgraphs denote

4 BioMed Research International

ENTRY1_IPNN_IP

interactsVBZ

withIN

ENTRYNN

Disassemble_IPVB_IP

toT

ENTRY2_IPNN_IP

filaments_IPNNS_IP

auxnsubj Nmod_IPPrep_with

xcomp

Xubj_IP Dobj_IP

03 03 03 03 03 03

03 03

09 09 09 09

09 09

Interacts_MVBZ_M

With_MIN_M

ENTRY_MNN_M

disassemble_MVB_IP_M

to_MTO_M

ENTRY2NN

filaments_ANNS_A

ENTRY1NN

09 09 09 09 09 09 09

Figure 2 Graph kernel with two directed subgraphs The candidate interaction pair is marked as ldquoENTRY1rdquo and ldquoENTRY2rdquo In thedependency based subgraph all nodes in a shortest path are specialized using a post-tag (IP) In the linear order subgraph possible tagsare (B)efore (M)iddle and (A)fter

the dependency structure and linear sequence of a sentencerespectively

In our method a simple weight allocation strategy ischosen that is the edges of the shortest path are assigned aweight of 09 other edges 03 all edges in the linear order sub-graph 09The representation thus allows us to emphasize theshortest pathwithout completely disregarding potentially rel-evant words outside of the path A graph kernel calculates thesimilarity between two input graphs by comparing the rela-tions between common vertices (nodes) A graph matrix119866 iscalculated as

119866 = 119871infin

sum119899=1

119860119899119871119879 (2)

where 119860 is an edge matrix whose rows and columns areindexed vertices119860 119894119895 is a weight if edge119881119894 is connected to edge119881119895 119871 is the label matrix whose row indicates the label and col-umn indicates the vertex 119871 119894119895 = 1 indicates that vertex119881119895 con-tains 119894th label The graph kernel 119870(119866 1198661015840) is defined by usingtwo input graph matrices 119866 and 1198661015840 [15]

119870(1198661198661015840) =119871

sum119894=1

119871

sum119895=1

1198661198941198951198661015840119894119895 (3)

24 Co-Training Algorithm The initial Co-Training algo-rithm (or standard Co-Training algorithm) was proposed byBlum and Mitchell [22] They assumed that the training sethas two sufficient and redundant views namely the set ofattributes meets two conditions First each attribute set issufficient to describe the problem that is if the training set issufficient each attribute set is able to learn a strong classifierSecond each attribute set is conditionally independent ofthe other given the class label Our Co-Training algorithm isdescribed in Algorithm 1

Algorithm 1 (Co-Training algorithm)

(1) Input is as followsThe labeled data 119871 and the unlabeled data 119880Initialize training set 1198711 1198712 (1198711 = 1198712 = 119871)Sufficient and redundant views 1198811 1198812Iteration number N

(2) Process is as follows

(21) Create a pool 119906 of examples by choosing 119899examples at random from 119880 119880 = 119880 minus 119906

(22) Use 1198711 to train a classifier ℎ1 in 1198811Use 1198712 to train a classifier ℎ2 in 1198812

(23) Use ℎ1 and ℎ2 to label the examples from u(24) Take119898 positive examples and119898 negative exam-

ples out which were consistently labeled by ℎ1and ℎ2 Then take 119901 positive examples out fromthe119898 positive examples and add them to 1198711 and1198712 respectively Choose 2119898 examples from119880 toreplenish u 119880 = 119880 minus 2119898119873 = 119873 minus 1

(25) Repeat the processes (22)ndash(24) until the unla-beled corpora 119880 are empty or the number ofunlabeled data in 119906 is less than a certain numberor119873 = 0

(3) Outputs are as followsThe classifiers ℎ1 and ℎ2

25 Tri-Training Algorithm The Co-Training algorithmrequires two sufficient and redundant views However thisconstraint does not exist in most real-world scenarios TheTri-Training algorithm neither requires the instance space tobe described with sufficient and redundant views and norputs any constraints on the supervised learning algorithm[28] In this algorithm three classifiers are used which can

BioMed Research International 5

Table 1 The details of two corpora

Corpus Training set Test set Unlabeled dataPositive Negative Positive Negative Total

Diseases and symptoms 299 299 249 250 19298Symptoms and therapeutic substances 300 300 249 249 19392

tackle the problem of determining how to label the unlabeleddata and produce the final hypothesis Our Tri-Trainingalgorithm is described in Algorithm 2

In addition the different classifiers calculate the similaritywith different aspects between the two sentences Combiningthe similarities can reduce the danger of missing importantfeatures Therefore in each Tri-Training round two differentensemble strategies are used to integrate the three classifiersfor further performance improvementThe first strategy inte-grates the classifiers with a simple votingmethodThe secondstrategy assigns each classifier with a different weight Thenthe normalized output 119870 of three classifier outputs 119870119898 (119898 =1 2 3) is defined as

119870 =119872

sum119898=1

120590119898119870119898

119872

sum119898=1

120590119898 = 1 120590119898 ge 0 forall119898(4)

where119872 represents the number of classifiers (119872 = 3 in ourmethod)

Algorithm 2 (Tri-Training algorithm)

(1) Input is as followsThe labeled data L and the unlabeled data UInitializing training set 1198711 1198712 1198713 (1198711 = 1198712 = 1198713 = 119871)Selecting views 1198811 1198812 and 1198813Iterations number N

(2) Process is as follows

(21) Create a pool 119906 of examples by choosing 119899examples at random from 119880 119880 = 119880 minus 119906

(22) Use 1198711 to train a classifier ℎ1 in 1198811Use 1198712 to train a classifier ℎ2 in 1198812Use 1198713 to train a classifier ℎ3 in 1198813

(23) Use ℎ1 ℎ2 and ℎ3 to label examples from 119906(24) Take119898 positive examples and119898 negative exam-

ples out which were consistently labeled byℎ1 ℎ2 and ℎ3 Then take 1199011 positive examplesfrom the 119898 positive examples and add them to1198711 1198712 and 1198713 respectively take 1199012 negativeexamples from the119898 negative examples and addthem to 1198711 1198712 and 1198713 respectively Choose 2119898examples from 119880 to replenish 119906 119880 = 119880 minus 2119898119873 = 119873 minus 1

(25) Repeat the processes (22)ndash(24) until the unla-beled corpora 119880 are empty or the number ofunlabeled data in 119906 is less than a certain numberor119873 = 0

(3) Outputs are as followsThe classifiers ℎ1 ℎ2 and ℎ3

3 Experiments and Results

31 Experimental Datasets In our experiments the diseaseand symptom corpus data was obtained through searchingSemantic MEDLINE Database [36] using 200 concepts cho-sen from MeSH (Medical Subject Headings) with semantictype ldquoDisease or Syndromerdquo Since these sentences (corpusdata) have been processed by SemRep [37] a natural languageprocessing tool based on the rule to identify relationshipin the MEDLINE documents the possibility of the relationbetween the two concept entities in the sentences is high Tolimit the semantic types of two concept entities in a sentencewe only preserved the sentences containing the concepts ofthe needed semantic types (ie biologic function cell functionfinding molecular function organism function organ or tissuefunction pathologic function phenomenon or process andphysiologic function) Finally we obtained a total of about20400 sentences from which we manually constructed twolabeled datasets as the initial training set 119879initial (598 labeledsentences as shown in Table 1) and test set (499 labeledsentences) respectively

During the manual annotation the following criteria areapplied the disease and symptom relationship indicates thatthe symptom is a physiological phenomenon of the diseaseIf an instance in a sentence semantically expresses the diseaseand symptom relationship it is labeled as a positive exampleAs in the example provided in Section 1 the sentence ldquomanyblood- and blood vessel-related characteristics are typical forRaynaud patients blood viscosity and platelet aggregabilityare highrdquo contains two positive examples that is Raynaudand blood viscosity and Raynaud and platelet aggregability Inaddition some special relationships such as ldquoB in Ardquo and ldquoAcan change Brdquo are also classified as the positive examples sincethey show a physiological phenomenon (B) occurs whensomeone has the disease (A) However if a relation in asentence is only a cooccurrence one it is labeled as a negativeexample For the patterns such as ldquoA is a Brdquo and ldquoA and Brdquothey are labeled as the negative examples since ldquoA is a Brdquo isa ldquoIS Ardquo relation and ldquoA and Brdquo is a coordination relationwhich are not the relations we need

The symptom-therapeutic substance corpus data wasobtained as follows First some ldquoAlzheimerrsquos diseaserdquo relatedsymptom terms were obtained from the Semantic MEDLINE

6 BioMed Research International

Database Then these symptom terms were used to searchthe database for the sentences which contain the query termsand terms belonging to the semantic types of therapeuticsubstance (eg pharmacologic substance and organic chemi-cal) We obtained about 20500 sentences and then manuallyannotated about 1100 sentences as the disease-symptomcorpora 600 labeled sentences are used as the initial trainingset and the remaining 498 labeled sentences as the test setSimilar to the disease and symptom relationship annotationthe following criteria are applied the symptom-therapeuticsubstance relationship indicates that a therapeutic substancecan relieve a physiological phenomenon If an instance ina sentence semantically expresses the symptom-therapeuticsubstance relationship it is labeled as a positive example Asin the example provided in Section 1 the sentence ldquofish oiland its active ingredient eicosapentaenoic acid (EPA) loweredblood viscosityrdquo contains two positive examples that is fish oiland blood viscosity and EPA and blood viscosity

When the manual annotation process was completedthe level of agreement was estimated Cohenrsquos kappa scoresbetween each annotator of two corpora are 0866 and 0903respectively and content analysis researchers generally thinkof aCohenrsquos kappa scoremore than 08 as good reliability [38]In addition the two corpora are available for academic use(see Supplementary Material available online at httpdxdoiorg10115520163594937)

32 Experimental Evaluation The evaluation metrics usedin our experiments are precision (P) recall (R) F-score (F)and Area under Roc Curve (AUC) [39] They are defined asfollows

119875 = TPTP + FP (5)

119877 = TPTP + FN (6)

119865 = 2 lowast 119875 lowast 119877119875 + 119877 (7)

AUC =sum119898+119894=1sum

119898minus

119895=1119867(119909119894 minus 119910119895)119898+119898minus

(8)

where TP denotes true interaction pair TN denotes truenoninteraction pair FP denotes false interaction pair andFN denotes false noninteraction pair F-score is the balancedmeasure for quantifying the performance of the systems Inaddition the AUC is also used to evaluate the performance ofour method It is not affected by the distribution of data andit has been advocated to be used for performance evaluationin the machine learning community [40] In formula (8)119898+and 119898minus are the numbers of positive and negative examplesrespectively and 1199091 119909119898

+are the outputs of the system for

the positive examples and 1199101 119910119898minusare the ones for the

negative examples The function119867(119903) is defined as follows

119867(119903) =

1 119903 gt 005 119903 = 00 119903 lt 0

(9)

Table 2The initial results on the disease-symptom test set Method1 integrates three classifiers (the feature kernel graph kernel and treekernel) with the same weight while Method 2 integrates them witha weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 9138 6211 7395 8713Graph kernel 9387 5977 7304 8721Tree kernel 6910 6289 6585 7337Method 1 9205 6328 7500 8947Method 2 9281 6055 7329 8974

33 The Initial Performance of the Disease-Symptom ModelTable 2 shows the performance of the classifiers on the initialdisease-symptom test set Feature kernel and graph kernelachieve almost the same performance which is better thanthat of tree kernel When the three classifiers are integratedwith the sameweight the higher F-score (7500) is obtainedwhile when they are integrated with a weight ratio of 4 4 2the F-score is a bit lower than that of feature kernel Howeverin both cases the AUC performances are improved whichshows that since different classifiers calculate the similaritywith different aspects between two sentences combiningthese similarities can boost the performance

331The Performance of Co-Training on theDisease-SymptomTest Set In our method the feature set for the disease-symptom model is divided into three views the featurekernel graph kernel and tree kernel In Co-Training exper-iments to compare the results of each combination of twoviews the experiments are divided into three groups as shownin Table 3 Each group uses same experimental parametersthat is u = 4000 m = 300 and p = 100 (u m and 119901 inAlgorithm 1) The performance curves of different combina-tions are shown in Figures 3 4 and 5 respectively and theirfinal results with different iteration times (13 27 and 22 resp)are shown in Table 3

From Figures 3 4 and 5 we can obtain the followingobservations (1) With the increase of the iteration timeand more unlabeled data added to the training set the F-score shows a rising trend The reason is that as the Co-Training process proceeds more and more unlabeled dataare labelled by one classifier for the other which improvesthe performance of both classifiers However after a numberof iterations the performance of the classifiers could not beimproved any more since too much noise (false positives andfalse negatives) may be introduced from the unlabeled data(2)The AUC of classifiers have different trends with differentcombinations of the views The AUC of the feature kernelfluctuate around 88 while the ones of the graph kernel fluc-tuate between 85 and 87 In contrast all of the tree kernelrsquosAUC have a rising trend since the performance of the initialtree kernel classifier is relatively low and then improved withthe relatively accurate labelled data provided by feature kernelor graph kernel

In fact the performance of semisupervised learningalgorithms is usually not stable because the unlabeled exam-ples may often be wrongly labeled during the learning

BioMed Research International 7

Feature kernelGraph kernel

Feature kernelGraph kernel

64

66

68

70

72

74

76

78F

-sco

re (

)

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

Figure 3 Co-Training performance curve of feature kernel and graph kernel on the disease-symptom test set

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

72747678808284868890

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

F-s

core

()

Figure 4 Co-Training performance curve of feature kernel and tree kernel on the disease-symptom test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

64

66

68

70

72

74

76

78

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

Figure 5 Co-Training performance curve of graph kernel and tree kernel on the disease-symptom test set

8 BioMed Research International

Table 3The results obtained with Co-Training on the disease-symptom test set Combinationmethod integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight

Combination View 119875 119877 119865-score AUC

Feature and graph kernelFeature kernel 8832 6797 7682 8801Graph kernel 8326 7188 7715 8754Combination 7491 8516 7971 8866

Feature and tree kernelFeature kernel 8606 6992 7715 8851Tree kernel 5780 9258 7117 7499Combination 7508 8711 8065 8718

Graph and tree kernelGraph kernel 8404 6992 7633 8604Tree kernel 5810 9531 7219 7810Combination 8243 7695 7960 8684

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

80

82

84

F-s

core

()

72

74

76

78

80

82

84

86

88

90

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

Figure 6 The Tri-Training performance on the disease-symptom test set

process [28] At the beginning of the Co-Training the num-ber of the noises is limited and unlabeled data added to thetraining set can help the classifiers improve the performanceHowever after a number of learning rounds more and morenoises introduced will cause the performance decline

332 The Performance of Tri-Training on the Disease-Symptom Test Set In our method we select three views toconduct the Tri-Training that is the feature kernel graphkernel and tree kernel In each Tri-Training round SVM isused to train the classifier on each view The parameters areset as follows u = 4000m = 300 1199011 = 100 1199012 = 0 and119873 =27 (u m 1199011 1199012 and119873 in Algorithm 2) Here 1199012 = 0 meansthat only the positive examples are added into the trainingset In this way the recall of the classifier can be improved (therecall is defined as the number of true positives divided by thetotal number of examples that actually belong to the positiveclass and usually more positive examples in the training setwill improve the recall) since it is lower compared with theprecision (see Table 2) The results are shown in Table 4 andFigure 6

Table 4 The results obtained with Tri-Training on the disease-symptom test set Method 1 integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight whileMethod 2 integrates them with a weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 8300 8008 8151 8880Graph kernel 7774 8594 8163 8980Tree kernel 5738 9414 7130 7600Method 1 7979 8789 8364 9157Method 2 7993 8555 8264 9075

Compared with the performances of the classifiers on theinitial disease-symptom test set shown in Table 2 the onesachieved through Tri-Training are significantly improvedThis shows that Tri-Training can exploit the unlabeled dataand improve the performance more effectively The reason isthat as mentioned in Section 1 the Tri-Training algorithmcan achieve satisfactory results while neither requiring theinstance space to be described with sufficient and redundant

BioMed Research International 9

Feature kernelGraph kernel

Feature kernelGraph kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

79

80

81

82

83

84

85

86

87

88

89

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

Figure 7 Co-Training performance curve of feature kernel and graph kernel on the symptom-therapeutic substance test set

views nor putting any constraints on the supervised learningmethod

In addition when three classifiers are integrated eitherwith the same weight or with a weight ratio of 4 4 2the higher F-scores and AUCs are obtained Furthermorecomparing the performance of Co-Training and Tri-Trainingshown in Tables 3 and 4 we found that in most casesTri-Training outperforms Co-Training The reason is thatthrough employing three classifiers Tri-Training is facilitatedwith good efficiency and generalization ability because itcould gracefully choose examples to label and use multipleclassifiers to compose the final hypothesis [28]

34 The Performance of the Symptom and Therapeutic Sub-stance Model Table 5 shows the performances of the clas-sifiers on the initial symptom-therapeutic substance test setSimilar to the results on the initial disease-symptom testset the feature kernel achieves the best performance whilethe tree kernel performs the worst One difference is thatwhen the three classifiers are integrated with a weight ratio of4 4 2 the higher F-score andAUC are obtainedwhile whenthey are integrated with the same weight the F-score andAUC are a little lower than those of feature kernel

341 The Performance of Co-Training on the Symptom andTherapeutic Substance Test Set Similar to that in the disease-symptom experiments the feature set for the symptom-therapeutic substance model is also divided into three viewsthe feature graph and tree kernels The experiments aredivided into three groups Each group uses the same experi-mental parameters that is u = 4000 m = 300 and p = 100The performance curves of different combinations are shownin Figures 7 8 and 9 and their final results with differentiteration times (27 26 and 9 resp) are shown in Table 6

From the figures we can draw similar conclusions as fromthe disease-symptom experiments In most cases the per-formance can be improved through the Co-Training process

Table 5 The initial results on the symptom-therapeutic substancetest set

Method 119875 119877 119865 AUCFeature kernel 7930 9076 8464 8790Graph kernel 7627 9036 8272 8730Tree kernel 6890 8273 7518 7994Method 1 7599 9277 8354 8759Method 2 7781 9438 8530 8894

while they are usually not stable since noisewill be introducedduring the learning process

342 The Performance of Tri-Training on the Symptom andTherapeutic Substance Test Set In the experiments of Tri-Training on the symptom-therapeutic substance the param-eters are set as follows u = 4000m = 300 1199011 = 100 1199012 = 0and119873 = 27 (um 1199011 1199012 and119873 in Algorithm 2) The resultsare shown in Table 7 and Figure 10

Compared with the performance of the classifiers onthe initial symptom-therapeutic substance test set shown inTable 6 the ones achieved through Tri-Training are alsoimproved as in the disease-symptom experiments This veri-fies that the Tri-Training algorithm is effective in utilizing theunlabeled data to boost the relation extraction performanceonce again When the three classifiers are integrated with aweight ratio of 4 4 2 a better AUC is obtained

Comparing the performance of Co-Training and Tri-Training on the symptom-therapeutic substance test set asshown in Tables 6 and 7 we found that in most cases Tri-Training outperforms Co-Training which is consistent withthe results achieved in the disease-symptom experimentsThis is due to the better efficiency and generalization ability ofTri-Training over Co-Training

In addition the performances of the classifiers on thedisease-symptom corpus are improved more than those on

10 BioMed Research International

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

7980818283848586878889

AUC

()

Figure 8 Co-Training performance curve of feature kernel and tree kernel on the symptom-therapeutic substance test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 90Iterators

1 2 3 4 5 6 7 8 90Iterators

74

76

78

80

82

84

86

F-s

core

()

79

80

81

82

83

84

85

86

87

88

89

AUC

()

Figure 9 Co-Training performance curve of graph kernel and tree kernel on the symptom-therapeutic substance test set

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

74

76

78

80

82

84

86

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

Figure 10 The results of Tri-Training on the symptom-therapeutic substance test set

BioMed Research International 11

Table 6 The results obtained with Co-Training on the symptom-therapeutic substance test set

Combination View 119875 119877 119865 AUC

Feature kernel and graph kernelFeature kernel 7800 9398 8525 8841Graph kernel 7151 9880 8297 8644Combination 7745 9518 8540 8910

Feature kernel and tree kernelFeature kernel 7872 9357 8551 8851Tree kernel 6713 9759 7954 8175Combination 7751 9679 8566 8861

Graph kernel and tree kernelGraph kernel 7414 9558 8351 8771Tree kernel 6782 9478 7906 8014Combination 7105 9759 8223 8624

Table 7 The results of Tri-Training on symptom-therapeutic sub-stance test set

119875 119877 119865 AUCFeature kernel 7898 9357 8566 8894Graph kernel 7431 9759 8437 8778Tree kernel 6801 9478 7919 8110Method 1 7477 9880 8512 8808Method 2 7562 9839 8551 8913

the symptom-therapeutic substance corpus There are tworeasons for that First on the symptom-therapeutic substancecorpus the classifiers have better performanceTherefore theCo-training and Tri-training algorithms have less room forthe performance improvement Second as the Co-trainingand Tri-training process proceeds more unlabeled data areadded into the training set which could introduce new infor-mation for the classifiers Therefore the recalls of the classi-fiers are improved Meanwhile more noise is also introducedcausing the precision decline For the initial classifiers thehigher the precision is the less the noise is introduced in theiterative process and the performance of the classifier wouldbe improved As a summary if the initial classifiers have bigdifference the performance can be improved through twoalgorithms In the experiment when more unlabeled dataare added to the training set the difference between theclassifiers becomes smallerThus after a number of iterationsperformance could not be improved any more

35 Some Examples for Disease-Symptom and Symptom-Ther-apeutic Substance Relations Extracted from Biomedical Liter-atures Some examples for disease-symptom or symptom-therapeutic substance relations extracted from biomedicalliteratures are shown in Tables 8 and 9 Table 8 shows somesymptoms of disease C0020541 (portal hypertension) Onesentence containing the relation between portal hypertensionand its symptomC0028778 (block) is provided Table 9 showssome relations between the symptom C0028778 (block) andsome therapeutic substances in which the sentences contain-ing the relations are provided

Table 8 Some disease-symptom relations extracted from biomedi-cal literature

Disease Symptom Sentence

C0020541(portalhypertension)

C0028778 (block) C0020541 as C2825142 ofintrahepatic C0028778accounted for 83 of thepatients (C0023891 65meta-C0022346 12) and

C0018920 11

C1565860C0035357C0005775C0014867C0232338

Table 9 Some symptom-therapeutic substance relations extractedfrom biomedical literature

Symptom Therapeuticsubstance Sentence

C0028778(block)

C0017302 (generalanesthetic agents)

Use-dependent conductionC0028778 produced by

volatile C0017302

C0006400(bupivacaine)

Epidural ropivacaine isknown to produce less

motor C0028778 comparedto C0006400 at anaesthetic

concentrations

C0053241(benzoquinone)

In contrast C0053241 andhydroquinone led to

g2-C0028778 rather than toa mitotic arrest

4 Conclusions and Future Work

Models for extracting the relations between the disease-symptom and symptom-therapeutic substance are importantfor further extracting knowledge about diseases and theirpotential therapeutic substances However currently there isno corpus available to train such models To solve the prob-lem we firstmanually annotated two training sets for extract-ing the relations Then two semisupervised learning algo-rithms that is Co-Training and Tri-Training are applied toexplore the unlabeled data to boost the performance Exper-imental results show that exploiting the unlabeled data withboth Co-Training and Tri-Training algorithms can enhance

12 BioMed Research International

the performance In particular through employing threeclassifiers Tri-training is facilitated with good efficiency andgeneralization ability since it could gracefully choose exam-ples to label and use multiple classifiers to compose the finalhypothesis [28] In addition its applicability is wide because itneither requires sufficient and redundant views nor puts anyconstraint on the employed supervised learning algorithm

In the future work we will study more effective semisu-pervised learningmethods to exploit the numerous unlabeleddata pieces in the biomedical literature On the other handwewill apply the disease-symptom and symptom-therapeuticsubstance models to extract the relations between diseasesand therapeutic substances from biomedical literature andpredict the potential therapeutic substances for certain dis-eases [41]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

Acknowledgments

This work is supported by the grants from the NaturalScience Foundation of China (nos 61272373 6107009861340020 61572102 and 61572098) Trans-Century TrainingProgram Foundation for the Talents by the Ministry of Edu-cation of China (NCET-13-0084) the Fundamental ResearchFunds for the Central Universities (nos DUT13JB09 andDUT14YQ213) and the Major State Research DevelopmentProgram of China (no 2016YFC0901902)

References

[1] D Hristovski B Peterlin J A Mitchell and S M HumphreyldquoUsing literature-based discovery to identify disease candidategenesrdquo International Journal of Medical Informatics vol 74 no2ndash4 pp 289ndash298 2005

[2] M N Prichard and C Shipman Jr ldquoA three-dimensional modelto analyze drug-drug interactionsrdquo Antiviral Research vol 14no 4-5 pp 181ndash205 1990

[3] Q-C Bui S Katrenko and PM A Sloot ldquoA hybrid approach toextract protein-protein interactionsrdquo Bioinformatics vol 27 no2 pp 259ndash265 2011

[4] M Krallinger F Leitner C Rodriguez-Penagos and A Valen-cia ldquoOverview of the protein-protein interaction annotationextraction task of BioCreative IIrdquo Genome Biology vol 9supplement 2 article S4 2008

[5] I Segura Bedmar P Martinez and D Sanchez Cisneros ldquoThe1st DDIExtraction-2011 challenge task extraction of Drug-Drug Interactions from biomedical textsrdquo in Proceedings ofthe 1st Challenge task on Drug-Drug Interaction Extraction(DDIExtraction rsquo11) pp 1ndash9 Huelva Spain September 2011

[6] I Segura-Bedmar P Martınez andM Herrero-Zazo SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from Biomed-ical Texts (DDIExtraction 2013) Association for ComputationalLinguistics 2013

[7] M Krallinger and A Valencia ldquoText-mining and information-retrieval services formolecular biologyrdquoGenome Biology vol 6no 7 article 224 2005

[8] T-K Jenssen A Laeliggreid J Komorowski and E Hovig ldquoAliterature network of human genes for high-throughput analysisof gene expressionrdquo Nature Genetics vol 28 no 1 pp 21ndash282001

[9] C Blaschke M A Andrade C Ouzounis and A ValencialdquoAutomatic extraction of biological information from scientifictext protein-protein interactionsrdquo in Proceedings of the Inter-national Conference on Intelligent Systems for Molecular Biology(ISMB rsquo99) pp 60ndash67 1999

[10] P Zweigenbaum D Demner-Fushman H Yu and K BCohen ldquoFrontiers of biomedical text mining current progressrdquoBriefings in Bioinformatics vol 8 no 5 pp 358ndash375 2007

[11] Y T Yen B Chen H W Chiu Y C Lee Y C Li and C YHsu ldquoDeveloping anNLP and IR-based algorithm for analyzinggene-disease relationshipsrdquoMethods of Information inMedicinevol 45 no 3 pp 321ndash329 2006

[12] M Huang X Zhu Y Hao D G Payan K Qu and M LildquoDiscovering patterns to extract proteinndashprotein interactionsfrom full textsrdquo Bioinformatics vol 20 no 18 pp 3604ndash36122004

[13] I Tsochantaridis T Hofmann T Joachims and Y AltunldquoSupport vector machine learning for interdependent andstructured output spacesrdquo inProceedings of the 21st InternationalConference on Machine Learning (ICML rsquo04) pp 823ndash830Alberta Canada July 2004

[14] J Xiao J Su G Zhou et al ldquoProtein-protein interaction extrac-tion a supervised learning approachrdquo in Proceedings of the 1stInternational Symposium on Semantic Mining in Biomedicinepp 51ndash59 Hinxton UK April 2005

[15] L A Nielsen ldquoExtracting protein-protein interactions usingsimple contextual featuresrdquo in Proceedings of the HLT-NAACLBioNLPWorkshop on Linking Natural Language and Biology pp120ndash121 ACM June 2006

[16] A Airola S Pyysalo J Bjorne T Pahikkala F Ginter and TSalakoski ldquoAll-paths graph kernel for protein-protein interac-tion extraction with evaluation of cross-corpus learningrdquo BMCBioinformatics vol 9 no 11 article 52 2008

[17] L Qian and G Zhou ldquoTree kernel-based proteinndashproteininteraction extraction from biomedical literaturerdquo Journal ofBiomedical Informatics vol 45 no 3 pp 535ndash543 2012

[18] S Kim J Yoon J Yang and S Park ldquoWalk-weighted subse-quence kernels for protein-protein interaction extractionrdquoBMCBioinformatics vol 11 no 1 article 107 2010

[19] D J Miller and H S Uyar ldquoA mixture of experts classifierwith learning based on both labelled and unlabelled datardquo inAdvances in Neural Information Processing Systems pp 571ndash5771997

[20] T Joachims ldquoTransductive inference for text classificationusing support vector machinesrdquo in Proceedings of the 16thInternational Conference on Machine Learning (ICML rsquo99) pp200ndash209 1999

[21] X Zhu Z Ghahramani and J Lafferty ldquoSemi-supervised learn-ing using gaussian fields and harmonic functionsrdquo in Proceed-ings of the 20th International Conference on Machine Learning(ICML rsquo03) vol 3 pp 912ndash919 Washington DC USA August2003

[22] A Blum and T Mitchell ldquoCombining labeled and unlabeleddata with co-trainingrdquo in Proceedings of the 11th Annual Confer-ence on Computational LearningTheory (COLT rsquo98) pp 92ndash100ACM 1998

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 3: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

BioMed Research International 3

V-WalksNMOD NMOD NMOD OBJ

ENTRY1 Interleukin and Interleukin ENTRY2 risk risk increases

E-WalksNMOD NMOD OBJ

Interleukin

NMOD

risk

NMOD

NMOD

OBJ

NMOD

ENTRY1 and Interleukin increases ENTRY2 risk

Figure 1 An example of a dependency graph The candidate interaction pair is marked as ldquoENTRY1rdquo and ldquoENTRY2rdquo

existence of relations Therefore the existence of these words(we called interaction words) is chosen as a binary feature Inaddition we found that the shorter the distance between twoconcept entities is the more likely the two concept entitieshave an interactive relationship Therefore the distance ischosen as a feature For example ldquoDISLessThanTreerdquo is a fea-ture value showing that the distance between the two conceptentities is less than three

The initial eigenvector extracted with our feature-basedkernel has a high dimension and includes many sparsefeatures In order to reduce the dimension we employed thedocument frequency method [33] to select features Initiallythe feature-based kernel method extracts 248000 featuresfrom the disease-symptom training set and we preserved thefeatures with document frequencies exceeding five (a totalof 12000 features) Similarly 345000 features were extractedfrom the symptom-therapeutic substance training set and13700 features were retained

22 Convolution Tree Kernel In our method convolutiontree kernel 119870119888(1198791 1198792) a special convolution kernel is usedto obtain useful structural information from substructureIt calculates the syntactic structure similarity between twoparse trees by counting the number of common subtrees ofthe two parse trees rooted by 1198791 and 1198792

119870119888 (1198791 1198792) = sum1198991isin11987311198992isin1198732

Δ (1198991 1198992) (1)

where119873119895 denotes the set of nodes in the tree 119879119895 and Δ(1198991 1198992)denotes the number of common subtrees of the two parsetrees rooted by 1198991 and 1198992

221 Tree Pruning in Convolution Kernel In our methodStanford parser [34] is used to parse the sentences Before asentence is parsed the concept entity pairs in the sentence arereplaced with ldquoENTRY1rdquo and ldquoENTRY2rdquo and other entitiesare replaced with ldquoENTRYrdquo Take gene-gene interactionbetween C0021764 and interleukin increases C0002395 risk

(the sentence is processed with MetaMap and the two con-cept entities are represented with their CUIs) for example Itis replacedwith ldquogene-gene interaction between ENTRY1 andinterleukin increases ENTRY2 riskrdquo Then we use Stanfordparser to parse the sentence to get a Complete Tree (CT)Since a CT includes too much contextual information whichmay introduce many noisy features we used the methoddescribed in [35] to obtain the shortest path enclosed tree(SPT)and replace the CTwith it SPT is the smallest commonsubtree including the two concept entities which is a part ofCT

222 Predicate Argument Path The representation of apredicate argument is a graphic structure which expresses thedeep syntactic and semantic relations between words In thepredicate argument structure different substructures on theshortest path between the two concept entities have differentinformation An example of a dependency graph is shown inFigure 1 In our method v-walk and e-walk features (whichare both on the shortest dependency paths) are added intothe tree kernel V-walk contains the syntactic and semanticrelations between two words For example in Figure 1 therelation between ldquoENTRY1rdquo and ldquointerleukinrdquo is ldquoNMODrdquoand the relation between ldquoriskrdquo and ldquoincreasesrdquo is ldquoOBJrdquo andso forth E-walk contains the relations between a word andits two adjacent nodes Figure 1 shows the relation of ldquointer-leukinrdquo with its two adjacent nodes ldquoNMODrdquo and ldquoNMODrdquoand the relation of ldquoriskrdquo with its two adjacent nodesldquoNMODrdquo and ldquoOBJrdquo

23 Graph Kernel The graph kernel method uses the syntaxtree to express a graph structure of a sentence The similarityof two graphs is calculated by comparing the relation betweentwo public nodes (vertices) Our method uses the all-pathsgraph kernel proposed byAirola et al [16]Thekernel consistsof two directed subgraphs that is a parse graph and a graphrepresenting the linear order of words In Figure 2 the upperpart is the analysis of the structure subgraph and the lowerpart is the linear order subgraphThese two subgraphs denote

4 BioMed Research International

ENTRY1_IPNN_IP

interactsVBZ

withIN

ENTRYNN

Disassemble_IPVB_IP

toT

ENTRY2_IPNN_IP

filaments_IPNNS_IP

auxnsubj Nmod_IPPrep_with

xcomp

Xubj_IP Dobj_IP

03 03 03 03 03 03

03 03

09 09 09 09

09 09

Interacts_MVBZ_M

With_MIN_M

ENTRY_MNN_M

disassemble_MVB_IP_M

to_MTO_M

ENTRY2NN

filaments_ANNS_A

ENTRY1NN

09 09 09 09 09 09 09

Figure 2 Graph kernel with two directed subgraphs The candidate interaction pair is marked as ldquoENTRY1rdquo and ldquoENTRY2rdquo In thedependency based subgraph all nodes in a shortest path are specialized using a post-tag (IP) In the linear order subgraph possible tagsare (B)efore (M)iddle and (A)fter

the dependency structure and linear sequence of a sentencerespectively

In our method a simple weight allocation strategy ischosen that is the edges of the shortest path are assigned aweight of 09 other edges 03 all edges in the linear order sub-graph 09The representation thus allows us to emphasize theshortest pathwithout completely disregarding potentially rel-evant words outside of the path A graph kernel calculates thesimilarity between two input graphs by comparing the rela-tions between common vertices (nodes) A graph matrix119866 iscalculated as

119866 = 119871infin

sum119899=1

119860119899119871119879 (2)

where 119860 is an edge matrix whose rows and columns areindexed vertices119860 119894119895 is a weight if edge119881119894 is connected to edge119881119895 119871 is the label matrix whose row indicates the label and col-umn indicates the vertex 119871 119894119895 = 1 indicates that vertex119881119895 con-tains 119894th label The graph kernel 119870(119866 1198661015840) is defined by usingtwo input graph matrices 119866 and 1198661015840 [15]

119870(1198661198661015840) =119871

sum119894=1

119871

sum119895=1

1198661198941198951198661015840119894119895 (3)

24 Co-Training Algorithm The initial Co-Training algo-rithm (or standard Co-Training algorithm) was proposed byBlum and Mitchell [22] They assumed that the training sethas two sufficient and redundant views namely the set ofattributes meets two conditions First each attribute set issufficient to describe the problem that is if the training set issufficient each attribute set is able to learn a strong classifierSecond each attribute set is conditionally independent ofthe other given the class label Our Co-Training algorithm isdescribed in Algorithm 1

Algorithm 1 (Co-Training algorithm)

(1) Input is as followsThe labeled data 119871 and the unlabeled data 119880Initialize training set 1198711 1198712 (1198711 = 1198712 = 119871)Sufficient and redundant views 1198811 1198812Iteration number N

(2) Process is as follows

(21) Create a pool 119906 of examples by choosing 119899examples at random from 119880 119880 = 119880 minus 119906

(22) Use 1198711 to train a classifier ℎ1 in 1198811Use 1198712 to train a classifier ℎ2 in 1198812

(23) Use ℎ1 and ℎ2 to label the examples from u(24) Take119898 positive examples and119898 negative exam-

ples out which were consistently labeled by ℎ1and ℎ2 Then take 119901 positive examples out fromthe119898 positive examples and add them to 1198711 and1198712 respectively Choose 2119898 examples from119880 toreplenish u 119880 = 119880 minus 2119898119873 = 119873 minus 1

(25) Repeat the processes (22)ndash(24) until the unla-beled corpora 119880 are empty or the number ofunlabeled data in 119906 is less than a certain numberor119873 = 0

(3) Outputs are as followsThe classifiers ℎ1 and ℎ2

25 Tri-Training Algorithm The Co-Training algorithmrequires two sufficient and redundant views However thisconstraint does not exist in most real-world scenarios TheTri-Training algorithm neither requires the instance space tobe described with sufficient and redundant views and norputs any constraints on the supervised learning algorithm[28] In this algorithm three classifiers are used which can

BioMed Research International 5

Table 1 The details of two corpora

Corpus Training set Test set Unlabeled dataPositive Negative Positive Negative Total

Diseases and symptoms 299 299 249 250 19298Symptoms and therapeutic substances 300 300 249 249 19392

tackle the problem of determining how to label the unlabeleddata and produce the final hypothesis Our Tri-Trainingalgorithm is described in Algorithm 2

In addition the different classifiers calculate the similaritywith different aspects between the two sentences Combiningthe similarities can reduce the danger of missing importantfeatures Therefore in each Tri-Training round two differentensemble strategies are used to integrate the three classifiersfor further performance improvementThe first strategy inte-grates the classifiers with a simple votingmethodThe secondstrategy assigns each classifier with a different weight Thenthe normalized output 119870 of three classifier outputs 119870119898 (119898 =1 2 3) is defined as

119870 =119872

sum119898=1

120590119898119870119898

119872

sum119898=1

120590119898 = 1 120590119898 ge 0 forall119898(4)

where119872 represents the number of classifiers (119872 = 3 in ourmethod)

Algorithm 2 (Tri-Training algorithm)

(1) Input is as followsThe labeled data L and the unlabeled data UInitializing training set 1198711 1198712 1198713 (1198711 = 1198712 = 1198713 = 119871)Selecting views 1198811 1198812 and 1198813Iterations number N

(2) Process is as follows

(21) Create a pool 119906 of examples by choosing 119899examples at random from 119880 119880 = 119880 minus 119906

(22) Use 1198711 to train a classifier ℎ1 in 1198811Use 1198712 to train a classifier ℎ2 in 1198812Use 1198713 to train a classifier ℎ3 in 1198813

(23) Use ℎ1 ℎ2 and ℎ3 to label examples from 119906(24) Take119898 positive examples and119898 negative exam-

ples out which were consistently labeled byℎ1 ℎ2 and ℎ3 Then take 1199011 positive examplesfrom the 119898 positive examples and add them to1198711 1198712 and 1198713 respectively take 1199012 negativeexamples from the119898 negative examples and addthem to 1198711 1198712 and 1198713 respectively Choose 2119898examples from 119880 to replenish 119906 119880 = 119880 minus 2119898119873 = 119873 minus 1

(25) Repeat the processes (22)ndash(24) until the unla-beled corpora 119880 are empty or the number ofunlabeled data in 119906 is less than a certain numberor119873 = 0

(3) Outputs are as followsThe classifiers ℎ1 ℎ2 and ℎ3

3 Experiments and Results

31 Experimental Datasets In our experiments the diseaseand symptom corpus data was obtained through searchingSemantic MEDLINE Database [36] using 200 concepts cho-sen from MeSH (Medical Subject Headings) with semantictype ldquoDisease or Syndromerdquo Since these sentences (corpusdata) have been processed by SemRep [37] a natural languageprocessing tool based on the rule to identify relationshipin the MEDLINE documents the possibility of the relationbetween the two concept entities in the sentences is high Tolimit the semantic types of two concept entities in a sentencewe only preserved the sentences containing the concepts ofthe needed semantic types (ie biologic function cell functionfinding molecular function organism function organ or tissuefunction pathologic function phenomenon or process andphysiologic function) Finally we obtained a total of about20400 sentences from which we manually constructed twolabeled datasets as the initial training set 119879initial (598 labeledsentences as shown in Table 1) and test set (499 labeledsentences) respectively

During the manual annotation the following criteria areapplied the disease and symptom relationship indicates thatthe symptom is a physiological phenomenon of the diseaseIf an instance in a sentence semantically expresses the diseaseand symptom relationship it is labeled as a positive exampleAs in the example provided in Section 1 the sentence ldquomanyblood- and blood vessel-related characteristics are typical forRaynaud patients blood viscosity and platelet aggregabilityare highrdquo contains two positive examples that is Raynaudand blood viscosity and Raynaud and platelet aggregability Inaddition some special relationships such as ldquoB in Ardquo and ldquoAcan change Brdquo are also classified as the positive examples sincethey show a physiological phenomenon (B) occurs whensomeone has the disease (A) However if a relation in asentence is only a cooccurrence one it is labeled as a negativeexample For the patterns such as ldquoA is a Brdquo and ldquoA and Brdquothey are labeled as the negative examples since ldquoA is a Brdquo isa ldquoIS Ardquo relation and ldquoA and Brdquo is a coordination relationwhich are not the relations we need

The symptom-therapeutic substance corpus data wasobtained as follows First some ldquoAlzheimerrsquos diseaserdquo relatedsymptom terms were obtained from the Semantic MEDLINE

6 BioMed Research International

Database Then these symptom terms were used to searchthe database for the sentences which contain the query termsand terms belonging to the semantic types of therapeuticsubstance (eg pharmacologic substance and organic chemi-cal) We obtained about 20500 sentences and then manuallyannotated about 1100 sentences as the disease-symptomcorpora 600 labeled sentences are used as the initial trainingset and the remaining 498 labeled sentences as the test setSimilar to the disease and symptom relationship annotationthe following criteria are applied the symptom-therapeuticsubstance relationship indicates that a therapeutic substancecan relieve a physiological phenomenon If an instance ina sentence semantically expresses the symptom-therapeuticsubstance relationship it is labeled as a positive example Asin the example provided in Section 1 the sentence ldquofish oiland its active ingredient eicosapentaenoic acid (EPA) loweredblood viscosityrdquo contains two positive examples that is fish oiland blood viscosity and EPA and blood viscosity

When the manual annotation process was completedthe level of agreement was estimated Cohenrsquos kappa scoresbetween each annotator of two corpora are 0866 and 0903respectively and content analysis researchers generally thinkof aCohenrsquos kappa scoremore than 08 as good reliability [38]In addition the two corpora are available for academic use(see Supplementary Material available online at httpdxdoiorg10115520163594937)

32 Experimental Evaluation The evaluation metrics usedin our experiments are precision (P) recall (R) F-score (F)and Area under Roc Curve (AUC) [39] They are defined asfollows

119875 = TPTP + FP (5)

119877 = TPTP + FN (6)

119865 = 2 lowast 119875 lowast 119877119875 + 119877 (7)

AUC =sum119898+119894=1sum

119898minus

119895=1119867(119909119894 minus 119910119895)119898+119898minus

(8)

where TP denotes true interaction pair TN denotes truenoninteraction pair FP denotes false interaction pair andFN denotes false noninteraction pair F-score is the balancedmeasure for quantifying the performance of the systems Inaddition the AUC is also used to evaluate the performance ofour method It is not affected by the distribution of data andit has been advocated to be used for performance evaluationin the machine learning community [40] In formula (8)119898+and 119898minus are the numbers of positive and negative examplesrespectively and 1199091 119909119898

+are the outputs of the system for

the positive examples and 1199101 119910119898minusare the ones for the

negative examples The function119867(119903) is defined as follows

119867(119903) =

1 119903 gt 005 119903 = 00 119903 lt 0

(9)

Table 2The initial results on the disease-symptom test set Method1 integrates three classifiers (the feature kernel graph kernel and treekernel) with the same weight while Method 2 integrates them witha weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 9138 6211 7395 8713Graph kernel 9387 5977 7304 8721Tree kernel 6910 6289 6585 7337Method 1 9205 6328 7500 8947Method 2 9281 6055 7329 8974

33 The Initial Performance of the Disease-Symptom ModelTable 2 shows the performance of the classifiers on the initialdisease-symptom test set Feature kernel and graph kernelachieve almost the same performance which is better thanthat of tree kernel When the three classifiers are integratedwith the sameweight the higher F-score (7500) is obtainedwhile when they are integrated with a weight ratio of 4 4 2the F-score is a bit lower than that of feature kernel Howeverin both cases the AUC performances are improved whichshows that since different classifiers calculate the similaritywith different aspects between two sentences combiningthese similarities can boost the performance

331The Performance of Co-Training on theDisease-SymptomTest Set In our method the feature set for the disease-symptom model is divided into three views the featurekernel graph kernel and tree kernel In Co-Training exper-iments to compare the results of each combination of twoviews the experiments are divided into three groups as shownin Table 3 Each group uses same experimental parametersthat is u = 4000 m = 300 and p = 100 (u m and 119901 inAlgorithm 1) The performance curves of different combina-tions are shown in Figures 3 4 and 5 respectively and theirfinal results with different iteration times (13 27 and 22 resp)are shown in Table 3

From Figures 3 4 and 5 we can obtain the followingobservations (1) With the increase of the iteration timeand more unlabeled data added to the training set the F-score shows a rising trend The reason is that as the Co-Training process proceeds more and more unlabeled dataare labelled by one classifier for the other which improvesthe performance of both classifiers However after a numberof iterations the performance of the classifiers could not beimproved any more since too much noise (false positives andfalse negatives) may be introduced from the unlabeled data(2)The AUC of classifiers have different trends with differentcombinations of the views The AUC of the feature kernelfluctuate around 88 while the ones of the graph kernel fluc-tuate between 85 and 87 In contrast all of the tree kernelrsquosAUC have a rising trend since the performance of the initialtree kernel classifier is relatively low and then improved withthe relatively accurate labelled data provided by feature kernelor graph kernel

In fact the performance of semisupervised learningalgorithms is usually not stable because the unlabeled exam-ples may often be wrongly labeled during the learning

BioMed Research International 7

Feature kernelGraph kernel

Feature kernelGraph kernel

64

66

68

70

72

74

76

78F

-sco

re (

)

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

Figure 3 Co-Training performance curve of feature kernel and graph kernel on the disease-symptom test set

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

72747678808284868890

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

F-s

core

()

Figure 4 Co-Training performance curve of feature kernel and tree kernel on the disease-symptom test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

64

66

68

70

72

74

76

78

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

Figure 5 Co-Training performance curve of graph kernel and tree kernel on the disease-symptom test set

8 BioMed Research International

Table 3The results obtained with Co-Training on the disease-symptom test set Combinationmethod integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight

Combination View 119875 119877 119865-score AUC

Feature and graph kernelFeature kernel 8832 6797 7682 8801Graph kernel 8326 7188 7715 8754Combination 7491 8516 7971 8866

Feature and tree kernelFeature kernel 8606 6992 7715 8851Tree kernel 5780 9258 7117 7499Combination 7508 8711 8065 8718

Graph and tree kernelGraph kernel 8404 6992 7633 8604Tree kernel 5810 9531 7219 7810Combination 8243 7695 7960 8684

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

80

82

84

F-s

core

()

72

74

76

78

80

82

84

86

88

90

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

Figure 6 The Tri-Training performance on the disease-symptom test set

process [28] At the beginning of the Co-Training the num-ber of the noises is limited and unlabeled data added to thetraining set can help the classifiers improve the performanceHowever after a number of learning rounds more and morenoises introduced will cause the performance decline

332 The Performance of Tri-Training on the Disease-Symptom Test Set In our method we select three views toconduct the Tri-Training that is the feature kernel graphkernel and tree kernel In each Tri-Training round SVM isused to train the classifier on each view The parameters areset as follows u = 4000m = 300 1199011 = 100 1199012 = 0 and119873 =27 (u m 1199011 1199012 and119873 in Algorithm 2) Here 1199012 = 0 meansthat only the positive examples are added into the trainingset In this way the recall of the classifier can be improved (therecall is defined as the number of true positives divided by thetotal number of examples that actually belong to the positiveclass and usually more positive examples in the training setwill improve the recall) since it is lower compared with theprecision (see Table 2) The results are shown in Table 4 andFigure 6

Table 4 The results obtained with Tri-Training on the disease-symptom test set Method 1 integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight whileMethod 2 integrates them with a weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 8300 8008 8151 8880Graph kernel 7774 8594 8163 8980Tree kernel 5738 9414 7130 7600Method 1 7979 8789 8364 9157Method 2 7993 8555 8264 9075

Compared with the performances of the classifiers on theinitial disease-symptom test set shown in Table 2 the onesachieved through Tri-Training are significantly improvedThis shows that Tri-Training can exploit the unlabeled dataand improve the performance more effectively The reason isthat as mentioned in Section 1 the Tri-Training algorithmcan achieve satisfactory results while neither requiring theinstance space to be described with sufficient and redundant

BioMed Research International 9

Feature kernelGraph kernel

Feature kernelGraph kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

79

80

81

82

83

84

85

86

87

88

89

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

Figure 7 Co-Training performance curve of feature kernel and graph kernel on the symptom-therapeutic substance test set

views nor putting any constraints on the supervised learningmethod

In addition when three classifiers are integrated eitherwith the same weight or with a weight ratio of 4 4 2the higher F-scores and AUCs are obtained Furthermorecomparing the performance of Co-Training and Tri-Trainingshown in Tables 3 and 4 we found that in most casesTri-Training outperforms Co-Training The reason is thatthrough employing three classifiers Tri-Training is facilitatedwith good efficiency and generalization ability because itcould gracefully choose examples to label and use multipleclassifiers to compose the final hypothesis [28]

34 The Performance of the Symptom and Therapeutic Sub-stance Model Table 5 shows the performances of the clas-sifiers on the initial symptom-therapeutic substance test setSimilar to the results on the initial disease-symptom testset the feature kernel achieves the best performance whilethe tree kernel performs the worst One difference is thatwhen the three classifiers are integrated with a weight ratio of4 4 2 the higher F-score andAUC are obtainedwhile whenthey are integrated with the same weight the F-score andAUC are a little lower than those of feature kernel

341 The Performance of Co-Training on the Symptom andTherapeutic Substance Test Set Similar to that in the disease-symptom experiments the feature set for the symptom-therapeutic substance model is also divided into three viewsthe feature graph and tree kernels The experiments aredivided into three groups Each group uses the same experi-mental parameters that is u = 4000 m = 300 and p = 100The performance curves of different combinations are shownin Figures 7 8 and 9 and their final results with differentiteration times (27 26 and 9 resp) are shown in Table 6

From the figures we can draw similar conclusions as fromthe disease-symptom experiments In most cases the per-formance can be improved through the Co-Training process

Table 5 The initial results on the symptom-therapeutic substancetest set

Method 119875 119877 119865 AUCFeature kernel 7930 9076 8464 8790Graph kernel 7627 9036 8272 8730Tree kernel 6890 8273 7518 7994Method 1 7599 9277 8354 8759Method 2 7781 9438 8530 8894

while they are usually not stable since noisewill be introducedduring the learning process

342 The Performance of Tri-Training on the Symptom andTherapeutic Substance Test Set In the experiments of Tri-Training on the symptom-therapeutic substance the param-eters are set as follows u = 4000m = 300 1199011 = 100 1199012 = 0and119873 = 27 (um 1199011 1199012 and119873 in Algorithm 2) The resultsare shown in Table 7 and Figure 10

Compared with the performance of the classifiers onthe initial symptom-therapeutic substance test set shown inTable 6 the ones achieved through Tri-Training are alsoimproved as in the disease-symptom experiments This veri-fies that the Tri-Training algorithm is effective in utilizing theunlabeled data to boost the relation extraction performanceonce again When the three classifiers are integrated with aweight ratio of 4 4 2 a better AUC is obtained

Comparing the performance of Co-Training and Tri-Training on the symptom-therapeutic substance test set asshown in Tables 6 and 7 we found that in most cases Tri-Training outperforms Co-Training which is consistent withthe results achieved in the disease-symptom experimentsThis is due to the better efficiency and generalization ability ofTri-Training over Co-Training

In addition the performances of the classifiers on thedisease-symptom corpus are improved more than those on

10 BioMed Research International

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

7980818283848586878889

AUC

()

Figure 8 Co-Training performance curve of feature kernel and tree kernel on the symptom-therapeutic substance test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 90Iterators

1 2 3 4 5 6 7 8 90Iterators

74

76

78

80

82

84

86

F-s

core

()

79

80

81

82

83

84

85

86

87

88

89

AUC

()

Figure 9 Co-Training performance curve of graph kernel and tree kernel on the symptom-therapeutic substance test set

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

74

76

78

80

82

84

86

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

Figure 10 The results of Tri-Training on the symptom-therapeutic substance test set

BioMed Research International 11

Table 6 The results obtained with Co-Training on the symptom-therapeutic substance test set

Combination View 119875 119877 119865 AUC

Feature kernel and graph kernelFeature kernel 7800 9398 8525 8841Graph kernel 7151 9880 8297 8644Combination 7745 9518 8540 8910

Feature kernel and tree kernelFeature kernel 7872 9357 8551 8851Tree kernel 6713 9759 7954 8175Combination 7751 9679 8566 8861

Graph kernel and tree kernelGraph kernel 7414 9558 8351 8771Tree kernel 6782 9478 7906 8014Combination 7105 9759 8223 8624

Table 7 The results of Tri-Training on symptom-therapeutic sub-stance test set

119875 119877 119865 AUCFeature kernel 7898 9357 8566 8894Graph kernel 7431 9759 8437 8778Tree kernel 6801 9478 7919 8110Method 1 7477 9880 8512 8808Method 2 7562 9839 8551 8913

the symptom-therapeutic substance corpus There are tworeasons for that First on the symptom-therapeutic substancecorpus the classifiers have better performanceTherefore theCo-training and Tri-training algorithms have less room forthe performance improvement Second as the Co-trainingand Tri-training process proceeds more unlabeled data areadded into the training set which could introduce new infor-mation for the classifiers Therefore the recalls of the classi-fiers are improved Meanwhile more noise is also introducedcausing the precision decline For the initial classifiers thehigher the precision is the less the noise is introduced in theiterative process and the performance of the classifier wouldbe improved As a summary if the initial classifiers have bigdifference the performance can be improved through twoalgorithms In the experiment when more unlabeled dataare added to the training set the difference between theclassifiers becomes smallerThus after a number of iterationsperformance could not be improved any more

35 Some Examples for Disease-Symptom and Symptom-Ther-apeutic Substance Relations Extracted from Biomedical Liter-atures Some examples for disease-symptom or symptom-therapeutic substance relations extracted from biomedicalliteratures are shown in Tables 8 and 9 Table 8 shows somesymptoms of disease C0020541 (portal hypertension) Onesentence containing the relation between portal hypertensionand its symptomC0028778 (block) is provided Table 9 showssome relations between the symptom C0028778 (block) andsome therapeutic substances in which the sentences contain-ing the relations are provided

Table 8 Some disease-symptom relations extracted from biomedi-cal literature

Disease Symptom Sentence

C0020541(portalhypertension)

C0028778 (block) C0020541 as C2825142 ofintrahepatic C0028778accounted for 83 of thepatients (C0023891 65meta-C0022346 12) and

C0018920 11

C1565860C0035357C0005775C0014867C0232338

Table 9 Some symptom-therapeutic substance relations extractedfrom biomedical literature

Symptom Therapeuticsubstance Sentence

C0028778(block)

C0017302 (generalanesthetic agents)

Use-dependent conductionC0028778 produced by

volatile C0017302

C0006400(bupivacaine)

Epidural ropivacaine isknown to produce less

motor C0028778 comparedto C0006400 at anaesthetic

concentrations

C0053241(benzoquinone)

In contrast C0053241 andhydroquinone led to

g2-C0028778 rather than toa mitotic arrest

4 Conclusions and Future Work

Models for extracting the relations between the disease-symptom and symptom-therapeutic substance are importantfor further extracting knowledge about diseases and theirpotential therapeutic substances However currently there isno corpus available to train such models To solve the prob-lem we firstmanually annotated two training sets for extract-ing the relations Then two semisupervised learning algo-rithms that is Co-Training and Tri-Training are applied toexplore the unlabeled data to boost the performance Exper-imental results show that exploiting the unlabeled data withboth Co-Training and Tri-Training algorithms can enhance

12 BioMed Research International

the performance In particular through employing threeclassifiers Tri-training is facilitated with good efficiency andgeneralization ability since it could gracefully choose exam-ples to label and use multiple classifiers to compose the finalhypothesis [28] In addition its applicability is wide because itneither requires sufficient and redundant views nor puts anyconstraint on the employed supervised learning algorithm

In the future work we will study more effective semisu-pervised learningmethods to exploit the numerous unlabeleddata pieces in the biomedical literature On the other handwewill apply the disease-symptom and symptom-therapeuticsubstance models to extract the relations between diseasesand therapeutic substances from biomedical literature andpredict the potential therapeutic substances for certain dis-eases [41]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

Acknowledgments

This work is supported by the grants from the NaturalScience Foundation of China (nos 61272373 6107009861340020 61572102 and 61572098) Trans-Century TrainingProgram Foundation for the Talents by the Ministry of Edu-cation of China (NCET-13-0084) the Fundamental ResearchFunds for the Central Universities (nos DUT13JB09 andDUT14YQ213) and the Major State Research DevelopmentProgram of China (no 2016YFC0901902)

References

[1] D Hristovski B Peterlin J A Mitchell and S M HumphreyldquoUsing literature-based discovery to identify disease candidategenesrdquo International Journal of Medical Informatics vol 74 no2ndash4 pp 289ndash298 2005

[2] M N Prichard and C Shipman Jr ldquoA three-dimensional modelto analyze drug-drug interactionsrdquo Antiviral Research vol 14no 4-5 pp 181ndash205 1990

[3] Q-C Bui S Katrenko and PM A Sloot ldquoA hybrid approach toextract protein-protein interactionsrdquo Bioinformatics vol 27 no2 pp 259ndash265 2011

[4] M Krallinger F Leitner C Rodriguez-Penagos and A Valen-cia ldquoOverview of the protein-protein interaction annotationextraction task of BioCreative IIrdquo Genome Biology vol 9supplement 2 article S4 2008

[5] I Segura Bedmar P Martinez and D Sanchez Cisneros ldquoThe1st DDIExtraction-2011 challenge task extraction of Drug-Drug Interactions from biomedical textsrdquo in Proceedings ofthe 1st Challenge task on Drug-Drug Interaction Extraction(DDIExtraction rsquo11) pp 1ndash9 Huelva Spain September 2011

[6] I Segura-Bedmar P Martınez andM Herrero-Zazo SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from Biomed-ical Texts (DDIExtraction 2013) Association for ComputationalLinguistics 2013

[7] M Krallinger and A Valencia ldquoText-mining and information-retrieval services formolecular biologyrdquoGenome Biology vol 6no 7 article 224 2005

[8] T-K Jenssen A Laeliggreid J Komorowski and E Hovig ldquoAliterature network of human genes for high-throughput analysisof gene expressionrdquo Nature Genetics vol 28 no 1 pp 21ndash282001

[9] C Blaschke M A Andrade C Ouzounis and A ValencialdquoAutomatic extraction of biological information from scientifictext protein-protein interactionsrdquo in Proceedings of the Inter-national Conference on Intelligent Systems for Molecular Biology(ISMB rsquo99) pp 60ndash67 1999

[10] P Zweigenbaum D Demner-Fushman H Yu and K BCohen ldquoFrontiers of biomedical text mining current progressrdquoBriefings in Bioinformatics vol 8 no 5 pp 358ndash375 2007

[11] Y T Yen B Chen H W Chiu Y C Lee Y C Li and C YHsu ldquoDeveloping anNLP and IR-based algorithm for analyzinggene-disease relationshipsrdquoMethods of Information inMedicinevol 45 no 3 pp 321ndash329 2006

[12] M Huang X Zhu Y Hao D G Payan K Qu and M LildquoDiscovering patterns to extract proteinndashprotein interactionsfrom full textsrdquo Bioinformatics vol 20 no 18 pp 3604ndash36122004

[13] I Tsochantaridis T Hofmann T Joachims and Y AltunldquoSupport vector machine learning for interdependent andstructured output spacesrdquo inProceedings of the 21st InternationalConference on Machine Learning (ICML rsquo04) pp 823ndash830Alberta Canada July 2004

[14] J Xiao J Su G Zhou et al ldquoProtein-protein interaction extrac-tion a supervised learning approachrdquo in Proceedings of the 1stInternational Symposium on Semantic Mining in Biomedicinepp 51ndash59 Hinxton UK April 2005

[15] L A Nielsen ldquoExtracting protein-protein interactions usingsimple contextual featuresrdquo in Proceedings of the HLT-NAACLBioNLPWorkshop on Linking Natural Language and Biology pp120ndash121 ACM June 2006

[16] A Airola S Pyysalo J Bjorne T Pahikkala F Ginter and TSalakoski ldquoAll-paths graph kernel for protein-protein interac-tion extraction with evaluation of cross-corpus learningrdquo BMCBioinformatics vol 9 no 11 article 52 2008

[17] L Qian and G Zhou ldquoTree kernel-based proteinndashproteininteraction extraction from biomedical literaturerdquo Journal ofBiomedical Informatics vol 45 no 3 pp 535ndash543 2012

[18] S Kim J Yoon J Yang and S Park ldquoWalk-weighted subse-quence kernels for protein-protein interaction extractionrdquoBMCBioinformatics vol 11 no 1 article 107 2010

[19] D J Miller and H S Uyar ldquoA mixture of experts classifierwith learning based on both labelled and unlabelled datardquo inAdvances in Neural Information Processing Systems pp 571ndash5771997

[20] T Joachims ldquoTransductive inference for text classificationusing support vector machinesrdquo in Proceedings of the 16thInternational Conference on Machine Learning (ICML rsquo99) pp200ndash209 1999

[21] X Zhu Z Ghahramani and J Lafferty ldquoSemi-supervised learn-ing using gaussian fields and harmonic functionsrdquo in Proceed-ings of the 20th International Conference on Machine Learning(ICML rsquo03) vol 3 pp 912ndash919 Washington DC USA August2003

[22] A Blum and T Mitchell ldquoCombining labeled and unlabeleddata with co-trainingrdquo in Proceedings of the 11th Annual Confer-ence on Computational LearningTheory (COLT rsquo98) pp 92ndash100ACM 1998

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 4: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

4 BioMed Research International

ENTRY1_IPNN_IP

interactsVBZ

withIN

ENTRYNN

Disassemble_IPVB_IP

toT

ENTRY2_IPNN_IP

filaments_IPNNS_IP

auxnsubj Nmod_IPPrep_with

xcomp

Xubj_IP Dobj_IP

03 03 03 03 03 03

03 03

09 09 09 09

09 09

Interacts_MVBZ_M

With_MIN_M

ENTRY_MNN_M

disassemble_MVB_IP_M

to_MTO_M

ENTRY2NN

filaments_ANNS_A

ENTRY1NN

09 09 09 09 09 09 09

Figure 2 Graph kernel with two directed subgraphs The candidate interaction pair is marked as ldquoENTRY1rdquo and ldquoENTRY2rdquo In thedependency based subgraph all nodes in a shortest path are specialized using a post-tag (IP) In the linear order subgraph possible tagsare (B)efore (M)iddle and (A)fter

the dependency structure and linear sequence of a sentencerespectively

In our method a simple weight allocation strategy ischosen that is the edges of the shortest path are assigned aweight of 09 other edges 03 all edges in the linear order sub-graph 09The representation thus allows us to emphasize theshortest pathwithout completely disregarding potentially rel-evant words outside of the path A graph kernel calculates thesimilarity between two input graphs by comparing the rela-tions between common vertices (nodes) A graph matrix119866 iscalculated as

119866 = 119871infin

sum119899=1

119860119899119871119879 (2)

where 119860 is an edge matrix whose rows and columns areindexed vertices119860 119894119895 is a weight if edge119881119894 is connected to edge119881119895 119871 is the label matrix whose row indicates the label and col-umn indicates the vertex 119871 119894119895 = 1 indicates that vertex119881119895 con-tains 119894th label The graph kernel 119870(119866 1198661015840) is defined by usingtwo input graph matrices 119866 and 1198661015840 [15]

119870(1198661198661015840) =119871

sum119894=1

119871

sum119895=1

1198661198941198951198661015840119894119895 (3)

24 Co-Training Algorithm The initial Co-Training algo-rithm (or standard Co-Training algorithm) was proposed byBlum and Mitchell [22] They assumed that the training sethas two sufficient and redundant views namely the set ofattributes meets two conditions First each attribute set issufficient to describe the problem that is if the training set issufficient each attribute set is able to learn a strong classifierSecond each attribute set is conditionally independent ofthe other given the class label Our Co-Training algorithm isdescribed in Algorithm 1

Algorithm 1 (Co-Training algorithm)

(1) Input is as followsThe labeled data 119871 and the unlabeled data 119880Initialize training set 1198711 1198712 (1198711 = 1198712 = 119871)Sufficient and redundant views 1198811 1198812Iteration number N

(2) Process is as follows

(21) Create a pool 119906 of examples by choosing 119899examples at random from 119880 119880 = 119880 minus 119906

(22) Use 1198711 to train a classifier ℎ1 in 1198811Use 1198712 to train a classifier ℎ2 in 1198812

(23) Use ℎ1 and ℎ2 to label the examples from u(24) Take119898 positive examples and119898 negative exam-

ples out which were consistently labeled by ℎ1and ℎ2 Then take 119901 positive examples out fromthe119898 positive examples and add them to 1198711 and1198712 respectively Choose 2119898 examples from119880 toreplenish u 119880 = 119880 minus 2119898119873 = 119873 minus 1

(25) Repeat the processes (22)ndash(24) until the unla-beled corpora 119880 are empty or the number ofunlabeled data in 119906 is less than a certain numberor119873 = 0

(3) Outputs are as followsThe classifiers ℎ1 and ℎ2

25 Tri-Training Algorithm The Co-Training algorithmrequires two sufficient and redundant views However thisconstraint does not exist in most real-world scenarios TheTri-Training algorithm neither requires the instance space tobe described with sufficient and redundant views and norputs any constraints on the supervised learning algorithm[28] In this algorithm three classifiers are used which can

BioMed Research International 5

Table 1 The details of two corpora

Corpus Training set Test set Unlabeled dataPositive Negative Positive Negative Total

Diseases and symptoms 299 299 249 250 19298Symptoms and therapeutic substances 300 300 249 249 19392

tackle the problem of determining how to label the unlabeleddata and produce the final hypothesis Our Tri-Trainingalgorithm is described in Algorithm 2

In addition the different classifiers calculate the similaritywith different aspects between the two sentences Combiningthe similarities can reduce the danger of missing importantfeatures Therefore in each Tri-Training round two differentensemble strategies are used to integrate the three classifiersfor further performance improvementThe first strategy inte-grates the classifiers with a simple votingmethodThe secondstrategy assigns each classifier with a different weight Thenthe normalized output 119870 of three classifier outputs 119870119898 (119898 =1 2 3) is defined as

119870 =119872

sum119898=1

120590119898119870119898

119872

sum119898=1

120590119898 = 1 120590119898 ge 0 forall119898(4)

where119872 represents the number of classifiers (119872 = 3 in ourmethod)

Algorithm 2 (Tri-Training algorithm)

(1) Input is as followsThe labeled data L and the unlabeled data UInitializing training set 1198711 1198712 1198713 (1198711 = 1198712 = 1198713 = 119871)Selecting views 1198811 1198812 and 1198813Iterations number N

(2) Process is as follows

(21) Create a pool 119906 of examples by choosing 119899examples at random from 119880 119880 = 119880 minus 119906

(22) Use 1198711 to train a classifier ℎ1 in 1198811Use 1198712 to train a classifier ℎ2 in 1198812Use 1198713 to train a classifier ℎ3 in 1198813

(23) Use ℎ1 ℎ2 and ℎ3 to label examples from 119906(24) Take119898 positive examples and119898 negative exam-

ples out which were consistently labeled byℎ1 ℎ2 and ℎ3 Then take 1199011 positive examplesfrom the 119898 positive examples and add them to1198711 1198712 and 1198713 respectively take 1199012 negativeexamples from the119898 negative examples and addthem to 1198711 1198712 and 1198713 respectively Choose 2119898examples from 119880 to replenish 119906 119880 = 119880 minus 2119898119873 = 119873 minus 1

(25) Repeat the processes (22)ndash(24) until the unla-beled corpora 119880 are empty or the number ofunlabeled data in 119906 is less than a certain numberor119873 = 0

(3) Outputs are as followsThe classifiers ℎ1 ℎ2 and ℎ3

3 Experiments and Results

31 Experimental Datasets In our experiments the diseaseand symptom corpus data was obtained through searchingSemantic MEDLINE Database [36] using 200 concepts cho-sen from MeSH (Medical Subject Headings) with semantictype ldquoDisease or Syndromerdquo Since these sentences (corpusdata) have been processed by SemRep [37] a natural languageprocessing tool based on the rule to identify relationshipin the MEDLINE documents the possibility of the relationbetween the two concept entities in the sentences is high Tolimit the semantic types of two concept entities in a sentencewe only preserved the sentences containing the concepts ofthe needed semantic types (ie biologic function cell functionfinding molecular function organism function organ or tissuefunction pathologic function phenomenon or process andphysiologic function) Finally we obtained a total of about20400 sentences from which we manually constructed twolabeled datasets as the initial training set 119879initial (598 labeledsentences as shown in Table 1) and test set (499 labeledsentences) respectively

During the manual annotation the following criteria areapplied the disease and symptom relationship indicates thatthe symptom is a physiological phenomenon of the diseaseIf an instance in a sentence semantically expresses the diseaseand symptom relationship it is labeled as a positive exampleAs in the example provided in Section 1 the sentence ldquomanyblood- and blood vessel-related characteristics are typical forRaynaud patients blood viscosity and platelet aggregabilityare highrdquo contains two positive examples that is Raynaudand blood viscosity and Raynaud and platelet aggregability Inaddition some special relationships such as ldquoB in Ardquo and ldquoAcan change Brdquo are also classified as the positive examples sincethey show a physiological phenomenon (B) occurs whensomeone has the disease (A) However if a relation in asentence is only a cooccurrence one it is labeled as a negativeexample For the patterns such as ldquoA is a Brdquo and ldquoA and Brdquothey are labeled as the negative examples since ldquoA is a Brdquo isa ldquoIS Ardquo relation and ldquoA and Brdquo is a coordination relationwhich are not the relations we need

The symptom-therapeutic substance corpus data wasobtained as follows First some ldquoAlzheimerrsquos diseaserdquo relatedsymptom terms were obtained from the Semantic MEDLINE

6 BioMed Research International

Database Then these symptom terms were used to searchthe database for the sentences which contain the query termsand terms belonging to the semantic types of therapeuticsubstance (eg pharmacologic substance and organic chemi-cal) We obtained about 20500 sentences and then manuallyannotated about 1100 sentences as the disease-symptomcorpora 600 labeled sentences are used as the initial trainingset and the remaining 498 labeled sentences as the test setSimilar to the disease and symptom relationship annotationthe following criteria are applied the symptom-therapeuticsubstance relationship indicates that a therapeutic substancecan relieve a physiological phenomenon If an instance ina sentence semantically expresses the symptom-therapeuticsubstance relationship it is labeled as a positive example Asin the example provided in Section 1 the sentence ldquofish oiland its active ingredient eicosapentaenoic acid (EPA) loweredblood viscosityrdquo contains two positive examples that is fish oiland blood viscosity and EPA and blood viscosity

When the manual annotation process was completedthe level of agreement was estimated Cohenrsquos kappa scoresbetween each annotator of two corpora are 0866 and 0903respectively and content analysis researchers generally thinkof aCohenrsquos kappa scoremore than 08 as good reliability [38]In addition the two corpora are available for academic use(see Supplementary Material available online at httpdxdoiorg10115520163594937)

32 Experimental Evaluation The evaluation metrics usedin our experiments are precision (P) recall (R) F-score (F)and Area under Roc Curve (AUC) [39] They are defined asfollows

119875 = TPTP + FP (5)

119877 = TPTP + FN (6)

119865 = 2 lowast 119875 lowast 119877119875 + 119877 (7)

AUC =sum119898+119894=1sum

119898minus

119895=1119867(119909119894 minus 119910119895)119898+119898minus

(8)

where TP denotes true interaction pair TN denotes truenoninteraction pair FP denotes false interaction pair andFN denotes false noninteraction pair F-score is the balancedmeasure for quantifying the performance of the systems Inaddition the AUC is also used to evaluate the performance ofour method It is not affected by the distribution of data andit has been advocated to be used for performance evaluationin the machine learning community [40] In formula (8)119898+and 119898minus are the numbers of positive and negative examplesrespectively and 1199091 119909119898

+are the outputs of the system for

the positive examples and 1199101 119910119898minusare the ones for the

negative examples The function119867(119903) is defined as follows

119867(119903) =

1 119903 gt 005 119903 = 00 119903 lt 0

(9)

Table 2The initial results on the disease-symptom test set Method1 integrates three classifiers (the feature kernel graph kernel and treekernel) with the same weight while Method 2 integrates them witha weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 9138 6211 7395 8713Graph kernel 9387 5977 7304 8721Tree kernel 6910 6289 6585 7337Method 1 9205 6328 7500 8947Method 2 9281 6055 7329 8974

33 The Initial Performance of the Disease-Symptom ModelTable 2 shows the performance of the classifiers on the initialdisease-symptom test set Feature kernel and graph kernelachieve almost the same performance which is better thanthat of tree kernel When the three classifiers are integratedwith the sameweight the higher F-score (7500) is obtainedwhile when they are integrated with a weight ratio of 4 4 2the F-score is a bit lower than that of feature kernel Howeverin both cases the AUC performances are improved whichshows that since different classifiers calculate the similaritywith different aspects between two sentences combiningthese similarities can boost the performance

331The Performance of Co-Training on theDisease-SymptomTest Set In our method the feature set for the disease-symptom model is divided into three views the featurekernel graph kernel and tree kernel In Co-Training exper-iments to compare the results of each combination of twoviews the experiments are divided into three groups as shownin Table 3 Each group uses same experimental parametersthat is u = 4000 m = 300 and p = 100 (u m and 119901 inAlgorithm 1) The performance curves of different combina-tions are shown in Figures 3 4 and 5 respectively and theirfinal results with different iteration times (13 27 and 22 resp)are shown in Table 3

From Figures 3 4 and 5 we can obtain the followingobservations (1) With the increase of the iteration timeand more unlabeled data added to the training set the F-score shows a rising trend The reason is that as the Co-Training process proceeds more and more unlabeled dataare labelled by one classifier for the other which improvesthe performance of both classifiers However after a numberof iterations the performance of the classifiers could not beimproved any more since too much noise (false positives andfalse negatives) may be introduced from the unlabeled data(2)The AUC of classifiers have different trends with differentcombinations of the views The AUC of the feature kernelfluctuate around 88 while the ones of the graph kernel fluc-tuate between 85 and 87 In contrast all of the tree kernelrsquosAUC have a rising trend since the performance of the initialtree kernel classifier is relatively low and then improved withthe relatively accurate labelled data provided by feature kernelor graph kernel

In fact the performance of semisupervised learningalgorithms is usually not stable because the unlabeled exam-ples may often be wrongly labeled during the learning

BioMed Research International 7

Feature kernelGraph kernel

Feature kernelGraph kernel

64

66

68

70

72

74

76

78F

-sco

re (

)

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

Figure 3 Co-Training performance curve of feature kernel and graph kernel on the disease-symptom test set

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

72747678808284868890

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

F-s

core

()

Figure 4 Co-Training performance curve of feature kernel and tree kernel on the disease-symptom test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

64

66

68

70

72

74

76

78

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

Figure 5 Co-Training performance curve of graph kernel and tree kernel on the disease-symptom test set

8 BioMed Research International

Table 3The results obtained with Co-Training on the disease-symptom test set Combinationmethod integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight

Combination View 119875 119877 119865-score AUC

Feature and graph kernelFeature kernel 8832 6797 7682 8801Graph kernel 8326 7188 7715 8754Combination 7491 8516 7971 8866

Feature and tree kernelFeature kernel 8606 6992 7715 8851Tree kernel 5780 9258 7117 7499Combination 7508 8711 8065 8718

Graph and tree kernelGraph kernel 8404 6992 7633 8604Tree kernel 5810 9531 7219 7810Combination 8243 7695 7960 8684

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

80

82

84

F-s

core

()

72

74

76

78

80

82

84

86

88

90

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

Figure 6 The Tri-Training performance on the disease-symptom test set

process [28] At the beginning of the Co-Training the num-ber of the noises is limited and unlabeled data added to thetraining set can help the classifiers improve the performanceHowever after a number of learning rounds more and morenoises introduced will cause the performance decline

332 The Performance of Tri-Training on the Disease-Symptom Test Set In our method we select three views toconduct the Tri-Training that is the feature kernel graphkernel and tree kernel In each Tri-Training round SVM isused to train the classifier on each view The parameters areset as follows u = 4000m = 300 1199011 = 100 1199012 = 0 and119873 =27 (u m 1199011 1199012 and119873 in Algorithm 2) Here 1199012 = 0 meansthat only the positive examples are added into the trainingset In this way the recall of the classifier can be improved (therecall is defined as the number of true positives divided by thetotal number of examples that actually belong to the positiveclass and usually more positive examples in the training setwill improve the recall) since it is lower compared with theprecision (see Table 2) The results are shown in Table 4 andFigure 6

Table 4 The results obtained with Tri-Training on the disease-symptom test set Method 1 integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight whileMethod 2 integrates them with a weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 8300 8008 8151 8880Graph kernel 7774 8594 8163 8980Tree kernel 5738 9414 7130 7600Method 1 7979 8789 8364 9157Method 2 7993 8555 8264 9075

Compared with the performances of the classifiers on theinitial disease-symptom test set shown in Table 2 the onesachieved through Tri-Training are significantly improvedThis shows that Tri-Training can exploit the unlabeled dataand improve the performance more effectively The reason isthat as mentioned in Section 1 the Tri-Training algorithmcan achieve satisfactory results while neither requiring theinstance space to be described with sufficient and redundant

BioMed Research International 9

Feature kernelGraph kernel

Feature kernelGraph kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

79

80

81

82

83

84

85

86

87

88

89

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

Figure 7 Co-Training performance curve of feature kernel and graph kernel on the symptom-therapeutic substance test set

views nor putting any constraints on the supervised learningmethod

In addition when three classifiers are integrated eitherwith the same weight or with a weight ratio of 4 4 2the higher F-scores and AUCs are obtained Furthermorecomparing the performance of Co-Training and Tri-Trainingshown in Tables 3 and 4 we found that in most casesTri-Training outperforms Co-Training The reason is thatthrough employing three classifiers Tri-Training is facilitatedwith good efficiency and generalization ability because itcould gracefully choose examples to label and use multipleclassifiers to compose the final hypothesis [28]

34 The Performance of the Symptom and Therapeutic Sub-stance Model Table 5 shows the performances of the clas-sifiers on the initial symptom-therapeutic substance test setSimilar to the results on the initial disease-symptom testset the feature kernel achieves the best performance whilethe tree kernel performs the worst One difference is thatwhen the three classifiers are integrated with a weight ratio of4 4 2 the higher F-score andAUC are obtainedwhile whenthey are integrated with the same weight the F-score andAUC are a little lower than those of feature kernel

341 The Performance of Co-Training on the Symptom andTherapeutic Substance Test Set Similar to that in the disease-symptom experiments the feature set for the symptom-therapeutic substance model is also divided into three viewsthe feature graph and tree kernels The experiments aredivided into three groups Each group uses the same experi-mental parameters that is u = 4000 m = 300 and p = 100The performance curves of different combinations are shownin Figures 7 8 and 9 and their final results with differentiteration times (27 26 and 9 resp) are shown in Table 6

From the figures we can draw similar conclusions as fromthe disease-symptom experiments In most cases the per-formance can be improved through the Co-Training process

Table 5 The initial results on the symptom-therapeutic substancetest set

Method 119875 119877 119865 AUCFeature kernel 7930 9076 8464 8790Graph kernel 7627 9036 8272 8730Tree kernel 6890 8273 7518 7994Method 1 7599 9277 8354 8759Method 2 7781 9438 8530 8894

while they are usually not stable since noisewill be introducedduring the learning process

342 The Performance of Tri-Training on the Symptom andTherapeutic Substance Test Set In the experiments of Tri-Training on the symptom-therapeutic substance the param-eters are set as follows u = 4000m = 300 1199011 = 100 1199012 = 0and119873 = 27 (um 1199011 1199012 and119873 in Algorithm 2) The resultsare shown in Table 7 and Figure 10

Compared with the performance of the classifiers onthe initial symptom-therapeutic substance test set shown inTable 6 the ones achieved through Tri-Training are alsoimproved as in the disease-symptom experiments This veri-fies that the Tri-Training algorithm is effective in utilizing theunlabeled data to boost the relation extraction performanceonce again When the three classifiers are integrated with aweight ratio of 4 4 2 a better AUC is obtained

Comparing the performance of Co-Training and Tri-Training on the symptom-therapeutic substance test set asshown in Tables 6 and 7 we found that in most cases Tri-Training outperforms Co-Training which is consistent withthe results achieved in the disease-symptom experimentsThis is due to the better efficiency and generalization ability ofTri-Training over Co-Training

In addition the performances of the classifiers on thedisease-symptom corpus are improved more than those on

10 BioMed Research International

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

7980818283848586878889

AUC

()

Figure 8 Co-Training performance curve of feature kernel and tree kernel on the symptom-therapeutic substance test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 90Iterators

1 2 3 4 5 6 7 8 90Iterators

74

76

78

80

82

84

86

F-s

core

()

79

80

81

82

83

84

85

86

87

88

89

AUC

()

Figure 9 Co-Training performance curve of graph kernel and tree kernel on the symptom-therapeutic substance test set

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

74

76

78

80

82

84

86

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

Figure 10 The results of Tri-Training on the symptom-therapeutic substance test set

BioMed Research International 11

Table 6 The results obtained with Co-Training on the symptom-therapeutic substance test set

Combination View 119875 119877 119865 AUC

Feature kernel and graph kernelFeature kernel 7800 9398 8525 8841Graph kernel 7151 9880 8297 8644Combination 7745 9518 8540 8910

Feature kernel and tree kernelFeature kernel 7872 9357 8551 8851Tree kernel 6713 9759 7954 8175Combination 7751 9679 8566 8861

Graph kernel and tree kernelGraph kernel 7414 9558 8351 8771Tree kernel 6782 9478 7906 8014Combination 7105 9759 8223 8624

Table 7 The results of Tri-Training on symptom-therapeutic sub-stance test set

119875 119877 119865 AUCFeature kernel 7898 9357 8566 8894Graph kernel 7431 9759 8437 8778Tree kernel 6801 9478 7919 8110Method 1 7477 9880 8512 8808Method 2 7562 9839 8551 8913

the symptom-therapeutic substance corpus There are tworeasons for that First on the symptom-therapeutic substancecorpus the classifiers have better performanceTherefore theCo-training and Tri-training algorithms have less room forthe performance improvement Second as the Co-trainingand Tri-training process proceeds more unlabeled data areadded into the training set which could introduce new infor-mation for the classifiers Therefore the recalls of the classi-fiers are improved Meanwhile more noise is also introducedcausing the precision decline For the initial classifiers thehigher the precision is the less the noise is introduced in theiterative process and the performance of the classifier wouldbe improved As a summary if the initial classifiers have bigdifference the performance can be improved through twoalgorithms In the experiment when more unlabeled dataare added to the training set the difference between theclassifiers becomes smallerThus after a number of iterationsperformance could not be improved any more

35 Some Examples for Disease-Symptom and Symptom-Ther-apeutic Substance Relations Extracted from Biomedical Liter-atures Some examples for disease-symptom or symptom-therapeutic substance relations extracted from biomedicalliteratures are shown in Tables 8 and 9 Table 8 shows somesymptoms of disease C0020541 (portal hypertension) Onesentence containing the relation between portal hypertensionand its symptomC0028778 (block) is provided Table 9 showssome relations between the symptom C0028778 (block) andsome therapeutic substances in which the sentences contain-ing the relations are provided

Table 8 Some disease-symptom relations extracted from biomedi-cal literature

Disease Symptom Sentence

C0020541(portalhypertension)

C0028778 (block) C0020541 as C2825142 ofintrahepatic C0028778accounted for 83 of thepatients (C0023891 65meta-C0022346 12) and

C0018920 11

C1565860C0035357C0005775C0014867C0232338

Table 9 Some symptom-therapeutic substance relations extractedfrom biomedical literature

Symptom Therapeuticsubstance Sentence

C0028778(block)

C0017302 (generalanesthetic agents)

Use-dependent conductionC0028778 produced by

volatile C0017302

C0006400(bupivacaine)

Epidural ropivacaine isknown to produce less

motor C0028778 comparedto C0006400 at anaesthetic

concentrations

C0053241(benzoquinone)

In contrast C0053241 andhydroquinone led to

g2-C0028778 rather than toa mitotic arrest

4 Conclusions and Future Work

Models for extracting the relations between the disease-symptom and symptom-therapeutic substance are importantfor further extracting knowledge about diseases and theirpotential therapeutic substances However currently there isno corpus available to train such models To solve the prob-lem we firstmanually annotated two training sets for extract-ing the relations Then two semisupervised learning algo-rithms that is Co-Training and Tri-Training are applied toexplore the unlabeled data to boost the performance Exper-imental results show that exploiting the unlabeled data withboth Co-Training and Tri-Training algorithms can enhance

12 BioMed Research International

the performance In particular through employing threeclassifiers Tri-training is facilitated with good efficiency andgeneralization ability since it could gracefully choose exam-ples to label and use multiple classifiers to compose the finalhypothesis [28] In addition its applicability is wide because itneither requires sufficient and redundant views nor puts anyconstraint on the employed supervised learning algorithm

In the future work we will study more effective semisu-pervised learningmethods to exploit the numerous unlabeleddata pieces in the biomedical literature On the other handwewill apply the disease-symptom and symptom-therapeuticsubstance models to extract the relations between diseasesand therapeutic substances from biomedical literature andpredict the potential therapeutic substances for certain dis-eases [41]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

Acknowledgments

This work is supported by the grants from the NaturalScience Foundation of China (nos 61272373 6107009861340020 61572102 and 61572098) Trans-Century TrainingProgram Foundation for the Talents by the Ministry of Edu-cation of China (NCET-13-0084) the Fundamental ResearchFunds for the Central Universities (nos DUT13JB09 andDUT14YQ213) and the Major State Research DevelopmentProgram of China (no 2016YFC0901902)

References

[1] D Hristovski B Peterlin J A Mitchell and S M HumphreyldquoUsing literature-based discovery to identify disease candidategenesrdquo International Journal of Medical Informatics vol 74 no2ndash4 pp 289ndash298 2005

[2] M N Prichard and C Shipman Jr ldquoA three-dimensional modelto analyze drug-drug interactionsrdquo Antiviral Research vol 14no 4-5 pp 181ndash205 1990

[3] Q-C Bui S Katrenko and PM A Sloot ldquoA hybrid approach toextract protein-protein interactionsrdquo Bioinformatics vol 27 no2 pp 259ndash265 2011

[4] M Krallinger F Leitner C Rodriguez-Penagos and A Valen-cia ldquoOverview of the protein-protein interaction annotationextraction task of BioCreative IIrdquo Genome Biology vol 9supplement 2 article S4 2008

[5] I Segura Bedmar P Martinez and D Sanchez Cisneros ldquoThe1st DDIExtraction-2011 challenge task extraction of Drug-Drug Interactions from biomedical textsrdquo in Proceedings ofthe 1st Challenge task on Drug-Drug Interaction Extraction(DDIExtraction rsquo11) pp 1ndash9 Huelva Spain September 2011

[6] I Segura-Bedmar P Martınez andM Herrero-Zazo SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from Biomed-ical Texts (DDIExtraction 2013) Association for ComputationalLinguistics 2013

[7] M Krallinger and A Valencia ldquoText-mining and information-retrieval services formolecular biologyrdquoGenome Biology vol 6no 7 article 224 2005

[8] T-K Jenssen A Laeliggreid J Komorowski and E Hovig ldquoAliterature network of human genes for high-throughput analysisof gene expressionrdquo Nature Genetics vol 28 no 1 pp 21ndash282001

[9] C Blaschke M A Andrade C Ouzounis and A ValencialdquoAutomatic extraction of biological information from scientifictext protein-protein interactionsrdquo in Proceedings of the Inter-national Conference on Intelligent Systems for Molecular Biology(ISMB rsquo99) pp 60ndash67 1999

[10] P Zweigenbaum D Demner-Fushman H Yu and K BCohen ldquoFrontiers of biomedical text mining current progressrdquoBriefings in Bioinformatics vol 8 no 5 pp 358ndash375 2007

[11] Y T Yen B Chen H W Chiu Y C Lee Y C Li and C YHsu ldquoDeveloping anNLP and IR-based algorithm for analyzinggene-disease relationshipsrdquoMethods of Information inMedicinevol 45 no 3 pp 321ndash329 2006

[12] M Huang X Zhu Y Hao D G Payan K Qu and M LildquoDiscovering patterns to extract proteinndashprotein interactionsfrom full textsrdquo Bioinformatics vol 20 no 18 pp 3604ndash36122004

[13] I Tsochantaridis T Hofmann T Joachims and Y AltunldquoSupport vector machine learning for interdependent andstructured output spacesrdquo inProceedings of the 21st InternationalConference on Machine Learning (ICML rsquo04) pp 823ndash830Alberta Canada July 2004

[14] J Xiao J Su G Zhou et al ldquoProtein-protein interaction extrac-tion a supervised learning approachrdquo in Proceedings of the 1stInternational Symposium on Semantic Mining in Biomedicinepp 51ndash59 Hinxton UK April 2005

[15] L A Nielsen ldquoExtracting protein-protein interactions usingsimple contextual featuresrdquo in Proceedings of the HLT-NAACLBioNLPWorkshop on Linking Natural Language and Biology pp120ndash121 ACM June 2006

[16] A Airola S Pyysalo J Bjorne T Pahikkala F Ginter and TSalakoski ldquoAll-paths graph kernel for protein-protein interac-tion extraction with evaluation of cross-corpus learningrdquo BMCBioinformatics vol 9 no 11 article 52 2008

[17] L Qian and G Zhou ldquoTree kernel-based proteinndashproteininteraction extraction from biomedical literaturerdquo Journal ofBiomedical Informatics vol 45 no 3 pp 535ndash543 2012

[18] S Kim J Yoon J Yang and S Park ldquoWalk-weighted subse-quence kernels for protein-protein interaction extractionrdquoBMCBioinformatics vol 11 no 1 article 107 2010

[19] D J Miller and H S Uyar ldquoA mixture of experts classifierwith learning based on both labelled and unlabelled datardquo inAdvances in Neural Information Processing Systems pp 571ndash5771997

[20] T Joachims ldquoTransductive inference for text classificationusing support vector machinesrdquo in Proceedings of the 16thInternational Conference on Machine Learning (ICML rsquo99) pp200ndash209 1999

[21] X Zhu Z Ghahramani and J Lafferty ldquoSemi-supervised learn-ing using gaussian fields and harmonic functionsrdquo in Proceed-ings of the 20th International Conference on Machine Learning(ICML rsquo03) vol 3 pp 912ndash919 Washington DC USA August2003

[22] A Blum and T Mitchell ldquoCombining labeled and unlabeleddata with co-trainingrdquo in Proceedings of the 11th Annual Confer-ence on Computational LearningTheory (COLT rsquo98) pp 92ndash100ACM 1998

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 5: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

BioMed Research International 5

Table 1 The details of two corpora

Corpus Training set Test set Unlabeled dataPositive Negative Positive Negative Total

Diseases and symptoms 299 299 249 250 19298Symptoms and therapeutic substances 300 300 249 249 19392

tackle the problem of determining how to label the unlabeleddata and produce the final hypothesis Our Tri-Trainingalgorithm is described in Algorithm 2

In addition the different classifiers calculate the similaritywith different aspects between the two sentences Combiningthe similarities can reduce the danger of missing importantfeatures Therefore in each Tri-Training round two differentensemble strategies are used to integrate the three classifiersfor further performance improvementThe first strategy inte-grates the classifiers with a simple votingmethodThe secondstrategy assigns each classifier with a different weight Thenthe normalized output 119870 of three classifier outputs 119870119898 (119898 =1 2 3) is defined as

119870 =119872

sum119898=1

120590119898119870119898

119872

sum119898=1

120590119898 = 1 120590119898 ge 0 forall119898(4)

where119872 represents the number of classifiers (119872 = 3 in ourmethod)

Algorithm 2 (Tri-Training algorithm)

(1) Input is as followsThe labeled data L and the unlabeled data UInitializing training set 1198711 1198712 1198713 (1198711 = 1198712 = 1198713 = 119871)Selecting views 1198811 1198812 and 1198813Iterations number N

(2) Process is as follows

(21) Create a pool 119906 of examples by choosing 119899examples at random from 119880 119880 = 119880 minus 119906

(22) Use 1198711 to train a classifier ℎ1 in 1198811Use 1198712 to train a classifier ℎ2 in 1198812Use 1198713 to train a classifier ℎ3 in 1198813

(23) Use ℎ1 ℎ2 and ℎ3 to label examples from 119906(24) Take119898 positive examples and119898 negative exam-

ples out which were consistently labeled byℎ1 ℎ2 and ℎ3 Then take 1199011 positive examplesfrom the 119898 positive examples and add them to1198711 1198712 and 1198713 respectively take 1199012 negativeexamples from the119898 negative examples and addthem to 1198711 1198712 and 1198713 respectively Choose 2119898examples from 119880 to replenish 119906 119880 = 119880 minus 2119898119873 = 119873 minus 1

(25) Repeat the processes (22)ndash(24) until the unla-beled corpora 119880 are empty or the number ofunlabeled data in 119906 is less than a certain numberor119873 = 0

(3) Outputs are as followsThe classifiers ℎ1 ℎ2 and ℎ3

3 Experiments and Results

31 Experimental Datasets In our experiments the diseaseand symptom corpus data was obtained through searchingSemantic MEDLINE Database [36] using 200 concepts cho-sen from MeSH (Medical Subject Headings) with semantictype ldquoDisease or Syndromerdquo Since these sentences (corpusdata) have been processed by SemRep [37] a natural languageprocessing tool based on the rule to identify relationshipin the MEDLINE documents the possibility of the relationbetween the two concept entities in the sentences is high Tolimit the semantic types of two concept entities in a sentencewe only preserved the sentences containing the concepts ofthe needed semantic types (ie biologic function cell functionfinding molecular function organism function organ or tissuefunction pathologic function phenomenon or process andphysiologic function) Finally we obtained a total of about20400 sentences from which we manually constructed twolabeled datasets as the initial training set 119879initial (598 labeledsentences as shown in Table 1) and test set (499 labeledsentences) respectively

During the manual annotation the following criteria areapplied the disease and symptom relationship indicates thatthe symptom is a physiological phenomenon of the diseaseIf an instance in a sentence semantically expresses the diseaseand symptom relationship it is labeled as a positive exampleAs in the example provided in Section 1 the sentence ldquomanyblood- and blood vessel-related characteristics are typical forRaynaud patients blood viscosity and platelet aggregabilityare highrdquo contains two positive examples that is Raynaudand blood viscosity and Raynaud and platelet aggregability Inaddition some special relationships such as ldquoB in Ardquo and ldquoAcan change Brdquo are also classified as the positive examples sincethey show a physiological phenomenon (B) occurs whensomeone has the disease (A) However if a relation in asentence is only a cooccurrence one it is labeled as a negativeexample For the patterns such as ldquoA is a Brdquo and ldquoA and Brdquothey are labeled as the negative examples since ldquoA is a Brdquo isa ldquoIS Ardquo relation and ldquoA and Brdquo is a coordination relationwhich are not the relations we need

The symptom-therapeutic substance corpus data wasobtained as follows First some ldquoAlzheimerrsquos diseaserdquo relatedsymptom terms were obtained from the Semantic MEDLINE

6 BioMed Research International

Database Then these symptom terms were used to searchthe database for the sentences which contain the query termsand terms belonging to the semantic types of therapeuticsubstance (eg pharmacologic substance and organic chemi-cal) We obtained about 20500 sentences and then manuallyannotated about 1100 sentences as the disease-symptomcorpora 600 labeled sentences are used as the initial trainingset and the remaining 498 labeled sentences as the test setSimilar to the disease and symptom relationship annotationthe following criteria are applied the symptom-therapeuticsubstance relationship indicates that a therapeutic substancecan relieve a physiological phenomenon If an instance ina sentence semantically expresses the symptom-therapeuticsubstance relationship it is labeled as a positive example Asin the example provided in Section 1 the sentence ldquofish oiland its active ingredient eicosapentaenoic acid (EPA) loweredblood viscosityrdquo contains two positive examples that is fish oiland blood viscosity and EPA and blood viscosity

When the manual annotation process was completedthe level of agreement was estimated Cohenrsquos kappa scoresbetween each annotator of two corpora are 0866 and 0903respectively and content analysis researchers generally thinkof aCohenrsquos kappa scoremore than 08 as good reliability [38]In addition the two corpora are available for academic use(see Supplementary Material available online at httpdxdoiorg10115520163594937)

32 Experimental Evaluation The evaluation metrics usedin our experiments are precision (P) recall (R) F-score (F)and Area under Roc Curve (AUC) [39] They are defined asfollows

119875 = TPTP + FP (5)

119877 = TPTP + FN (6)

119865 = 2 lowast 119875 lowast 119877119875 + 119877 (7)

AUC =sum119898+119894=1sum

119898minus

119895=1119867(119909119894 minus 119910119895)119898+119898minus

(8)

where TP denotes true interaction pair TN denotes truenoninteraction pair FP denotes false interaction pair andFN denotes false noninteraction pair F-score is the balancedmeasure for quantifying the performance of the systems Inaddition the AUC is also used to evaluate the performance ofour method It is not affected by the distribution of data andit has been advocated to be used for performance evaluationin the machine learning community [40] In formula (8)119898+and 119898minus are the numbers of positive and negative examplesrespectively and 1199091 119909119898

+are the outputs of the system for

the positive examples and 1199101 119910119898minusare the ones for the

negative examples The function119867(119903) is defined as follows

119867(119903) =

1 119903 gt 005 119903 = 00 119903 lt 0

(9)

Table 2The initial results on the disease-symptom test set Method1 integrates three classifiers (the feature kernel graph kernel and treekernel) with the same weight while Method 2 integrates them witha weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 9138 6211 7395 8713Graph kernel 9387 5977 7304 8721Tree kernel 6910 6289 6585 7337Method 1 9205 6328 7500 8947Method 2 9281 6055 7329 8974

33 The Initial Performance of the Disease-Symptom ModelTable 2 shows the performance of the classifiers on the initialdisease-symptom test set Feature kernel and graph kernelachieve almost the same performance which is better thanthat of tree kernel When the three classifiers are integratedwith the sameweight the higher F-score (7500) is obtainedwhile when they are integrated with a weight ratio of 4 4 2the F-score is a bit lower than that of feature kernel Howeverin both cases the AUC performances are improved whichshows that since different classifiers calculate the similaritywith different aspects between two sentences combiningthese similarities can boost the performance

331The Performance of Co-Training on theDisease-SymptomTest Set In our method the feature set for the disease-symptom model is divided into three views the featurekernel graph kernel and tree kernel In Co-Training exper-iments to compare the results of each combination of twoviews the experiments are divided into three groups as shownin Table 3 Each group uses same experimental parametersthat is u = 4000 m = 300 and p = 100 (u m and 119901 inAlgorithm 1) The performance curves of different combina-tions are shown in Figures 3 4 and 5 respectively and theirfinal results with different iteration times (13 27 and 22 resp)are shown in Table 3

From Figures 3 4 and 5 we can obtain the followingobservations (1) With the increase of the iteration timeand more unlabeled data added to the training set the F-score shows a rising trend The reason is that as the Co-Training process proceeds more and more unlabeled dataare labelled by one classifier for the other which improvesthe performance of both classifiers However after a numberof iterations the performance of the classifiers could not beimproved any more since too much noise (false positives andfalse negatives) may be introduced from the unlabeled data(2)The AUC of classifiers have different trends with differentcombinations of the views The AUC of the feature kernelfluctuate around 88 while the ones of the graph kernel fluc-tuate between 85 and 87 In contrast all of the tree kernelrsquosAUC have a rising trend since the performance of the initialtree kernel classifier is relatively low and then improved withthe relatively accurate labelled data provided by feature kernelor graph kernel

In fact the performance of semisupervised learningalgorithms is usually not stable because the unlabeled exam-ples may often be wrongly labeled during the learning

BioMed Research International 7

Feature kernelGraph kernel

Feature kernelGraph kernel

64

66

68

70

72

74

76

78F

-sco

re (

)

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

Figure 3 Co-Training performance curve of feature kernel and graph kernel on the disease-symptom test set

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

72747678808284868890

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

F-s

core

()

Figure 4 Co-Training performance curve of feature kernel and tree kernel on the disease-symptom test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

64

66

68

70

72

74

76

78

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

Figure 5 Co-Training performance curve of graph kernel and tree kernel on the disease-symptom test set

8 BioMed Research International

Table 3The results obtained with Co-Training on the disease-symptom test set Combinationmethod integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight

Combination View 119875 119877 119865-score AUC

Feature and graph kernelFeature kernel 8832 6797 7682 8801Graph kernel 8326 7188 7715 8754Combination 7491 8516 7971 8866

Feature and tree kernelFeature kernel 8606 6992 7715 8851Tree kernel 5780 9258 7117 7499Combination 7508 8711 8065 8718

Graph and tree kernelGraph kernel 8404 6992 7633 8604Tree kernel 5810 9531 7219 7810Combination 8243 7695 7960 8684

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

80

82

84

F-s

core

()

72

74

76

78

80

82

84

86

88

90

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

Figure 6 The Tri-Training performance on the disease-symptom test set

process [28] At the beginning of the Co-Training the num-ber of the noises is limited and unlabeled data added to thetraining set can help the classifiers improve the performanceHowever after a number of learning rounds more and morenoises introduced will cause the performance decline

332 The Performance of Tri-Training on the Disease-Symptom Test Set In our method we select three views toconduct the Tri-Training that is the feature kernel graphkernel and tree kernel In each Tri-Training round SVM isused to train the classifier on each view The parameters areset as follows u = 4000m = 300 1199011 = 100 1199012 = 0 and119873 =27 (u m 1199011 1199012 and119873 in Algorithm 2) Here 1199012 = 0 meansthat only the positive examples are added into the trainingset In this way the recall of the classifier can be improved (therecall is defined as the number of true positives divided by thetotal number of examples that actually belong to the positiveclass and usually more positive examples in the training setwill improve the recall) since it is lower compared with theprecision (see Table 2) The results are shown in Table 4 andFigure 6

Table 4 The results obtained with Tri-Training on the disease-symptom test set Method 1 integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight whileMethod 2 integrates them with a weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 8300 8008 8151 8880Graph kernel 7774 8594 8163 8980Tree kernel 5738 9414 7130 7600Method 1 7979 8789 8364 9157Method 2 7993 8555 8264 9075

Compared with the performances of the classifiers on theinitial disease-symptom test set shown in Table 2 the onesachieved through Tri-Training are significantly improvedThis shows that Tri-Training can exploit the unlabeled dataand improve the performance more effectively The reason isthat as mentioned in Section 1 the Tri-Training algorithmcan achieve satisfactory results while neither requiring theinstance space to be described with sufficient and redundant

BioMed Research International 9

Feature kernelGraph kernel

Feature kernelGraph kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

79

80

81

82

83

84

85

86

87

88

89

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

Figure 7 Co-Training performance curve of feature kernel and graph kernel on the symptom-therapeutic substance test set

views nor putting any constraints on the supervised learningmethod

In addition when three classifiers are integrated eitherwith the same weight or with a weight ratio of 4 4 2the higher F-scores and AUCs are obtained Furthermorecomparing the performance of Co-Training and Tri-Trainingshown in Tables 3 and 4 we found that in most casesTri-Training outperforms Co-Training The reason is thatthrough employing three classifiers Tri-Training is facilitatedwith good efficiency and generalization ability because itcould gracefully choose examples to label and use multipleclassifiers to compose the final hypothesis [28]

34 The Performance of the Symptom and Therapeutic Sub-stance Model Table 5 shows the performances of the clas-sifiers on the initial symptom-therapeutic substance test setSimilar to the results on the initial disease-symptom testset the feature kernel achieves the best performance whilethe tree kernel performs the worst One difference is thatwhen the three classifiers are integrated with a weight ratio of4 4 2 the higher F-score andAUC are obtainedwhile whenthey are integrated with the same weight the F-score andAUC are a little lower than those of feature kernel

341 The Performance of Co-Training on the Symptom andTherapeutic Substance Test Set Similar to that in the disease-symptom experiments the feature set for the symptom-therapeutic substance model is also divided into three viewsthe feature graph and tree kernels The experiments aredivided into three groups Each group uses the same experi-mental parameters that is u = 4000 m = 300 and p = 100The performance curves of different combinations are shownin Figures 7 8 and 9 and their final results with differentiteration times (27 26 and 9 resp) are shown in Table 6

From the figures we can draw similar conclusions as fromthe disease-symptom experiments In most cases the per-formance can be improved through the Co-Training process

Table 5 The initial results on the symptom-therapeutic substancetest set

Method 119875 119877 119865 AUCFeature kernel 7930 9076 8464 8790Graph kernel 7627 9036 8272 8730Tree kernel 6890 8273 7518 7994Method 1 7599 9277 8354 8759Method 2 7781 9438 8530 8894

while they are usually not stable since noisewill be introducedduring the learning process

342 The Performance of Tri-Training on the Symptom andTherapeutic Substance Test Set In the experiments of Tri-Training on the symptom-therapeutic substance the param-eters are set as follows u = 4000m = 300 1199011 = 100 1199012 = 0and119873 = 27 (um 1199011 1199012 and119873 in Algorithm 2) The resultsare shown in Table 7 and Figure 10

Compared with the performance of the classifiers onthe initial symptom-therapeutic substance test set shown inTable 6 the ones achieved through Tri-Training are alsoimproved as in the disease-symptom experiments This veri-fies that the Tri-Training algorithm is effective in utilizing theunlabeled data to boost the relation extraction performanceonce again When the three classifiers are integrated with aweight ratio of 4 4 2 a better AUC is obtained

Comparing the performance of Co-Training and Tri-Training on the symptom-therapeutic substance test set asshown in Tables 6 and 7 we found that in most cases Tri-Training outperforms Co-Training which is consistent withthe results achieved in the disease-symptom experimentsThis is due to the better efficiency and generalization ability ofTri-Training over Co-Training

In addition the performances of the classifiers on thedisease-symptom corpus are improved more than those on

10 BioMed Research International

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

7980818283848586878889

AUC

()

Figure 8 Co-Training performance curve of feature kernel and tree kernel on the symptom-therapeutic substance test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 90Iterators

1 2 3 4 5 6 7 8 90Iterators

74

76

78

80

82

84

86

F-s

core

()

79

80

81

82

83

84

85

86

87

88

89

AUC

()

Figure 9 Co-Training performance curve of graph kernel and tree kernel on the symptom-therapeutic substance test set

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

74

76

78

80

82

84

86

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

Figure 10 The results of Tri-Training on the symptom-therapeutic substance test set

BioMed Research International 11

Table 6 The results obtained with Co-Training on the symptom-therapeutic substance test set

Combination View 119875 119877 119865 AUC

Feature kernel and graph kernelFeature kernel 7800 9398 8525 8841Graph kernel 7151 9880 8297 8644Combination 7745 9518 8540 8910

Feature kernel and tree kernelFeature kernel 7872 9357 8551 8851Tree kernel 6713 9759 7954 8175Combination 7751 9679 8566 8861

Graph kernel and tree kernelGraph kernel 7414 9558 8351 8771Tree kernel 6782 9478 7906 8014Combination 7105 9759 8223 8624

Table 7 The results of Tri-Training on symptom-therapeutic sub-stance test set

119875 119877 119865 AUCFeature kernel 7898 9357 8566 8894Graph kernel 7431 9759 8437 8778Tree kernel 6801 9478 7919 8110Method 1 7477 9880 8512 8808Method 2 7562 9839 8551 8913

the symptom-therapeutic substance corpus There are tworeasons for that First on the symptom-therapeutic substancecorpus the classifiers have better performanceTherefore theCo-training and Tri-training algorithms have less room forthe performance improvement Second as the Co-trainingand Tri-training process proceeds more unlabeled data areadded into the training set which could introduce new infor-mation for the classifiers Therefore the recalls of the classi-fiers are improved Meanwhile more noise is also introducedcausing the precision decline For the initial classifiers thehigher the precision is the less the noise is introduced in theiterative process and the performance of the classifier wouldbe improved As a summary if the initial classifiers have bigdifference the performance can be improved through twoalgorithms In the experiment when more unlabeled dataare added to the training set the difference between theclassifiers becomes smallerThus after a number of iterationsperformance could not be improved any more

35 Some Examples for Disease-Symptom and Symptom-Ther-apeutic Substance Relations Extracted from Biomedical Liter-atures Some examples for disease-symptom or symptom-therapeutic substance relations extracted from biomedicalliteratures are shown in Tables 8 and 9 Table 8 shows somesymptoms of disease C0020541 (portal hypertension) Onesentence containing the relation between portal hypertensionand its symptomC0028778 (block) is provided Table 9 showssome relations between the symptom C0028778 (block) andsome therapeutic substances in which the sentences contain-ing the relations are provided

Table 8 Some disease-symptom relations extracted from biomedi-cal literature

Disease Symptom Sentence

C0020541(portalhypertension)

C0028778 (block) C0020541 as C2825142 ofintrahepatic C0028778accounted for 83 of thepatients (C0023891 65meta-C0022346 12) and

C0018920 11

C1565860C0035357C0005775C0014867C0232338

Table 9 Some symptom-therapeutic substance relations extractedfrom biomedical literature

Symptom Therapeuticsubstance Sentence

C0028778(block)

C0017302 (generalanesthetic agents)

Use-dependent conductionC0028778 produced by

volatile C0017302

C0006400(bupivacaine)

Epidural ropivacaine isknown to produce less

motor C0028778 comparedto C0006400 at anaesthetic

concentrations

C0053241(benzoquinone)

In contrast C0053241 andhydroquinone led to

g2-C0028778 rather than toa mitotic arrest

4 Conclusions and Future Work

Models for extracting the relations between the disease-symptom and symptom-therapeutic substance are importantfor further extracting knowledge about diseases and theirpotential therapeutic substances However currently there isno corpus available to train such models To solve the prob-lem we firstmanually annotated two training sets for extract-ing the relations Then two semisupervised learning algo-rithms that is Co-Training and Tri-Training are applied toexplore the unlabeled data to boost the performance Exper-imental results show that exploiting the unlabeled data withboth Co-Training and Tri-Training algorithms can enhance

12 BioMed Research International

the performance In particular through employing threeclassifiers Tri-training is facilitated with good efficiency andgeneralization ability since it could gracefully choose exam-ples to label and use multiple classifiers to compose the finalhypothesis [28] In addition its applicability is wide because itneither requires sufficient and redundant views nor puts anyconstraint on the employed supervised learning algorithm

In the future work we will study more effective semisu-pervised learningmethods to exploit the numerous unlabeleddata pieces in the biomedical literature On the other handwewill apply the disease-symptom and symptom-therapeuticsubstance models to extract the relations between diseasesand therapeutic substances from biomedical literature andpredict the potential therapeutic substances for certain dis-eases [41]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

Acknowledgments

This work is supported by the grants from the NaturalScience Foundation of China (nos 61272373 6107009861340020 61572102 and 61572098) Trans-Century TrainingProgram Foundation for the Talents by the Ministry of Edu-cation of China (NCET-13-0084) the Fundamental ResearchFunds for the Central Universities (nos DUT13JB09 andDUT14YQ213) and the Major State Research DevelopmentProgram of China (no 2016YFC0901902)

References

[1] D Hristovski B Peterlin J A Mitchell and S M HumphreyldquoUsing literature-based discovery to identify disease candidategenesrdquo International Journal of Medical Informatics vol 74 no2ndash4 pp 289ndash298 2005

[2] M N Prichard and C Shipman Jr ldquoA three-dimensional modelto analyze drug-drug interactionsrdquo Antiviral Research vol 14no 4-5 pp 181ndash205 1990

[3] Q-C Bui S Katrenko and PM A Sloot ldquoA hybrid approach toextract protein-protein interactionsrdquo Bioinformatics vol 27 no2 pp 259ndash265 2011

[4] M Krallinger F Leitner C Rodriguez-Penagos and A Valen-cia ldquoOverview of the protein-protein interaction annotationextraction task of BioCreative IIrdquo Genome Biology vol 9supplement 2 article S4 2008

[5] I Segura Bedmar P Martinez and D Sanchez Cisneros ldquoThe1st DDIExtraction-2011 challenge task extraction of Drug-Drug Interactions from biomedical textsrdquo in Proceedings ofthe 1st Challenge task on Drug-Drug Interaction Extraction(DDIExtraction rsquo11) pp 1ndash9 Huelva Spain September 2011

[6] I Segura-Bedmar P Martınez andM Herrero-Zazo SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from Biomed-ical Texts (DDIExtraction 2013) Association for ComputationalLinguistics 2013

[7] M Krallinger and A Valencia ldquoText-mining and information-retrieval services formolecular biologyrdquoGenome Biology vol 6no 7 article 224 2005

[8] T-K Jenssen A Laeliggreid J Komorowski and E Hovig ldquoAliterature network of human genes for high-throughput analysisof gene expressionrdquo Nature Genetics vol 28 no 1 pp 21ndash282001

[9] C Blaschke M A Andrade C Ouzounis and A ValencialdquoAutomatic extraction of biological information from scientifictext protein-protein interactionsrdquo in Proceedings of the Inter-national Conference on Intelligent Systems for Molecular Biology(ISMB rsquo99) pp 60ndash67 1999

[10] P Zweigenbaum D Demner-Fushman H Yu and K BCohen ldquoFrontiers of biomedical text mining current progressrdquoBriefings in Bioinformatics vol 8 no 5 pp 358ndash375 2007

[11] Y T Yen B Chen H W Chiu Y C Lee Y C Li and C YHsu ldquoDeveloping anNLP and IR-based algorithm for analyzinggene-disease relationshipsrdquoMethods of Information inMedicinevol 45 no 3 pp 321ndash329 2006

[12] M Huang X Zhu Y Hao D G Payan K Qu and M LildquoDiscovering patterns to extract proteinndashprotein interactionsfrom full textsrdquo Bioinformatics vol 20 no 18 pp 3604ndash36122004

[13] I Tsochantaridis T Hofmann T Joachims and Y AltunldquoSupport vector machine learning for interdependent andstructured output spacesrdquo inProceedings of the 21st InternationalConference on Machine Learning (ICML rsquo04) pp 823ndash830Alberta Canada July 2004

[14] J Xiao J Su G Zhou et al ldquoProtein-protein interaction extrac-tion a supervised learning approachrdquo in Proceedings of the 1stInternational Symposium on Semantic Mining in Biomedicinepp 51ndash59 Hinxton UK April 2005

[15] L A Nielsen ldquoExtracting protein-protein interactions usingsimple contextual featuresrdquo in Proceedings of the HLT-NAACLBioNLPWorkshop on Linking Natural Language and Biology pp120ndash121 ACM June 2006

[16] A Airola S Pyysalo J Bjorne T Pahikkala F Ginter and TSalakoski ldquoAll-paths graph kernel for protein-protein interac-tion extraction with evaluation of cross-corpus learningrdquo BMCBioinformatics vol 9 no 11 article 52 2008

[17] L Qian and G Zhou ldquoTree kernel-based proteinndashproteininteraction extraction from biomedical literaturerdquo Journal ofBiomedical Informatics vol 45 no 3 pp 535ndash543 2012

[18] S Kim J Yoon J Yang and S Park ldquoWalk-weighted subse-quence kernels for protein-protein interaction extractionrdquoBMCBioinformatics vol 11 no 1 article 107 2010

[19] D J Miller and H S Uyar ldquoA mixture of experts classifierwith learning based on both labelled and unlabelled datardquo inAdvances in Neural Information Processing Systems pp 571ndash5771997

[20] T Joachims ldquoTransductive inference for text classificationusing support vector machinesrdquo in Proceedings of the 16thInternational Conference on Machine Learning (ICML rsquo99) pp200ndash209 1999

[21] X Zhu Z Ghahramani and J Lafferty ldquoSemi-supervised learn-ing using gaussian fields and harmonic functionsrdquo in Proceed-ings of the 20th International Conference on Machine Learning(ICML rsquo03) vol 3 pp 912ndash919 Washington DC USA August2003

[22] A Blum and T Mitchell ldquoCombining labeled and unlabeleddata with co-trainingrdquo in Proceedings of the 11th Annual Confer-ence on Computational LearningTheory (COLT rsquo98) pp 92ndash100ACM 1998

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 6: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

6 BioMed Research International

Database Then these symptom terms were used to searchthe database for the sentences which contain the query termsand terms belonging to the semantic types of therapeuticsubstance (eg pharmacologic substance and organic chemi-cal) We obtained about 20500 sentences and then manuallyannotated about 1100 sentences as the disease-symptomcorpora 600 labeled sentences are used as the initial trainingset and the remaining 498 labeled sentences as the test setSimilar to the disease and symptom relationship annotationthe following criteria are applied the symptom-therapeuticsubstance relationship indicates that a therapeutic substancecan relieve a physiological phenomenon If an instance ina sentence semantically expresses the symptom-therapeuticsubstance relationship it is labeled as a positive example Asin the example provided in Section 1 the sentence ldquofish oiland its active ingredient eicosapentaenoic acid (EPA) loweredblood viscosityrdquo contains two positive examples that is fish oiland blood viscosity and EPA and blood viscosity

When the manual annotation process was completedthe level of agreement was estimated Cohenrsquos kappa scoresbetween each annotator of two corpora are 0866 and 0903respectively and content analysis researchers generally thinkof aCohenrsquos kappa scoremore than 08 as good reliability [38]In addition the two corpora are available for academic use(see Supplementary Material available online at httpdxdoiorg10115520163594937)

32 Experimental Evaluation The evaluation metrics usedin our experiments are precision (P) recall (R) F-score (F)and Area under Roc Curve (AUC) [39] They are defined asfollows

119875 = TPTP + FP (5)

119877 = TPTP + FN (6)

119865 = 2 lowast 119875 lowast 119877119875 + 119877 (7)

AUC =sum119898+119894=1sum

119898minus

119895=1119867(119909119894 minus 119910119895)119898+119898minus

(8)

where TP denotes true interaction pair TN denotes truenoninteraction pair FP denotes false interaction pair andFN denotes false noninteraction pair F-score is the balancedmeasure for quantifying the performance of the systems Inaddition the AUC is also used to evaluate the performance ofour method It is not affected by the distribution of data andit has been advocated to be used for performance evaluationin the machine learning community [40] In formula (8)119898+and 119898minus are the numbers of positive and negative examplesrespectively and 1199091 119909119898

+are the outputs of the system for

the positive examples and 1199101 119910119898minusare the ones for the

negative examples The function119867(119903) is defined as follows

119867(119903) =

1 119903 gt 005 119903 = 00 119903 lt 0

(9)

Table 2The initial results on the disease-symptom test set Method1 integrates three classifiers (the feature kernel graph kernel and treekernel) with the same weight while Method 2 integrates them witha weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 9138 6211 7395 8713Graph kernel 9387 5977 7304 8721Tree kernel 6910 6289 6585 7337Method 1 9205 6328 7500 8947Method 2 9281 6055 7329 8974

33 The Initial Performance of the Disease-Symptom ModelTable 2 shows the performance of the classifiers on the initialdisease-symptom test set Feature kernel and graph kernelachieve almost the same performance which is better thanthat of tree kernel When the three classifiers are integratedwith the sameweight the higher F-score (7500) is obtainedwhile when they are integrated with a weight ratio of 4 4 2the F-score is a bit lower than that of feature kernel Howeverin both cases the AUC performances are improved whichshows that since different classifiers calculate the similaritywith different aspects between two sentences combiningthese similarities can boost the performance

331The Performance of Co-Training on theDisease-SymptomTest Set In our method the feature set for the disease-symptom model is divided into three views the featurekernel graph kernel and tree kernel In Co-Training exper-iments to compare the results of each combination of twoviews the experiments are divided into three groups as shownin Table 3 Each group uses same experimental parametersthat is u = 4000 m = 300 and p = 100 (u m and 119901 inAlgorithm 1) The performance curves of different combina-tions are shown in Figures 3 4 and 5 respectively and theirfinal results with different iteration times (13 27 and 22 resp)are shown in Table 3

From Figures 3 4 and 5 we can obtain the followingobservations (1) With the increase of the iteration timeand more unlabeled data added to the training set the F-score shows a rising trend The reason is that as the Co-Training process proceeds more and more unlabeled dataare labelled by one classifier for the other which improvesthe performance of both classifiers However after a numberof iterations the performance of the classifiers could not beimproved any more since too much noise (false positives andfalse negatives) may be introduced from the unlabeled data(2)The AUC of classifiers have different trends with differentcombinations of the views The AUC of the feature kernelfluctuate around 88 while the ones of the graph kernel fluc-tuate between 85 and 87 In contrast all of the tree kernelrsquosAUC have a rising trend since the performance of the initialtree kernel classifier is relatively low and then improved withthe relatively accurate labelled data provided by feature kernelor graph kernel

In fact the performance of semisupervised learningalgorithms is usually not stable because the unlabeled exam-ples may often be wrongly labeled during the learning

BioMed Research International 7

Feature kernelGraph kernel

Feature kernelGraph kernel

64

66

68

70

72

74

76

78F

-sco

re (

)

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

Figure 3 Co-Training performance curve of feature kernel and graph kernel on the disease-symptom test set

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

72747678808284868890

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

F-s

core

()

Figure 4 Co-Training performance curve of feature kernel and tree kernel on the disease-symptom test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

64

66

68

70

72

74

76

78

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

Figure 5 Co-Training performance curve of graph kernel and tree kernel on the disease-symptom test set

8 BioMed Research International

Table 3The results obtained with Co-Training on the disease-symptom test set Combinationmethod integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight

Combination View 119875 119877 119865-score AUC

Feature and graph kernelFeature kernel 8832 6797 7682 8801Graph kernel 8326 7188 7715 8754Combination 7491 8516 7971 8866

Feature and tree kernelFeature kernel 8606 6992 7715 8851Tree kernel 5780 9258 7117 7499Combination 7508 8711 8065 8718

Graph and tree kernelGraph kernel 8404 6992 7633 8604Tree kernel 5810 9531 7219 7810Combination 8243 7695 7960 8684

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

80

82

84

F-s

core

()

72

74

76

78

80

82

84

86

88

90

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

Figure 6 The Tri-Training performance on the disease-symptom test set

process [28] At the beginning of the Co-Training the num-ber of the noises is limited and unlabeled data added to thetraining set can help the classifiers improve the performanceHowever after a number of learning rounds more and morenoises introduced will cause the performance decline

332 The Performance of Tri-Training on the Disease-Symptom Test Set In our method we select three views toconduct the Tri-Training that is the feature kernel graphkernel and tree kernel In each Tri-Training round SVM isused to train the classifier on each view The parameters areset as follows u = 4000m = 300 1199011 = 100 1199012 = 0 and119873 =27 (u m 1199011 1199012 and119873 in Algorithm 2) Here 1199012 = 0 meansthat only the positive examples are added into the trainingset In this way the recall of the classifier can be improved (therecall is defined as the number of true positives divided by thetotal number of examples that actually belong to the positiveclass and usually more positive examples in the training setwill improve the recall) since it is lower compared with theprecision (see Table 2) The results are shown in Table 4 andFigure 6

Table 4 The results obtained with Tri-Training on the disease-symptom test set Method 1 integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight whileMethod 2 integrates them with a weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 8300 8008 8151 8880Graph kernel 7774 8594 8163 8980Tree kernel 5738 9414 7130 7600Method 1 7979 8789 8364 9157Method 2 7993 8555 8264 9075

Compared with the performances of the classifiers on theinitial disease-symptom test set shown in Table 2 the onesachieved through Tri-Training are significantly improvedThis shows that Tri-Training can exploit the unlabeled dataand improve the performance more effectively The reason isthat as mentioned in Section 1 the Tri-Training algorithmcan achieve satisfactory results while neither requiring theinstance space to be described with sufficient and redundant

BioMed Research International 9

Feature kernelGraph kernel

Feature kernelGraph kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

79

80

81

82

83

84

85

86

87

88

89

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

Figure 7 Co-Training performance curve of feature kernel and graph kernel on the symptom-therapeutic substance test set

views nor putting any constraints on the supervised learningmethod

In addition when three classifiers are integrated eitherwith the same weight or with a weight ratio of 4 4 2the higher F-scores and AUCs are obtained Furthermorecomparing the performance of Co-Training and Tri-Trainingshown in Tables 3 and 4 we found that in most casesTri-Training outperforms Co-Training The reason is thatthrough employing three classifiers Tri-Training is facilitatedwith good efficiency and generalization ability because itcould gracefully choose examples to label and use multipleclassifiers to compose the final hypothesis [28]

34 The Performance of the Symptom and Therapeutic Sub-stance Model Table 5 shows the performances of the clas-sifiers on the initial symptom-therapeutic substance test setSimilar to the results on the initial disease-symptom testset the feature kernel achieves the best performance whilethe tree kernel performs the worst One difference is thatwhen the three classifiers are integrated with a weight ratio of4 4 2 the higher F-score andAUC are obtainedwhile whenthey are integrated with the same weight the F-score andAUC are a little lower than those of feature kernel

341 The Performance of Co-Training on the Symptom andTherapeutic Substance Test Set Similar to that in the disease-symptom experiments the feature set for the symptom-therapeutic substance model is also divided into three viewsthe feature graph and tree kernels The experiments aredivided into three groups Each group uses the same experi-mental parameters that is u = 4000 m = 300 and p = 100The performance curves of different combinations are shownin Figures 7 8 and 9 and their final results with differentiteration times (27 26 and 9 resp) are shown in Table 6

From the figures we can draw similar conclusions as fromthe disease-symptom experiments In most cases the per-formance can be improved through the Co-Training process

Table 5 The initial results on the symptom-therapeutic substancetest set

Method 119875 119877 119865 AUCFeature kernel 7930 9076 8464 8790Graph kernel 7627 9036 8272 8730Tree kernel 6890 8273 7518 7994Method 1 7599 9277 8354 8759Method 2 7781 9438 8530 8894

while they are usually not stable since noisewill be introducedduring the learning process

342 The Performance of Tri-Training on the Symptom andTherapeutic Substance Test Set In the experiments of Tri-Training on the symptom-therapeutic substance the param-eters are set as follows u = 4000m = 300 1199011 = 100 1199012 = 0and119873 = 27 (um 1199011 1199012 and119873 in Algorithm 2) The resultsare shown in Table 7 and Figure 10

Compared with the performance of the classifiers onthe initial symptom-therapeutic substance test set shown inTable 6 the ones achieved through Tri-Training are alsoimproved as in the disease-symptom experiments This veri-fies that the Tri-Training algorithm is effective in utilizing theunlabeled data to boost the relation extraction performanceonce again When the three classifiers are integrated with aweight ratio of 4 4 2 a better AUC is obtained

Comparing the performance of Co-Training and Tri-Training on the symptom-therapeutic substance test set asshown in Tables 6 and 7 we found that in most cases Tri-Training outperforms Co-Training which is consistent withthe results achieved in the disease-symptom experimentsThis is due to the better efficiency and generalization ability ofTri-Training over Co-Training

In addition the performances of the classifiers on thedisease-symptom corpus are improved more than those on

10 BioMed Research International

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

7980818283848586878889

AUC

()

Figure 8 Co-Training performance curve of feature kernel and tree kernel on the symptom-therapeutic substance test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 90Iterators

1 2 3 4 5 6 7 8 90Iterators

74

76

78

80

82

84

86

F-s

core

()

79

80

81

82

83

84

85

86

87

88

89

AUC

()

Figure 9 Co-Training performance curve of graph kernel and tree kernel on the symptom-therapeutic substance test set

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

74

76

78

80

82

84

86

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

Figure 10 The results of Tri-Training on the symptom-therapeutic substance test set

BioMed Research International 11

Table 6 The results obtained with Co-Training on the symptom-therapeutic substance test set

Combination View 119875 119877 119865 AUC

Feature kernel and graph kernelFeature kernel 7800 9398 8525 8841Graph kernel 7151 9880 8297 8644Combination 7745 9518 8540 8910

Feature kernel and tree kernelFeature kernel 7872 9357 8551 8851Tree kernel 6713 9759 7954 8175Combination 7751 9679 8566 8861

Graph kernel and tree kernelGraph kernel 7414 9558 8351 8771Tree kernel 6782 9478 7906 8014Combination 7105 9759 8223 8624

Table 7 The results of Tri-Training on symptom-therapeutic sub-stance test set

119875 119877 119865 AUCFeature kernel 7898 9357 8566 8894Graph kernel 7431 9759 8437 8778Tree kernel 6801 9478 7919 8110Method 1 7477 9880 8512 8808Method 2 7562 9839 8551 8913

the symptom-therapeutic substance corpus There are tworeasons for that First on the symptom-therapeutic substancecorpus the classifiers have better performanceTherefore theCo-training and Tri-training algorithms have less room forthe performance improvement Second as the Co-trainingand Tri-training process proceeds more unlabeled data areadded into the training set which could introduce new infor-mation for the classifiers Therefore the recalls of the classi-fiers are improved Meanwhile more noise is also introducedcausing the precision decline For the initial classifiers thehigher the precision is the less the noise is introduced in theiterative process and the performance of the classifier wouldbe improved As a summary if the initial classifiers have bigdifference the performance can be improved through twoalgorithms In the experiment when more unlabeled dataare added to the training set the difference between theclassifiers becomes smallerThus after a number of iterationsperformance could not be improved any more

35 Some Examples for Disease-Symptom and Symptom-Ther-apeutic Substance Relations Extracted from Biomedical Liter-atures Some examples for disease-symptom or symptom-therapeutic substance relations extracted from biomedicalliteratures are shown in Tables 8 and 9 Table 8 shows somesymptoms of disease C0020541 (portal hypertension) Onesentence containing the relation between portal hypertensionand its symptomC0028778 (block) is provided Table 9 showssome relations between the symptom C0028778 (block) andsome therapeutic substances in which the sentences contain-ing the relations are provided

Table 8 Some disease-symptom relations extracted from biomedi-cal literature

Disease Symptom Sentence

C0020541(portalhypertension)

C0028778 (block) C0020541 as C2825142 ofintrahepatic C0028778accounted for 83 of thepatients (C0023891 65meta-C0022346 12) and

C0018920 11

C1565860C0035357C0005775C0014867C0232338

Table 9 Some symptom-therapeutic substance relations extractedfrom biomedical literature

Symptom Therapeuticsubstance Sentence

C0028778(block)

C0017302 (generalanesthetic agents)

Use-dependent conductionC0028778 produced by

volatile C0017302

C0006400(bupivacaine)

Epidural ropivacaine isknown to produce less

motor C0028778 comparedto C0006400 at anaesthetic

concentrations

C0053241(benzoquinone)

In contrast C0053241 andhydroquinone led to

g2-C0028778 rather than toa mitotic arrest

4 Conclusions and Future Work

Models for extracting the relations between the disease-symptom and symptom-therapeutic substance are importantfor further extracting knowledge about diseases and theirpotential therapeutic substances However currently there isno corpus available to train such models To solve the prob-lem we firstmanually annotated two training sets for extract-ing the relations Then two semisupervised learning algo-rithms that is Co-Training and Tri-Training are applied toexplore the unlabeled data to boost the performance Exper-imental results show that exploiting the unlabeled data withboth Co-Training and Tri-Training algorithms can enhance

12 BioMed Research International

the performance In particular through employing threeclassifiers Tri-training is facilitated with good efficiency andgeneralization ability since it could gracefully choose exam-ples to label and use multiple classifiers to compose the finalhypothesis [28] In addition its applicability is wide because itneither requires sufficient and redundant views nor puts anyconstraint on the employed supervised learning algorithm

In the future work we will study more effective semisu-pervised learningmethods to exploit the numerous unlabeleddata pieces in the biomedical literature On the other handwewill apply the disease-symptom and symptom-therapeuticsubstance models to extract the relations between diseasesand therapeutic substances from biomedical literature andpredict the potential therapeutic substances for certain dis-eases [41]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

Acknowledgments

This work is supported by the grants from the NaturalScience Foundation of China (nos 61272373 6107009861340020 61572102 and 61572098) Trans-Century TrainingProgram Foundation for the Talents by the Ministry of Edu-cation of China (NCET-13-0084) the Fundamental ResearchFunds for the Central Universities (nos DUT13JB09 andDUT14YQ213) and the Major State Research DevelopmentProgram of China (no 2016YFC0901902)

References

[1] D Hristovski B Peterlin J A Mitchell and S M HumphreyldquoUsing literature-based discovery to identify disease candidategenesrdquo International Journal of Medical Informatics vol 74 no2ndash4 pp 289ndash298 2005

[2] M N Prichard and C Shipman Jr ldquoA three-dimensional modelto analyze drug-drug interactionsrdquo Antiviral Research vol 14no 4-5 pp 181ndash205 1990

[3] Q-C Bui S Katrenko and PM A Sloot ldquoA hybrid approach toextract protein-protein interactionsrdquo Bioinformatics vol 27 no2 pp 259ndash265 2011

[4] M Krallinger F Leitner C Rodriguez-Penagos and A Valen-cia ldquoOverview of the protein-protein interaction annotationextraction task of BioCreative IIrdquo Genome Biology vol 9supplement 2 article S4 2008

[5] I Segura Bedmar P Martinez and D Sanchez Cisneros ldquoThe1st DDIExtraction-2011 challenge task extraction of Drug-Drug Interactions from biomedical textsrdquo in Proceedings ofthe 1st Challenge task on Drug-Drug Interaction Extraction(DDIExtraction rsquo11) pp 1ndash9 Huelva Spain September 2011

[6] I Segura-Bedmar P Martınez andM Herrero-Zazo SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from Biomed-ical Texts (DDIExtraction 2013) Association for ComputationalLinguistics 2013

[7] M Krallinger and A Valencia ldquoText-mining and information-retrieval services formolecular biologyrdquoGenome Biology vol 6no 7 article 224 2005

[8] T-K Jenssen A Laeliggreid J Komorowski and E Hovig ldquoAliterature network of human genes for high-throughput analysisof gene expressionrdquo Nature Genetics vol 28 no 1 pp 21ndash282001

[9] C Blaschke M A Andrade C Ouzounis and A ValencialdquoAutomatic extraction of biological information from scientifictext protein-protein interactionsrdquo in Proceedings of the Inter-national Conference on Intelligent Systems for Molecular Biology(ISMB rsquo99) pp 60ndash67 1999

[10] P Zweigenbaum D Demner-Fushman H Yu and K BCohen ldquoFrontiers of biomedical text mining current progressrdquoBriefings in Bioinformatics vol 8 no 5 pp 358ndash375 2007

[11] Y T Yen B Chen H W Chiu Y C Lee Y C Li and C YHsu ldquoDeveloping anNLP and IR-based algorithm for analyzinggene-disease relationshipsrdquoMethods of Information inMedicinevol 45 no 3 pp 321ndash329 2006

[12] M Huang X Zhu Y Hao D G Payan K Qu and M LildquoDiscovering patterns to extract proteinndashprotein interactionsfrom full textsrdquo Bioinformatics vol 20 no 18 pp 3604ndash36122004

[13] I Tsochantaridis T Hofmann T Joachims and Y AltunldquoSupport vector machine learning for interdependent andstructured output spacesrdquo inProceedings of the 21st InternationalConference on Machine Learning (ICML rsquo04) pp 823ndash830Alberta Canada July 2004

[14] J Xiao J Su G Zhou et al ldquoProtein-protein interaction extrac-tion a supervised learning approachrdquo in Proceedings of the 1stInternational Symposium on Semantic Mining in Biomedicinepp 51ndash59 Hinxton UK April 2005

[15] L A Nielsen ldquoExtracting protein-protein interactions usingsimple contextual featuresrdquo in Proceedings of the HLT-NAACLBioNLPWorkshop on Linking Natural Language and Biology pp120ndash121 ACM June 2006

[16] A Airola S Pyysalo J Bjorne T Pahikkala F Ginter and TSalakoski ldquoAll-paths graph kernel for protein-protein interac-tion extraction with evaluation of cross-corpus learningrdquo BMCBioinformatics vol 9 no 11 article 52 2008

[17] L Qian and G Zhou ldquoTree kernel-based proteinndashproteininteraction extraction from biomedical literaturerdquo Journal ofBiomedical Informatics vol 45 no 3 pp 535ndash543 2012

[18] S Kim J Yoon J Yang and S Park ldquoWalk-weighted subse-quence kernels for protein-protein interaction extractionrdquoBMCBioinformatics vol 11 no 1 article 107 2010

[19] D J Miller and H S Uyar ldquoA mixture of experts classifierwith learning based on both labelled and unlabelled datardquo inAdvances in Neural Information Processing Systems pp 571ndash5771997

[20] T Joachims ldquoTransductive inference for text classificationusing support vector machinesrdquo in Proceedings of the 16thInternational Conference on Machine Learning (ICML rsquo99) pp200ndash209 1999

[21] X Zhu Z Ghahramani and J Lafferty ldquoSemi-supervised learn-ing using gaussian fields and harmonic functionsrdquo in Proceed-ings of the 20th International Conference on Machine Learning(ICML rsquo03) vol 3 pp 912ndash919 Washington DC USA August2003

[22] A Blum and T Mitchell ldquoCombining labeled and unlabeleddata with co-trainingrdquo in Proceedings of the 11th Annual Confer-ence on Computational LearningTheory (COLT rsquo98) pp 92ndash100ACM 1998

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 7: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

BioMed Research International 7

Feature kernelGraph kernel

Feature kernelGraph kernel

64

66

68

70

72

74

76

78F

-sco

re (

)

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 130Iterators

Figure 3 Co-Training performance curve of feature kernel and graph kernel on the disease-symptom test set

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

72747678808284868890

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

F-s

core

()

Figure 4 Co-Training performance curve of feature kernel and tree kernel on the disease-symptom test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

64

66

68

70

72

74

76

78

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 220Iterators

72

74

76

78

80

82

84

86

88

90

AUC

()

Figure 5 Co-Training performance curve of graph kernel and tree kernel on the disease-symptom test set

8 BioMed Research International

Table 3The results obtained with Co-Training on the disease-symptom test set Combinationmethod integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight

Combination View 119875 119877 119865-score AUC

Feature and graph kernelFeature kernel 8832 6797 7682 8801Graph kernel 8326 7188 7715 8754Combination 7491 8516 7971 8866

Feature and tree kernelFeature kernel 8606 6992 7715 8851Tree kernel 5780 9258 7117 7499Combination 7508 8711 8065 8718

Graph and tree kernelGraph kernel 8404 6992 7633 8604Tree kernel 5810 9531 7219 7810Combination 8243 7695 7960 8684

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

80

82

84

F-s

core

()

72

74

76

78

80

82

84

86

88

90

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

Figure 6 The Tri-Training performance on the disease-symptom test set

process [28] At the beginning of the Co-Training the num-ber of the noises is limited and unlabeled data added to thetraining set can help the classifiers improve the performanceHowever after a number of learning rounds more and morenoises introduced will cause the performance decline

332 The Performance of Tri-Training on the Disease-Symptom Test Set In our method we select three views toconduct the Tri-Training that is the feature kernel graphkernel and tree kernel In each Tri-Training round SVM isused to train the classifier on each view The parameters areset as follows u = 4000m = 300 1199011 = 100 1199012 = 0 and119873 =27 (u m 1199011 1199012 and119873 in Algorithm 2) Here 1199012 = 0 meansthat only the positive examples are added into the trainingset In this way the recall of the classifier can be improved (therecall is defined as the number of true positives divided by thetotal number of examples that actually belong to the positiveclass and usually more positive examples in the training setwill improve the recall) since it is lower compared with theprecision (see Table 2) The results are shown in Table 4 andFigure 6

Table 4 The results obtained with Tri-Training on the disease-symptom test set Method 1 integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight whileMethod 2 integrates them with a weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 8300 8008 8151 8880Graph kernel 7774 8594 8163 8980Tree kernel 5738 9414 7130 7600Method 1 7979 8789 8364 9157Method 2 7993 8555 8264 9075

Compared with the performances of the classifiers on theinitial disease-symptom test set shown in Table 2 the onesachieved through Tri-Training are significantly improvedThis shows that Tri-Training can exploit the unlabeled dataand improve the performance more effectively The reason isthat as mentioned in Section 1 the Tri-Training algorithmcan achieve satisfactory results while neither requiring theinstance space to be described with sufficient and redundant

BioMed Research International 9

Feature kernelGraph kernel

Feature kernelGraph kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

79

80

81

82

83

84

85

86

87

88

89

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

Figure 7 Co-Training performance curve of feature kernel and graph kernel on the symptom-therapeutic substance test set

views nor putting any constraints on the supervised learningmethod

In addition when three classifiers are integrated eitherwith the same weight or with a weight ratio of 4 4 2the higher F-scores and AUCs are obtained Furthermorecomparing the performance of Co-Training and Tri-Trainingshown in Tables 3 and 4 we found that in most casesTri-Training outperforms Co-Training The reason is thatthrough employing three classifiers Tri-Training is facilitatedwith good efficiency and generalization ability because itcould gracefully choose examples to label and use multipleclassifiers to compose the final hypothesis [28]

34 The Performance of the Symptom and Therapeutic Sub-stance Model Table 5 shows the performances of the clas-sifiers on the initial symptom-therapeutic substance test setSimilar to the results on the initial disease-symptom testset the feature kernel achieves the best performance whilethe tree kernel performs the worst One difference is thatwhen the three classifiers are integrated with a weight ratio of4 4 2 the higher F-score andAUC are obtainedwhile whenthey are integrated with the same weight the F-score andAUC are a little lower than those of feature kernel

341 The Performance of Co-Training on the Symptom andTherapeutic Substance Test Set Similar to that in the disease-symptom experiments the feature set for the symptom-therapeutic substance model is also divided into three viewsthe feature graph and tree kernels The experiments aredivided into three groups Each group uses the same experi-mental parameters that is u = 4000 m = 300 and p = 100The performance curves of different combinations are shownin Figures 7 8 and 9 and their final results with differentiteration times (27 26 and 9 resp) are shown in Table 6

From the figures we can draw similar conclusions as fromthe disease-symptom experiments In most cases the per-formance can be improved through the Co-Training process

Table 5 The initial results on the symptom-therapeutic substancetest set

Method 119875 119877 119865 AUCFeature kernel 7930 9076 8464 8790Graph kernel 7627 9036 8272 8730Tree kernel 6890 8273 7518 7994Method 1 7599 9277 8354 8759Method 2 7781 9438 8530 8894

while they are usually not stable since noisewill be introducedduring the learning process

342 The Performance of Tri-Training on the Symptom andTherapeutic Substance Test Set In the experiments of Tri-Training on the symptom-therapeutic substance the param-eters are set as follows u = 4000m = 300 1199011 = 100 1199012 = 0and119873 = 27 (um 1199011 1199012 and119873 in Algorithm 2) The resultsare shown in Table 7 and Figure 10

Compared with the performance of the classifiers onthe initial symptom-therapeutic substance test set shown inTable 6 the ones achieved through Tri-Training are alsoimproved as in the disease-symptom experiments This veri-fies that the Tri-Training algorithm is effective in utilizing theunlabeled data to boost the relation extraction performanceonce again When the three classifiers are integrated with aweight ratio of 4 4 2 a better AUC is obtained

Comparing the performance of Co-Training and Tri-Training on the symptom-therapeutic substance test set asshown in Tables 6 and 7 we found that in most cases Tri-Training outperforms Co-Training which is consistent withthe results achieved in the disease-symptom experimentsThis is due to the better efficiency and generalization ability ofTri-Training over Co-Training

In addition the performances of the classifiers on thedisease-symptom corpus are improved more than those on

10 BioMed Research International

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

7980818283848586878889

AUC

()

Figure 8 Co-Training performance curve of feature kernel and tree kernel on the symptom-therapeutic substance test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 90Iterators

1 2 3 4 5 6 7 8 90Iterators

74

76

78

80

82

84

86

F-s

core

()

79

80

81

82

83

84

85

86

87

88

89

AUC

()

Figure 9 Co-Training performance curve of graph kernel and tree kernel on the symptom-therapeutic substance test set

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

74

76

78

80

82

84

86

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

Figure 10 The results of Tri-Training on the symptom-therapeutic substance test set

BioMed Research International 11

Table 6 The results obtained with Co-Training on the symptom-therapeutic substance test set

Combination View 119875 119877 119865 AUC

Feature kernel and graph kernelFeature kernel 7800 9398 8525 8841Graph kernel 7151 9880 8297 8644Combination 7745 9518 8540 8910

Feature kernel and tree kernelFeature kernel 7872 9357 8551 8851Tree kernel 6713 9759 7954 8175Combination 7751 9679 8566 8861

Graph kernel and tree kernelGraph kernel 7414 9558 8351 8771Tree kernel 6782 9478 7906 8014Combination 7105 9759 8223 8624

Table 7 The results of Tri-Training on symptom-therapeutic sub-stance test set

119875 119877 119865 AUCFeature kernel 7898 9357 8566 8894Graph kernel 7431 9759 8437 8778Tree kernel 6801 9478 7919 8110Method 1 7477 9880 8512 8808Method 2 7562 9839 8551 8913

the symptom-therapeutic substance corpus There are tworeasons for that First on the symptom-therapeutic substancecorpus the classifiers have better performanceTherefore theCo-training and Tri-training algorithms have less room forthe performance improvement Second as the Co-trainingand Tri-training process proceeds more unlabeled data areadded into the training set which could introduce new infor-mation for the classifiers Therefore the recalls of the classi-fiers are improved Meanwhile more noise is also introducedcausing the precision decline For the initial classifiers thehigher the precision is the less the noise is introduced in theiterative process and the performance of the classifier wouldbe improved As a summary if the initial classifiers have bigdifference the performance can be improved through twoalgorithms In the experiment when more unlabeled dataare added to the training set the difference between theclassifiers becomes smallerThus after a number of iterationsperformance could not be improved any more

35 Some Examples for Disease-Symptom and Symptom-Ther-apeutic Substance Relations Extracted from Biomedical Liter-atures Some examples for disease-symptom or symptom-therapeutic substance relations extracted from biomedicalliteratures are shown in Tables 8 and 9 Table 8 shows somesymptoms of disease C0020541 (portal hypertension) Onesentence containing the relation between portal hypertensionand its symptomC0028778 (block) is provided Table 9 showssome relations between the symptom C0028778 (block) andsome therapeutic substances in which the sentences contain-ing the relations are provided

Table 8 Some disease-symptom relations extracted from biomedi-cal literature

Disease Symptom Sentence

C0020541(portalhypertension)

C0028778 (block) C0020541 as C2825142 ofintrahepatic C0028778accounted for 83 of thepatients (C0023891 65meta-C0022346 12) and

C0018920 11

C1565860C0035357C0005775C0014867C0232338

Table 9 Some symptom-therapeutic substance relations extractedfrom biomedical literature

Symptom Therapeuticsubstance Sentence

C0028778(block)

C0017302 (generalanesthetic agents)

Use-dependent conductionC0028778 produced by

volatile C0017302

C0006400(bupivacaine)

Epidural ropivacaine isknown to produce less

motor C0028778 comparedto C0006400 at anaesthetic

concentrations

C0053241(benzoquinone)

In contrast C0053241 andhydroquinone led to

g2-C0028778 rather than toa mitotic arrest

4 Conclusions and Future Work

Models for extracting the relations between the disease-symptom and symptom-therapeutic substance are importantfor further extracting knowledge about diseases and theirpotential therapeutic substances However currently there isno corpus available to train such models To solve the prob-lem we firstmanually annotated two training sets for extract-ing the relations Then two semisupervised learning algo-rithms that is Co-Training and Tri-Training are applied toexplore the unlabeled data to boost the performance Exper-imental results show that exploiting the unlabeled data withboth Co-Training and Tri-Training algorithms can enhance

12 BioMed Research International

the performance In particular through employing threeclassifiers Tri-training is facilitated with good efficiency andgeneralization ability since it could gracefully choose exam-ples to label and use multiple classifiers to compose the finalhypothesis [28] In addition its applicability is wide because itneither requires sufficient and redundant views nor puts anyconstraint on the employed supervised learning algorithm

In the future work we will study more effective semisu-pervised learningmethods to exploit the numerous unlabeleddata pieces in the biomedical literature On the other handwewill apply the disease-symptom and symptom-therapeuticsubstance models to extract the relations between diseasesand therapeutic substances from biomedical literature andpredict the potential therapeutic substances for certain dis-eases [41]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

Acknowledgments

This work is supported by the grants from the NaturalScience Foundation of China (nos 61272373 6107009861340020 61572102 and 61572098) Trans-Century TrainingProgram Foundation for the Talents by the Ministry of Edu-cation of China (NCET-13-0084) the Fundamental ResearchFunds for the Central Universities (nos DUT13JB09 andDUT14YQ213) and the Major State Research DevelopmentProgram of China (no 2016YFC0901902)

References

[1] D Hristovski B Peterlin J A Mitchell and S M HumphreyldquoUsing literature-based discovery to identify disease candidategenesrdquo International Journal of Medical Informatics vol 74 no2ndash4 pp 289ndash298 2005

[2] M N Prichard and C Shipman Jr ldquoA three-dimensional modelto analyze drug-drug interactionsrdquo Antiviral Research vol 14no 4-5 pp 181ndash205 1990

[3] Q-C Bui S Katrenko and PM A Sloot ldquoA hybrid approach toextract protein-protein interactionsrdquo Bioinformatics vol 27 no2 pp 259ndash265 2011

[4] M Krallinger F Leitner C Rodriguez-Penagos and A Valen-cia ldquoOverview of the protein-protein interaction annotationextraction task of BioCreative IIrdquo Genome Biology vol 9supplement 2 article S4 2008

[5] I Segura Bedmar P Martinez and D Sanchez Cisneros ldquoThe1st DDIExtraction-2011 challenge task extraction of Drug-Drug Interactions from biomedical textsrdquo in Proceedings ofthe 1st Challenge task on Drug-Drug Interaction Extraction(DDIExtraction rsquo11) pp 1ndash9 Huelva Spain September 2011

[6] I Segura-Bedmar P Martınez andM Herrero-Zazo SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from Biomed-ical Texts (DDIExtraction 2013) Association for ComputationalLinguistics 2013

[7] M Krallinger and A Valencia ldquoText-mining and information-retrieval services formolecular biologyrdquoGenome Biology vol 6no 7 article 224 2005

[8] T-K Jenssen A Laeliggreid J Komorowski and E Hovig ldquoAliterature network of human genes for high-throughput analysisof gene expressionrdquo Nature Genetics vol 28 no 1 pp 21ndash282001

[9] C Blaschke M A Andrade C Ouzounis and A ValencialdquoAutomatic extraction of biological information from scientifictext protein-protein interactionsrdquo in Proceedings of the Inter-national Conference on Intelligent Systems for Molecular Biology(ISMB rsquo99) pp 60ndash67 1999

[10] P Zweigenbaum D Demner-Fushman H Yu and K BCohen ldquoFrontiers of biomedical text mining current progressrdquoBriefings in Bioinformatics vol 8 no 5 pp 358ndash375 2007

[11] Y T Yen B Chen H W Chiu Y C Lee Y C Li and C YHsu ldquoDeveloping anNLP and IR-based algorithm for analyzinggene-disease relationshipsrdquoMethods of Information inMedicinevol 45 no 3 pp 321ndash329 2006

[12] M Huang X Zhu Y Hao D G Payan K Qu and M LildquoDiscovering patterns to extract proteinndashprotein interactionsfrom full textsrdquo Bioinformatics vol 20 no 18 pp 3604ndash36122004

[13] I Tsochantaridis T Hofmann T Joachims and Y AltunldquoSupport vector machine learning for interdependent andstructured output spacesrdquo inProceedings of the 21st InternationalConference on Machine Learning (ICML rsquo04) pp 823ndash830Alberta Canada July 2004

[14] J Xiao J Su G Zhou et al ldquoProtein-protein interaction extrac-tion a supervised learning approachrdquo in Proceedings of the 1stInternational Symposium on Semantic Mining in Biomedicinepp 51ndash59 Hinxton UK April 2005

[15] L A Nielsen ldquoExtracting protein-protein interactions usingsimple contextual featuresrdquo in Proceedings of the HLT-NAACLBioNLPWorkshop on Linking Natural Language and Biology pp120ndash121 ACM June 2006

[16] A Airola S Pyysalo J Bjorne T Pahikkala F Ginter and TSalakoski ldquoAll-paths graph kernel for protein-protein interac-tion extraction with evaluation of cross-corpus learningrdquo BMCBioinformatics vol 9 no 11 article 52 2008

[17] L Qian and G Zhou ldquoTree kernel-based proteinndashproteininteraction extraction from biomedical literaturerdquo Journal ofBiomedical Informatics vol 45 no 3 pp 535ndash543 2012

[18] S Kim J Yoon J Yang and S Park ldquoWalk-weighted subse-quence kernels for protein-protein interaction extractionrdquoBMCBioinformatics vol 11 no 1 article 107 2010

[19] D J Miller and H S Uyar ldquoA mixture of experts classifierwith learning based on both labelled and unlabelled datardquo inAdvances in Neural Information Processing Systems pp 571ndash5771997

[20] T Joachims ldquoTransductive inference for text classificationusing support vector machinesrdquo in Proceedings of the 16thInternational Conference on Machine Learning (ICML rsquo99) pp200ndash209 1999

[21] X Zhu Z Ghahramani and J Lafferty ldquoSemi-supervised learn-ing using gaussian fields and harmonic functionsrdquo in Proceed-ings of the 20th International Conference on Machine Learning(ICML rsquo03) vol 3 pp 912ndash919 Washington DC USA August2003

[22] A Blum and T Mitchell ldquoCombining labeled and unlabeleddata with co-trainingrdquo in Proceedings of the 11th Annual Confer-ence on Computational LearningTheory (COLT rsquo98) pp 92ndash100ACM 1998

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 8: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

8 BioMed Research International

Table 3The results obtained with Co-Training on the disease-symptom test set Combinationmethod integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight

Combination View 119875 119877 119865-score AUC

Feature and graph kernelFeature kernel 8832 6797 7682 8801Graph kernel 8326 7188 7715 8754Combination 7491 8516 7971 8866

Feature and tree kernelFeature kernel 8606 6992 7715 8851Tree kernel 5780 9258 7117 7499Combination 7508 8711 8065 8718

Graph and tree kernelGraph kernel 8404 6992 7633 8604Tree kernel 5810 9531 7219 7810Combination 8243 7695 7960 8684

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

64

66

68

70

72

74

76

78

80

82

84

F-s

core

()

72

74

76

78

80

82

84

86

88

90

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

Figure 6 The Tri-Training performance on the disease-symptom test set

process [28] At the beginning of the Co-Training the num-ber of the noises is limited and unlabeled data added to thetraining set can help the classifiers improve the performanceHowever after a number of learning rounds more and morenoises introduced will cause the performance decline

332 The Performance of Tri-Training on the Disease-Symptom Test Set In our method we select three views toconduct the Tri-Training that is the feature kernel graphkernel and tree kernel In each Tri-Training round SVM isused to train the classifier on each view The parameters areset as follows u = 4000m = 300 1199011 = 100 1199012 = 0 and119873 =27 (u m 1199011 1199012 and119873 in Algorithm 2) Here 1199012 = 0 meansthat only the positive examples are added into the trainingset In this way the recall of the classifier can be improved (therecall is defined as the number of true positives divided by thetotal number of examples that actually belong to the positiveclass and usually more positive examples in the training setwill improve the recall) since it is lower compared with theprecision (see Table 2) The results are shown in Table 4 andFigure 6

Table 4 The results obtained with Tri-Training on the disease-symptom test set Method 1 integrates three classifiers (the featurekernel graph kernel and tree kernel) with the same weight whileMethod 2 integrates them with a weight ratio of 4 4 2

Method 119875 119877 119865-score AUCFeature kernel 8300 8008 8151 8880Graph kernel 7774 8594 8163 8980Tree kernel 5738 9414 7130 7600Method 1 7979 8789 8364 9157Method 2 7993 8555 8264 9075

Compared with the performances of the classifiers on theinitial disease-symptom test set shown in Table 2 the onesachieved through Tri-Training are significantly improvedThis shows that Tri-Training can exploit the unlabeled dataand improve the performance more effectively The reason isthat as mentioned in Section 1 the Tri-Training algorithmcan achieve satisfactory results while neither requiring theinstance space to be described with sufficient and redundant

BioMed Research International 9

Feature kernelGraph kernel

Feature kernelGraph kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

79

80

81

82

83

84

85

86

87

88

89

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

Figure 7 Co-Training performance curve of feature kernel and graph kernel on the symptom-therapeutic substance test set

views nor putting any constraints on the supervised learningmethod

In addition when three classifiers are integrated eitherwith the same weight or with a weight ratio of 4 4 2the higher F-scores and AUCs are obtained Furthermorecomparing the performance of Co-Training and Tri-Trainingshown in Tables 3 and 4 we found that in most casesTri-Training outperforms Co-Training The reason is thatthrough employing three classifiers Tri-Training is facilitatedwith good efficiency and generalization ability because itcould gracefully choose examples to label and use multipleclassifiers to compose the final hypothesis [28]

34 The Performance of the Symptom and Therapeutic Sub-stance Model Table 5 shows the performances of the clas-sifiers on the initial symptom-therapeutic substance test setSimilar to the results on the initial disease-symptom testset the feature kernel achieves the best performance whilethe tree kernel performs the worst One difference is thatwhen the three classifiers are integrated with a weight ratio of4 4 2 the higher F-score andAUC are obtainedwhile whenthey are integrated with the same weight the F-score andAUC are a little lower than those of feature kernel

341 The Performance of Co-Training on the Symptom andTherapeutic Substance Test Set Similar to that in the disease-symptom experiments the feature set for the symptom-therapeutic substance model is also divided into three viewsthe feature graph and tree kernels The experiments aredivided into three groups Each group uses the same experi-mental parameters that is u = 4000 m = 300 and p = 100The performance curves of different combinations are shownin Figures 7 8 and 9 and their final results with differentiteration times (27 26 and 9 resp) are shown in Table 6

From the figures we can draw similar conclusions as fromthe disease-symptom experiments In most cases the per-formance can be improved through the Co-Training process

Table 5 The initial results on the symptom-therapeutic substancetest set

Method 119875 119877 119865 AUCFeature kernel 7930 9076 8464 8790Graph kernel 7627 9036 8272 8730Tree kernel 6890 8273 7518 7994Method 1 7599 9277 8354 8759Method 2 7781 9438 8530 8894

while they are usually not stable since noisewill be introducedduring the learning process

342 The Performance of Tri-Training on the Symptom andTherapeutic Substance Test Set In the experiments of Tri-Training on the symptom-therapeutic substance the param-eters are set as follows u = 4000m = 300 1199011 = 100 1199012 = 0and119873 = 27 (um 1199011 1199012 and119873 in Algorithm 2) The resultsare shown in Table 7 and Figure 10

Compared with the performance of the classifiers onthe initial symptom-therapeutic substance test set shown inTable 6 the ones achieved through Tri-Training are alsoimproved as in the disease-symptom experiments This veri-fies that the Tri-Training algorithm is effective in utilizing theunlabeled data to boost the relation extraction performanceonce again When the three classifiers are integrated with aweight ratio of 4 4 2 a better AUC is obtained

Comparing the performance of Co-Training and Tri-Training on the symptom-therapeutic substance test set asshown in Tables 6 and 7 we found that in most cases Tri-Training outperforms Co-Training which is consistent withthe results achieved in the disease-symptom experimentsThis is due to the better efficiency and generalization ability ofTri-Training over Co-Training

In addition the performances of the classifiers on thedisease-symptom corpus are improved more than those on

10 BioMed Research International

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

7980818283848586878889

AUC

()

Figure 8 Co-Training performance curve of feature kernel and tree kernel on the symptom-therapeutic substance test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 90Iterators

1 2 3 4 5 6 7 8 90Iterators

74

76

78

80

82

84

86

F-s

core

()

79

80

81

82

83

84

85

86

87

88

89

AUC

()

Figure 9 Co-Training performance curve of graph kernel and tree kernel on the symptom-therapeutic substance test set

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

74

76

78

80

82

84

86

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

Figure 10 The results of Tri-Training on the symptom-therapeutic substance test set

BioMed Research International 11

Table 6 The results obtained with Co-Training on the symptom-therapeutic substance test set

Combination View 119875 119877 119865 AUC

Feature kernel and graph kernelFeature kernel 7800 9398 8525 8841Graph kernel 7151 9880 8297 8644Combination 7745 9518 8540 8910

Feature kernel and tree kernelFeature kernel 7872 9357 8551 8851Tree kernel 6713 9759 7954 8175Combination 7751 9679 8566 8861

Graph kernel and tree kernelGraph kernel 7414 9558 8351 8771Tree kernel 6782 9478 7906 8014Combination 7105 9759 8223 8624

Table 7 The results of Tri-Training on symptom-therapeutic sub-stance test set

119875 119877 119865 AUCFeature kernel 7898 9357 8566 8894Graph kernel 7431 9759 8437 8778Tree kernel 6801 9478 7919 8110Method 1 7477 9880 8512 8808Method 2 7562 9839 8551 8913

the symptom-therapeutic substance corpus There are tworeasons for that First on the symptom-therapeutic substancecorpus the classifiers have better performanceTherefore theCo-training and Tri-training algorithms have less room forthe performance improvement Second as the Co-trainingand Tri-training process proceeds more unlabeled data areadded into the training set which could introduce new infor-mation for the classifiers Therefore the recalls of the classi-fiers are improved Meanwhile more noise is also introducedcausing the precision decline For the initial classifiers thehigher the precision is the less the noise is introduced in theiterative process and the performance of the classifier wouldbe improved As a summary if the initial classifiers have bigdifference the performance can be improved through twoalgorithms In the experiment when more unlabeled dataare added to the training set the difference between theclassifiers becomes smallerThus after a number of iterationsperformance could not be improved any more

35 Some Examples for Disease-Symptom and Symptom-Ther-apeutic Substance Relations Extracted from Biomedical Liter-atures Some examples for disease-symptom or symptom-therapeutic substance relations extracted from biomedicalliteratures are shown in Tables 8 and 9 Table 8 shows somesymptoms of disease C0020541 (portal hypertension) Onesentence containing the relation between portal hypertensionand its symptomC0028778 (block) is provided Table 9 showssome relations between the symptom C0028778 (block) andsome therapeutic substances in which the sentences contain-ing the relations are provided

Table 8 Some disease-symptom relations extracted from biomedi-cal literature

Disease Symptom Sentence

C0020541(portalhypertension)

C0028778 (block) C0020541 as C2825142 ofintrahepatic C0028778accounted for 83 of thepatients (C0023891 65meta-C0022346 12) and

C0018920 11

C1565860C0035357C0005775C0014867C0232338

Table 9 Some symptom-therapeutic substance relations extractedfrom biomedical literature

Symptom Therapeuticsubstance Sentence

C0028778(block)

C0017302 (generalanesthetic agents)

Use-dependent conductionC0028778 produced by

volatile C0017302

C0006400(bupivacaine)

Epidural ropivacaine isknown to produce less

motor C0028778 comparedto C0006400 at anaesthetic

concentrations

C0053241(benzoquinone)

In contrast C0053241 andhydroquinone led to

g2-C0028778 rather than toa mitotic arrest

4 Conclusions and Future Work

Models for extracting the relations between the disease-symptom and symptom-therapeutic substance are importantfor further extracting knowledge about diseases and theirpotential therapeutic substances However currently there isno corpus available to train such models To solve the prob-lem we firstmanually annotated two training sets for extract-ing the relations Then two semisupervised learning algo-rithms that is Co-Training and Tri-Training are applied toexplore the unlabeled data to boost the performance Exper-imental results show that exploiting the unlabeled data withboth Co-Training and Tri-Training algorithms can enhance

12 BioMed Research International

the performance In particular through employing threeclassifiers Tri-training is facilitated with good efficiency andgeneralization ability since it could gracefully choose exam-ples to label and use multiple classifiers to compose the finalhypothesis [28] In addition its applicability is wide because itneither requires sufficient and redundant views nor puts anyconstraint on the employed supervised learning algorithm

In the future work we will study more effective semisu-pervised learningmethods to exploit the numerous unlabeleddata pieces in the biomedical literature On the other handwewill apply the disease-symptom and symptom-therapeuticsubstance models to extract the relations between diseasesand therapeutic substances from biomedical literature andpredict the potential therapeutic substances for certain dis-eases [41]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

Acknowledgments

This work is supported by the grants from the NaturalScience Foundation of China (nos 61272373 6107009861340020 61572102 and 61572098) Trans-Century TrainingProgram Foundation for the Talents by the Ministry of Edu-cation of China (NCET-13-0084) the Fundamental ResearchFunds for the Central Universities (nos DUT13JB09 andDUT14YQ213) and the Major State Research DevelopmentProgram of China (no 2016YFC0901902)

References

[1] D Hristovski B Peterlin J A Mitchell and S M HumphreyldquoUsing literature-based discovery to identify disease candidategenesrdquo International Journal of Medical Informatics vol 74 no2ndash4 pp 289ndash298 2005

[2] M N Prichard and C Shipman Jr ldquoA three-dimensional modelto analyze drug-drug interactionsrdquo Antiviral Research vol 14no 4-5 pp 181ndash205 1990

[3] Q-C Bui S Katrenko and PM A Sloot ldquoA hybrid approach toextract protein-protein interactionsrdquo Bioinformatics vol 27 no2 pp 259ndash265 2011

[4] M Krallinger F Leitner C Rodriguez-Penagos and A Valen-cia ldquoOverview of the protein-protein interaction annotationextraction task of BioCreative IIrdquo Genome Biology vol 9supplement 2 article S4 2008

[5] I Segura Bedmar P Martinez and D Sanchez Cisneros ldquoThe1st DDIExtraction-2011 challenge task extraction of Drug-Drug Interactions from biomedical textsrdquo in Proceedings ofthe 1st Challenge task on Drug-Drug Interaction Extraction(DDIExtraction rsquo11) pp 1ndash9 Huelva Spain September 2011

[6] I Segura-Bedmar P Martınez andM Herrero-Zazo SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from Biomed-ical Texts (DDIExtraction 2013) Association for ComputationalLinguistics 2013

[7] M Krallinger and A Valencia ldquoText-mining and information-retrieval services formolecular biologyrdquoGenome Biology vol 6no 7 article 224 2005

[8] T-K Jenssen A Laeliggreid J Komorowski and E Hovig ldquoAliterature network of human genes for high-throughput analysisof gene expressionrdquo Nature Genetics vol 28 no 1 pp 21ndash282001

[9] C Blaschke M A Andrade C Ouzounis and A ValencialdquoAutomatic extraction of biological information from scientifictext protein-protein interactionsrdquo in Proceedings of the Inter-national Conference on Intelligent Systems for Molecular Biology(ISMB rsquo99) pp 60ndash67 1999

[10] P Zweigenbaum D Demner-Fushman H Yu and K BCohen ldquoFrontiers of biomedical text mining current progressrdquoBriefings in Bioinformatics vol 8 no 5 pp 358ndash375 2007

[11] Y T Yen B Chen H W Chiu Y C Lee Y C Li and C YHsu ldquoDeveloping anNLP and IR-based algorithm for analyzinggene-disease relationshipsrdquoMethods of Information inMedicinevol 45 no 3 pp 321ndash329 2006

[12] M Huang X Zhu Y Hao D G Payan K Qu and M LildquoDiscovering patterns to extract proteinndashprotein interactionsfrom full textsrdquo Bioinformatics vol 20 no 18 pp 3604ndash36122004

[13] I Tsochantaridis T Hofmann T Joachims and Y AltunldquoSupport vector machine learning for interdependent andstructured output spacesrdquo inProceedings of the 21st InternationalConference on Machine Learning (ICML rsquo04) pp 823ndash830Alberta Canada July 2004

[14] J Xiao J Su G Zhou et al ldquoProtein-protein interaction extrac-tion a supervised learning approachrdquo in Proceedings of the 1stInternational Symposium on Semantic Mining in Biomedicinepp 51ndash59 Hinxton UK April 2005

[15] L A Nielsen ldquoExtracting protein-protein interactions usingsimple contextual featuresrdquo in Proceedings of the HLT-NAACLBioNLPWorkshop on Linking Natural Language and Biology pp120ndash121 ACM June 2006

[16] A Airola S Pyysalo J Bjorne T Pahikkala F Ginter and TSalakoski ldquoAll-paths graph kernel for protein-protein interac-tion extraction with evaluation of cross-corpus learningrdquo BMCBioinformatics vol 9 no 11 article 52 2008

[17] L Qian and G Zhou ldquoTree kernel-based proteinndashproteininteraction extraction from biomedical literaturerdquo Journal ofBiomedical Informatics vol 45 no 3 pp 535ndash543 2012

[18] S Kim J Yoon J Yang and S Park ldquoWalk-weighted subse-quence kernels for protein-protein interaction extractionrdquoBMCBioinformatics vol 11 no 1 article 107 2010

[19] D J Miller and H S Uyar ldquoA mixture of experts classifierwith learning based on both labelled and unlabelled datardquo inAdvances in Neural Information Processing Systems pp 571ndash5771997

[20] T Joachims ldquoTransductive inference for text classificationusing support vector machinesrdquo in Proceedings of the 16thInternational Conference on Machine Learning (ICML rsquo99) pp200ndash209 1999

[21] X Zhu Z Ghahramani and J Lafferty ldquoSemi-supervised learn-ing using gaussian fields and harmonic functionsrdquo in Proceed-ings of the 20th International Conference on Machine Learning(ICML rsquo03) vol 3 pp 912ndash919 Washington DC USA August2003

[22] A Blum and T Mitchell ldquoCombining labeled and unlabeleddata with co-trainingrdquo in Proceedings of the 11th Annual Confer-ence on Computational LearningTheory (COLT rsquo98) pp 92ndash100ACM 1998

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 9: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

BioMed Research International 9

Feature kernelGraph kernel

Feature kernelGraph kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

79

80

81

82

83

84

85

86

87

88

89

AUC

()

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

Figure 7 Co-Training performance curve of feature kernel and graph kernel on the symptom-therapeutic substance test set

views nor putting any constraints on the supervised learningmethod

In addition when three classifiers are integrated eitherwith the same weight or with a weight ratio of 4 4 2the higher F-scores and AUCs are obtained Furthermorecomparing the performance of Co-Training and Tri-Trainingshown in Tables 3 and 4 we found that in most casesTri-Training outperforms Co-Training The reason is thatthrough employing three classifiers Tri-Training is facilitatedwith good efficiency and generalization ability because itcould gracefully choose examples to label and use multipleclassifiers to compose the final hypothesis [28]

34 The Performance of the Symptom and Therapeutic Sub-stance Model Table 5 shows the performances of the clas-sifiers on the initial symptom-therapeutic substance test setSimilar to the results on the initial disease-symptom testset the feature kernel achieves the best performance whilethe tree kernel performs the worst One difference is thatwhen the three classifiers are integrated with a weight ratio of4 4 2 the higher F-score andAUC are obtainedwhile whenthey are integrated with the same weight the F-score andAUC are a little lower than those of feature kernel

341 The Performance of Co-Training on the Symptom andTherapeutic Substance Test Set Similar to that in the disease-symptom experiments the feature set for the symptom-therapeutic substance model is also divided into three viewsthe feature graph and tree kernels The experiments aredivided into three groups Each group uses the same experi-mental parameters that is u = 4000 m = 300 and p = 100The performance curves of different combinations are shownin Figures 7 8 and 9 and their final results with differentiteration times (27 26 and 9 resp) are shown in Table 6

From the figures we can draw similar conclusions as fromthe disease-symptom experiments In most cases the per-formance can be improved through the Co-Training process

Table 5 The initial results on the symptom-therapeutic substancetest set

Method 119875 119877 119865 AUCFeature kernel 7930 9076 8464 8790Graph kernel 7627 9036 8272 8730Tree kernel 6890 8273 7518 7994Method 1 7599 9277 8354 8759Method 2 7781 9438 8530 8894

while they are usually not stable since noisewill be introducedduring the learning process

342 The Performance of Tri-Training on the Symptom andTherapeutic Substance Test Set In the experiments of Tri-Training on the symptom-therapeutic substance the param-eters are set as follows u = 4000m = 300 1199011 = 100 1199012 = 0and119873 = 27 (um 1199011 1199012 and119873 in Algorithm 2) The resultsare shown in Table 7 and Figure 10

Compared with the performance of the classifiers onthe initial symptom-therapeutic substance test set shown inTable 6 the ones achieved through Tri-Training are alsoimproved as in the disease-symptom experiments This veri-fies that the Tri-Training algorithm is effective in utilizing theunlabeled data to boost the relation extraction performanceonce again When the three classifiers are integrated with aweight ratio of 4 4 2 a better AUC is obtained

Comparing the performance of Co-Training and Tri-Training on the symptom-therapeutic substance test set asshown in Tables 6 and 7 we found that in most cases Tri-Training outperforms Co-Training which is consistent withthe results achieved in the disease-symptom experimentsThis is due to the better efficiency and generalization ability ofTri-Training over Co-Training

In addition the performances of the classifiers on thedisease-symptom corpus are improved more than those on

10 BioMed Research International

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

7980818283848586878889

AUC

()

Figure 8 Co-Training performance curve of feature kernel and tree kernel on the symptom-therapeutic substance test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 90Iterators

1 2 3 4 5 6 7 8 90Iterators

74

76

78

80

82

84

86

F-s

core

()

79

80

81

82

83

84

85

86

87

88

89

AUC

()

Figure 9 Co-Training performance curve of graph kernel and tree kernel on the symptom-therapeutic substance test set

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

74

76

78

80

82

84

86

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

Figure 10 The results of Tri-Training on the symptom-therapeutic substance test set

BioMed Research International 11

Table 6 The results obtained with Co-Training on the symptom-therapeutic substance test set

Combination View 119875 119877 119865 AUC

Feature kernel and graph kernelFeature kernel 7800 9398 8525 8841Graph kernel 7151 9880 8297 8644Combination 7745 9518 8540 8910

Feature kernel and tree kernelFeature kernel 7872 9357 8551 8851Tree kernel 6713 9759 7954 8175Combination 7751 9679 8566 8861

Graph kernel and tree kernelGraph kernel 7414 9558 8351 8771Tree kernel 6782 9478 7906 8014Combination 7105 9759 8223 8624

Table 7 The results of Tri-Training on symptom-therapeutic sub-stance test set

119875 119877 119865 AUCFeature kernel 7898 9357 8566 8894Graph kernel 7431 9759 8437 8778Tree kernel 6801 9478 7919 8110Method 1 7477 9880 8512 8808Method 2 7562 9839 8551 8913

the symptom-therapeutic substance corpus There are tworeasons for that First on the symptom-therapeutic substancecorpus the classifiers have better performanceTherefore theCo-training and Tri-training algorithms have less room forthe performance improvement Second as the Co-trainingand Tri-training process proceeds more unlabeled data areadded into the training set which could introduce new infor-mation for the classifiers Therefore the recalls of the classi-fiers are improved Meanwhile more noise is also introducedcausing the precision decline For the initial classifiers thehigher the precision is the less the noise is introduced in theiterative process and the performance of the classifier wouldbe improved As a summary if the initial classifiers have bigdifference the performance can be improved through twoalgorithms In the experiment when more unlabeled dataare added to the training set the difference between theclassifiers becomes smallerThus after a number of iterationsperformance could not be improved any more

35 Some Examples for Disease-Symptom and Symptom-Ther-apeutic Substance Relations Extracted from Biomedical Liter-atures Some examples for disease-symptom or symptom-therapeutic substance relations extracted from biomedicalliteratures are shown in Tables 8 and 9 Table 8 shows somesymptoms of disease C0020541 (portal hypertension) Onesentence containing the relation between portal hypertensionand its symptomC0028778 (block) is provided Table 9 showssome relations between the symptom C0028778 (block) andsome therapeutic substances in which the sentences contain-ing the relations are provided

Table 8 Some disease-symptom relations extracted from biomedi-cal literature

Disease Symptom Sentence

C0020541(portalhypertension)

C0028778 (block) C0020541 as C2825142 ofintrahepatic C0028778accounted for 83 of thepatients (C0023891 65meta-C0022346 12) and

C0018920 11

C1565860C0035357C0005775C0014867C0232338

Table 9 Some symptom-therapeutic substance relations extractedfrom biomedical literature

Symptom Therapeuticsubstance Sentence

C0028778(block)

C0017302 (generalanesthetic agents)

Use-dependent conductionC0028778 produced by

volatile C0017302

C0006400(bupivacaine)

Epidural ropivacaine isknown to produce less

motor C0028778 comparedto C0006400 at anaesthetic

concentrations

C0053241(benzoquinone)

In contrast C0053241 andhydroquinone led to

g2-C0028778 rather than toa mitotic arrest

4 Conclusions and Future Work

Models for extracting the relations between the disease-symptom and symptom-therapeutic substance are importantfor further extracting knowledge about diseases and theirpotential therapeutic substances However currently there isno corpus available to train such models To solve the prob-lem we firstmanually annotated two training sets for extract-ing the relations Then two semisupervised learning algo-rithms that is Co-Training and Tri-Training are applied toexplore the unlabeled data to boost the performance Exper-imental results show that exploiting the unlabeled data withboth Co-Training and Tri-Training algorithms can enhance

12 BioMed Research International

the performance In particular through employing threeclassifiers Tri-training is facilitated with good efficiency andgeneralization ability since it could gracefully choose exam-ples to label and use multiple classifiers to compose the finalhypothesis [28] In addition its applicability is wide because itneither requires sufficient and redundant views nor puts anyconstraint on the employed supervised learning algorithm

In the future work we will study more effective semisu-pervised learningmethods to exploit the numerous unlabeleddata pieces in the biomedical literature On the other handwewill apply the disease-symptom and symptom-therapeuticsubstance models to extract the relations between diseasesand therapeutic substances from biomedical literature andpredict the potential therapeutic substances for certain dis-eases [41]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

Acknowledgments

This work is supported by the grants from the NaturalScience Foundation of China (nos 61272373 6107009861340020 61572102 and 61572098) Trans-Century TrainingProgram Foundation for the Talents by the Ministry of Edu-cation of China (NCET-13-0084) the Fundamental ResearchFunds for the Central Universities (nos DUT13JB09 andDUT14YQ213) and the Major State Research DevelopmentProgram of China (no 2016YFC0901902)

References

[1] D Hristovski B Peterlin J A Mitchell and S M HumphreyldquoUsing literature-based discovery to identify disease candidategenesrdquo International Journal of Medical Informatics vol 74 no2ndash4 pp 289ndash298 2005

[2] M N Prichard and C Shipman Jr ldquoA three-dimensional modelto analyze drug-drug interactionsrdquo Antiviral Research vol 14no 4-5 pp 181ndash205 1990

[3] Q-C Bui S Katrenko and PM A Sloot ldquoA hybrid approach toextract protein-protein interactionsrdquo Bioinformatics vol 27 no2 pp 259ndash265 2011

[4] M Krallinger F Leitner C Rodriguez-Penagos and A Valen-cia ldquoOverview of the protein-protein interaction annotationextraction task of BioCreative IIrdquo Genome Biology vol 9supplement 2 article S4 2008

[5] I Segura Bedmar P Martinez and D Sanchez Cisneros ldquoThe1st DDIExtraction-2011 challenge task extraction of Drug-Drug Interactions from biomedical textsrdquo in Proceedings ofthe 1st Challenge task on Drug-Drug Interaction Extraction(DDIExtraction rsquo11) pp 1ndash9 Huelva Spain September 2011

[6] I Segura-Bedmar P Martınez andM Herrero-Zazo SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from Biomed-ical Texts (DDIExtraction 2013) Association for ComputationalLinguistics 2013

[7] M Krallinger and A Valencia ldquoText-mining and information-retrieval services formolecular biologyrdquoGenome Biology vol 6no 7 article 224 2005

[8] T-K Jenssen A Laeliggreid J Komorowski and E Hovig ldquoAliterature network of human genes for high-throughput analysisof gene expressionrdquo Nature Genetics vol 28 no 1 pp 21ndash282001

[9] C Blaschke M A Andrade C Ouzounis and A ValencialdquoAutomatic extraction of biological information from scientifictext protein-protein interactionsrdquo in Proceedings of the Inter-national Conference on Intelligent Systems for Molecular Biology(ISMB rsquo99) pp 60ndash67 1999

[10] P Zweigenbaum D Demner-Fushman H Yu and K BCohen ldquoFrontiers of biomedical text mining current progressrdquoBriefings in Bioinformatics vol 8 no 5 pp 358ndash375 2007

[11] Y T Yen B Chen H W Chiu Y C Lee Y C Li and C YHsu ldquoDeveloping anNLP and IR-based algorithm for analyzinggene-disease relationshipsrdquoMethods of Information inMedicinevol 45 no 3 pp 321ndash329 2006

[12] M Huang X Zhu Y Hao D G Payan K Qu and M LildquoDiscovering patterns to extract proteinndashprotein interactionsfrom full textsrdquo Bioinformatics vol 20 no 18 pp 3604ndash36122004

[13] I Tsochantaridis T Hofmann T Joachims and Y AltunldquoSupport vector machine learning for interdependent andstructured output spacesrdquo inProceedings of the 21st InternationalConference on Machine Learning (ICML rsquo04) pp 823ndash830Alberta Canada July 2004

[14] J Xiao J Su G Zhou et al ldquoProtein-protein interaction extrac-tion a supervised learning approachrdquo in Proceedings of the 1stInternational Symposium on Semantic Mining in Biomedicinepp 51ndash59 Hinxton UK April 2005

[15] L A Nielsen ldquoExtracting protein-protein interactions usingsimple contextual featuresrdquo in Proceedings of the HLT-NAACLBioNLPWorkshop on Linking Natural Language and Biology pp120ndash121 ACM June 2006

[16] A Airola S Pyysalo J Bjorne T Pahikkala F Ginter and TSalakoski ldquoAll-paths graph kernel for protein-protein interac-tion extraction with evaluation of cross-corpus learningrdquo BMCBioinformatics vol 9 no 11 article 52 2008

[17] L Qian and G Zhou ldquoTree kernel-based proteinndashproteininteraction extraction from biomedical literaturerdquo Journal ofBiomedical Informatics vol 45 no 3 pp 535ndash543 2012

[18] S Kim J Yoon J Yang and S Park ldquoWalk-weighted subse-quence kernels for protein-protein interaction extractionrdquoBMCBioinformatics vol 11 no 1 article 107 2010

[19] D J Miller and H S Uyar ldquoA mixture of experts classifierwith learning based on both labelled and unlabelled datardquo inAdvances in Neural Information Processing Systems pp 571ndash5771997

[20] T Joachims ldquoTransductive inference for text classificationusing support vector machinesrdquo in Proceedings of the 16thInternational Conference on Machine Learning (ICML rsquo99) pp200ndash209 1999

[21] X Zhu Z Ghahramani and J Lafferty ldquoSemi-supervised learn-ing using gaussian fields and harmonic functionsrdquo in Proceed-ings of the 20th International Conference on Machine Learning(ICML rsquo03) vol 3 pp 912ndash919 Washington DC USA August2003

[22] A Blum and T Mitchell ldquoCombining labeled and unlabeleddata with co-trainingrdquo in Proceedings of the 11th Annual Confer-ence on Computational LearningTheory (COLT rsquo98) pp 92ndash100ACM 1998

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 10: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

10 BioMed Research International

Feature kernelTree kernel

Feature kernelTree kernel

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

74

76

78

80

82

84

86F

-sco

re (

)

2 4 6 8 10 12 14 16 18 20 22 24 260Iterators

7980818283848586878889

AUC

()

Figure 8 Co-Training performance curve of feature kernel and tree kernel on the symptom-therapeutic substance test set

Graph kernelTree kernel

Graph kernelTree kernel

1 2 3 4 5 6 7 8 90Iterators

1 2 3 4 5 6 7 8 90Iterators

74

76

78

80

82

84

86

F-s

core

()

79

80

81

82

83

84

85

86

87

88

89

AUC

()

Figure 9 Co-Training performance curve of graph kernel and tree kernel on the symptom-therapeutic substance test set

Feature kernelGraph kernelTree kernel

Feature kernelGraph kernelTree kernel

74

76

78

80

82

84

86

F-s

core

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

78

80

82

84

86

88

90

AUC

()

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 200Iterators

Figure 10 The results of Tri-Training on the symptom-therapeutic substance test set

BioMed Research International 11

Table 6 The results obtained with Co-Training on the symptom-therapeutic substance test set

Combination View 119875 119877 119865 AUC

Feature kernel and graph kernelFeature kernel 7800 9398 8525 8841Graph kernel 7151 9880 8297 8644Combination 7745 9518 8540 8910

Feature kernel and tree kernelFeature kernel 7872 9357 8551 8851Tree kernel 6713 9759 7954 8175Combination 7751 9679 8566 8861

Graph kernel and tree kernelGraph kernel 7414 9558 8351 8771Tree kernel 6782 9478 7906 8014Combination 7105 9759 8223 8624

Table 7 The results of Tri-Training on symptom-therapeutic sub-stance test set

119875 119877 119865 AUCFeature kernel 7898 9357 8566 8894Graph kernel 7431 9759 8437 8778Tree kernel 6801 9478 7919 8110Method 1 7477 9880 8512 8808Method 2 7562 9839 8551 8913

the symptom-therapeutic substance corpus There are tworeasons for that First on the symptom-therapeutic substancecorpus the classifiers have better performanceTherefore theCo-training and Tri-training algorithms have less room forthe performance improvement Second as the Co-trainingand Tri-training process proceeds more unlabeled data areadded into the training set which could introduce new infor-mation for the classifiers Therefore the recalls of the classi-fiers are improved Meanwhile more noise is also introducedcausing the precision decline For the initial classifiers thehigher the precision is the less the noise is introduced in theiterative process and the performance of the classifier wouldbe improved As a summary if the initial classifiers have bigdifference the performance can be improved through twoalgorithms In the experiment when more unlabeled dataare added to the training set the difference between theclassifiers becomes smallerThus after a number of iterationsperformance could not be improved any more

35 Some Examples for Disease-Symptom and Symptom-Ther-apeutic Substance Relations Extracted from Biomedical Liter-atures Some examples for disease-symptom or symptom-therapeutic substance relations extracted from biomedicalliteratures are shown in Tables 8 and 9 Table 8 shows somesymptoms of disease C0020541 (portal hypertension) Onesentence containing the relation between portal hypertensionand its symptomC0028778 (block) is provided Table 9 showssome relations between the symptom C0028778 (block) andsome therapeutic substances in which the sentences contain-ing the relations are provided

Table 8 Some disease-symptom relations extracted from biomedi-cal literature

Disease Symptom Sentence

C0020541(portalhypertension)

C0028778 (block) C0020541 as C2825142 ofintrahepatic C0028778accounted for 83 of thepatients (C0023891 65meta-C0022346 12) and

C0018920 11

C1565860C0035357C0005775C0014867C0232338

Table 9 Some symptom-therapeutic substance relations extractedfrom biomedical literature

Symptom Therapeuticsubstance Sentence

C0028778(block)

C0017302 (generalanesthetic agents)

Use-dependent conductionC0028778 produced by

volatile C0017302

C0006400(bupivacaine)

Epidural ropivacaine isknown to produce less

motor C0028778 comparedto C0006400 at anaesthetic

concentrations

C0053241(benzoquinone)

In contrast C0053241 andhydroquinone led to

g2-C0028778 rather than toa mitotic arrest

4 Conclusions and Future Work

Models for extracting the relations between the disease-symptom and symptom-therapeutic substance are importantfor further extracting knowledge about diseases and theirpotential therapeutic substances However currently there isno corpus available to train such models To solve the prob-lem we firstmanually annotated two training sets for extract-ing the relations Then two semisupervised learning algo-rithms that is Co-Training and Tri-Training are applied toexplore the unlabeled data to boost the performance Exper-imental results show that exploiting the unlabeled data withboth Co-Training and Tri-Training algorithms can enhance

12 BioMed Research International

the performance In particular through employing threeclassifiers Tri-training is facilitated with good efficiency andgeneralization ability since it could gracefully choose exam-ples to label and use multiple classifiers to compose the finalhypothesis [28] In addition its applicability is wide because itneither requires sufficient and redundant views nor puts anyconstraint on the employed supervised learning algorithm

In the future work we will study more effective semisu-pervised learningmethods to exploit the numerous unlabeleddata pieces in the biomedical literature On the other handwewill apply the disease-symptom and symptom-therapeuticsubstance models to extract the relations between diseasesand therapeutic substances from biomedical literature andpredict the potential therapeutic substances for certain dis-eases [41]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

Acknowledgments

This work is supported by the grants from the NaturalScience Foundation of China (nos 61272373 6107009861340020 61572102 and 61572098) Trans-Century TrainingProgram Foundation for the Talents by the Ministry of Edu-cation of China (NCET-13-0084) the Fundamental ResearchFunds for the Central Universities (nos DUT13JB09 andDUT14YQ213) and the Major State Research DevelopmentProgram of China (no 2016YFC0901902)

References

[1] D Hristovski B Peterlin J A Mitchell and S M HumphreyldquoUsing literature-based discovery to identify disease candidategenesrdquo International Journal of Medical Informatics vol 74 no2ndash4 pp 289ndash298 2005

[2] M N Prichard and C Shipman Jr ldquoA three-dimensional modelto analyze drug-drug interactionsrdquo Antiviral Research vol 14no 4-5 pp 181ndash205 1990

[3] Q-C Bui S Katrenko and PM A Sloot ldquoA hybrid approach toextract protein-protein interactionsrdquo Bioinformatics vol 27 no2 pp 259ndash265 2011

[4] M Krallinger F Leitner C Rodriguez-Penagos and A Valen-cia ldquoOverview of the protein-protein interaction annotationextraction task of BioCreative IIrdquo Genome Biology vol 9supplement 2 article S4 2008

[5] I Segura Bedmar P Martinez and D Sanchez Cisneros ldquoThe1st DDIExtraction-2011 challenge task extraction of Drug-Drug Interactions from biomedical textsrdquo in Proceedings ofthe 1st Challenge task on Drug-Drug Interaction Extraction(DDIExtraction rsquo11) pp 1ndash9 Huelva Spain September 2011

[6] I Segura-Bedmar P Martınez andM Herrero-Zazo SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from Biomed-ical Texts (DDIExtraction 2013) Association for ComputationalLinguistics 2013

[7] M Krallinger and A Valencia ldquoText-mining and information-retrieval services formolecular biologyrdquoGenome Biology vol 6no 7 article 224 2005

[8] T-K Jenssen A Laeliggreid J Komorowski and E Hovig ldquoAliterature network of human genes for high-throughput analysisof gene expressionrdquo Nature Genetics vol 28 no 1 pp 21ndash282001

[9] C Blaschke M A Andrade C Ouzounis and A ValencialdquoAutomatic extraction of biological information from scientifictext protein-protein interactionsrdquo in Proceedings of the Inter-national Conference on Intelligent Systems for Molecular Biology(ISMB rsquo99) pp 60ndash67 1999

[10] P Zweigenbaum D Demner-Fushman H Yu and K BCohen ldquoFrontiers of biomedical text mining current progressrdquoBriefings in Bioinformatics vol 8 no 5 pp 358ndash375 2007

[11] Y T Yen B Chen H W Chiu Y C Lee Y C Li and C YHsu ldquoDeveloping anNLP and IR-based algorithm for analyzinggene-disease relationshipsrdquoMethods of Information inMedicinevol 45 no 3 pp 321ndash329 2006

[12] M Huang X Zhu Y Hao D G Payan K Qu and M LildquoDiscovering patterns to extract proteinndashprotein interactionsfrom full textsrdquo Bioinformatics vol 20 no 18 pp 3604ndash36122004

[13] I Tsochantaridis T Hofmann T Joachims and Y AltunldquoSupport vector machine learning for interdependent andstructured output spacesrdquo inProceedings of the 21st InternationalConference on Machine Learning (ICML rsquo04) pp 823ndash830Alberta Canada July 2004

[14] J Xiao J Su G Zhou et al ldquoProtein-protein interaction extrac-tion a supervised learning approachrdquo in Proceedings of the 1stInternational Symposium on Semantic Mining in Biomedicinepp 51ndash59 Hinxton UK April 2005

[15] L A Nielsen ldquoExtracting protein-protein interactions usingsimple contextual featuresrdquo in Proceedings of the HLT-NAACLBioNLPWorkshop on Linking Natural Language and Biology pp120ndash121 ACM June 2006

[16] A Airola S Pyysalo J Bjorne T Pahikkala F Ginter and TSalakoski ldquoAll-paths graph kernel for protein-protein interac-tion extraction with evaluation of cross-corpus learningrdquo BMCBioinformatics vol 9 no 11 article 52 2008

[17] L Qian and G Zhou ldquoTree kernel-based proteinndashproteininteraction extraction from biomedical literaturerdquo Journal ofBiomedical Informatics vol 45 no 3 pp 535ndash543 2012

[18] S Kim J Yoon J Yang and S Park ldquoWalk-weighted subse-quence kernels for protein-protein interaction extractionrdquoBMCBioinformatics vol 11 no 1 article 107 2010

[19] D J Miller and H S Uyar ldquoA mixture of experts classifierwith learning based on both labelled and unlabelled datardquo inAdvances in Neural Information Processing Systems pp 571ndash5771997

[20] T Joachims ldquoTransductive inference for text classificationusing support vector machinesrdquo in Proceedings of the 16thInternational Conference on Machine Learning (ICML rsquo99) pp200ndash209 1999

[21] X Zhu Z Ghahramani and J Lafferty ldquoSemi-supervised learn-ing using gaussian fields and harmonic functionsrdquo in Proceed-ings of the 20th International Conference on Machine Learning(ICML rsquo03) vol 3 pp 912ndash919 Washington DC USA August2003

[22] A Blum and T Mitchell ldquoCombining labeled and unlabeleddata with co-trainingrdquo in Proceedings of the 11th Annual Confer-ence on Computational LearningTheory (COLT rsquo98) pp 92ndash100ACM 1998

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 11: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

BioMed Research International 11

Table 6 The results obtained with Co-Training on the symptom-therapeutic substance test set

Combination View 119875 119877 119865 AUC

Feature kernel and graph kernelFeature kernel 7800 9398 8525 8841Graph kernel 7151 9880 8297 8644Combination 7745 9518 8540 8910

Feature kernel and tree kernelFeature kernel 7872 9357 8551 8851Tree kernel 6713 9759 7954 8175Combination 7751 9679 8566 8861

Graph kernel and tree kernelGraph kernel 7414 9558 8351 8771Tree kernel 6782 9478 7906 8014Combination 7105 9759 8223 8624

Table 7 The results of Tri-Training on symptom-therapeutic sub-stance test set

119875 119877 119865 AUCFeature kernel 7898 9357 8566 8894Graph kernel 7431 9759 8437 8778Tree kernel 6801 9478 7919 8110Method 1 7477 9880 8512 8808Method 2 7562 9839 8551 8913

the symptom-therapeutic substance corpus There are tworeasons for that First on the symptom-therapeutic substancecorpus the classifiers have better performanceTherefore theCo-training and Tri-training algorithms have less room forthe performance improvement Second as the Co-trainingand Tri-training process proceeds more unlabeled data areadded into the training set which could introduce new infor-mation for the classifiers Therefore the recalls of the classi-fiers are improved Meanwhile more noise is also introducedcausing the precision decline For the initial classifiers thehigher the precision is the less the noise is introduced in theiterative process and the performance of the classifier wouldbe improved As a summary if the initial classifiers have bigdifference the performance can be improved through twoalgorithms In the experiment when more unlabeled dataare added to the training set the difference between theclassifiers becomes smallerThus after a number of iterationsperformance could not be improved any more

35 Some Examples for Disease-Symptom and Symptom-Ther-apeutic Substance Relations Extracted from Biomedical Liter-atures Some examples for disease-symptom or symptom-therapeutic substance relations extracted from biomedicalliteratures are shown in Tables 8 and 9 Table 8 shows somesymptoms of disease C0020541 (portal hypertension) Onesentence containing the relation between portal hypertensionand its symptomC0028778 (block) is provided Table 9 showssome relations between the symptom C0028778 (block) andsome therapeutic substances in which the sentences contain-ing the relations are provided

Table 8 Some disease-symptom relations extracted from biomedi-cal literature

Disease Symptom Sentence

C0020541(portalhypertension)

C0028778 (block) C0020541 as C2825142 ofintrahepatic C0028778accounted for 83 of thepatients (C0023891 65meta-C0022346 12) and

C0018920 11

C1565860C0035357C0005775C0014867C0232338

Table 9 Some symptom-therapeutic substance relations extractedfrom biomedical literature

Symptom Therapeuticsubstance Sentence

C0028778(block)

C0017302 (generalanesthetic agents)

Use-dependent conductionC0028778 produced by

volatile C0017302

C0006400(bupivacaine)

Epidural ropivacaine isknown to produce less

motor C0028778 comparedto C0006400 at anaesthetic

concentrations

C0053241(benzoquinone)

In contrast C0053241 andhydroquinone led to

g2-C0028778 rather than toa mitotic arrest

4 Conclusions and Future Work

Models for extracting the relations between the disease-symptom and symptom-therapeutic substance are importantfor further extracting knowledge about diseases and theirpotential therapeutic substances However currently there isno corpus available to train such models To solve the prob-lem we firstmanually annotated two training sets for extract-ing the relations Then two semisupervised learning algo-rithms that is Co-Training and Tri-Training are applied toexplore the unlabeled data to boost the performance Exper-imental results show that exploiting the unlabeled data withboth Co-Training and Tri-Training algorithms can enhance

12 BioMed Research International

the performance In particular through employing threeclassifiers Tri-training is facilitated with good efficiency andgeneralization ability since it could gracefully choose exam-ples to label and use multiple classifiers to compose the finalhypothesis [28] In addition its applicability is wide because itneither requires sufficient and redundant views nor puts anyconstraint on the employed supervised learning algorithm

In the future work we will study more effective semisu-pervised learningmethods to exploit the numerous unlabeleddata pieces in the biomedical literature On the other handwewill apply the disease-symptom and symptom-therapeuticsubstance models to extract the relations between diseasesand therapeutic substances from biomedical literature andpredict the potential therapeutic substances for certain dis-eases [41]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

Acknowledgments

This work is supported by the grants from the NaturalScience Foundation of China (nos 61272373 6107009861340020 61572102 and 61572098) Trans-Century TrainingProgram Foundation for the Talents by the Ministry of Edu-cation of China (NCET-13-0084) the Fundamental ResearchFunds for the Central Universities (nos DUT13JB09 andDUT14YQ213) and the Major State Research DevelopmentProgram of China (no 2016YFC0901902)

References

[1] D Hristovski B Peterlin J A Mitchell and S M HumphreyldquoUsing literature-based discovery to identify disease candidategenesrdquo International Journal of Medical Informatics vol 74 no2ndash4 pp 289ndash298 2005

[2] M N Prichard and C Shipman Jr ldquoA three-dimensional modelto analyze drug-drug interactionsrdquo Antiviral Research vol 14no 4-5 pp 181ndash205 1990

[3] Q-C Bui S Katrenko and PM A Sloot ldquoA hybrid approach toextract protein-protein interactionsrdquo Bioinformatics vol 27 no2 pp 259ndash265 2011

[4] M Krallinger F Leitner C Rodriguez-Penagos and A Valen-cia ldquoOverview of the protein-protein interaction annotationextraction task of BioCreative IIrdquo Genome Biology vol 9supplement 2 article S4 2008

[5] I Segura Bedmar P Martinez and D Sanchez Cisneros ldquoThe1st DDIExtraction-2011 challenge task extraction of Drug-Drug Interactions from biomedical textsrdquo in Proceedings ofthe 1st Challenge task on Drug-Drug Interaction Extraction(DDIExtraction rsquo11) pp 1ndash9 Huelva Spain September 2011

[6] I Segura-Bedmar P Martınez andM Herrero-Zazo SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from Biomed-ical Texts (DDIExtraction 2013) Association for ComputationalLinguistics 2013

[7] M Krallinger and A Valencia ldquoText-mining and information-retrieval services formolecular biologyrdquoGenome Biology vol 6no 7 article 224 2005

[8] T-K Jenssen A Laeliggreid J Komorowski and E Hovig ldquoAliterature network of human genes for high-throughput analysisof gene expressionrdquo Nature Genetics vol 28 no 1 pp 21ndash282001

[9] C Blaschke M A Andrade C Ouzounis and A ValencialdquoAutomatic extraction of biological information from scientifictext protein-protein interactionsrdquo in Proceedings of the Inter-national Conference on Intelligent Systems for Molecular Biology(ISMB rsquo99) pp 60ndash67 1999

[10] P Zweigenbaum D Demner-Fushman H Yu and K BCohen ldquoFrontiers of biomedical text mining current progressrdquoBriefings in Bioinformatics vol 8 no 5 pp 358ndash375 2007

[11] Y T Yen B Chen H W Chiu Y C Lee Y C Li and C YHsu ldquoDeveloping anNLP and IR-based algorithm for analyzinggene-disease relationshipsrdquoMethods of Information inMedicinevol 45 no 3 pp 321ndash329 2006

[12] M Huang X Zhu Y Hao D G Payan K Qu and M LildquoDiscovering patterns to extract proteinndashprotein interactionsfrom full textsrdquo Bioinformatics vol 20 no 18 pp 3604ndash36122004

[13] I Tsochantaridis T Hofmann T Joachims and Y AltunldquoSupport vector machine learning for interdependent andstructured output spacesrdquo inProceedings of the 21st InternationalConference on Machine Learning (ICML rsquo04) pp 823ndash830Alberta Canada July 2004

[14] J Xiao J Su G Zhou et al ldquoProtein-protein interaction extrac-tion a supervised learning approachrdquo in Proceedings of the 1stInternational Symposium on Semantic Mining in Biomedicinepp 51ndash59 Hinxton UK April 2005

[15] L A Nielsen ldquoExtracting protein-protein interactions usingsimple contextual featuresrdquo in Proceedings of the HLT-NAACLBioNLPWorkshop on Linking Natural Language and Biology pp120ndash121 ACM June 2006

[16] A Airola S Pyysalo J Bjorne T Pahikkala F Ginter and TSalakoski ldquoAll-paths graph kernel for protein-protein interac-tion extraction with evaluation of cross-corpus learningrdquo BMCBioinformatics vol 9 no 11 article 52 2008

[17] L Qian and G Zhou ldquoTree kernel-based proteinndashproteininteraction extraction from biomedical literaturerdquo Journal ofBiomedical Informatics vol 45 no 3 pp 535ndash543 2012

[18] S Kim J Yoon J Yang and S Park ldquoWalk-weighted subse-quence kernels for protein-protein interaction extractionrdquoBMCBioinformatics vol 11 no 1 article 107 2010

[19] D J Miller and H S Uyar ldquoA mixture of experts classifierwith learning based on both labelled and unlabelled datardquo inAdvances in Neural Information Processing Systems pp 571ndash5771997

[20] T Joachims ldquoTransductive inference for text classificationusing support vector machinesrdquo in Proceedings of the 16thInternational Conference on Machine Learning (ICML rsquo99) pp200ndash209 1999

[21] X Zhu Z Ghahramani and J Lafferty ldquoSemi-supervised learn-ing using gaussian fields and harmonic functionsrdquo in Proceed-ings of the 20th International Conference on Machine Learning(ICML rsquo03) vol 3 pp 912ndash919 Washington DC USA August2003

[22] A Blum and T Mitchell ldquoCombining labeled and unlabeleddata with co-trainingrdquo in Proceedings of the 11th Annual Confer-ence on Computational LearningTheory (COLT rsquo98) pp 92ndash100ACM 1998

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 12: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

12 BioMed Research International

the performance In particular through employing threeclassifiers Tri-training is facilitated with good efficiency andgeneralization ability since it could gracefully choose exam-ples to label and use multiple classifiers to compose the finalhypothesis [28] In addition its applicability is wide because itneither requires sufficient and redundant views nor puts anyconstraint on the employed supervised learning algorithm

In the future work we will study more effective semisu-pervised learningmethods to exploit the numerous unlabeleddata pieces in the biomedical literature On the other handwewill apply the disease-symptom and symptom-therapeuticsubstance models to extract the relations between diseasesand therapeutic substances from biomedical literature andpredict the potential therapeutic substances for certain dis-eases [41]

Competing Interests

The authors declare that there is no conflict of interestsregarding the publication of this article

Acknowledgments

This work is supported by the grants from the NaturalScience Foundation of China (nos 61272373 6107009861340020 61572102 and 61572098) Trans-Century TrainingProgram Foundation for the Talents by the Ministry of Edu-cation of China (NCET-13-0084) the Fundamental ResearchFunds for the Central Universities (nos DUT13JB09 andDUT14YQ213) and the Major State Research DevelopmentProgram of China (no 2016YFC0901902)

References

[1] D Hristovski B Peterlin J A Mitchell and S M HumphreyldquoUsing literature-based discovery to identify disease candidategenesrdquo International Journal of Medical Informatics vol 74 no2ndash4 pp 289ndash298 2005

[2] M N Prichard and C Shipman Jr ldquoA three-dimensional modelto analyze drug-drug interactionsrdquo Antiviral Research vol 14no 4-5 pp 181ndash205 1990

[3] Q-C Bui S Katrenko and PM A Sloot ldquoA hybrid approach toextract protein-protein interactionsrdquo Bioinformatics vol 27 no2 pp 259ndash265 2011

[4] M Krallinger F Leitner C Rodriguez-Penagos and A Valen-cia ldquoOverview of the protein-protein interaction annotationextraction task of BioCreative IIrdquo Genome Biology vol 9supplement 2 article S4 2008

[5] I Segura Bedmar P Martinez and D Sanchez Cisneros ldquoThe1st DDIExtraction-2011 challenge task extraction of Drug-Drug Interactions from biomedical textsrdquo in Proceedings ofthe 1st Challenge task on Drug-Drug Interaction Extraction(DDIExtraction rsquo11) pp 1ndash9 Huelva Spain September 2011

[6] I Segura-Bedmar P Martınez andM Herrero-Zazo SemEval-2013 Task 9 Extraction of Drug-Drug Interactions from Biomed-ical Texts (DDIExtraction 2013) Association for ComputationalLinguistics 2013

[7] M Krallinger and A Valencia ldquoText-mining and information-retrieval services formolecular biologyrdquoGenome Biology vol 6no 7 article 224 2005

[8] T-K Jenssen A Laeliggreid J Komorowski and E Hovig ldquoAliterature network of human genes for high-throughput analysisof gene expressionrdquo Nature Genetics vol 28 no 1 pp 21ndash282001

[9] C Blaschke M A Andrade C Ouzounis and A ValencialdquoAutomatic extraction of biological information from scientifictext protein-protein interactionsrdquo in Proceedings of the Inter-national Conference on Intelligent Systems for Molecular Biology(ISMB rsquo99) pp 60ndash67 1999

[10] P Zweigenbaum D Demner-Fushman H Yu and K BCohen ldquoFrontiers of biomedical text mining current progressrdquoBriefings in Bioinformatics vol 8 no 5 pp 358ndash375 2007

[11] Y T Yen B Chen H W Chiu Y C Lee Y C Li and C YHsu ldquoDeveloping anNLP and IR-based algorithm for analyzinggene-disease relationshipsrdquoMethods of Information inMedicinevol 45 no 3 pp 321ndash329 2006

[12] M Huang X Zhu Y Hao D G Payan K Qu and M LildquoDiscovering patterns to extract proteinndashprotein interactionsfrom full textsrdquo Bioinformatics vol 20 no 18 pp 3604ndash36122004

[13] I Tsochantaridis T Hofmann T Joachims and Y AltunldquoSupport vector machine learning for interdependent andstructured output spacesrdquo inProceedings of the 21st InternationalConference on Machine Learning (ICML rsquo04) pp 823ndash830Alberta Canada July 2004

[14] J Xiao J Su G Zhou et al ldquoProtein-protein interaction extrac-tion a supervised learning approachrdquo in Proceedings of the 1stInternational Symposium on Semantic Mining in Biomedicinepp 51ndash59 Hinxton UK April 2005

[15] L A Nielsen ldquoExtracting protein-protein interactions usingsimple contextual featuresrdquo in Proceedings of the HLT-NAACLBioNLPWorkshop on Linking Natural Language and Biology pp120ndash121 ACM June 2006

[16] A Airola S Pyysalo J Bjorne T Pahikkala F Ginter and TSalakoski ldquoAll-paths graph kernel for protein-protein interac-tion extraction with evaluation of cross-corpus learningrdquo BMCBioinformatics vol 9 no 11 article 52 2008

[17] L Qian and G Zhou ldquoTree kernel-based proteinndashproteininteraction extraction from biomedical literaturerdquo Journal ofBiomedical Informatics vol 45 no 3 pp 535ndash543 2012

[18] S Kim J Yoon J Yang and S Park ldquoWalk-weighted subse-quence kernels for protein-protein interaction extractionrdquoBMCBioinformatics vol 11 no 1 article 107 2010

[19] D J Miller and H S Uyar ldquoA mixture of experts classifierwith learning based on both labelled and unlabelled datardquo inAdvances in Neural Information Processing Systems pp 571ndash5771997

[20] T Joachims ldquoTransductive inference for text classificationusing support vector machinesrdquo in Proceedings of the 16thInternational Conference on Machine Learning (ICML rsquo99) pp200ndash209 1999

[21] X Zhu Z Ghahramani and J Lafferty ldquoSemi-supervised learn-ing using gaussian fields and harmonic functionsrdquo in Proceed-ings of the 20th International Conference on Machine Learning(ICML rsquo03) vol 3 pp 912ndash919 Washington DC USA August2003

[22] A Blum and T Mitchell ldquoCombining labeled and unlabeleddata with co-trainingrdquo in Proceedings of the 11th Annual Confer-ence on Computational LearningTheory (COLT rsquo98) pp 92ndash100ACM 1998

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 13: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

BioMed Research International 13

[23] WWang and Z H Zhou ldquoCo-training with insufficient viewsrdquoin Proceedings of the Asian Conference onMachine Learning pp467ndash482 2013

[24] W Wang and Z-H Zhou ldquoA new analysis of co-trainingrdquo inProceedings of the 27th International Conference on MachineLearning (ICML rsquo10) pp 1135ndash1142 June 2010

[25] W Wang and H Zhou Z ldquoAnalyzing co-training style algo-rithmsrdquo in Machine Learning ECML 2007 vol 4701 of LectureNotes in Computer Science pp 454ndash465 Springer BerlinGermany 2007

[26] D Pierce and C Cardie ldquoLimitations of co-training for naturallanguage learning from large datasetsrdquo in Proceedings of the2001 Conference on Empirical Methods in Natural LanguageProcessing pp 1ndash9 Pittsburgh Pa USA 2001

[27] S Kiritchenko and S Matwin ldquoEmail classification with co-trainingrdquo in Proceedings of the Conference of the Center forAdvanced Studies on Collaborative Research (CASCON rsquo01) pp301ndash312 IBM Toronto Canada November 2011

[28] Z-H Zhou and M Li ldquoTri-training exploiting unlabeled datausing three classifiersrdquo IEEE Transactions on Knowledge andData Engineering vol 17 no 11 pp 1529ndash1541 2005

[29] D Mavroeidis K Chaidos S Pirillos et al ldquoUsing tri-trainingand support vector machines for addressing the ECMLPKDD2006 discovery challengerdquo in Proceedings of the ECML-PKDDDiscovery Challenge Workshop pp 39ndash47 Berlin Germany2006

[30] L Breiman ldquoBagging predictorsrdquoMachine Learning vol 24 no2 pp 123ndash140 1996

[31] R E Schapire ldquoThe strength of weak learnabilityrdquo MachineLearning vol 5 no 2 pp 197ndash227 1990

[32] Y Dang Y Zhang and H Chen ldquoA lexicon-enhanced methodfor sentiment classification an experiment on online productreviewsrdquo IEEE Intelligent Systems vol 25 no 4 pp 46ndash53 2010

[33] Y Yang and J O Pedersen ldquoA comparative study on featureselection in text categorizationrdquo in Proceedings of the 14thInternational Conference on Machine Learning (ICML rsquo97) vol97 pp 412ndash420 Morgan Kaufmann San Mateo Calif USA1997

[34] D Klein and C D Manning ldquoAccurate unlexicalized parsingrdquoin Proceedings of the the 41st Annual Meeting on Association forComputational Linguistics vol 1 pp 423ndash430 Association forComputational Linguistics Sapporo Japan July 2003

[35] M Zhang J Zhang J Su et al ldquoA composite kernel toextract relations between entities with both flat and structuredfeaturesrdquo in Proceedings of the 21st International Conference onComputational Linguistics and the 44th Annual Meeting of theAssociation for Computational Linguistics pp 825ndash832 Associ-ation for Computational Linguistics 2006

[36] National Library of Medicine ldquoSemantic MEDLINE Databaserdquohttpskr3nlmnihgovSemMedDB

[37] T C Rindflesch and M Fiszman ldquoThe interaction of domainknowledge and linguistic structure in natural language process-ing interpreting hypernymic propositions in biomedical textrdquoJournal of Biomedical Informatics vol 36 no 6 pp 462ndash4772003

[38] J Carletta ldquoAssessing agreement on classification tasks thekappa statisticrdquo Computational Linguistics vol 22 no 2 pp248ndash254 1996

[39] J A Hanley and B J McNeil ldquoThe meaning and use of thearea under a receiver operating characteristic (ROC) curverdquoRadiology vol 143 no 1 pp 29ndash36 1982

[40] A P Bradley ldquoThe use of the area under the ROC curve inthe evaluation of machine learning algorithmsrdquo Pattern Recog-nition vol 30 no 7 pp 1145ndash1159 1997

[41] D R Swanson ldquoFish oil Raynaudrsquos syndrome and undiscov-ered public knowledgerdquo Perspectives in Biology and Medicinevol 30 no 1 pp 7ndash18 1986

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology

Page 14: Research Article Semisupervised Learning Based Disease …downloads.hindawi.com/journals/bmri/2016/3594937.pdf · 2019. 7. 30. · models for extracting the relations between the

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Molecular Biology International

GenomicsInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioinformaticsAdvances in

Marine BiologyJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Signal TransductionJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

BioMed Research International

Evolutionary BiologyInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Genetics Research International

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Enzyme Research

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Microbiology