jakub dutkiewicz , czesław ję[email protected]; {jakub.dutkiewicz,...

10
POZNAN Contribution to TREC PM 2018 (Notebook Paper) Jakub Dutkiewicz 2 , Czesław Jędrzejek 2 Artur Cieślewicz 1 1 Department of Clinical Pharmacology, Poznan University of Medical Sciences, Poznan, Poland 2 IARiII, Poznan University of Technology, Poznan, Poland [email protected]; {jakub.dutkiewicz, czeslaw.jedrzejek}@put.poznan.pl Abstract. This work describes the medical information retrieval systems designed by the Poznan Consortium for TREC PM, personalized medicine track, which was submitted to the TREC 2018. The baseline is the Terrier BB2. For Clinical Trials this work uses the following options: Terrier BB2_simple_noprf: query without extension (sq_nprf) Terrier BB2_simple_w2v_prf: query extended with w2v and terrier prf (sqw2vprf) Terrier BB2 _variant_noprf: query + gene variant; without extension (vq_noprf) BB2_variant_w2v_prf: query + gene variant extended with w2v and terrier prf (vqw2vprf) SQ_results: experimental search of the sql database using keywords from query (sq) Our best result is vq_noprf which is significantly better (approximately 0.08 for evaluated measures). With suitable word2method the results are better compared to query extension using Mesh and disease taxonomy. For Medline abstracts the documents are placed in the SQL database. For each document templates of diseases, genes, and applicability as Precision Medicine vs research objects are matched. Patterns are saved using regular expressions. The pattern association with the document is stored in the SQL database. For Medline abstracts our results are little worse than the TREC 2018 median. 1 Introduction The TREC PM 2018 [TREC PM. 2018], is following three previous Clinical Decision Support Track contests and in particular, the TREC PM 2017 [Roberts et al., 2017] . Its aim was the retrieval of biomedical articles relevant for answering generic clinical questions about medical records. The 2018 track focuses on an important use case in clinical decision support: providing useful precision medicine-related information to clinicians treating cancer patients. The 2018 track largely repeats the structure and evaluation of the PM 2017 track. TREC 2018 Precision Medicine Track developed Relevance Judgment Guidelines [Relevance Judgment, 2017], the more specific rules compared to TREC CDS to qualify an answer as definitely relevant. 1.1 Source and target documents and rules There are two target document collections for the Precision Medicine track: scientific abstracts (a January 2017 snapshot of PubMed abstracts plus from AACR and ASCO proceedings targeted toward cancer therapy) and clinical trials (an April 2017 snapshot of ClinicalTrials.gov). Although we submitted runs in both areas, our main concentration was clinical trials task. Topics’ description for clinical trials was structured information. Each topic has four primary fields: Disease, Gene (Variant), and Demographic. For instance: <topics task="2018 TREC Precision Medicine"> <topic number="1"> <disease>melanoma</disease> <gene>BRAF (V600E)</gene> <demographic>64-year-old male</demographic> </topic>

Upload: others

Post on 22-Jan-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Jakub Dutkiewicz , Czesław Jędrzejekartcies@ump.edu.pl; {jakub.dutkiewicz, czeslaw.jedrzejek}@put.poznan.pl Abstract. This work describes the medical information retrieval systems

POZNAN Contribution to TREC PM 2018 (Notebook Paper)

Jakub Dutkiewicz2, Czesław Jędrzejek2 Artur Cieślewicz1

1 Department of Clinical Pharmacology, Poznan University of Medical Sciences, Poznan, Poland

2 IARiII, Poznan University of Technology, Poznan, Poland

[email protected]; {jakub.dutkiewicz, czeslaw.jedrzejek}@put.poznan.pl

Abstract. This work describes the medical information retrieval systems designed by the Poznan Consortium for TREC PM,

personalized medicine track, which was submitted to the TREC 2018. The baseline is the Terrier BB2. For Clinical Trials this work

uses the following options:

• Terrier BB2_simple_noprf: query without extension (sq_nprf)

• Terrier BB2_simple_w2v_prf: query extended with w2v and terrier prf (sqw2vprf)

• Terrier BB2 _variant_noprf: query + gene variant; without extension (vq_noprf)

• BB2_variant_w2v_prf: query + gene variant extended with w2v and terrier prf (vqw2vprf)

• SQ_results: experimental search of the sql database using keywords from query (sq)

Our best result is vq_noprf which is significantly better (approximately 0.08 for evaluated measures). With suitable word2method

the results are better compared to query extension using Mesh and disease taxonomy. For Medline abstracts the documents are

placed in the SQL database. For each document templates of diseases, genes, and applicability as Precision Medicine vs research

objects are matched. Patterns are saved using regular expressions. The pattern association with the document is stored in the SQL

database. For Medline abstracts our results are little worse than the TREC 2018 median.

1 Introduction

The TREC PM 2018 [TREC PM. 2018], is following three previous Clinical Decision Support Track contests and in particular,

the TREC PM 2017 [Roberts et al., 2017] . Its aim was the retrieval of biomedical articles relevant for answering generic clinical

questions about medical records. The 2018 track focuses on an important use case in clinical decision support: providing useful

precision medicine-related information to clinicians treating cancer patients. The 2018 track largely repeats the structure and

evaluation of the PM 2017 track. TREC 2018 Precision Medicine Track developed Relevance Judgment Guidelines [Relevance

Judgment, 2017], the more specific rules compared to TREC CDS to qualify an answer as definitely relevant.

1.1 Source and target documents and rules

There are two target document collections for the Precision Medicine track: scientific abstracts (a January 2017 snapshot of

PubMed abstracts plus from AACR and ASCO proceedings targeted toward cancer therapy) and clinical trials (an April 2017

snapshot of ClinicalTrials.gov). Although we submitted runs in both areas, our main concentration was clinical trials task. Topics’

description for clinical trials was structured information. Each topic has four primary fields: Disease, Gene (Variant), and

Demographic. For instance:

<topics task="2018 TREC Precision Medicine">

<topic number="1">

<disease>melanoma</disease>

<gene>BRAF (V600E)</gene>

<demographic>64-year-old male</demographic>

</topic>

Page 2: Jakub Dutkiewicz , Czesław Jędrzejekartcies@ump.edu.pl; {jakub.dutkiewicz, czeslaw.jedrzejek}@put.poznan.pl Abstract. This work describes the medical information retrieval systems

This structured information allows for a precise definition of „Definitely Relevant” answer: a result should have Disease in {Exact,

More General, More Specific categories}, at least one Gene is Exact, and both Demographic and Other are in. For some topics the

gene field contains a specification of the gene action or indirect gene function (Table 1).

Table 1 Additional/alternative gene functions.

Gene function Topic number

amplification 7, 11, 14

loss of function 5, 17, 23, 24

truncation 15

(extensive) tumor infiltrating lymphocytes 21, 22

high tumor mutational burden 20

rearrangement 16

tumor cells with >50% membranous PD-L1 expression 18

tumor cells negative for PD-L1 expression 19

high serum LDH levels 25

The “(extensive) tumor infiltrating lymphocytes” description can be only attributed to genes, though it is not direct. E.g. the

correlation between high TIL levels and improved clinical outcome was most prominent in triple-negative breast cancer (TNBC).

<topic number="18">

<disease>melanoma</disease>

<gene>tumor cells with >50% membranous PD-L1 expression</gene>

<demographic>48-year-old female</demographic>

</topic>

<topic number="16">

<disease>melanoma</disease>

<gene>NTRK1 rearrangement</gene>

<demographic>60-year-old male</demographic>

</topic>

<topic number="20">

<disease>melanoma</disease>

<gene>high tumor mutational burden</gene>

<demographic>86-year-old female</demographic>

</topic>

<topic number="24">

<disease>melanoma</disease>

<gene>APC loss of function</gene>

<demographic>47-year-old male</demographic>

</topic>

<topic number="31">

<disease>head and neck squamous cell carcinoma</disease>

<gene>CDKN2A</gene>

<demographic>64-year-old male</demographic>

</topic>

Topics 1-6 and 8-13 were characterized by a gene variant. The average of the best TREC PM results as provided by organizers for

them is 0.80 compared with the same result for the rest of topics which is 0.70.

1.2 Experience

The Poznan Consortium team for TREC PM consists of contributors of two institutions: Department of Clinical Pharmacology,

Poznan University of Medical Sciences, and Information Systems and Technologies Division, IARiII, Poznan University of

Page 3: Jakub Dutkiewicz , Czesław Jędrzejekartcies@ump.edu.pl; {jakub.dutkiewicz, czeslaw.jedrzejek}@put.poznan.pl Abstract. This work describes the medical information retrieval systems

Technology. We participated in TREC CDS 2016 track [Dutkiewicz et al, 2017], TREC PM 2017 track [Cieslewicz et al. , 2018a]

and in bioCADDIE 2016 [Cieslewicz et al. , 2018b]. In this work we use two type of embedding: designed for biomedical

applications [Chiu et al., 2016] and classical word embedding [Mikolov et al., 2013a] on the Pubmed abstracts corpus. We did not

preprocess the target Open Access Subset of PubMed Central (PMC).

2 Related work

2.1 Baseline system

Our experience shows that for biomedical systems Terrier 4.2 is among the best for baseline systems. The best performing depending

were 1) BB2 (Bernoulli-Einstein model with Bernoulli after-effect and normalization [Amati, van Rijsbergen, 2012], also denoted

as “DPH + Bo1) [Dutkiewicz, 2017], 2) LGD [Clinchant, Gaussier, 2010] (a log-logistic model for information retrieval)

[Cieslewicz, 2018b]. Another valuable feature implemented in Terrier is pseudo relevance feedback query expansion (PRF) – a

mechanism allowing for extraction of n most informative terms from m top ranked documents (ranking created in the first search

run) which are then added to the original query in the second retrieval rank. Terrier provides both parameter free (Bose-Einstein 1,

Bose-Einstein 2, Kullback-Leibler) and parameterized (Rocchio) models for query expansion [Rocchio, 1971]. Rocchio feedback

approach was developed using the Vector Space Model. The modified vectors are moved in a direction closer or farther away, from

the original query depending on whether documents are related or non-related.

2.2 Query expansion

Expanding queries by adding potentially relevant terms is a common practice in improving relevance in IR systems. There are

many methods of query expansion. Relevance feedback takes the documents on top of a ranking list and adds terms appearing in

these document to a new query. In this work we use the idea to add synonyms and other similar terms to query terms before the

pseudo-relevance feedback. This type of expansion can be divided into two categories. The first category involves the use of

ontologies or lexicons (relational knowledge). In biomedical area UMLS, MeSH [MeSH, 2018], SNOMED-CT, ICD-10, WordNet,

and Wikipedia are used [JaiswaL, 2016]. Generally, the result of lexicon type of expansion is positive. The second category

is word embedding (WE). This belongs to a class of distributional semantics, feature learning techniques in natural language

processing. One can draw experience on effect of using lexicons from other semantic task areas.

For natural language queries requiring an answer using multiple choice, relational learning using dictionaries encompassing the

whole corpus gives always better results than pure word embedding (word2vec). However, having synonym dictionaries (flat

structure) can significantly improve the word2vec results.

3 Retrieval methodology setup

Page 4: Jakub Dutkiewicz , Czesław Jędrzejekartcies@ump.edu.pl; {jakub.dutkiewicz, czeslaw.jedrzejek}@put.poznan.pl Abstract. This work describes the medical information retrieval systems

Figure 3. Flowchart of the system developed for TREC 2018 PM track.

3.1 Methodology developed for Medline abstracts

1. Data indexing.

For the set of medical abstract documents, we perform regular expression based feature extraction. We look for

documents which relate to various genes, diseases, drugs which may relate to diseases within the query set or

gene variants, by applying the dedicated pattern to the document. Regex patterns relate to the features and

contain a variety of syntactic irregularities, such as:

- -delimiter between the main part of the gene/gene variant name and a number,

- -various letter case options,

- various gene/gene variant/disease/drug synonyms.

Exact value of a feature is calculated as number of the regular expression matches divided by the document

length (in words).

2. Query construction.

Queries are translated into SQL SELECT statements. The generated statements are:

a. Select documents which are classified as related to a given disease

b. Select documents which contain information about a given gene

Page 5: Jakub Dutkiewicz , Czesław Jędrzejekartcies@ump.edu.pl; {jakub.dutkiewicz, czeslaw.jedrzejek}@put.poznan.pl Abstract. This work describes the medical information retrieval systems

c. Select documents which contain information about a given variant

d. Select a specific value of a feature for a given document

3. Information retrieval

We perform a single run "boolean" on the set. A document is taken into relevant document list only if it contains

at least one gene name named in the query and it contains one of the disease name synonyms. We assume that a

document, which relates to a disease is a Precision Medicine document by default. We rank these documents

which contain correct gene variant higher than documents which miss the gene variant and higher than documents

with incorrect gene variant. Further ranking is calculated upon the values of the related features - the higher the

combined sum of features is, the higher ranking score the document receives.

3.2 Methodology developed for Clinical Trials data

1. Data indexing.

The content of clinical trials documents was indexed as follows:

- tag DOCNO: the value of nct_id tag

- tag <TEXT>: the values of tags brief_title, official_title, brief_summary, detailed_description,

primary_outcome, secondary_outcome, condition, arm_group, condition_browse, intervention_browse,

keyword; additionally, inclusion criteria were extracted from criteria tag

2. Query construction.

Three types of queries were constructed:

1. simple query: find documents with any term from <disease> tag or any gene name from <gene> tag;

gene variants were not taken into consideration

2. variant query: find documents with any term from <disease> tag or any term from <gene> tag; gene

variants were taken into consideration

3. SQL query: browse data from SQL database to find records that have any term from <disease> tag and

any term from <gene> tag; gene variants were not taken into consideration

Queries 1) and 2 were also expanded in a following manner: terms from simple or variant query were expanded

using 2 vector space models (word2vec): first calculated by us on the basis of bioCADDIE corpus; second

calculated by [Chiu , et al., 2016] on the text corpus based on PubMed citations and full-text articles

As a result, five queries were constructed for each topic.

3. Information retrieval

4 terrier runs were carried out, using BB2 as ranking function:

1. BB2_simple_noprf: simple query was put as input for terrier

2. BB2_simple_w2v_prf: simple query expanded with word2vec was put as input for terrier; additional

expansion was carried out using terrier pseudo relevance feedback (PRF)

3. BB2_variant_noprf: variant query was put as input for terrier

Page 6: Jakub Dutkiewicz , Czesław Jędrzejekartcies@ump.edu.pl; {jakub.dutkiewicz, czeslaw.jedrzejek}@put.poznan.pl Abstract. This work describes the medical information retrieval systems

4. BB2_variant_w2v_prf: variant query expanded with word2vec was put as input for terrier; additional

expansion was carried out using terrier PRF

The last run (SQ_results) was performer as follows:

5. data from <nct_id>, <brief_title>, <official_title>, <brief_summary>, <detailed_description>,

<keyword>, <condition_browse>, and <study_type> xml tags of each document was put into SQL

database; inclusion criteria were also extracted from <criteria> tag and were placed in the database in

“inclusion” field for each topic, an SQL query (described in “Query construction” section) was

executed obtain results were then ranked, according to the number of occurrences of disease, gene

name and gene variant terms in each record fields; documents describing international studies were

granted additional points to total score

All generated results were then checked according to the value of <demographic> tag of each topic. Resulting

documents that were describing trials recruiting patients with inappropriate age or gender were removed from

the result set.

4 Results

In this section we go through the outcome of every retrieval setup implemented by our group and applied to the competition data

sets. We compare our results to median and best of the PM submissions. Finally, we discuss the best application for each setup. For

the evaluation we show measures that were used by TREC PM evaluators for abstracts and clinical trials.

4.1 Results for Medline abstracts

In this category we submitted one run using Boolean search. Our average result is little worse than TREC PM median.

Figure 4.1 infNDCG results for Medline abstracts for individual topics and median averages (ours and the TREC PM 2018).

4.2 Results for Clinical Trials data

In this category we submitted 5 runs, all using BB2 as a baseline. In TREC PM 2017 we used the LGD option that was the best for

bioCADDIE 2016 challenge. For the purpose of this work, having qrels for the last year TRC PM we verified that BB2 method

beats the LGD one.

The results generated for Clinical trials data have shown that using Pseudo Relevance Feedback (PRF) to expand the query had

negative impact for almost all measures (see the Table 4.1). Terrier PRF was configured to expand queries with 10 terms from top

2 documents. Observed worsening of the results could be explained with the fact that PRF was carried out before documents

describing trials recruiting patients with inappropriate age or gender were removed from the result set. Another aspect worth

Page 7: Jakub Dutkiewicz , Czesław Jędrzejekartcies@ump.edu.pl; {jakub.dutkiewicz, czeslaw.jedrzejek}@put.poznan.pl Abstract. This work describes the medical information retrieval systems

considering is that Clinical trials information retrieval required finding the documents describing clinical trials for which the patient

described in the query is eligible for the recruitment. PRF, by adding additional terms to the query, could improve the score of not

relevant documents.

The best results were obtained with word embedding using both vectors obtained by [Chiu et al., 2016] and by our classical

[Mikolov, et al., 2013] type of calculations with vector cosine measure = 0.875.

Green color indicates the results better than 0.5, red color the results worse than 0.5 than median results.

Table 4.1 Results for various options for the provided Clinical Trials data runs (intensity of these colors proportional to divergence

from the medians).

topic sq_nprf vq_noprf vqw2vprf sqw2vprf sq trec_best trec_median 1 0.613 0.806 0.626 0.6239 0.7373 0.8489 0.6516

2 0.7232 0.7913 0.7334 0.7295 0.7724 0.9054 0.7724

3 0.6989 0.7247 0.699 0.7119 0.6665 0.8424 0.6937

4 0.3416 0.3416 0.3416 0.3231 0.2157 0.4584 0.2583

5 0.859 0.9239 0.8797 0.8593 0.8885 0.9645 0.8084

6 0.7031 0.7531 0.774 0.6924 0.7488 0.8685 0.6651

7 0.8326 0.8565 0.8646 0.8323 0.7893 0.9652 0.7421

8 0.7013 0.7013 0.7013 0.5796 0.4353 1.0942 0.4944

9 0.1323 0.1323 0.1323 0.2658 0.1385 0.2658 0.1396

10 0.7195 0.7195 0.7195 0.6899 0.6886 0.9078 0.7135

11 0.8052 0.9154 0.8104 0.7709 0.8272 0.9291 0.7746

12 0.9055 0.9055 0.9055 0.8497 0.935 0.9627 0.8837

13 0.8697 0.8697 0.8697 0.848 0.8141 0.8941 0.7536

14 0.7874 0.7256 0.7266 0.7506 0.6935 0.8067 0.6667

15 0.0895 0.0893 0.0875 0.0411 0 0.2744 0.0518

16 0 0 0 0 0 0.5 0

17 0.2215 0.2617 0.259 0.1949 0.2962 0.608 0.2754

18 0.1249 0.0555 0 0 0.4544 0.6743 0.26

19 0.242 0.2635 0.021 0 0.4082 0.6679 0.2635

20 0 0 0 0 0 0.5027 0.0337

21 0.0193 0.6864 0 0.017 0.6035 0.8145 0.4292

22 0.0116 0.4929 0 0.0108 0.3634 0.5291 0.3016

23 0.4168 0.4363 0.4321 0.4326 0.3356 0.6817 0.417

24 0 0 0 0 0 0.6309 0

25 0.0781 0.0708 0.0708 0.0868 0 0.2934 0.0708

26 0.7522 0.7522 0.7517 0.743 0.7809 0.8929 0.7112

27 0.7732 0.7732 0.7601 0.8193 0.2841 0.9098 0.6276

28 0.4738 0.4738 0.4738 0.4774 0.4579 0.7743 0.5339

29 0.5055 0.5055 0.5021 0.5001 0.3899 0.7313 0.3151

30 0.8085 0.8085 0.8089 0.8373 0.6184 0.904 0.7763

31 0 0 0 0 0 0.469 0.0746

32 0.1631 0.1631 0.1504 0.4484 0.0826 0.9194 0.234

33 0.522 0.522 0.5204 0.5209 0.5029 0.7171 0.3435

34 0.4972 0.4972 0.5633 0.4693 0.4453 0.7263 0.3654

35 0.5798 0.5798 0.5798 0.3468 0 0.842 0.1505

36 0.0896 0.0896 0.0896 0.082 0.1232 0.7636 0.0887

37 0.0188 0.0188 0.0252 0 0.0503 0.6119 0.2384

38 0.5231 0.5231 0.5231 0.5558 0.553 0.6993 0.5231

39 0.6131 0.6131 0.2519 0.2034 0.6262 0.868 0.5486

40 0.7283 0.7283 0.7283 0.7201 0.7705 0.8546 0.6206

41 0.7082 0.7082 0.6903 0.8744 0 0.8744 0.389

42 0.363 0.363 0.363 0.363 0.363 0.7735 0.363

43 0.6545 0.6545 0.6546 0.6223 0.6793 0.8021 0.5773

44 0.0779 0.0779 0.0779 0.0779 0.0922 0.9987 0.4119

45 0.7108 0.7108 0.7095 0.6527 0.5548 1.1734 0.5996

46 0.6196 0.6196 0.6196 0.6139 0.5447 0.7885 0.4801

Page 8: Jakub Dutkiewicz , Czesław Jędrzejekartcies@ump.edu.pl; {jakub.dutkiewicz, czeslaw.jedrzejek}@put.poznan.pl Abstract. This work describes the medical information retrieval systems

47 0.727 0.727 0.727 0.7496 0.5802 0.8101 0.55

48 0.4825 0.4825 0.4825 0.571 0.3536 0.6527 0.4727

49 0.1205 0.1205 0.15 0.147 0 0.6131 0

50 0.4342 0.4342 0.4381 0.456 0.5999 0.7341 0.3676

all 0.4568 0.4894 0.4459 0.4432 0.4253 0.755894 0.429668

Figures 4.2-4.4 present results for particular topics compared with the best and median TREC PM 2018 results.

=

Figure 4.2 infNDCG results for Clinical Trials for individual topics and median averages (ours and the TREC PM 2018).

Figure 4.3 R-prec_ct for Clinical Trials for individual topics and median averages (ours and the TREC PM 2018).

sq, 1, 0.5909sq, 2, 0.5556sq, 3, 0.5691

sq, 4, 0.0638

sq, 5, 0.6

sq, 6, 0.5378sq, 7, 0.5424

sq, 8, 0.4815

sq, 9, 0.3125

sq, 10, 0.72sq, 11, 0.7273sq, 12, 0.7083sq, 13, 0.6667

sq, 14, 0.6429

sq, 15, 0sq, 16, 0

sq, 17, 0.1538

sq, 18, 0.3333

sq, 19, 0.25

sq, 20, 0

sq, 21, 0.6212

sq, 22, 0.4186sq, 23, 0.4231

sq, 24, 0sq, 25, 0

sq, 26, 0.8108

sq, 27, 0.2

sq, 28, 0.1111

sq, 29, 0.3846

sq, 30, 0.5

sq, 31, 0

sq, 32, 0.1455

sq, 33, 0.3649sq, 34, 0.4

sq, 35, 0

sq, 36, 0.1081

sq, 37, 0.0154

sq, 38, 0.5455

sq, 39, 0.4706

sq, 40, 0.5255

sq, 41, 0

sq, 42, 0.2

sq, 43, 0.6

sq, 44, 0.0283

sq, 45, 0.3913

sq, 46, 0.5sq, 47, 0.4894

sq, 48, 0.25

sq, 49, 0

sq, 50, 0.45

sq, all, 0.3482

vqw2vprf, 1, 0.5545

vqw2vprf, 2, 0.6349

vqw2vprf, 3, 0.6911

vqw2vprf, 4, 0.3404

vqw2vprf, 5, 0.5917

vqw2vprf, 6, 0.6303

vqw2vprf, 7, 0.6525vqw2vprf, 8,

0.5926

vqw2vprf, 9, 0.375

vqw2vprf, 10, 0.84

vqw2vprf, 11, 0.7273

vqw2vprf, 12, 0.7917

vqw2vprf, 13, 0.8148

vqw2vprf, 14, 0.8214

vqw2vprf, 15, 0vqw2vprf, 16, 0

vqw2vprf, 17, 0.1538

vqw2vprf, 18, 0

vqw2vprf, 19, 0.0312

vqw2vprf, 20, 0vqw2vprf, 21, 0vqw2vprf, 22, 0

vqw2vprf, 23, 0.3846

vqw2vprf, 24, 0vqw2vprf, 25, 0

vqw2vprf, 26, 0.6757

vqw2vprf, 27, 0.6

vqw2vprf, 28, 0.4444vqw2vprf, 29,

0.3846

vqw2vprf, 30, 0.6818

vqw2vprf, 31, 0

vqw2vprf, 32, 0.1636

vqw2vprf, 33, 0.4595

vqw2vprf, 34, 0.4

vqw2vprf, 35, 0.5

vqw2vprf, 36, 0.0541vqw2vprf, 37,

0.0154

vqw2vprf, 38, 0.5455

vqw2vprf, 39, 0.2353

vqw2vprf, 40, 0.5255

vqw2vprf, 41, 0.4286

vqw2vprf, 42, 0.2

vqw2vprf, 43, 0.6

vqw2vprf, 44, 0.0189

vqw2vprf, 45, 0.5217

vqw2vprf, 46, 0.5

vqw2vprf, 47, 0.6596

vqw2vprf, 48, 0.25

vqw2vprf, 49, 0

vqw2vprf, 50, 0.5

vqw2vprf, all, 0.3798

sqw2vprf, 1, 0.5636

sqw2vprf, 2, 0.627

sqw2vprf, 3, 0.6911

sqw2vprf, 4, 0.3404

sqw2vprf, 5, 0.625

sqw2vprf, 6, 0.6387

sqw2vprf, 7, 0.678

sqw2vprf, 8, 0.5556

sqw2vprf, 9, 0.375

sqw2vprf, 10, 0.8sqw2vprf, 11,

0.7273sqw2vprf, 12, 0.75

sqw2vprf, 13, 0.8148

sqw2vprf, 14, 0.8571

sqw2vprf, 15, 0sqw2vprf, 16, 0

sqw2vprf, 17, 0.1154

sqw2vprf, 18, 0sqw2vprf, 19, 0sqw2vprf, 20, 0

sqw2vprf, 21, 0.0152sqw2vprf, 22, 0

sqw2vprf, 23, 0.3846

sqw2vprf, 24, 0sqw2vprf, 25, 0

sqw2vprf, 26, 0.5946

sqw2vprf, 27, 0.56

sqw2vprf, 28, 0.4444

sqw2vprf, 29, 0.4176

sqw2vprf, 30, 0.6818

sqw2vprf, 31, 0

sqw2vprf, 32, 0.2545

sqw2vprf, 33, 0.4459

sqw2vprf, 34, 0.2sqw2vprf, 35, 0.25

sqw2vprf, 36, 0.0811

sqw2vprf, 37, 0

sqw2vprf, 38, 0.5455

sqw2vprf, 39, 0.2353

sqw2vprf, 40, 0.5182

sqw2vprf, 41, 0.7857

sqw2vprf, 42, 0.2

sqw2vprf, 43, 0.4

sqw2vprf, 44, 0.0189

sqw2vprf, 45, 0.4783

sqw2vprf, 46, 0.5

sqw2vprf, 47, 0.6596

sqw2vprf, 48, 0.25

sqw2vprf, 49, 0

sqw2vprf, 50, 0.6

sqw2vprf, all, 0.3736

vq_noprf, 1, 0.6273

vq_noprf, 2, 0.6587

vq_noprf, 3, 0.6911

vq_noprf, 4, 0.3404

vq_noprf, 5, 0.6833

vq_noprf, 6, 0.605

vq_noprf, 7, 0.661vq_noprf, 8, 0.5926

vq_noprf, 9, 0.375

vq_noprf, 10, 0.84vq_noprf, 11,

0.8182vq_noprf, 12,

0.7917

vq_noprf, 13, 0.8148

vq_noprf, 14, 0.8214

vq_noprf, 15, 0vq_noprf, 16, 0

vq_noprf, 17, 0.1538

vq_noprf, 18, 0.0606

vq_noprf, 19, 0.0625

vq_noprf, 20, 0

vq_noprf, 21, 0.6212

vq_noprf, 22, 0.5349

vq_noprf, 23, 0.3846

vq_noprf, 24, 0vq_noprf, 25, 0

vq_noprf, 26, 0.6757

vq_noprf, 27, 0.6

vq_noprf, 28, 0.4444vq_noprf, 29,

0.3956

vq_noprf, 30, 0.6818

vq_noprf, 31, 0

vq_noprf, 32, 0.1636

vq_noprf, 33, 0.4595

vq_noprf, 34, 0.2

vq_noprf, 35, 0.5

vq_noprf, 36, 0.0541

vq_noprf, 37, 0.0308

vq_noprf, 38, 0.5455

vq_noprf, 39, 0.4118

vq_noprf, 40, 0.5255

vq_noprf, 41, 0.4286

vq_noprf, 42, 0.2

vq_noprf, 43, 0.6

vq_noprf, 44, 0.0189

vq_noprf, 45, 0.5217

vq_noprf, 46, 0.5

vq_noprf, 47, 0.6596

vq_noprf, 48, 0.25

vq_noprf, 49, 0

vq_noprf, 50, 0.5vq_noprf, all,

0.4101

sq_nprf, 1, 0.5545

sq_nprf, 2, 0.6349

sq_nprf, 3, 0.6911

sq_nprf, 4, 0.3404

sq_nprf, 5, 0.6sq_nprf, 6, 0.6303

sq_nprf, 7, 0.678

sq_nprf, 8, 0.5926

sq_nprf, 9, 0.375

sq_nprf, 10, 0.84sq_nprf, 11,

0.8182sq_nprf, 12,

0.7917

sq_nprf, 13, 0.8148

sq_nprf, 14, 0.8214

sq_nprf, 15, 0sq_nprf, 16, 0

sq_nprf, 17, 0.1538

sq_nprf, 18, 0.0606

sq_nprf, 19, 0.0625

sq_nprf, 20, 0

sq_nprf, 21, 0.0303

sq_nprf, 22, 0

sq_nprf, 23, 0.3846

sq_nprf, 24, 0sq_nprf, 25, 0

sq_nprf, 26, 0.6757

sq_nprf, 27, 0.6

sq_nprf, 28, 0.4444sq_nprf, 29,

0.3956

sq_nprf, 30, 0.6818

sq_nprf, 31, 0

sq_nprf, 32, 0.1636

sq_nprf, 33, 0.4595

sq_nprf, 34, 0.2

sq_nprf, 35, 0.5

sq_nprf, 36, 0.0541

sq_nprf, 37, 0.0308

sq_nprf, 38, 0.5455

sq_nprf, 39, 0.4118

sq_nprf, 40, 0.5255

sq_nprf, 41, 0.4286

sq_nprf, 42, 0.2

sq_nprf, 43, 0.6

sq_nprf, 44, 0.0189

sq_nprf, 45, 0.5217

sq_nprf, 46, 0.5

sq_nprf, 47, 0.6596

sq_nprf, 48, 0.25

sq_nprf, 49, 0

sq_nprf, 50, 0.5

sq_nprf, all, 0.3848

trec_best, 1, 0.6727

trec_best, 2, 0.6905

trec_best, 3, 0.6992

trec_best, 4, 0.3404

trec_best, 5, 0.6833trec_best, 6,

0.6387

trec_best, 7, 0.678

trec_best, 8, 0.6667

trec_best, 9, 0.375

trec_best, 10, 0.88

trec_best, 11, 0.8636

trec_best, 12, 0.875trec_best, 13,

0.8148

trec_best, 14, 0.8571

trec_best, 15, 0.2667

trec_best, 16, 0

trec_best, 17, 0.5385trec_best, 18,

0.4848trec_best, 19, 0.4375

trec_best, 20, 0.25

trec_best, 21, 0.6667

trec_best, 22, 0.5581

trec_best, 23, 0.5769

trec_best, 24, 0

trec_best, 25, 0.25

trec_best, 26, 0.8108

trec_best, 27, 0.72trec_best, 28, 0.6667

trec_best, 29, 0.4835

trec_best, 30, 0.7727

trec_best, 31, 0.3636

trec_best, 32, 0.4727

trec_best, 33, 0.5405

trec_best, 34, 0.6

trec_best, 35, 0.5

trec_best, 36, 0.5135

trec_best, 37, 0.5846

trec_best, 38, 0.7273

trec_best, 39, 0.6471

trec_best, 40, 0.5547

trec_best, 41, 0.7857

trec_best, 42, 0.6

trec_best, 43, 0.7trec_best, 44,

0.6038trec_best, 45, 0.5652

trec_best, 46, 0.6111

trec_best, 47, 0.6596

trec_best, 48, 0.4375

trec_best, 49, 0.5

trec_best, 50, 0.65

trec_best, all, 0.576696trec_median, 1,

0.5455

trec_median, 2, 0.5873

trec_median, 3, 0.6098

trec_median, 4, 0.2128

trec_median, 5, 0.575trec_median, 6,

0.5042

trec_median, 7, 0.5424

trec_median, 8, 0.4444

trec_median, 9, 0.25

trec_median, 10, 0.68trec_median, 11, 0.6364

trec_median, 12, 0.7083trec_median, 13,

0.6667trec_median, 14, 0.6071

trec_median, 15, 0

trec_median, 16, 0

trec_median, 17, 0.2308

trec_median, 18, 0.1515

trec_median, 19, 0.1562

trec_median, 20, 0

trec_median, 21, 0.4091

trec_median, 22, 0.2791

trec_median, 23, 0.3462

trec_median, 24, 0

trec_median, 25, 0

trec_median, 26, 0.5676

trec_median, 27, 0.48trec_median, 28, 0.4444

trec_median, 29, 0.2308

trec_median, 30, 0.6364

trec_median, 31, 0

trec_median, 32, 0.1636

trec_median, 33, 0.2703trec_median, 34,

0.2

trec_median, 35, 0

trec_median, 36, 0.0811

trec_median, 37, 0.1692

trec_median, 38, 0.4545

trec_median, 39, 0.2353

trec_median, 40, 0.3504

trec_median, 41, 0.1429

trec_median, 42, 0.2

trec_median, 43, 0.5

trec_median, 44, 0.2358

trec_median, 45, 0.3913

trec_median, 46, 0.4444

trec_median, 47, 0.4468

trec_median, 48, 0.25

trec_median, 49, 0

trec_median, 50, 0.3

trec_median, all, 0.326752

infNDCG_ct

sq vqw2vprf sqw2vprf vq_noprf sq_nprf trec_best trec_median

sq, 1, 0.5909sq, 2, 0.5556sq, 3, 0.5691

sq, 4, 0.0638

sq, 5, 0.6

sq, 6, 0.5378sq, 7, 0.5424

sq, 8, 0.4815

sq, 9, 0.3125

sq, 10, 0.72sq, 11, 0.7273sq, 12, 0.7083sq, 13, 0.6667

sq, 14, 0.6429

sq, 15, 0sq, 16, 0

sq, 17, 0.1538

sq, 18, 0.3333

sq, 19, 0.25

sq, 20, 0

sq, 21, 0.6212

sq, 22, 0.4186sq, 23, 0.4231

sq, 24, 0sq, 25, 0

sq, 26, 0.8108

sq, 27, 0.2

sq, 28, 0.1111

sq, 29, 0.3846

sq, 30, 0.5

sq, 31, 0

sq, 32, 0.1455

sq, 33, 0.3649sq, 34, 0.4

sq, 35, 0

sq, 36, 0.1081

sq, 37, 0.0154

sq, 38, 0.5455

sq, 39, 0.4706sq, 40, 0.5255

sq, 41, 0

sq, 42, 0.2

sq, 43, 0.6

sq, 44, 0.0283

sq, 45, 0.3913

sq, 46, 0.5sq, 47, 0.4894

sq, 48, 0.25

sq, 49, 0

sq, 50, 0.45

sq, all, 0.3482

vqw2vprf, 1, 0.5545

vqw2vprf, 2, 0.6349

vqw2vprf, 3, 0.6911

vqw2vprf, 4, 0.3404

vqw2vprf, 5, 0.5917

vqw2vprf, 6, 0.6303

vqw2vprf, 7, 0.6525vqw2vprf, 8,

0.5926

vqw2vprf, 9, 0.375

vqw2vprf, 10, 0.84vqw2vprf, 11,

0.7273

vqw2vprf, 12, 0.7917

vqw2vprf, 13, 0.8148

vqw2vprf, 14, 0.8214

vqw2vprf, 15, 0vqw2vprf, 16, 0

vqw2vprf, 17, 0.1538

vqw2vprf, 18, 0

vqw2vprf, 19, 0.0312

vqw2vprf, 20, 0vqw2vprf, 21, 0vqw2vprf, 22, 0

vqw2vprf, 23, 0.3846

vqw2vprf, 24, 0vqw2vprf, 25, 0

vqw2vprf, 26, 0.6757

vqw2vprf, 27, 0.6

vqw2vprf, 28, 0.4444vqw2vprf, 29,

0.3846

vqw2vprf, 30, 0.6818

vqw2vprf, 31, 0

vqw2vprf, 32, 0.1636

vqw2vprf, 33, 0.4595

vqw2vprf, 34, 0.4

vqw2vprf, 35, 0.5

vqw2vprf, 36, 0.0541vqw2vprf, 37,

0.0154

vqw2vprf, 38, 0.5455

vqw2vprf, 39, 0.2353

vqw2vprf, 40, 0.5255

vqw2vprf, 41, 0.4286

vqw2vprf, 42, 0.2

vqw2vprf, 43, 0.6

vqw2vprf, 44, 0.0189

vqw2vprf, 45, 0.5217vqw2vprf, 46, 0.5

vqw2vprf, 47, 0.6596

vqw2vprf, 48, 0.25

vqw2vprf, 49, 0

vqw2vprf, 50, 0.5

vqw2vprf, all, 0.3798

sqw2vprf, 1, 0.5636

sqw2vprf, 2, 0.627

sqw2vprf, 3, 0.6911

sqw2vprf, 4, 0.3404

sqw2vprf, 5, 0.625

sqw2vprf, 6, 0.6387

sqw2vprf, 7, 0.678

sqw2vprf, 8, 0.5556

sqw2vprf, 9, 0.375

sqw2vprf, 10, 0.8sqw2vprf, 11, 0.7273sqw2vprf, 12, 0.75

sqw2vprf, 13, 0.8148

sqw2vprf, 14, 0.8571

sqw2vprf, 15, 0sqw2vprf, 16, 0

sqw2vprf, 17, 0.1154

sqw2vprf, 18, 0sqw2vprf, 19, 0sqw2vprf, 20, 0

sqw2vprf, 21, 0.0152sqw2vprf, 22, 0

sqw2vprf, 23, 0.3846

sqw2vprf, 24, 0sqw2vprf, 25, 0

sqw2vprf, 26, 0.5946

sqw2vprf, 27, 0.56

sqw2vprf, 28, 0.4444

sqw2vprf, 29, 0.4176

sqw2vprf, 30, 0.6818

sqw2vprf, 31, 0

sqw2vprf, 32, 0.2545

sqw2vprf, 33, 0.4459

sqw2vprf, 34, 0.2sqw2vprf, 35, 0.25

sqw2vprf, 36, 0.0811

sqw2vprf, 37, 0

sqw2vprf, 38, 0.5455

sqw2vprf, 39, 0.2353

sqw2vprf, 40, 0.5182

sqw2vprf, 41, 0.7857

sqw2vprf, 42, 0.2

sqw2vprf, 43, 0.4

sqw2vprf, 44, 0.0189

sqw2vprf, 45, 0.4783sqw2vprf, 46, 0.5

sqw2vprf, 47, 0.6596

sqw2vprf, 48, 0.25

sqw2vprf, 49, 0

sqw2vprf, 50, 0.6

sqw2vprf, all, 0.3736

vq_noprf, 1, 0.6273

vq_noprf, 2, 0.6587

vq_noprf, 3, 0.6911

vq_noprf, 4, 0.3404

vq_noprf, 5, 0.6833

vq_noprf, 6, 0.605vq_noprf, 7, 0.661vq_noprf, 8,

0.5926

vq_noprf, 9, 0.375

vq_noprf, 10, 0.84vq_noprf, 11,

0.8182vq_noprf, 12,

0.7917

vq_noprf, 13, 0.8148

vq_noprf, 14, 0.8214

vq_noprf, 15, 0vq_noprf, 16, 0

vq_noprf, 17, 0.1538

vq_noprf, 18, 0.0606

vq_noprf, 19, 0.0625

vq_noprf, 20, 0

vq_noprf, 21, 0.6212

vq_noprf, 22, 0.5349

vq_noprf, 23, 0.3846

vq_noprf, 24, 0vq_noprf, 25, 0

vq_noprf, 26, 0.6757

vq_noprf, 27, 0.6

vq_noprf, 28, 0.4444vq_noprf, 29,

0.3956

vq_noprf, 30, 0.6818

vq_noprf, 31, 0

vq_noprf, 32, 0.1636

vq_noprf, 33, 0.4595

vq_noprf, 34, 0.2

vq_noprf, 35, 0.5

vq_noprf, 36, 0.0541

vq_noprf, 37, 0.0308

vq_noprf, 38, 0.5455

vq_noprf, 39, 0.4118

vq_noprf, 40, 0.5255

vq_noprf, 41, 0.4286

vq_noprf, 42, 0.2

vq_noprf, 43, 0.6

vq_noprf, 44, 0.0189

vq_noprf, 45, 0.5217vq_noprf, 46, 0.5

vq_noprf, 47, 0.6596

vq_noprf, 48, 0.25

vq_noprf, 49, 0

vq_noprf, 50, 0.5vq_noprf, all,

0.4101

sq_nprf, 1, 0.5545

sq_nprf, 2, 0.6349sq_nprf, 3, 0.6911

sq_nprf, 4, 0.3404

sq_nprf, 5, 0.6sq_nprf, 6, 0.6303

sq_nprf, 7, 0.678

sq_nprf, 8, 0.5926

sq_nprf, 9, 0.375

sq_nprf, 10, 0.84sq_nprf, 11,

0.8182sq_nprf, 12,

0.7917

sq_nprf, 13, 0.8148

sq_nprf, 14, 0.8214

sq_nprf, 15, 0sq_nprf, 16, 0

sq_nprf, 17, 0.1538

sq_nprf, 18, 0.0606

sq_nprf, 19, 0.0625

sq_nprf, 20, 0

sq_nprf, 21, 0.0303

sq_nprf, 22, 0

sq_nprf, 23, 0.3846

sq_nprf, 24, 0sq_nprf, 25, 0

sq_nprf, 26, 0.6757

sq_nprf, 27, 0.6

sq_nprf, 28, 0.4444sq_nprf, 29,

0.3956

sq_nprf, 30, 0.6818

sq_nprf, 31, 0

sq_nprf, 32, 0.1636

sq_nprf, 33, 0.4595

sq_nprf, 34, 0.2

sq_nprf, 35, 0.5

sq_nprf, 36, 0.0541

sq_nprf, 37, 0.0308

sq_nprf, 38, 0.5455

sq_nprf, 39, 0.4118

sq_nprf, 40, 0.5255

sq_nprf, 41, 0.4286

sq_nprf, 42, 0.2

sq_nprf, 43, 0.6

sq_nprf, 44, 0.0189

sq_nprf, 45, 0.5217sq_nprf, 46, 0.5

sq_nprf, 47, 0.6596

sq_nprf, 48, 0.25

sq_nprf, 49, 0

sq_nprf, 50, 0.5

sq_nprf, all, 0.3848

trec_best, 1, 0.6727

trec_best, 2, 0.6905

trec_best, 3, 0.6992

trec_best, 4, 0.3404

trec_best, 5, 0.6833trec_best, 6,

0.6387trec_best, 7, 0.678

trec_best, 8, 0.6667

trec_best, 9, 0.375

trec_best, 10, 0.88trec_best, 11,

0.8636

trec_best, 12, 0.875trec_best, 13,

0.8148

trec_best, 14, 0.8571

trec_best, 15, 0.2667

trec_best, 16, 0

trec_best, 17, 0.5385trec_best, 18,

0.4848trec_best, 19, 0.4375

trec_best, 20, 0.25

trec_best, 21, 0.6667

trec_best, 22, 0.5581

trec_best, 23, 0.5769

trec_best, 24, 0

trec_best, 25, 0.25

trec_best, 26, 0.8108

trec_best, 27, 0.72trec_best, 28, 0.6667

trec_best, 29, 0.4835

trec_best, 30, 0.7727

trec_best, 31, 0.3636

trec_best, 32, 0.4727

trec_best, 33, 0.5405

trec_best, 34, 0.6

trec_best, 35, 0.5

trec_best, 36, 0.5135

trec_best, 37, 0.5846

trec_best, 38, 0.7273

trec_best, 39, 0.6471

trec_best, 40, 0.5547

trec_best, 41, 0.7857

trec_best, 42, 0.6

trec_best, 43, 0.7trec_best, 44,

0.6038trec_best, 45, 0.5652

trec_best, 46, 0.6111

trec_best, 47, 0.6596

trec_best, 48, 0.4375

trec_best, 49, 0.5

trec_best, 50, 0.65trec_best, all, 0.576696

trec_median, 1, 0.5455

trec_median, 2, 0.5873

trec_median, 3, 0.6098

trec_median, 4, 0.2128

trec_median, 5, 0.575trec_median, 6,

0.5042

trec_median, 7, 0.5424

trec_median, 8, 0.4444

trec_median, 9, 0.25

trec_median, 10, 0.68trec_median, 11, 0.6364

trec_median, 12, 0.7083trec_median, 13,

0.6667trec_median, 14, 0.6071

trec_median, 15, 0

trec_median, 16, 0

trec_median, 17, 0.2308trec_median, 18,

0.1515

trec_median, 19, 0.1562

trec_median, 20, 0

trec_median, 21, 0.4091

trec_median, 22, 0.2791

trec_median, 23, 0.3462

trec_median, 24, 0

trec_median, 25, 0

trec_median, 26, 0.5676

trec_median, 27, 0.48trec_median, 28, 0.4444

trec_median, 29, 0.2308

trec_median, 30, 0.6364

trec_median, 31, 0

trec_median, 32, 0.1636

trec_median, 33, 0.2703trec_median, 34,

0.2

trec_median, 35, 0

trec_median, 36, 0.0811

trec_median, 37, 0.1692

trec_median, 38, 0.4545

trec_median, 39, 0.2353

trec_median, 40, 0.3504

trec_median, 41, 0.1429

trec_median, 42, 0.2

trec_median, 43, 0.5

trec_median, 44, 0.2358

trec_median, 45, 0.3913

trec_median, 46, 0.4444

trec_median, 47, 0.4468

trec_median, 48, 0.25

trec_median, 49, 0

trec_median, 50, 0.3

trec_median, all, 0.326752

R-prec_ct

sq vqw2vprf sqw2vprf vq_noprf sq_nprf trec_best trec_median

Page 9: Jakub Dutkiewicz , Czesław Jędrzejekartcies@ump.edu.pl; {jakub.dutkiewicz, czeslaw.jedrzejek}@put.poznan.pl Abstract. This work describes the medical information retrieval systems

Figure 4.4 P-10_ct for Clinical Trials for individual topics and median averages (ours and the TREC PM 2018).

4 Conclusions and future work

For Medline abstracts our results are little worse than the TREC 2018 median. The method is different from classical

methods, and similar to Boolean searches. The biggest deficiency of our approach for Medline abstracts is lack of

distinction between scientific papers and personalized treatment papers. Presumably this feature explains very good

results of [Goodwin et al., 2017] and [Mahmood et al., 2017] in the TREC PM 2017 track. Another reason could be

extraction of relations used be these two teams [Peng et al., 2016]. For Clinical Trials our best result is vq_noprf which

is significantly better (approximately 0.08 for evaluated measures). With a suitable word2vec method the results are

better compared to query extension using Mesh and disease taxonomies.

References

[Amati, van Rijsbergen, 2012] Amati,G., van Rijsbergen,C. J., (2002) Probabilistic models of information retrieval based on

measuring the divergence from randomness, ACM Trans. Inf. Syst. 20(4): 357-389

[TREC PM. 2018], [ http://www.trec-cds.org/2018.html

[TREC PM, Relevance guidelines, http://www.trec-cds.org/relevance_guidelines.pdf

[Roberts et al, 2017] Kirk Roberts, Dina Demner-Fushman, Ellen M. Voorhees William R. Hersh and Steven Bedrick Alexander J.

Lazar Shubham Pant, Overview of the TREC 2017 Precision Medicine Track

https://trec.nist.gov/pubs/trec26/papers/Overview-PM.pdf

[Chiu , et al., 2016] Chiu,B., Crichton,G., Korhonen,A. et al. (2016) How to train good word embeddings for biomedical NLP. In:

Proceedings of the 5th Workshop on Biomedical Natural Language Processing. Berlin, Germany.

http://www.aclweb.org/anthology/W16-2922.

[Cieslewicz et al, 2017a] Artur Cieslewicz, Jakub Dutkiewicz, Czeslaw Jedrzejek: Baseline and extensions approach to information

retrieval of complex medical data: Poznan's approach to the bioCADDIE 2016. Database 2018: bax103 (2018).

[Cieslewicz et al, 2017b] Artur Cieslewicz, Jakub Dutkiewicz, Czeslaw Jedrzejek:, POZNAN Contribution to TREC PM 2017,

https://trec.nist.gov/pubs/trec26/papers/POZNAN-PM.pdf

[Clinchant, Gaussier,2010] Clinchant,S., Gaussier,E., (2010) Information-based models for ad hoc IR. In SIGIR’10, 234-241.

[Dutkiewicz et al, 2017] Jakub Dutkiewicz, Czesław Jędrzejek, Michał Frąckowiak and Paweł Werda, PUT Contribution to TREC

CDS 2016, https://trec.nist.gov/pubs/trec25/papers/IAII_PUT-CL.pdf

Terrier 5.0 IR Platform www.terrier.org http://terrier.org/docs/v5.0/ accessed July 2, 2018

[Goodwin et al., 2017] Travis R. Goodwin, Michael A. Skinner and Sanda M. Harabagiu UTD HLTRI at TREC 2017: Precision

Medicine Track, https://trec.nist.gov/pubs/trec26/papers/UTDHLTRI-PM.pdf

[Mahmood et al., 2017] A. S. M. Ashique Mahmood, Gang Li , Shruti Rao , Peter McGarvey, Cathy Wu, Subha Madhavan2, K.

Vijay-Shanker UD_GU_BioTM at TREC 2017: Precision Medicine Track

https://trec.nist.gov/pubs/trec26/papers/UD_GU_BioTM-PM.pdf

[Jaiswa et al., 2016] Jaiswal,P., Hoehndorf,R., Cecilia,N. et al. (2016) Proceedings of the Joint International Conference on

Biological Ontology and BioCreative, Corvallis, Oregon, United States. CEUR Workshop

Proceedings 1747, CEUR-WS.org 201.

[Mikolov, et al., 2013] Mikolov, T., Sutskever, I., Chen, K., Corrado, Corrado, G.S., and Dean, (2013b), J. Efficient

Estimation of Word Representations in Vector Space. CoRR abs/1301.3781

[MeSH, 2018], MeSH database: https://www.nlm.nih.gov/mesh/download_mesh.html, Access: July 2, 2018

[Peng et al., 2016] Nanyun Peng, Hoifung Poon, Chris Quirk, Kristina Toutanova, Wen-tau Yih:

Cross-Sentence N-ary Relation Extraction with Graph LSTMs. TACL 5: 101-115 (2017)

Serie1, 1, 0.6

Serie1, 2, 1

Serie1, 3, 0.9

Serie1, 4, 0.5

Serie1, 5, 1Serie1, 6, 1

Serie1, 7, 0.5

Serie1, 8, 0.8

Serie1, 9, 0.5

Serie1, 10, 1

Serie1, 11, 0.9

Serie1, 12, 1Serie1, 13, 1

Serie1, 14, 0.9

Serie1, 15, 0Serie1, 16, 0

Serie1, 17, 0.4

Serie1, 18, 0.1Serie1, 19, 0.1

Serie1, 20, 0Serie1, 21, 0Serie1, 22, 0

Serie1, 23, 0.9

Serie1, 24, 0Serie1, 25, 0

Serie1, 26, 0.9

Serie1, 27, 0.8

Serie1, 28, 0.4

Serie1, 29, 0.9

Serie1, 30, 0.8

Serie1, 31, 0

Serie1, 32, 0.3

Serie1, 33, 1

Serie1, 34, 0.1

Serie1, 35, 0.2Serie1, 36, 0.2

Serie1, 37, 0

Serie1, 38, 0.6Serie1, 39, 0.6

Serie1, 40, 1

Serie1, 41, 0.6

Serie1, 42, 0.1

Serie1, 43, 0.6

Serie1, 44, 0.2

Serie1, 45, 0.7Serie1, 46, 0.7

Serie1, 47, 0.9

Serie1, 48, 0.4

Serie1, 49, 0

Serie1, 50, 0.5Serie1, all, 0.512

Serie2, 1, 0.9

Serie2, 2, 1

Serie2, 3, 0.8

Serie2, 4, 0.5

Serie2, 5, 1

Serie2, 6, 0.9

Serie2, 7, 0.7

Serie2, 8, 0.8

Serie2, 9, 0.5

Serie2, 10, 1

Serie2, 11, 0.9

Serie2, 12, 1Serie2, 13, 1

Serie2, 14, 0.9

Serie2, 15, 0Serie2, 16, 0

Serie2, 17, 0.4

Serie2, 18, 0.1

Serie2, 19, 0.2

Serie2, 20, 0

Serie2, 21, 1

Serie2, 22, 0.9

Serie2, 23, 0.8

Serie2, 24, 0

Serie2, 25, 0.1

Serie2, 26, 0.9

Serie2, 27, 0.8

Serie2, 28, 0.4

Serie2, 29, 0.9

Serie2, 30, 0.8

Serie2, 31, 0

Serie2, 32, 0.3

Serie2, 33, 1

Serie2, 34, 0.1

Serie2, 35, 0.2Serie2, 36, 0.2

Serie2, 37, 0

Serie2, 38, 0.6Serie2, 39, 0.6

Serie2, 40, 1

Serie2, 41, 0.6

Serie2, 42, 0.1

Serie2, 43, 0.6

Serie2, 44, 0.2

Serie2, 45, 0.7Serie2, 46, 0.7

Serie2, 47, 0.9

Serie2, 48, 0.4

Serie2, 49, 0

Serie2, 50, 0.5Serie2, all, 0.558

Serie3, 1, 0.8

Serie3, 2, 1

Serie3, 3, 0.9

Serie3, 4, 0.4

Serie3, 5, 1Serie3, 6, 1

Serie3, 7, 0.7Serie3, 8, 0.7

Serie3, 9, 0.4

Serie3, 10, 0.8Serie3, 11, 0.8

Serie3, 12, 0.7

Serie3, 13, 0.9Serie3, 14, 0.9

Serie3, 15, 0Serie3, 16, 0

Serie3, 17, 0.2

Serie3, 18, 0Serie3, 19, 0Serie3, 20, 0Serie3, 21, 0Serie3, 22, 0

Serie3, 23, 0.9

Serie3, 24, 0Serie3, 25, 0

Serie3, 26, 0.7

Serie3, 27, 0.8

Serie3, 28, 0.4

Serie3, 29, 0.9

Serie3, 30, 0.8

Serie3, 31, 0

Serie3, 32, 0.4

Serie3, 33, 1

Serie3, 34, 0.1Serie3, 35, 0.1Serie3, 36, 0.1

Serie3, 37, 0

Serie3, 38, 0.6

Serie3, 39, 0.3

Serie3, 40, 1Serie3, 41, 1

Serie3, 42, 0.1

Serie3, 43, 0.4

Serie3, 44, 0.2

Serie3, 45, 0.6

Serie3, 46, 0.7

Serie3, 47, 0.9

Serie3, 48, 0.4

Serie3, 49, 0

Serie3, 50, 0.5Serie3, all, 0.482

Serie4, 1, 0.6

Serie4, 2, 1

Serie4, 3, 0.9

Serie4, 4, 0.5

Serie4, 5, 1Serie4, 6, 1

Serie4, 7, 0.7

Serie4, 8, 0.8

Serie4, 9, 0.5

Serie4, 10, 1

Serie4, 11, 0.9

Serie4, 12, 1Serie4, 13, 1

Serie4, 14, 0.9

Serie4, 15, 0Serie4, 16, 0

Serie4, 17, 0.3

Serie4, 18, 0Serie4, 19, 0Serie4, 20, 0Serie4, 21, 0Serie4, 22, 0

Serie4, 23, 0.8

Serie4, 24, 0

Serie4, 25, 0.1

Serie4, 26, 0.9

Serie4, 27, 0.8

Serie4, 28, 0.4

Serie4, 29, 0.9

Serie4, 30, 0.8

Serie4, 31, 0

Serie4, 32, 0.2

Serie4, 33, 1

Serie4, 34, 0.2Serie4, 35, 0.2Serie4, 36, 0.2

Serie4, 37, 0.1

Serie4, 38, 0.6

Serie4, 39, 0.4

Serie4, 40, 1

Serie4, 41, 0.5

Serie4, 42, 0.1

Serie4, 43, 0.6

Serie4, 44, 0.2

Serie4, 45, 0.7Serie4, 46, 0.7

Serie4, 47, 0.9

Serie4, 48, 0.4

Serie4, 49, 0

Serie4, 50, 0.6Serie4, all, 0.508

Serie5, 1, 0.8

Serie5, 2, 1

Serie5, 3, 0.9

Serie5, 4, 0.1

Serie5, 5, 0.9

Serie5, 6, 0.8

Serie5, 7, 0.7

Serie5, 8, 0.4

Serie5, 10, 0.8Serie5, 11, 0.8Serie5, 12, 0.8

Serie5, 13, 0.7

Serie5, 14, 0.8

Serie5, 15, 0Serie5, 16, 0

Serie5, 17, 0.4Serie5, 18, 0.4

Serie5, 19, 0.3

Serie5, 20, 0

Serie5, 21, 1

Serie5, 22, 0.8

Serie5, 23, 0.7

Serie5, 24, 0Serie5, 25, 0

Serie5, 26, 0.9

Serie5, 27, 0.3

Serie5, 28, 0.1

Serie5, 29, 0.9

Serie5, 30, 0.8

Serie5, 31, 0Serie5, 32, 0

Serie5, 33, 0.6

Serie5, 34, 0.2

Serie5, 35, 0

Serie5, 36, 0.3

Serie5, 37, 0.1

Serie5, 38, 0.6Serie5, 39, 0.6

Serie5, 40, 1

Serie5, 41, 0

Serie5, 42, 0.1

Serie5, 43, 0.6

Serie5, 44, 0.3

Serie5, 45, 0.7

Serie5, 46, 0.8Serie5, 47, 0.8

Serie5, 48, 0.4

Serie5, 49, 0

Serie5, 50, 0.5Serie5, all, 0.484

Serie6, 1, 1Serie6, 2, 1Serie6, 3, 1

Serie6, 4, 0.7

Serie6, 5, 1Serie6, 6, 1Serie6, 7, 1

Serie6, 8, 0.8

Serie6, 9, 0.5

Serie6, 10, 1Serie6, 11, 1Serie6, 12, 1Serie6, 13, 1Serie6, 14, 1

Serie6, 15, 0.4

Serie6, 16, 0.1

Serie6, 17, 0.9

Serie6, 18, 0.6Serie6, 19, 0.6

Serie6, 20, 0.2

Serie6, 21, 1

Serie6, 22, 0.9Serie6, 23, 0.9

Serie6, 24, 0.1Serie6, 25, 0.1

Serie6, 26, 1Serie6, 27, 1

Serie6, 28, 0.6

Serie6, 29, 1Serie6, 30, 1

Serie6, 31, 0.4

Serie6, 32, 1Serie6, 33, 1

Serie6, 34, 0.3Serie6, 35, 0.3

Serie6, 36, 0.9

Serie6, 37, 1

Serie6, 38, 0.8Serie6, 39, 0.8

Serie6, 40, 1Serie6, 41, 1

Serie6, 42, 0.3

Serie6, 43, 0.7

Serie6, 44, 1

Serie6, 45, 0.8

Serie6, 46, 0.9

Serie6, 47, 1

Serie6, 48, 0.6

Serie6, 49, 0.1

Serie6, 50, 0.8Serie6, all, 0.762

Serie7, 1, 0.8

Serie7, 2, 1

Serie7, 3, 0.9

Serie7, 4, 0.3

Serie7, 5, 0.9

Serie7, 6, 0.8

Serie7, 7, 0.7

Serie7, 8, 0.6

Serie7, 9, 0.3

Serie7, 10, 0.9

Serie7, 11, 0.8

Serie7, 12, 0.9Serie7, 13, 0.9

Serie7, 14, 0.8

Serie7, 15, 0Serie7, 16, 0

Serie7, 17, 0.3

Serie7, 18, 0.2Serie7, 19, 0.2

Serie7, 20, 0

Serie7, 21, 0.7

Serie7, 22, 0.4

Serie7, 23, 0.5

Serie7, 24, 0Serie7, 25, 0

Serie7, 26, 0.8

Serie7, 27, 0.7

Serie7, 28, 0.4

Serie7, 29, 0.7Serie7, 30, 0.7

Serie7, 31, 0

Serie7, 32, 0.3

Serie7, 33, 0.6

Serie7, 34, 0.1Serie7, 35, 0.1Serie7, 36, 0.1

Serie7, 37, 0.4

Serie7, 38, 0.5

Serie7, 39, 0.4

Serie7, 40, 0.9

Serie7, 41, 0.2

Serie7, 42, 0.1

Serie7, 43, 0.5Serie7, 44, 0.5

Serie7, 45, 0.6Serie7, 46, 0.6Serie7, 47, 0.6

Serie7, 48, 0.3

Serie7, 49, 0

Serie7, 50, 0.4Serie7, all, 0.468

P10_ct

Serie1 Serie2 Serie3 Serie4 Serie5 Serie6 Serie7

Page 10: Jakub Dutkiewicz , Czesław Jędrzejekartcies@ump.edu.pl; {jakub.dutkiewicz, czeslaw.jedrzejek}@put.poznan.pl Abstract. This work describes the medical information retrieval systems

[Rocchio, 1971] Rocchio, J. (1971). Relevance feedback in information retrieval. In G. Salton (Ed.), The SMART retrieval

system—Experiments in automatic document processing (pp. 313–323). New York City, USA: Prentice-Hall.