medline text searching tools – a comparison experiment mcdermott center for human growth and...

15
Medline Text Searching Tools – a Comparison Experiment McDermott Center for Human Growth and Development Center for Biomedical

Upload: clemence-allen

Post on 26-Dec-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Medline Text Searching Tools – a Comparison Experiment

McDermott Center for Human Growth and Development

Center for Biomedical Inventions

Your first biomedical experiment should be on a computer - a literature search and a data search!

Databases and Tools to Exploit Them

• National Center for Biotechnology Information - http://www.ncbi.nlm.nih.gov/

• UTSW Bioinformatics Tools – http://innovation.swmed.edu/

• GeneCards - http://bioinformatics.weizmann.ac.il/cards/

• Genome Data Base - http://www.gdb.org/• US Patent Office - http://www.uspto.gov/• Research Genetics, Stratagene….

Bioinformatics research requires significant hardware and software investment.

• Hardware – ~120 workstations, 3 Apache www servers, Sun servers, 3 HP Enterprise 4/8 CPU servers, 32 CPU Linux cluster, 3 TB RAID 5 storage system, 3D laser scanners, SGI visualization stations, HP visualization stations.

• Software – standard genomics search tools, unique new text data mining tools, 3D analysis tools

• Databases – all major genetics/genomics databases, all of Medline, many custom local database.

Text data mining Genomic Annotation

Gene collectionidentification and analysis

Polymorphism Prediction

A family of bioinformatics tools and their computed databases have been developed.

Applied Computational Biology Toolset.

• POMOUS and SNIDE – polymorphism prediction software.• PANORAMA – A DNA/Protein sequence analysis and

visualization tool.• ARROGANT – A gene/clone collection analysis tool.• eTBLAST, FRISC, TRITE, Text similarity tools• IRIDESCENT – Text data mining tool• ARGH – Acronym resolving software• ELXR – Exon locator and extractor for resequencing.• Microarray BLAST – UTSW BLAST utility for comparison

against EST/cDNA sequences from UTSW microarrays.• MarC-V, Signal, SNPCEQer …….

Text data mining can speed reduction of data to knowledge.

1975 1980 1985 1990 1995 2000

Publications DNA sequences

DNA SequencingInvented

Human GenomeProject Begun Gap

Science (Genome Issue) 15 Oct. 1999

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000

1965 1970 1975 1980 1985 1990 1995

Now over 10 million articles in MEDLINE®

400,000 new articles added each year

Over 1 million are in Genetics and Molecular Biology

Online biomedical literature is growing rapidly…

Most Biomedical results are reported in scientific papers, that are now searchable.

Can you keep up with the literature?

The differences in PubMed and eTBLAST

• PubMed is database driven

• PubMed performs a boolean search into Medline.

• PubMed returns hits sorted by date (and other).

• eTBLAST is currently written in C.

• eTBLAST automatically extracts common words from text and then the remainder are keywords.

• eTBLAST performs a similarity comparison using weighted keywords

• eTBLAST returns his sorted by similarity (and soon other).

Lets research a topic – Wilms’ Tumor

79) Wilms' tumour A childhood nephroblastoma (solid tumour of the kidney)affecting one in 10 000 children, usually appearing within the first five years of life. A susceptibility to Wilms' tumour is associated with inheritance of defects in several different genes, including the TUMOUR SUPPRESSOR GENE WT-1 which maps to chromosome 11p13. Tumours arise from mesenchymal STEM CELLS that would normally differentiate into parts of the nephron. Around 5-10% also contain ectopic tissues such as bone and cartilage. Wilms' tumour also appears as one of the manifestations of the WAGR syndrome - a CONTIGUOUS GENE SYNDROME. The WT-1 gene has been cloned and encodes a zinc finger protein which is presumed to be a TRANSCRIPTION FACTOR.

Entrez is a search and retrieval system that integrates information from databases at NCBI.

FollowThat Link

Our text data mining tools – eTBLAST, FRISC, TRITE

• eTBLAST (2) – similarity comparison engine for electronic text using weighted keywords, concepts and grammar induction. Psi-eTBLAST is iterative. Example use of natural entry process.

• FRISC (2) – using eTBLAST, a UTSW faculty research interests page is checked regularly against new Biomedical abstracts from Medline and ranks to cluster information that best fits interests of researcher.

• TRITE (2) (3) (4) – using eTBLAST, topical interests will be searched regularly against new Biomedical abstracts in Medline.

You are being asked to conduct a series of reference checks on a set of topics. Each student will be given 3 topics to research, out of a total of 120 total different topics. You are asked to research the topics, first, in the standard way using keyword-based searches using PubMed over the web, and then using a new code, eTBLAST, that performs keyword identification and searching automatically. You are asked to research each of the topics, reading their titles, abstracts and any other information to determine the relative sensitivity and selectivity of each of the methods for finding the documents. You will also be asked to compare and contrast the results of the approaches based on other criteria, like speed, user interface, etc.

What you will be doing.

• Please obtain floppy disk, instructions and 3 topic sheets.

• Find a computer, some on North Campus Library, South Campus Library, and the computer training room.

• Fill in the spread sheet with the results of your search on the topic. Then write a brief paragraph in word to compare and contrast the two methods.

• Come back to Lacynda’s office (NA2.504) to return the floppy disk. She will then log the disk in, and open it and verify that the files are there and complete. You are then free to go.

http://innovation.swmed.edu/