system for-health-diagnosis
DESCRIPTION
TRANSCRIPT
Information Retrieval and Extraction (CSE474)
MAJOR PROJECT in
International Institute of Information Technology, Hyderabad
Anand Kalburgi (201305648) Arnav Singh (201305626)
Saumya Pathak (201101141) Vennela Miryala (201125034)
Presented to : Mr. Vasudeva Varma Mentor : Sandeep Sharma
By:
Introduction
Problem Statement : ➢ Given some initial symptoms detect the possible disease efficiently.
Resources: ➢ Webmd and Mayoclinic datasets.
Related Works
➢ WebMD’s symptomChecker online tool
Many similar online healthcare systems like
➢ http://androctor.com
➢ www.mayoclinic.org
System Components
➢ A simple User Interface
➢ A retrieval system that takes symptom as input and
yields possible disease conditions as output.
➢ An extensive INDEX (both forward and inverted) of
diseases vs symptoms.
Architecture
fig. Case Diagram for the execution of the diagnostic system
Challenges
➢ Different data formats of both websites.
➢ Merging the indices.
➢ Stopwords, unwanted weeds and characters.
➢ Recursive AI feature.
Tools and technologies used
➢ Eclipse as an IDE
➢ PHP and Java for crawling
➢ Jsoup library for getting html pages from links
➢ MetaMap
➢ Python
Approach in phases
Crawling phase:➢ Using Jsoup library ,php and python➢ Website ----------> textfiles (1 per disease)➢ Each disease file containing crawled text related to that
disease.
Metamap phase
➢ The Part-of-Speech Tagger server; Word Sense Disambiguation (WSD) Server and the metamap server
➢ Java API. ➢ Phrase-wise formatted output➢ Parsing according to attribute
“Semantic type = [sosy]” ➢ symptoms.
Index creation phase
➢ Disease vs Symptom fwd index.
➢ Symptom vs Disease inverted index.
➢ Using python.
➢ Merging Both indices and
➢ keeping OFFSET as the mapping attribute.
The recursive “AI” system phase
➢ 1st Symptom taken as input from user is mapped to its corresponding diseases in the indices.
➢ Now one by one, the symptoms of these diseases were displayed to the user and user is asked to choose his/her symptom from this list.
➢ The user inputs yes/no for each symptom being asked. ➢ This process being a recursive one, the target list of diseases gradually
becomes specific towards the users condition and the List of diseases are output when the target list crosses the minimum threshold.
UI integration phase
➢ This whole system is now integrated as a web application using a simple online GUI template.
Conclusion
➢ We have built a system consisting of a knowledge base and a knowledge gathering system to extract relationships between diseases and determinants, symptoms, and the affected body parts.
➢ The end product is a web application with a user friendly interface wherein the user will enter the symptoms he/she has and on giving the data to the system, the system will output the most significant disease/s. This project implements AI feature successfully with more user interactivity.
References
Crawling Domains,➢ http://www.webmd.com/a-to-z-guides/health-topics/default.htm➢ http://www.mayoclinic.org/diseases-conditions/➢ Min Ye.Text Mining for Building a Biomedical Knowledge Base on
Diseases, Risk Factors, and Symptoms. Master's Thesis, Center for Bioinformatics, Saarland University, March 2011.
➢ Jsoup library for getting html pages from links.➢ Metamap software , UMLS: Unified Medical Language System. http:
//www.nlm.nih.gov/research/umls