nudell research proposal
TRANSCRIPT
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Developing a real-time predictive algorithm for
emergency medical dispatch
Nick Nudell, MS, NRP
Dakota State University
Introduction
2paramedicfoundation.org
twitter.com/paramedicfound facebook.com/ParamedicFoundation
• Approximately 7 billion emergency calls annually
• Police – Fire – EMS
Real World Problem
3
• What action to take?
• Where to do it?
• Who should do it?
• How quickly does it need to be done?
• Why was it done?
• Decisions!
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Data Sources
4
• Caller phone number: call routing information, mobile/fixed, single/multiple user (like an IP address), GPS/tower, eCall/Automatic Crash Notification
• Resources/system status: what people, vehicles, equipment, etc.
• Environment: Weather, crowding & traffic (granular to the device), street corner/high rise/wilderness, ferry/train/plane schedules
• Call center, paramedics, hospital, police records, fire records, public health
• Social media: twitter, facebook, instagram, etc
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Existing research
5
• 50 years of Operations Research / Management
• 25 years of decision tool/tree validation
• 10 years of clinical registry prediction tool validation
• 15 years of decision support in emergency calling “appropriateness”
• 6 months of deep data mining exploratory work
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Why is it so complex?
6
• Chinese city with 9 million residents • 2.5 calls per resident over 5 years (0.5/person/year)
• Repeat callers average 2.09 calls per year
• USA with 320 million residents• 240 million 911 calls per year (0.75/person/year)
• 41,000 calls per Public Safety Answering Point
• $4.51 per call, just to maintain the ICT & dispatching system
• 10,000+ ICD10 diagnosis codes
• 19,000 EMS services across 50 states & 6 territories
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Categorization
7
• Started in 1978…
• 36 Families of problem types
• Level of Urgency: Hot or Not• Omega, Alpha, Bravo, Charlie, Delta, Echo
• Nuanced descriptors help determine what kind of first-aid instructions are to be given
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
FDNY Example
8paramedicfoundation.org
twitter.com/paramedicfound facebook.com/ParamedicFoundation
1120 * 8 = 8,960 hours of coverage
Two-level capability
138,116 total calls5,730 high priority (Cardiac Arrest & Choking)53,481 life threatening78,905 non-life threatening
Decision Tree – Manual Deductive Reasoning
9paramedicfoundation.org
twitter.com/paramedicfound facebook.com/ParamedicFoundation
• Dispatching priority relies on standardized keywords compared to a known list of static scenarios
• IF• Shooting THEN
• Urgently send police, apply tourniquet, stop bleeding.
• Not breathing/pulseless THEN• Start CPR, urgently send paramedics
• Cardiac history THEN• Urgently send paramedics, take aspirin, stay calm
• Known as clustering in computer science
Questions / Prioritization / Instructions
10
• Priorities designed to purposefully over-triage rather than increase specificity as risk management tool• Lots of vehicles / fewer vehicles
• Lights & Sirens / no L&S
• Queuing theory using probabilistic expected delays for paramedics, police, or fire department responders• Targeting the slowest delay possible because time=money
• Knowledge discovery opportunities are overlooked!• Crowdsource trained people for faster response
• Electronic medical records describe historical risk
• Caller behavior, word choice, history, location, etc are untapped indicators
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Queuing Theory – Planning to Disappoint
11paramedicfoundation.org
twitter.com/paramedicfound facebook.com/ParamedicFoundation
• Operations Research, Management Science, & Computer Science disciplines rely on probabilistic calculations
• A model is constructed so that queue lengths and waiting time can be predicted• Interarrival time & service times are independent random variables
• Designed to select next task to perform
• The most commonly used laws are: • FIFO - First In First Out: who comes earlier leaves earlier • LIFO - Last Come First Out: who comes later leaves earlier • RS - Random Service: the customer is selected randomly • Priority
Erlang Call Center Algorithm
12paramedicfoundation.org
twitter.com/paramedicfound facebook.com/ParamedicFoundation
Source: http://www.erlang.com/calculator/call/
Estimate how many agents you need in your call center for each hour during an eight hour day…
How many taxis for a particular time of day?
How many hospital beds? Fire trucks? Paramedics? Police?
Natural Language Processing
13
• Machine learning to determine semantic meaning
• Based on ontologies and probabilistic decisions• “Understanding” of words, meanings, intents
• Better suited for structured, grouped or otherwise trained text such as physician narratives or same language categorization
• Excels at spelling, grammar, and Named Entity Recognition that are relatively structured attributes
• Well suited for classifying/parsing simple or common statements
• Generally “trained” by humans (expensive)
• Handling unstructured data, stemming, bag of words, TF/IDF, topic modeling.
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Machine Learning - Inductive
14
• Learns from the information itself
• Classifier accuracy is similar to human experts
• Common Algorithm Types• K-nearest neighbors (KNN)
• Linear regression
• Logistic regression
• Naive Bayes
• Decision trees, bagged trees, boosted trees, boosted stumps
• Random Forests
• AdaBoost
• Neural networks
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Comparing Supervised Learning Algorithms
15
AlgorithmProblem
Type
Results interpretabl
e by you?
Easy to explain
algorithm to others?
Average predictive accuracy
Training speedPrediction
speed
Amount of parameter
tuning needed (excluding
feature selection)
Performs well with small number of
observations?
Handles lots of irrelevant
features well (separates signal
from noise)?
Automatically learns feature
interactions?
Gives calibrated
probabilities of class
membership?
Parametric?
Features might need
scaling?
KNN Either Yes Yes Lower FastDepends
on nMinimal No No No Yes No Yes
Linear regression
Regression Yes Yes Lower Fast FastNone (excluding regularization)
Yes No No N/A YesNo (unless
regularized)
Logistic regression
Classification Somewhat Somewhat Lower Fast FastNone (excluding regularization)
Yes No No Yes YesNo (unless
regularized)
Naive Bayes Classification Somewhat Somewhat LowerFast (excluding
feature extraction)
FastSome for feature
extractionYes Yes No No Yes No
Decision trees Either Somewhat Somewhat Lower Fast Fast Some No No Yes Possibly No No
Random Forests
Either A little No Higher Slow Moderate Some NoYes (unless noise ratio is very high)
Yes Possibly No No
AdaBoost Either A little No Higher Slow Fast Some No Yes Yes Possibly No No
Neural networks
Either No No Higher Slow Fast Lots No Yes Yes Possibly No Yes
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
https://docs.google.com/spreadsheets/d/16i47Wmjpj8k-mFRk-NnXXU5tmSQz8h37YxluDV8Zy9U/edit#gid=0
Support Vector Machine (SVM)
16paramedicfoundation.org
twitter.com/paramedicfound facebook.com/ParamedicFoundation
Nadkarni, P. M., Ohno-Machado, L., & Chapman, W. W. (2011). Natural language processing: an introduction. Journal of the American Medical Informatics Association : JAMIA, 18(5), 544–551. http://doi.org/10.1136/amiajnl-2011-000464
Algorithm Quality
17
• Very similar level of accuracy between algorithms• Will use similar attributes for
scoring
• May vary when categorical vs continuous data
• Primary difference is in efficiency• Big-O Notation is a relative
representation of the complexity of an algorithm
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Random Forest
18
• Advantages • It has been widely shown that random forests
are one of the most accurate existing classification methods
• It can deal with a huge number of features• It runs efficiently on large datasets• It can help estimate which variables are
important in classification• It can be extended to an unsupervised version
to work with unlabeled data.• It is relatively robust to noise
• Disadvantages• They tend to overt noisy data.• Not as intuitive as some other classification
methods• Might take a while to build the forest (but once
it's built classification is very fast)
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
The Turing Test
19
• In 1950 Alan Turing wondered ‘Can computers think?’
• Proposed The Imitation Game
• Interrogator and two players, one human and one computer
• Based on typewritten responses the interrogator was to guess which player was the computer
• He believed having adequate storage was the primary limiting factor with speed being next
• Learning machine is like a child being taught
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Turing, A. M. (1950). Computing machinery and intelligence. Mind, 59(236), 433-460.
Research Questions
20
• Can an a priori algorithmic, inductive reasoning based approach be developed to: • improve the speed of the decision making process during emergency call
taking and dispatching?
• improve the accuracy of the resource assignment for emergency call dispatching?
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Discussion – Present Considerations
21
• Flowchart/Tree: veracity of the reporting party, socio-economic and demographic factors of the patient/victim, the capability of the responding unit, the quality of services provided by the responding individual, and the specificity of the dispatching algorithm itself are not factored into the decision model.
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Discussion – Future Considerations & Research
22
• Future research: develop an AI, ML based approach. • Obtain detailed 911 call and electronic Patient Care Records for approximately
five million patients where an outcome is identified. • unfounded/no merit, patient treated but not transported, patient treated and
transported, and patient transferred to another responder. • The clinical condition at the time of the outcome will be determined based on standard
paramedic coding practices.
• Data split by randomization to a training dataset and test dataset. • A Random Forest model built from training dataset then applied to test
dataset.
• Comparative statistics to evaluate the resource assignments, reduced demand, and potential savings of the new model
• New knowledge model is a dynamic and real-time application
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation
Contact
23
Nikiah Nudell, MS, NRP
(760) 405-6869
http://twitter.com/runmedic
https://www.linkedin.com/in/medicnick
paramedicfoundation.org twitter.com/paramedicfound
facebook.com/ParamedicFoundation