hms nyc* talk
TRANSCRIPT
![Page 1: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/1.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
The Big Data Quadfecta
Brian O’NeillLead Architect, Health Market Science@boneill42, [email protected]
Taylor GoetzDevelopment Lead, Health Market Science@ptgoetz, [email protected]
![Page 2: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/2.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Quadfecta?
1. Quadfecta• A legendary beirut/beer pong shot that lands on
the tops of four cups simultaneously. Considered the rarest shot in the game, topping even the trifecta, 2-cup knockover-and-sink, and simultaneous 6-cup game-ending double bounce-in.
• Kafka• Storm• Elastic Search• Cassandra
![Page 3: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/3.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
3 V’s
Volume Variety Velocity
![Page 4: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/4.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
The Use Case
![Page 5: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/5.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Our Mission
Prescriber eligibility and remediation
Eliminate fraud, waste and abuse
Insights into the healthcare space
![Page 6: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/6.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
The BusinessBusiness Solutions
Health Care Provider & Facilities
Variety/Velocity• >l2000 of sources
• 6 Million unique HCPs
• 10+ years history
Data Challenges• Constant change in
real world data
• Conflicting & partial info
• Frequent changes to source structure
• Authoritative sources vs. crowdsource
• Predicting source quality
Master Data SolutionsMedical Procedures &
Diagnosis
Volume/Velocity• ~1B claims annually
• +5B records annually
• 5+ years history
Data Challenges• Sources have
incomplete capture
• Overlapping source data
• Statistical projections & biases
• Social media type relationships
Medical Claims Data
CompleteView, Expense Manager,
CompleteSpend
Prescriber Eligibility/Remdi
ation
Analtyics (Influencer Networks)
![Page 7: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/7.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Our SolutionsBusiness
Needs
Finance & LegalBusiness SystemsCompliance Sales & Marketing
SolutionsProvider Data ComplianceData Assessment, Integration &
Enrichment Services
01010011
Market Intelligence
HMSAuthoritative
SourcesPDC Federal StateMedical Claims Web Derived
AdvancedTechnology
Master Data Management
![Page 8: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/8.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Datacenter
¾ Petabytes of raw storage
Virtualized (VMware)
On a SAN
Should we go physical???
![Page 9: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/9.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Under the Hood
Visualization
Dashboard / Reports
Structured Storage
RelationalIndexing
Flexible Storage
NoSQL Graph(s)
Interfacing
Web Services
Distributed Processing
Standardize
Validate
MatchConsolidat
e
Analytics
Data Sources
Government
Web
Customer
I’m happy
User Interface
![Page 10: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/10.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Master Data Management
Harvested
Government
PrivateSchema Change!
![Page 11: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/11.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
The Design
![Page 12: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/12.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
System of Record
Flexibility (Variety)Scalability (Velocity + Volume)
![Page 13: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/13.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Primitives of Distributed Processing
emit/proce
ss(tuple(…
))
map<key<map<[], value>>
pop(push(v))
index(field, type)
Kafka
![Page 14: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/14.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Design Principles
PatternsIdempotent Operations
Elegantly handle replay
Immutable dataAssertions of facts over time
Anti-PatternsTransactions / Locking
![Page 15: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/15.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
State / Counting
Exactly-once semantics for state
Create small batchesOrder batchesBatch 1
Batch 2
Batch 3
Batch 3’
4
13
13
6
Batch Total
1 4
3 4 (wait)
2 10 (+6)
3 23 (+13)
3’ 23 (+0)
![Page 16: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/16.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
What we did wrong…
Could not react to transactional changes
Needed extra logic to track what changed
Took too long
![Page 17: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/17.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
What we did wrong… (II)
AOP-based triggersWorked well initially.Business Processes captured as side-effects.
![Page 18: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/18.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
What we did right.
REST APIs for Loose Coupling
See Virgil:https://github.com/hmsonline/virgil
But really… watch out for Intraverthttps://github.com/zznate/intravert-ug
![Page 19: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/19.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Kafka• Millions of Messages• Replay Enabled• No transactions / Lightning Fast
![Page 20: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/20.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Elastic Search• Edit Distance / Soundex• Native Scalability• Fuzzy Search• Geospatial• Facets
![Page 21: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/21.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm• Guaranteed once semantics• Well-designed processing
abstraction• Beats BYODP• Momentum
![Page 22: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/22.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
The System
KafkaQueue(s)
Offset
C*
A
BC
C* ES1Kafka
ElasticSearch
ES2C*
REST API
NP. We can route around
it.
NP. Replication Factor > 1.
NP. Rewind!
![Page 23: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/23.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
What comes after Quadfecta?
?
![Page 24: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/24.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Real-Time Integration
Real-time CRUD via Web Services
DRPC“Real-time” Queue
Not quite sure?
![Page 25: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/25.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
The Storm/C* Bridge
![Page 26: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/26.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Anatomy of a Storm Cluster
NimbusMaster Node
ZookeeperCluster Coordination
SupervisorsWorker Nodes
![Page 27: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/27.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm Primatives
StreamsUnbounded sequence of tuples
SpoutsStream Sources
BoltsUnit of Computation
TopologiesCombination of n Spouts and n BoltsDefines the overall “Computation”
![Page 28: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/28.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm Spouts
Represents a source (stream) of dataQueues (JMS, Kafka, Kestrel, etc.)Twitter FirehoseSensor Data
Emits “Tuples” (Events) based on source
Primary Storm data structureSet of Key-Value pairs
![Page 29: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/29.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm Bolts
Receive Tuples from Spouts or other Bolts
Operate on, or React to DataFunctions/Filters/Joins/AggregationsDatabase writes/lookups
Optionally emit additional Tuples
![Page 30: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/30.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm Topologies
Data flow between spouts and bolts
Routing of Tuples between spouts/bolts
Stream “Groupings”
Parallelism of Components
Long-Lived
![Page 31: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/31.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm Topologies
![Page 32: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/32.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm and Cassandra
Use Cases:Write Storm Tuple data to C*
Computation ResultsPre-compute indices
Read data from C* and emit Storm Tuples
Dynamic Lookupshttp://github.com/hmsonline/storm-cassandra
![Page 33: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/33.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm Cassandra Bolt Types
CassandraBolt
CassandraLookupBolt
CassandraBoltWrites data to CassandraAvailable in Batching and Non-Batching
CassandraLookupBoltReads data from Cassandra
http://github.com/hmsonline/storm-cassandra
C*STORM
![Page 34: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/34.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm-Cassandra Project
Provides generic Bolts for writing/reading Storm Tuples to/from C*
TupleTuple
Mapper Rows
C*Tuples
ColumnsMapper Columns
STORM
http://github.com/hmsonline/storm-cassandra
![Page 35: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/35.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm-Cassandra Project
TupleMapper InterfaceTells the CassandraBolt how to write a tuple to an arbitrary data model
Given a Storm Tuple:Map to Column FamilyMap to Row KeyMap to Columns
http://github.com/hmsonline/storm-cassandra
![Page 36: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/36.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm-Cassandra Project
ColumnsMapper InterfaceTells the CassandraLookupBolt how to transform a C* row into a Storm Tuple
Given a C* Row Key and list of Columns:
Return a list of Storm Tuples
http://github.com/hmsonline/storm-cassandra
![Page 37: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/37.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm-Cassandra Project
Current State:Version 0.4.0Uses Astyanax ClientSeveral out-of-the-box *Mapper Implementations:
Basic Key-Value ColumnsValue-less ColumnsCounter ColumnsLookup by row keyLookup by range query
Composite Key/Column SupportTrident support
http://github.com/hmsonline/storm-cassandra
![Page 38: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/38.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Storm-Cassandra Project
Future Plans:Switch to CQLEnhanced Trident Support
http://github.com/hmsonline/storm-cassandra
![Page 39: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/39.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Persistent Word Count
http://github.com/hmsonline/storm-cassandra
![Page 40: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/40.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
DRPC
![Page 41: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/41.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
“Reach” Computation
![Page 42: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/42.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
MDM Topology*
*Notional
![Page 43: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/43.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Load Topology
![Page 44: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/44.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Shameless Shoutouts
HMS (https://github.com/hmsonline/)storm-cassandrastorm-elastic-searchstorm-jdbi (coming soon)
ptgoetz (https://github.com/ptgoetz) storm-jmsstorm-signals
![Page 45: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/45.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Next Level : Trident
![Page 46: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/46.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Trident
Provides a higher-level abstraction for stream processing
Constructs for state management and Batching
Adds additional primitives that abstract away common topological patterns
Deprecates transactional topologies
Distributes with Storm
![Page 47: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/47.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Sample Trident Operations
Partition LocalFunctions ( execute(x) x + y )
Filters ( isKeep(x) 0,x )
PartitionAggregateCombiner ( pairwise combining )Reducer ( iterative accumulation )Aggregator ( byoa )
![Page 48: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/48.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
A sample topology
TridentTopology topology = new TridentTopology();
TridentState wordCounts =
topology.newStream("spout1", spout)
.each(new Fields("sentence"),
new Split(),
new Fields("word"))
.groupBy(new Fields("word"))
.persistentAggregate(
MemcachedState.opaque(serverLocations),
new Count(),
new Fields("count"))
.parallelismHint(6);
https://github.com/nathanmarz/storm/wiki/Trident-state
![Page 49: Hms nyc* talk](https://reader035.vdocuments.mx/reader035/viewer/2022062312/554f6df3b4c905bb178b4fb7/html5/thumbnails/49.jpg)
1• 8
00.5
93.4
467
• in
fo@
heal
thm
arke
tsci
ence
.com
Trident State
Sequenced writes by batch/transaction id.
SpoutsTransactional
Batch contents never change
OpaqueBatch contents can change
StateTransactional
Store tx_id with counts to maintain sequencing of writes.
OpaqueStore previous value in order to overwrite the current value when contents of a batch change.