low latency web scale fraud prevention with apache samza, kafka and friends
TRANSCRIPT
![Page 1: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/1.jpg)
Low-Latency, Web-scale Fraud Prevention with Samza and Friends
![Page 2: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/2.jpg)
Senior Data Scientist at eBay Enterprise leading R&D efforts in applying machine learning to fraud prevention and elsewhere.
![Page 3: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/3.jpg)
Commerce is getting more convenient, more complex, and so is fraud. To keep up fraud prevention solutions need to process a lot more data
• Older Data• Looking back a lot further in time• Older data is not effective excuse – home for the holidays?
• Wider Data• Using all available data sources• How wide can customer name possibly be?
• Richer Data• Social/unstructured data – people, places, interests• Connected data – who shipped to whom, where; email, devices, IP addresses
• Faster Data• Clickstream data – website click patterns
![Page 4: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/4.jpg)
Modern Fraud Prevention Architecture Requirements
• Web scale capable (horizontal scaling using commodity hardware)• handle more actions and data for each user• handle more users and more volume from each user• handle more customers of all sizes (lowest processing cost)
• Low latency (milliseconds not hours)• card present, digital goods, gift cards, store pickup (in-store online shopping!?)• e-commerce physical goods? – no teleporting yet so speed up what we can• process customer interactions in real time (personalization, royalty, shopping experience)• dynamic order process (identification, authentication, tender presentation)
• Fault tolerance• Commodity hardware is not without faults• Expect and design for routine failures – more like shift changes, or relay races
![Page 5: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/5.jpg)
Preventing fraud is all about detecting abnormal behavior. Normal behavior is not normal – we are all normal in our own abnormal special ways.
• Typical customer profiling calculations• Transaction velocity (#txns_day) and change (#txns_day_1days/#txns_day_10days)• Amount velocity ($txns_day) and change ($txns_day_1days/$txns_day_10days)
• Typical implementation and technologies1. Define sliding window interval (7 days, a month, 6 months?)2. For each live txn pull matching txns (card, ...) from single SQL DB within that sliding window3. Loop over pulled transactions filtering based on timestamp to calculate change over sub-windows
• Issues, Problems, Solutions?!
![Page 6: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/6.jpg)
![Page 7: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/7.jpg)
CName Date $ ShipAddr … CTxns CustAvgAmt TxAmt_AvgAmt_Ratio Shipping Addr Txns
Shipping Addr Avg Amount
Edi Bice 8/3/15 50 123 Main St 1 50 = (50 + 0) / 1 NA 1 50
Edi Bice 8/3/15 100 123 Main St 2 75 = (100 + 50*1) / 2 2.0 = 100 / 50 2 75
Edi Bice 8/4/15 150 123 Main St 3 100 = (150 + 75*2) / 3 2.0 = 150 / 75 3 100
Edi Bice 8/5/15 1500 999 Wall St 4 450 = (1500 + 100*3) / 4 15.0 = 1500 / 100 1 1500
Streaming Analytics New Avg Amt = (New Txn Amt + Curr Avg Amt * Curr Num Txns) / (Curr Num Txns + 1)
![Page 8: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/8.jpg)
job pipelines
Kafka, Samza, and the Unix philosophy of distributed data by Martin Kleppmann
![Page 9: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/9.jpg)
Apache Kafka• Distributed, scalable, publish-subscribe messaging system• Persistent, high-throughput messaging• Designed for real time activity stream data processing
![Page 10: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/10.jpg)
PreCog Samza Job Pipeline
Manifold (1-in-N-out) jobsRisk-by-Y calc jobs
X-by-Y calc jobs
Assembly jobs
![Page 11: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/11.jpg)
![Page 12: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/12.jpg)
FAULT-TOLERANT LOCAL STATE
Samza job partition 0
Samza job partition 1
LocalRocksDB
LocalRocksDB
Durable changelog Kafka
replicate writes
Embedded key-value: very fastMachine dies local key-value store is lost⇒Solution: replicate all writes to Kafka!Machine dies restart on another machine⇒Restore key-value store from changelogChangelog compaction in the background
![Page 13: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/13.jpg)
Samza Jobs on Hadoop 2.0 (YARN)
Samza App Master
Node Manager
Kafka Broker
Machine 1 Machine 2
Samza TaskRunner: Partition 1
Node Manager
Kafka Broker
aStreamTask:process()
Samza TaskRunner: Partition 2
aStreamTask:process()
Machine 3
Node Manager
Kafka Broker
Samza TaskRunner: Partition 3
aStreamTask:process()
![Page 14: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/14.jpg)
Monitoring Samza: Metrics and More
Samza JMX metrics jmxtrans OpenTSDB/HBase Grafana
![Page 15: Low Latency Web Scale Fraud prevention with Apache Samza, Kafka and Friends](https://reader035.vdocuments.mx/reader035/viewer/2022062401/58f9a949760da3da068b6d1f/html5/thumbnails/15.jpg)
Questions?
http://www.ebayenterprise.com/
@edi_bice
https://www.linkedin.com/in/ebice