separating bots from humans - paper.seebug.org conf... · separating bots from humans ryan mitchell...
TRANSCRIPT
![Page 1: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/1.jpg)
Separating Bots from HumansRyan Mitchell
@kludgist
DEF CON 23 August 8th, 2015
![Page 2: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/2.jpg)
Who am I?
● Software Engineer● Author of two books:
○ Web Scraping with Python (O’Reilly, 2015)○ Instant Web Scraping with Java (Packt, 2013)
● Engineering grad from Olin College● Masters student at Harvard University School of
Extension Studies, 2016
![Page 3: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/3.jpg)
A history of this talk
![Page 4: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/4.jpg)
The O’Reilly Hacking Book:
![Page 5: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/5.jpg)
Separating Bots from Humans
![Page 6: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/6.jpg)
Pro-tips to get what you want:
● Include some market research● Write it in Python, because it’s really popular
![Page 7: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/7.jpg)
What are Web Scrapers, Bots, etc?
● They can use browsers● They can take their sweet time● They can be surprisingly smart● They can be stunningly idiotic
![Page 8: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/8.jpg)
Why They’re Important
source: https://www.incapsula.com/blog/bot-traffic-report-2014.html
![Page 9: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/9.jpg)
On the Defense Side of Things
(For better or worse)
![Page 10: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/10.jpg)
robots.txt?
● “No Trespassing, please?”
![Page 11: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/11.jpg)
Terms of Service
● “Hey! You said you wouldn’t trespass!”
![Page 12: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/12.jpg)
Headers
● “I’m totally not a bot. Promise”
![Page 13: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/13.jpg)
JavaScript
● Make your site un-indexable for anyone but the bad guys
![Page 14: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/14.jpg)
Embedding Text in Images
● Oh come on.● You’re the type of person who writes email
addresses like “m e (at sign) domain . com”○ And you have duct tape on your laptop’s web cam,
mostly because you never use it.
![Page 15: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/15.jpg)
CAPTCHAs
AnnoyingBreakable
![Page 16: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/16.jpg)
Honepots
● Can be effective, if implemented correctly● Please don’t block the Google bots
![Page 17: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/17.jpg)
Example time!
http://ryanemitchell.com/honeypots.html
![Page 18: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/18.jpg)
Behavioral Patterns
● Now we’re getting somewhere!● Again, please don’t block the Google bots
![Page 19: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/19.jpg)
IP Address Blocking
● It’s sort of effective… If they didn’t really care in the first place
● Lists are a pain to maintain● You can easily block the good guys
![Page 20: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/20.jpg)
On the Attack Side of Things...
![Page 21: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/21.jpg)
Targeted vs. Non-Targeted Attacks
● Non-targeted: Also known as, “look for /phpMyAdmin”
● Targeted, usually to get proprietary data
![Page 22: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/22.jpg)
OCR
● Works best on relatively normal text● Can be used to solve CAPTCHAs
○ Time consuming to create training data. Have a series or two of a TV show ready
![Page 23: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/23.jpg)
OCR Training Tool
● Everything you need to solve a CAPTCHA!https://github.com/REMitchell/tesseract-trainer
![Page 24: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/24.jpg)
JavaScript Execution
● Selenium● PhantomJS
![Page 25: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/25.jpg)
Honeypot Avoidance
● Better than you might expect -- it’s biggest weakness is color
https://github.com/REMitchell/python-scraping/blob/master/chapter12/3-honeypotDetection.py
![Page 26: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/26.jpg)
Stop Caring!
● Bot-proofing sites is way too much work, and often impedes accessibility
● Is your data really that valuable?○ Consider API costs, ease of use -- make it more
attractive to pay for data● If your application is vulnerable to automated
attacks, it’s vulnerable, period.
![Page 27: Separating Bots from Humans - paper.seebug.org Conf... · Separating Bots from Humans Ryan Mitchell @kludgist DEF CON 23 August 8th, 2015. Who am I? Software Engineer Author of two](https://reader034.vdocuments.mx/reader034/viewer/2022042214/5eb91c58e1883936eb751cd7/html5/thumbnails/27.jpg)
Question time!