Download - Getting started with Scrapy in Python
![Page 1: Getting started with Scrapy in Python](https://reader036.vdocuments.mx/reader036/viewer/2022081505/5552beebb4c905920f8b471f/html5/thumbnails/1.jpg)
Web Scraping with ScrapyVirendra Rajput
Hacker @Markitty
![Page 2: Getting started with Scrapy in Python](https://reader036.vdocuments.mx/reader036/viewer/2022081505/5552beebb4c905920f8b471f/html5/thumbnails/2.jpg)
Agenda
● What is web scraping and why it's fun● My experiments with web scraping● Getting started with Scrapy● How Scrapy works and a quick Demo ● Why Scrapy● Questions
![Page 3: Getting started with Scrapy in Python](https://reader036.vdocuments.mx/reader036/viewer/2022081505/5552beebb4c905920f8b471f/html5/thumbnails/3.jpg)
What is Web Scraping?
● Extracting information from websites● Problem:
○ Static websites ○ No access to APIs to extract the data you
need○ Need to extract data periodically
● Manual solution - go to the website and copy the required data
● Smarter solution: Web Scraping
![Page 4: Getting started with Scrapy in Python](https://reader036.vdocuments.mx/reader036/viewer/2022081505/5552beebb4c905920f8b471f/html5/thumbnails/4.jpg)
My Experiments with Scraping
![Page 5: Getting started with Scrapy in Python](https://reader036.vdocuments.mx/reader036/viewer/2022081505/5552beebb4c905920f8b471f/html5/thumbnails/5.jpg)
Web Scraping in Python
● Download webpage with urllib2, requests
● Parse the page with BeautifulSoup/lxml
● Select with XPath or css selectors
![Page 6: Getting started with Scrapy in Python](https://reader036.vdocuments.mx/reader036/viewer/2022081505/5552beebb4c905920f8b471f/html5/thumbnails/6.jpg)
Scrapy - fast high Level Screen Scraping and web crawling Framework● Pick a website● Define the data you want to scrape● Write the spider to extract the data● Run the spider ● Store the Data
![Page 7: Getting started with Scrapy in Python](https://reader036.vdocuments.mx/reader036/viewer/2022081505/5552beebb4c905920f8b471f/html5/thumbnails/7.jpg)
Demo
![Page 8: Getting started with Scrapy in Python](https://reader036.vdocuments.mx/reader036/viewer/2022081505/5552beebb4c905920f8b471f/html5/thumbnails/8.jpg)
![Page 9: Getting started with Scrapy in Python](https://reader036.vdocuments.mx/reader036/viewer/2022081505/5552beebb4c905920f8b471f/html5/thumbnails/9.jpg)
Why Scrapy
● Simplicity● Fast● Productive/ Extensible● Portable● Well docs & Healthy community● Commercial Support
![Page 10: Getting started with Scrapy in Python](https://reader036.vdocuments.mx/reader036/viewer/2022081505/5552beebb4c905920f8b471f/html5/thumbnails/10.jpg)
Advanced Features (built in)
● Interactive shell for trying XPaths (useful for debugging)
● selecting and extracting data from html sources
● cleaning and sanitizing the scraped data● generating feed exports (JSON, CSV)● media pipeline for downloading stuff● Middlewares for (cookies, HTTP
compression, cache, user-agent spoofing, etc)
![Page 11: Getting started with Scrapy in Python](https://reader036.vdocuments.mx/reader036/viewer/2022081505/5552beebb4c905920f8b471f/html5/thumbnails/11.jpg)
questions?