Posts Tagged - python

Scrapy (Python web crawler)

Scrapy is a web-scrapper & crawler.

Concepts

spider: class that you define and scrapy uses to scrape information from a website (our a group of websites). They must define the initial requests to make, optionally how to follow links in the pages and how to parse the content to extract data

item pipeline: after an item has been crawled by a spider, it’s sent to the item pipeline which processes it through several components that are executed sequentially. You can use them, for example, to save items to a database

How to use

# create a new project
scrapy startproject your_project_name  

# after writing a spider, it starts the crawl
scrapy crawl quotes

Read More