Scrapy uses Request and Response objects for crawling web sites.

Last update: Nov 03, 2021

Related tags

Overview

Requests and Responses¶

Scrapy uses Request and Response objects for crawling web sites.

Typically, Request objects are generated in the spiders and pass across the system until they reach the Downloader, which executes the request and returns a Response object which travels back to the spider that issued the request.

Both Request and Response classes have subclasses which add functionality not required in the base classes. These are described below in Request subclasses and Response subclasses.

Request objects¶

classscrapy.http.Request(*args, **kwargs)[source]¶ A Request object represents an HTTP request, which is usually generated in the Spider and executed by the Downloader, and thus generating a Response.

Parameters url (str) –

the URL of this request

If the URL is invalid, a ValueError exception is raised.

callback (collections.abc.Callable) – the function that will be called with the response of this request (once it’s downloaded) as its first parameter. For more information see Passing additional data to callback functions below. If a Request doesn’t specify a callback, the spider’s parse() method will be used. Note that if exceptions are raised during processing, errback is called instead.

method (str) – the HTTP method of this request. Defaults to 'GET'.

meta (dict) – the initial values for the Request.meta attribute. If given, the dict passed in this parameter will be shallow copied.

body (bytes or str) – the request body. If a string is passed, then it’s encoded as bytes using the encoding passed (which defaults to utf-8). If body is not given, an empty bytes object is stored. Regardless of the type of this argument, the final value stored will be a bytes object (never a string or None).

headers (dict) –

the headers of this request. The dict values can be strings (for single valued headers) or lists (for multi-valued headers). If None is passed as value, the HTTP header will not be sent at all.

Caution

Cookies set via the Cookie header are not considered by the CookiesMiddleware. If you need to set cookies for a request, use the Request.cookies parameter. This is a known current limitation that is being worked on.

cookies (dict or list) –

the request cookies. These can be sent in two forms.

Using a dict:

request_with_cookies = Request(url="http://www.example.com", cookies={'currency': 'USD', 'country': 'UY'}) Using a list of dicts:

request_with_cookies = Request(url="http://www.example.com", cookies=[{'name': 'currency', 'value': 'USD', 'domain': 'example.com', 'path': '/currency'}])

The latter form allows for customizing the domain and path attributes of the cookie. This is only useful if the cookies are saved for later requests.

When some site returns cookies (in a response) those are stored in the cookies for that domain and will be sent again in future requests. That’s the typical behaviour of any regular web browser.

To create a request that does not send stored cookies and does not store received cookies, set the dont_merge_cookies key to True in request.meta.

Example of a request that sends manually-defined cookies and ignores cookie storage:

Request( url="http://www.example.com", cookies={'currency': 'USD', 'country': 'UY'}, meta={'dont_merge_cookies': True}, ) For more info see CookiesMiddleware.

Caution

encoding (str) – the encoding of this request (defaults to 'utf-8'). This encoding will be used to percent-encode the URL and to convert the body to bytes (if given as a string).

priority (int) – the priority of this request (defaults to 0). The priority is used by the scheduler to define the order used to process requests. Requests with a higher priority value will execute earlier. Negative values are allowed in order to indicate relatively low-priority.

dont_filter (bool) – indicates that this request should not be filtered by the scheduler. This is used when you want to perform an identical request multiple times, to ignore the duplicates filter. Use it with care, or you will get into crawling loops. Default to False.

errback (collections.abc.Callable) –

a function that will be called if any exception was raised while processing the request. This includes pages that failed with 404 HTTP errors and such. It receives a Failure as first parameter. For more information, see Using errbacks to catch exceptions in request processing below.

Changed in version 2.0: The callback parameter is no longer required when the errback parameter is specified.

flags (list) – Flags sent to the request, can be used for logging or similar purposes.

cb_kwargs (dict) – A dict with arbitrary data that will be passed as keyword arguments to the Request’s callback.

Scrapy uses Request and Response objects for crawling web sites.

Related tags

Overview

Requests and Responses¶

Request objects¶

Owner

Md Rashidul Islam

A simple flask application to scrape gogoanime website.

Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.

Scrape all the media from an OnlyFans account - Updated regularly

Telegram group scraper tool

A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

Find thumbnails and original images from URL or HTML file.

Scraping news from Ucsal portal with Scrapy.

OSTA web scraper, for checking the status of school buses in Ottawa

茅台抢购最新优化版本，茅台秒杀，优化了抢购协程队列

A list of Python Bots used to extract data from several websites

Web Scraping Practica With Python

Google Developer Profile Badge Scraper

原神爬虫抓取原神界面圣遗物信息

A simple python web scraper.

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

Simple library for exploring/scraping the web or testing a website you’re developing

爬取各大SRC当日公告 | 通过微信通知的小工具 | 赏金工具

This tool crawls a list of websites and download all PDF and office documents

Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Scrapy uses Request and Response objects for crawling web sites.

Related tags

Overview

Requests and Responses¶

Request objects¶

Owner

Md Rashidul Islam

A simple flask application to scrape gogoanime website.

Using Selenium with Python to Web Scrap Popular Youtube Tech Channels.

Scrape all the media from an OnlyFans account - Updated regularly

Telegram group scraper tool

A package that provides you Latest Cyber/Hacker News from website using Web-Scraping.

SearchifyX, predecessor to Searchify, is a fast Quizlet, Quizizz, and Brainly webscraper with various stealth features.

Find thumbnails and original images from URL or HTML file.

Scraping news from Ucsal portal with Scrapy.

OSTA web scraper, for checking the status of school buses in Ottawa

茅台抢购最新优化版本，茅台秒杀，优化了抢购协程队列

A list of Python Bots used to extract data from several websites

Web Scraping Practica With Python

Google Developer Profile Badge Scraper

原神爬虫 抓取原神界面圣遗物信息

A simple python web scraper.

Scrapping the data from each page of biocides listed on the BAUA website into a csv file

Simple library for exploring/scraping the web or testing a website you’re developing

爬取各大SRC当日公告 | 通过微信通知的小工具 | 赏金工具

This tool crawls a list of websites and download all PDF and office documents

Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

原神爬虫抓取原神界面圣遗物信息