Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Apple Store Stock Notifier monitors the availability of selected Apple devices in selected Apple stores, and sends you a notification when devices are available!

Apple Store Stock Notifier This software will immediately send you a notification via Telegram when one of your coveted Apple Devices is available in

Floris-Jan Willemsen 25 Dec 05, 2022
Use Raspberry Pi and CircuitSetup's power monitor hardware to publish electrical usage to MQTT

This repo has code and notes for whole home electrical power monitoring using a Raspberry Pi and CircuitSetup modules. Beyond just collecting data, it

Eric Tsai 10 Jul 25, 2022
Get Your Localhost Online - Ngrok Alternative

Get Your Localhost Online - Ngrok Alternative

Azimjon Pulatov 442 Jan 04, 2023
Python implementation of the Session open group server

API Documentation CLI Reference Want to build from source? See BUILDING.md. Want to deploy using Docker? See DOCKER.md. Installation Instructions Vide

Oxen 36 Jan 02, 2023
A simple tool to get information about IP

IP Info Tool Just a simple tool to get IP's information, it uses requests module to gather information about IP, if you dont have much knowledge about

0 Dec 01, 2021
Display ip2.network active live streams.

Display ip2.network active live streams.

Daeshon Jones 0 Oct 31, 2021
DataShare - Simple library for data sharing between scripts and public functions calling

DataShare - Simple library for data sharing between scripts and public functions calling. Installation. Install code, Delete LICENSE, README, readme.t

Ivan Perzhinsky. 1 Dec 17, 2021
Discord RPC Generator With Python

Discord-RPC-Generator Thank you for using this Discord Custom RP Generator. This is 100% safe and open source. Download Discord for your computer here

1 Nov 09, 2021
This Tool Help To Information gathering for domain name or ip address...

Owl-Eye This Tool Help To Information gathering for domain name or ip address... follow this command $apt update && upgrade $apt install python apt in

Black Owl 6 Nov 12, 2022
A Simple Web Server made by Python3.

A Simple Web Server made by Python3.

GGN_2015 2 Nov 27, 2021
SocksFlood, a DoS tools that sends attacks using Socks5 & Socks4

Information SocksFlood, a DoS tools that sends attacks using Socks5 and Socks4 Requirements Python 3.10.0 A little bit knowledge of sockets IDE / Code

ArtemisID 0 Dec 03, 2021
Mass Reverse IP Dibuat Dengan Python 3 Dan Ada Fitur Filter.

Reverse IP Tools Description. Reverse IP is a method to map an IP address to a sub domain. This tool is made in the python 3 programming language. Fea

Wan Naz ID 6 Oct 24, 2022
Slowloris is basically an HTTP Denial of Service attack that affects threaded servers.

slowrise-ddos-tool What is Slowloris? Slowloris is basically an HTTP Denial of S

DEMON cat 4 Jun 19, 2022
A Scapy implementation of SMS-SUBMIT and (U)SIM Application Toolkit command packets.

A Scapy implementation of SMS-SUBMIT and (U)SIM Application Toolkit command packets.

mnemonic 83 Dec 11, 2022
Distribute a portion of your yield to other addresses 💙

YSHARE Distribute a portion of your yield to other addresses. How does it work Desposit your yToken or tokens into this contract Set the benificiaries

11 Nov 24, 2021
Pesquise, filtre e obtenha informações sobre animes. ( Módulo PIP )

Pesquise, filtre e obtenha informações sobre animes. ( Módulo PIP )

AimCaffe 3 Jan 30, 2022
Tiny JSON RPC via HTTP library.

jrpc Simplest ever possible Asynchronous JSON RPC via HTTP library for Python, backed by httpx. Installation pip install async-jrpc Usage Import JRPC

Onigiri Team 2 Jan 31, 2022
A simple GitHub Action that physically puts your senses on alert when your build/release fails

GH Release Paniker A simple GitHub Action that physically puts your senses on alert when your build/release fails Usage Requirements: Raspberry Pi, LE

Hemanth Krishna 5 Dec 20, 2021
Tool for quickly gathering information from Shodan.io about the number of IPs which satisfy large number of different queries

TriOp Tool for quickly gathering information from Shodan.io about the number of IPs which satisfy large number of different queries For furt

Jan Kopriva 27 Nov 03, 2022
A simple framwork to streamline the Domain Adaptation training process.

FastDA Introduction This is a simple framework for domain adaptation training. You can use it to build your own training process. It heavily relies on

Vincent Zhang 7 Nov 22, 2022