Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Multipurpose Growtopia Server tools, can be used for newbie to learn things.

Multipurpose Growtopia Server tools, can be used for newbie to learn things.

FelixF 3 Dec 01, 2021
A python shell / chat bot for XMPP and cloud services

XMPP_Shell_Bot A python shell / chat bot for XMPP and cloud services, designed for penetration testers to bypass network filters. To better understand

Abdulkareem Aldeek 1 Jan 09, 2022
This is a Client-Server-System which can share the screen from the server to client and in the other direction.

Screenshare-Streaming-Python This is a Client-Server-System which can share the screen from the server to client and in the other direction. You have

VFX / Videoeffects Creator 1 Nov 19, 2021
PySocks lets you send traffic through SOCKS proxy servers.

PySocks lets you send traffic through SOCKS proxy servers. It is a modern fork of SocksiPy with bug fixes and extra features. Acts as a drop-i

1.1k Dec 07, 2022
test whether http(s) proxies actually hide your ip

Proxy anonymity I made this for other projects, to find working proxies. If it gets enough support and if i have time i might make it into a gui Repos

gxzs1337 1 Nov 09, 2021
Medusa is a cross-platform agent compatible with both Python 3.8 and Python 2.7.

Medusa Medusa is a cross-platform agent compatible with both Python 3.8 and Python 2.7. Installation To install Medusa, you'll need Mythic installed o

Mythic Agents 123 Nov 09, 2022
openPortScanner is a port scanner made with Python!

Port Scanner made with python • Installation • Usage • Commands Installation Run this to install: $ git clone https://github.com/Miguel-Galdin0/openPo

Miguel Galdino 7 Jan 09, 2022
libsigrok stacked Protocol Decoder for TPM 2.0 transactions from an SPI bus. BitLocker Volume Master Key (VMK) are automatically extracted.

libsigrok stacked Protocol Decoder for TPM 2.0 transactions from an SPI bus. BitLocker Volume Master Key (VMK) are automatically extracted.

Jordan Ovrè 9 Dec 26, 2022
GhostVPN - Simple and lightweight TUI application for CyberGhostVPN

GhostVPN Simple and lightweight TUI application for CyberGhostVPN. Screenshot Us

Mehmet Ali KERİMOĞLU 5 Jul 27, 2022
The Delegate Network: An Interactive Voice Response Delegative Democracy Implementation of Liquid Democracy

The Delegate Network Overview The delegate network is a completely transparent, easy-to-use and understand version of what is sometimes called liquid

James Bowery 2 Feb 25, 2022
jarbou3 is rat tool coded in python with C&C which can accept multiple connections from clients

jarbou3 Jarbou3 is rat tool with coded in python with C&C which can accept multi

youhacker55 108 Dec 29, 2022
A fully automated, accurate, and extensive scanner for finding log4j RCE CVE-2021-44228

log4j-scan A fully automated, accurate, and extensive scanner for finding vulnerable log4j hosts Features Support for lists of URLs. Fuzzing for more

FullHunt 3.2k Jan 02, 2023
Pritunl is a distributed enterprise vpn server built using the OpenVPN protocol.

Pritunl is a distributed enterprise vpn server built using the OpenVPN protocol.

Pritunl 3.8k Jan 03, 2023
LGPL Pure Python OPC-UA Client and Server

LGPL Pure Python OPC-UA Client and Server

Free OPC-UA Library 1.2k Jan 04, 2023
A server and client for passing data between computercraft computers/turtles across dimensions or even servers.

ccserver A server and client for passing data between computercraft computers/turtles across dimensions or even servers. pastebin get zUnE5N0v client

1 Jan 22, 2022
🌐 Tools for Networking

🌐 Network Tools Tools for Networking This repository contains the tools needed to make networking easier. Make sure to download all of the requiremen

Tornaido 1 Jan 15, 2022
Apple Store Stock Notifier monitors the availability of selected Apple devices in selected Apple stores, and sends you a notification when devices are available!

Apple Store Stock Notifier This software will immediately send you a notification via Telegram when one of your coveted Apple Devices is available in

Floris-Jan Willemsen 25 Dec 05, 2022
High capacity, high availability, well connected, fast lightning node.

LND ⚡ Routing High capacity, high availability, well connected, fast lightning node. We aim to become a top liquidity provider for the lightning netwo

18 Dec 16, 2022
An open source bias lighting program which syncs up colored lights to the contents of your screen.

About Firelight Firelight is an open source bias lighting program which syncs up colored lights to the contents of your screen or TV, providing an imm

Roshan 18 Dec 18, 2022
Simple app that redirect fixed URL to changing URL, configurable via POST requests

This is a basic URL redirection service. It stores associations between apps and redirection URLs, for apps with changing URLs. You can then use GET r

Maxime Weyl 2 Jan 28, 2022