Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
A network address manipulation library for Python

netaddr A system-independent network address manipulation library for Python 2.7 and 3.5+. (Python 2.7 and 3.5 support is deprecated). Provides suppor

711 Jan 05, 2023
Wifijammer - Continuously jam all wifi clients/routers

wifijammer Continuously jam all wifi clients and access points within range. The effectiveness of this script is constrained by your wireless card. Al

Dan McInerney 3.5k Dec 31, 2022
A Python3 discord trojan, utilizing discord webhooks for sending information.

Vape-Lite-RAT A Python3 discord trojan, utilizing discord webhooks for sending information. What you do with this code / project / idea is non of my b

NightTab 12 Oct 15, 2022
An ftp syncing python package that I use to sync pokemon saves between my hacked 3ds running ftpd and my server

Sync file pairs over ftp and apply patches to them. Useful for using ftpd to transfer ROM save files to and from your DS if you also play on an emulator. Setup a cron job to check for your DS's ftp s

17 Jan 04, 2023
It can be used both locally and remotely (indicating IP and port)

It can be used both locally and remotely (indicating IP and port). It automatically finds the offset to the Instruction Pointer stored in the stack.

DiegoAltF4 13 Dec 29, 2022
A pure-Python KSUID implementation

Svix - Webhooks as a service Svix-KSUID This library is inspired by Segment's KSUID implementation: https://github.com/segmentio/ksuid What is a ksuid

Svix 83 Dec 16, 2022
NetMiaou is an crossplatform hacking tool that can do reverse shells, send files, create an http server or send and receive tcp packet

NetMiaou is an crossplatform hacking tool that can do reverse shells, send files, create an http server or send and receive tcp packet

TRIKKSS 5 Oct 05, 2022
A simple python script that parses the MSFT Teams log file for the users current Teams status and then outputs the status color to a MQTT connected light.

Description A simple python script that parses the MSFT Teams log file for the users current Teams status and then outputs the status color to a MQTT

Lorentz Factr 8 Dec 16, 2022
Autopen is a very modular tool that automates the execution of scans during a penetration test.

Autopen Autopen is a very modular tool that automates the execution of scans during a penetration test. A Nmap scan result in the form of an XML file

2 Dec 22, 2021
Keep your application settings in sync (OS X/Linux)

Mackup Keep your application settings in sync. Table of content Quickstart Usage What does it do Bullsh*t, what does it really do to my files Supporte

Laurent Raufaste 12.8k Jan 08, 2023
订阅转换,添加免流host

普通订阅转免流订阅 原理 将原来的订阅解析后添加免流host 使用方法 服务器域名/&&订阅链接&&免流host&&转换后服务器前缀 我这里已经在服务器上搭建好了

163 Apr 01, 2022
Heroku Cloudflare App Domain

Heroku Cloudflare App Domain Creating branded herokuapp.com-like domains using Cloudflare, based on the app name (eg my-app-prod.example.com). Feature

Torchbox 2 Oct 04, 2022
🎥 PYnema is a simple UDP server written in python, allows you to watch downloaded videos.

🎥 PYnema is a simple UDP server written in python, allows you to watch downloaded videos.

Jan Kupczyk 1 Jan 16, 2022
Whoisss is a website information gatharing Tool.

Whoisss Whoisss is a website information gatharing Tool. You can cse it to collect information about website. Usage apt-get update apt-get upgrade pkg

Md. Nur habib 2 Jan 23, 2022
A script to automatically update the github's proxy IP in hosts file.

updateHostsGithub A script to automatically update the github's proxy IP in hosts file. Now only Mac and Linux are supported. (脚本自动更新本地hosts文件,目前仅支持Ma

2 Jul 06, 2022
BaseSpec is a system that performs a comparative analysis of baseband implementation and the specifications of cellular networks.

BaseSpec is a system that performs a comparative analysis of baseband implementation and the specifications of cellular networks. The key intuition of BaseSpec is that a message decoder in baseband s

SysSec Lab 35 Dec 06, 2022
Raspberry Pi Based Serial Console Server, with PushBullet Notification of IP changes, Automatic VPN termination, custom menu, Power Outlet Control, and a lot more

ConsolePi Acts as a serial Console Server, allowing you to remotely connect to ConsolePi via Telnet/SSH/bluetooth to gain Console Access to devices co

120 Jan 05, 2023
A Cheap Flight Alert program sends you a SMS to notify cheap flights in next 8 months.

Flight Dealer A Cheap Flight Alert program sends you a SMS to notify cheap flights (return trips) in next 6 months. Installing Download the Python 3 i

Aytaç Kaşoğlu 2 Feb 10, 2022
A python tool auto change proxy or ip after dealy time set by user

Auto proxy Ghost This tool auto change proxy or ip after dealy time set by user how to run 1. Install required file ./requirements.sh 2.Enter command

Harsh Tagra 0 Feb 23, 2022
Repo for investigation of timeouts that happens with prolonged training on clients

Flower-timeout Repo for investigation of timeouts that happens with prolonged training on clients. This repository is meant purely for demonstration o

1 Jan 21, 2022