Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
A simple, configurable application and set of services to monitor multiple raspberry pi's on a network.

rpi-info-monitor A simple, configurable application and set of services to monitor multiple raspberry pi's on a network. It can be used in a terminal

Kevin Kirchhoff 11 May 22, 2022
D-dos attack GUI tool written in python using tkinter module

ddos D-dos attack GUI tool written in python using tkinter module #to use this tool on android, do the following on termux. *. apt update *. apt upgra

6 Feb 05, 2022
Whoisss is a website information gatharing Tool.

Whoisss Whoisss is a website information gatharing Tool. You can cse it to collect information about website. Usage apt-get update apt-get upgrade pkg

Md. Nur habib 2 Jan 23, 2022
Automatically block traffic on Cloudflare's side based on Nginx Log parsing.

AutoRL This is a PoC of automatically block traffic on Cloudflare's side based on Nginx Log parsing. It will evaluate Nginx access.log and find potent

Nova Kwok 62 Dec 28, 2022
Ip-Seeker - See Details With Public Ip && Find Web Ip Addresses

IP SEEKER See Details With Public Ip && Find Web Ip Addresses Tool By Heshan

M.D.Heshan Sankalpa 1 Jan 02, 2022
Easily share folders between VMs.

This package aims to solve the problem of inter-VM file sharing (rather than manual copying) by allowing a VM to mount folders from any other VM's file system (or mounted network shares).

Rudd-O 12 Oct 17, 2022
Multipurpose Growtopia Server tools, can be used for newbie to learn things.

Multipurpose Growtopia Server tools, can be used for newbie to learn things.

FelixF 3 Dec 01, 2021
BlueHawk is an HTTP/1.1 compliant web server developed in python

This project is done as a part of Computer Networks course. It aims at the implementation of the HTTP/1.1 Protocol based on RFC 2616 and includes the basic HTTP methods of GET, POST, PUT, DELETE and

2 Nov 11, 2022
Dos attack a Bluetooth connection!

Bluetooth Denial of service Script made for attacking Bluetooth Devices By Samrat Katwal. Warning This project was created only for fun purposes and p

Samrat 1 Oct 29, 2021
API to establish connection between server and client

Socket-programming API to establish connection between server and client, socket.socket() creates a socket object that supports the context manager ty

Muziwandile Nkomo 1 Oct 30, 2021
Simple P2P application for sending files over open and forwarded network ports.

FileShareV2 A major overhaul to the V1 (now deprecated) FileShare application. V2 brings major improvements in both UI and performance. V2 is now base

Michael Wang 1 Nov 23, 2021
TunnelProxy 是一个本地隧道代理,可以从fofa爬取免费的socks代理,然后构建代理池,如果一个代理失效,会自动切换

TunnelProxy 是一个本地隧道代理,可以从fofa爬取免费的socks代理,然后构建代理池,如果一个代理失效,会自动切换。 应用场景 渗透测试需要访问某些国内网站(比如edu的),想要隐藏自己,但是国外代理不能访问,也没有稳定的可用代理的时候。 之后,可能我会增加国外代理,实现白嫖科学上网。

urdr-gungnir 45 Nov 17, 2022
基于多线程快速端口扫描脚本,支持目标批量导入、结果导出。

JWS_portscan 基于多线程快速端口扫描脚本,支持目标批量导入、结果导出。如果扫描公网资产,为了提升扫描的精准性,建议放到服务器运行。 用法 依赖安装:pip3 install -r requriement.txt 支持参数:python3 JWS_portscan.py --help 脚本

jammny 5 Apr 12, 2022
A Network tool kit for scanning active IP addresses and open ports

Network scanner A small project that I wrote on the fly for (IT351) Computer Networks University Course to identify and label the devices in my networ

Mohamed Abdelrahman 10 Nov 07, 2022
Rufus is a Dos tool written in Python3.

🦎 Rufus 🦎 Rufus is a simple but powerful Denial of Service tool written in Python3. The type of the Dos attack is TCP Flood, the power of the attack

Billy 88 Dec 20, 2022
Find information about an IP address, such as its location, ISP, hostname, region, country, and city.

Find information about an IP address, such as its location, ISP, hostname, region, country, and city. An IP address can be traced, tracked, and located.

Sachit Yadav 2 Jul 09, 2022
Python module to interface with Tuya WiFi smart devices

TinyTuya Python module to interface with Tuya WiFi smart devices Description This python module controls and monitors Tuya compatible WiFi Smart Devic

Jason Cox 365 Dec 26, 2022
Cobalt Strike C2 Reverse proxy that fends off Blue Teams, AVs, EDRs, scanners through packet inspection and malleable profile correlation

Cobalt Strike C2 Reverse proxy that fends off Blue Teams, AVs, EDRs, scanners through packet inspection and malleable profile correlation

Mariusz B. 715 Dec 25, 2022
Python script to stop qBittorrent from torrenting without VPN for users with static IP.

Python script to stop qBittorrent from torrenting without VPN for users with static IP.

voidoak_ 1 Oct 25, 2021
A fire and forget command-line tool to allow for easy transitions of VPN connections between a pool of AWS machines.

VPN Swapper A fire and forget command-line tool to allow for easy transitions of VPN connections between a pool of AWS machines. Dependencies poetry -

Workday 5 Jul 07, 2022