Python aiohttp session proxy

Содержание

Python aiohttp: How to Use & Rotate Proxies
Using Proxy IPs With Python aiohttp
Proxy Authentication With Python aiohttp
Add Username & Password to Proxy String
Pass Username & Password to Auth Parameter
The 3 Most Common Proxy Formats
Proxy Integration #1: Rotating Through Proxy IP List
Proxy Integration #2: Using Proxy Gateway
Proxy Integration #3: Using Proxy API Endpoint
More Web Scraping Tutorials

Python aiohttp: How to Use & Rotate Proxies

To use proxies with Python aiohttp add a proxy string to the get() or post() method of your request.

  import aiohttp import asyncio  async with aiohttp.ClientSession() as session: async with session.get('http://httpbin.org/get', proxy='http://proxy.example.com:8080') as resp: print(resp.status) print(await resp.text())  asyncio.run(main)

In this guide for The Python Web Scraping Playbook, we will look at how to integrate the 3 most common types of proxies into our Python aiohttp based web scraper.

Although, similar to Python aiohttp library, how you integrate proxies into Python aiohttp based scrapers can be slightly different. Allowing you to spread your requests over multiple IP addresses making it harder for websites to detect & block your web scrapers.

In this guide we will walk you through the 3 most common proxy integration methods and show you how to use them with Python aiohttp:

If you would like to know how to integrate proxies into your Python Requests or Python Scrapy scrapers then check out our Python Requests proxies guide here and Scrapy proxies guide here.

Using Proxy IPs With Python aiohttp

Python aiohttp has more limited proxy support than Python HTTPX for example, but it still has enough functionality for most use cases and it it pretty straight forward to integrate proxies into your aiohttp requests.

To use a proxy you just need to pass it into the your aiohttp.ClientSession() instance using the proxy parameter:

  import aiohttp import asyncio  async with aiohttp.ClientSession() as session: async with session.get('http://httpbin.org/get', proxy='http://proxy.example.com:8080') as resp: print(resp.status) print(await resp.text()) asyncio.run(main)

This method will work for all request methods Python aiohttp supports: GET , POST , PUT , DELETE , PATCH , HEAD .

Proxy Authentication With Python aiohttp

Some proxy IPs require authentication in the form of a username and password to use the proxy.

To add authenticate the proxy you have two options:

Add Username & Password to Proxy String

The first option is to simply add the username and password to the proxy strings.

  import aiohttp import asyncio  async with aiohttp.ClientSession() as session: async with session.get('http://httpbin.org/get', proxy='http://USERNAME:PASSWORD@proxy.example.com:8080') as resp: print(resp.status) print(await resp.text()) asyncio.run(main)

Pass Username & Password to Auth Parameter

Python Requests allows you to authenicate your proxy by adding the username and password to the auth parameter.

  import aiohttp import asyncio  async with aiohttp.ClientSession() as session: async with session.get('http://httpbin.org/get', proxy='http://proxy.example.com:8080', proxy_auth=('USERNAME', 'PASSWORD')) as resp: print(resp.status) print(await resp.text()) asyncio.run(main)

The 3 Most Common Proxy Formats

That covered the basics of integrating a proxy into Python aiohttp, in the next sections we will show you how to integrate Python aiohttp into the 3 most common proxy formats:

A couple years ago, proxy providers would sell you a list of proxy IP addresses and you would configure your scraper to rotate through these IP addresses and use a new one with each request.

However, today more and more proxy providers don’t sell raw lists of proxy IP addresses anymore. Instead providing access to their proxy pools via proxy gateways or proxy API endpoints.

We will look at how to integrate with all 3 proxy formats.

If you are looking to find a good proxy provider then check out our web scraping proxy comparison tool where you can compare the plans of all the major proxy providers.

Proxy Integration #1: Rotating Through Proxy IP List

Here a proxy provider will normally provide you with a list of proxy IP addresses that you will need to configure your scraper to rotate through and select a new IP address for every request.

The proxy list you recieve will look something like this:

  'http://Username:Password@85.237.57.198:20000', 'http://Username:Password@85.237.57.198:21000', 'http://Username:Password@85.237.57.198:22000', 'http://Username:Password@85.237.57.198:23000',

To integrate them into our scrapers we need to configure our code to pick a new proxy from this list everytime we make a request.

In our Python aiohttp scraper we could do it like this:

  import aiohttp import asyncio from random import randint proxy_list = [ 'http://Username:Password@85.237.57.198:20000', 'http://Username:Password@85.237.57.198:21000', 'http://Username:Password@85.237.57.198:22000', 'http://Username:Password@85.237.57.198:23000', ] proxy_index = randint(0, len(proxy_list) - 1)  async with aiohttp.ClientSession() as session: async with session.get('http://httpbin.org/get', proxy=proxy_list[proxy_index]) as resp: print(resp.status) print(await resp.text()) asyncio.run(main)

This is a simplistic example, as when scraping at scale we would also need to build a mechanism to monitor the performance of each individual IP address and remove it from the proxy rotation if it got banned or blocked.

Proxy Integration #2: Using Proxy Gateway

Increasingly, a lot of proxy providers aren’t selling lists of proxy IP addresses anymore. Instead, they give you access to their proxy pools via a proxy gateway.

Here, you only have to integrate a single proxy into your Python aiohttp scraper and the proxy provider will manage the proxy rotation, selection, cleaning, etc. on their end for you.

This is the most comman way to use residential and mobile proxies, and becoming increasingly common when using datacenter proxies too.

Here is an example of how to integrate a BrightData’s residential proxy gateway into our Python aiohttp scraper:

  import aiohttp import asyncio  async with aiohttp.ClientSession() as session: async with session.get('http://httpbin.org/get', proxy='http://USERNAME:PASSWORD@zproxy.lum-superproxy.io:22225') as resp: print(resp.status) print(await resp.text()) asyncio.run(main)

As you can see, it is much easier to integrate than using a proxy list as you don’t have to worry about implementing all the proxy rotation logic.

Proxy Integration #3: Using Proxy API Endpoint

Recently, a lot of proxy providers have started offering smart proxy APIs that take care of managing your proxy infrastructure for you by rotating proxies and headers for you so you can focus on extracting the data you need.

Here you typically, send the URL you want to scrape to their API endpoint and then they will return the HTML response to.

Although every proxy API provider has a slightly different API integration, they are all very similar and are very easy to integrate with.

Here is an example of how to integrate with the ScrapeOps Proxy Manager:

  import aiohttp import asyncio params = 'api_key': 'APIKEY', 'url': 'https://httpbin.org/ip'>  async with aiohttp.ClientSession() as session: async with session.get('http://httpbin.org/get', params=urlencode(params)) as resp: print(resp.status) print(await resp.text()) asyncio.run(main)

Here you simple send the URL you want to scrape to the ScrapeOps API endpoint in the URL query parameter, along with your API key in the api_key query parameter, and ScrapeOps will deal with finding the best proxy for that domain and return the HTML response to you.

When using proxy API endpoints it is very important to encode the URL you want to scrape before sending it to the Proxy API endpoint. As if the URL contains query parameters then the Proxy API might think that those query parameters are for the Proxy API and not the target website.

To encode your URL you just need to use the urlencode(params) function as we’ve done above in the example.

Python aiohttp session proxy

Python aiohttp: How to Use & Rotate Proxies

Using Proxy IPs With Python aiohttp​

Proxy Authentication With Python aiohttp​

Add Username & Password to Proxy String​

Pass Username & Password to Auth Parameter​

The 3 Most Common Proxy Formats​

Proxy Integration #1: Rotating Through Proxy IP List​

Proxy Integration #2: Using Proxy Gateway​

Proxy Integration #3: Using Proxy API Endpoint​

More Web Scraping Tutorials​