Python request get proxies

How to Use a Proxy with Python Requests?

In this article, you will examine how to use the Python Requests library behind a proxy server. Developers use proxies for anonymity, security, and sometimes will even use more than one to prevent websites from banning their IP addresses. Proxies also carry several other benefits such as bypassing filters and censorship. Feel free to learn more about rotating proxies before continuing, but let’s get started!

Prerequisites & Installation

This article is intended for those who would like to scrape behind a proxy in Python. To get the most of the material, it is beneficial to:

✅ Have experience with Python 3 🐍.

✅ Python 3 installed on your local machine.

Check if the python-requests pacakges is installed by opening the terminal and typing:

pip freeze will display all your current python packages and their versions, so go ahead and check if it is present. If not, install it by running:

How to use a Proxy with Python Requests

  1. To use a proxy in Python, first import the requests package.
  2. Next create a proxies dictionary that defines the HTTP and HTTPS connections. This variable should be a dictionary that maps a protocol to the proxy URL. Additionally, make a url variable set to the webpage you’re scraping from.

Notice in the example below, the dictionary defines the proxy URL for two separate protocols: HTTP and HTTPS. Each connection maps to an individual URL and port, but this does not mean that the two cannot be the same

  1. Lastly, create a response variable that uses any of the requests methods. The method will take in two arguments: the URL variable you created and the dictionary defined.

You may use the same syntax for different api calls, but regardless of the call you’re making, you need to specify the protocol.

import requests proxies = < 'http': 'http://proxy.example.com:8080', 'https': 'http://secureproxy.example.com:8090', > url = 'http://mywebsite.com/example' response = requests.post(url, proxies=proxies) 

Requests Methods ✍️

response = requests.get(url) response = requests.post(url, data="a": 1, "b": 2>) response = requests.put(url, data=put_body) response = requests.delete(url) response = requests.patch(url, data=patch_update) response = requests.head(url) response = requests.options(url) 

Proxy Authentication 👩‍💻

If you need to add authentication, you can rewrite your code using the following syntax:

response = requests.get(url, auth=('user', 'pass')) 

Proxy Sessions 🕒

You may also find yourself wanting to scrape from websites that utilize sessions, in this case, you would have to create a session object. You can do this by first creating a session variable and setting it to the requests Session() method. Then similar to before, you would send your session proxies through the requests method, but this time only passing in the url as the argument.

import requests session = requests.Session() session.proxies = < 'http': 'http://10.10.10.10:8000', 'https': 'http://10.10.10.10:8000', > url = 'http://mywebsite.com/example' response = session.get(url) 

Environmental Variables 🌱

You may find yourself reusing the same proxy for each request, so feel free to DRY up your code by setting some environmental variables:

export HTTP_PROXY='http://10.10.10.10:8000' export HTTPS_PROXY='http://10.10.10.10:1212' 

If you decide to set environmental variables, there’s no longer a need to set proxies in your code. As soon as you make a request, an api call will be made!

Читайте также:  Меняем фон сайта с помощью HTML - Нубекс

Reading Responses 📖

If you would like to read your data:

response = requests.get(url) text_resp = response.text 

JSON: for JSON-formatted responses the requests package provides a built-in method.

response = requests.get(url) json_resp = response.json() 

Rotating Proxies with Requests

Remember how we said some developers use more than one proxy? Well, now you can too!

Anytime you find yourself scraping from a webpage repeatedly, it’s good practice to use more than one proxy, because there’s a good chance your scraper will get blocked, meaning your IP address will get banned. The scraping cancel culture is real! So, to avoid being canceled, it’s best to utilize rotating proxies. A rotating proxy is a proxy server that assigns a new IP address from the proxy pool for each connection.

To rotate IP addresses, you first need to have a pool of IPs available. You can use free proxies found on the internet or commercial solutions. In most cases, if your service relies on scraped data a free proxy will most likely not be enough.

How to Rotate IPs with Requests

In order to start rotating your IP addresses, you need a list of free proxies. In the case free proxies do fit your scrapping needs, here you can find a list of free proxies. Today you’ll be writing a script that chooses and rotates through proxies.

  1. First import the requests , BeautifulSoup , and choice libraries.
  2. Next define a method get_proxy() that will be responsible for retrieving IP addresses for you to use. In this method you will define your url as whatever proxy list resources you choose to use. After sending a request api call, convert the response into a Beautiful Soup object to make extraction easier. Use the html5lib parser library to parse the website’s HTML, as you would for a browser. Create a proxy variable that uses choice to randomly choose an IP address from the list of proxies generated by soup . Within the map function, you can use a lambda function to convert the HTML element into text for both retrieved IP addresses and port numbers.
  3. Create a proxy_request method that takes in 3 arguments: the request_type , the url , and **kwargs . Inside this method, define your proxy dictionary as the proxy returned from the get_proxy method. Similiar to before, you’ll use the requests , passing in your arguments.
import requests ip_addresses = [ "mysuperproxy.com:5000", "mysuperproxy.com:5001", "mysuperproxy.com:5100", "mysuperproxy.com:5010", "mysuperproxy.com:5050", "mysuperproxy.com:8080", "mysuperproxy.com:8001", "mysuperproxy.com:8000", "mysuperproxy.com:8050" ] def proxy_request(request_type, url, **kwargs): while True: try: proxy = random.randint(0, len(ip_addresses) - 1) proxies = "http": ip_addresses(proxy), "https": ip_addresses(proxy)> response = requests.get(request_type, url, proxies=proxies, timeout=5, **kwargs) print(f"Proxy currently being used: proxy['https']>") break except: print("Error, looking for another proxy") return response 

You can now scrape and rotate all at once!🌀

Use ScrapingBee’s Proxy Mode

Believe it or not, there is another free* alternative that makes scraping behind a proxy even easier! That alternative is ScrapingBee’s Proxy Mode, a proxy front-end to the API. 🐝

  1. Make a free account on ScrapingBee. Once logged on, you can see your account information, including your API Key. *And not to mention 1000 free API credits! 🍯😍
  2. Run the following script, passing your api_key as the proxy username and the API parameters as the proxy password. You can skip the proxy password if the default API parameters suit your needs.:
# Install the Python Requests library: # pip install requests import requests def send_request(): proxies = < "http": "http://YOUR_SCRAPINGBEE_API_KEY:render_js=False&premium_proxy=True@proxy.scrapingbee.com:8886", "https": "https://YOUR_SCRAPINGBEE_API_KEY:render_js=False&premium_proxy=True@proxy.scrapingbee.com:8887" > response = requests.get( url="http://httpbin.org/headers?json", proxies=proxies, verify=False ) print('Response HTTP Status Code: ', response.status_code) print('Response HTTP Response Body: ', response.content) send_request() 

Remember that if you want to use proxy mode, your code must be configured not to verify SSL certificates. In this case, it would be verify=False since you are working with Python Requests.

That’s all there is to sending successful HTTP requests! When you use ScrapingBee’s Proxy Mode, you no longer need to deal with proxy rotation manually, we take care of everything for you. 😎

Conclusion

While it might be tempting to start scraping right away with your fancy new proxies, there are still a few key things you should know. For starters, not all proxies are the same. There are actually different types, with the three main being: transparent proxies, anonymous proxies, and elite proxies.

In most cases, you will use an elite proxy, whether paid or free, since they are the best solution to avoid being detected. If using a proxy for the sole purpose of privacy, anonymous proxies may be worth your while. It’s not advised to use a transparent proxy unless there is a particular reason for doing so, since transparent proxies reveal your real IP address and that you are using a proxy server.

Now that we have that all cleared up, it’s time to start web scraping with a proxy in Python. So, get on out there and make all the requests you can dream up!💭

Resources

image description

Maxine is a software engineer and passionate technical writer, who enjoys spending her free time incorporating her knowledge of environmental technologies into web development.

Источник

Оцените статью