Через прокси сервер python

Содержание

How to Use a Proxy with Python Requests?
Prerequisites & Installation
How to use a Proxy with Python Requests
Requests Methods ✍️
Proxy Authentication 👩‍💻
Proxy Sessions 🕒
Environmental Variables 🌱
Reading Responses 📖
Rotating Proxies with Requests
How to Rotate IPs with Requests
Use ScrapingBee’s Proxy Mode
Conclusion
Resources
Using a Proxy Server with Python requests
How to Use Proxy Servers with Python requests
How to Authenticate when Using a Proxy Server with Python requests
How to Use Sessions with a Proxy Server with Python requests
Conclusion
Additional Resources

How to Use a Proxy with Python Requests?

In this article, you will examine how to use the Python Requests library behind a proxy server. Developers use proxies for anonymity, security, and sometimes will even use more than one to prevent websites from banning their IP addresses. Proxies also carry several other benefits such as bypassing filters and censorship. Feel free to learn more about rotating proxies before continuing, but let’s get started!

Prerequisites & Installation

This article is intended for those who would like to scrape behind a proxy in Python. To get the most of the material, it is beneficial to:

✅ Have experience with Python 3 🐍.

✅ Python 3 installed on your local machine.

Check if the python-requests pacakges is installed by opening the terminal and typing:

pip freeze will display all your current python packages and their versions, so go ahead and check if it is present. If not, install it by running:

How to use a Proxy with Python Requests

To use a proxy in Python, first import the requests package.
Next create a proxies dictionary that defines the HTTP and HTTPS connections. This variable should be a dictionary that maps a protocol to the proxy URL. Additionally, make a url variable set to the webpage you’re scraping from.

Notice in the example below, the dictionary defines the proxy URL for two separate protocols: HTTP and HTTPS. Each connection maps to an individual URL and port, but this does not mean that the two cannot be the same

Lastly, create a response variable that uses any of the requests methods. The method will take in two arguments: the URL variable you created and the dictionary defined.

You may use the same syntax for different api calls, but regardless of the call you’re making, you need to specify the protocol.

import requests proxies = < 'http': 'http://proxy.example.com:8080', 'https': 'http://secureproxy.example.com:8090', > url = 'http://mywebsite.com/example' response = requests.post(url, proxies=proxies)

Requests Methods ✍️

response = requests.get(url) response = requests.post(url, data="a": 1, "b": 2>) response = requests.put(url, data=put_body) response = requests.delete(url) response = requests.patch(url, data=patch_update) response = requests.head(url) response = requests.options(url)

Proxy Authentication 👩‍💻

If you need to add authentication, you can rewrite your code using the following syntax:

response = requests.get(url, auth=('user', 'pass'))

Proxy Sessions 🕒

You may also find yourself wanting to scrape from websites that utilize sessions, in this case, you would have to create a session object. You can do this by first creating a session variable and setting it to the requests Session() method. Then similar to before, you would send your session proxies through the requests method, but this time only passing in the url as the argument.

import requests session = requests.Session() session.proxies = < 'http': 'http://10.10.10.10:8000', 'https': 'http://10.10.10.10:8000', > url = 'http://mywebsite.com/example' response = session.get(url)

Environmental Variables 🌱

You may find yourself reusing the same proxy for each request, so feel free to DRY up your code by setting some environmental variables:

export HTTP_PROXY='http://10.10.10.10:8000' export HTTPS_PROXY='http://10.10.10.10:1212'

If you decide to set environmental variables, there’s no longer a need to set proxies in your code. As soon as you make a request, an api call will be made!

Reading Responses 📖

If you would like to read your data:

response = requests.get(url) text_resp = response.text

JSON: for JSON-formatted responses the requests package provides a built-in method.

response = requests.get(url) json_resp = response.json()

Rotating Proxies with Requests

Remember how we said some developers use more than one proxy? Well, now you can too!

Anytime you find yourself scraping from a webpage repeatedly, it’s good practice to use more than one proxy, because there’s a good chance your scraper will get blocked, meaning your IP address will get banned. The scraping cancel culture is real! So, to avoid being canceled, it’s best to utilize rotating proxies. A rotating proxy is a proxy server that assigns a new IP address from the proxy pool for each connection.

To rotate IP addresses, you first need to have a pool of IPs available. You can use free proxies found on the internet or commercial solutions. In most cases, if your service relies on scraped data a free proxy will most likely not be enough.

How to Rotate IPs with Requests

In order to start rotating your IP addresses, you need a list of free proxies. In the case free proxies do fit your scrapping needs, here you can find a list of free proxies. Today you’ll be writing a script that chooses and rotates through proxies.

First import the requests , BeautifulSoup , and choice libraries.
Next define a method get_proxy() that will be responsible for retrieving IP addresses for you to use. In this method you will define your url as whatever proxy list resources you choose to use. After sending a request api call, convert the response into a Beautiful Soup object to make extraction easier. Use the html5lib parser library to parse the website’s HTML, as you would for a browser. Create a proxy variable that uses choice to randomly choose an IP address from the list of proxies generated by soup . Within the map function, you can use a lambda function to convert the HTML element into text for both retrieved IP addresses and port numbers.
Create a proxy_request method that takes in 3 arguments: the request_type , the url , and **kwargs . Inside this method, define your proxy dictionary as the proxy returned from the get_proxy method. Similiar to before, you’ll use the requests , passing in your arguments.

import requests ip_addresses = [ "mysuperproxy.com:5000", "mysuperproxy.com:5001", "mysuperproxy.com:5100", "mysuperproxy.com:5010", "mysuperproxy.com:5050", "mysuperproxy.com:8080", "mysuperproxy.com:8001", "mysuperproxy.com:8000", "mysuperproxy.com:8050" ] def proxy_request(request_type, url, **kwargs): while True: try: proxy = random.randint(0, len(ip_addresses) - 1) proxies = "http": ip_addresses(proxy), "https": ip_addresses(proxy)> response = requests.get(request_type, url, proxies=proxies, timeout=5, **kwargs) print(f"Proxy currently being used: proxy['https']>") break except: print("Error, looking for another proxy") return response

You can now scrape and rotate all at once!🌀

Use ScrapingBee’s Proxy Mode

Believe it or not, there is another free* alternative that makes scraping behind a proxy even easier! That alternative is ScrapingBee’s Proxy Mode, a proxy front-end to the API. 🐝

Make a free account on ScrapingBee. Once logged on, you can see your account information, including your API Key. *And not to mention 1000 free API credits! 🍯😍
Run the following script, passing your api_key as the proxy username and the API parameters as the proxy password. You can skip the proxy password if the default API parameters suit your needs.:

# Install the Python Requests library: # pip install requests import requests def send_request(): proxies = < "http": "http://YOUR_SCRAPINGBEE_API_KEY:render_js=False&premium_proxy=True@proxy.scrapingbee.com:8886", "https": "https://YOUR_SCRAPINGBEE_API_KEY:render_js=False&premium_proxy=True@proxy.scrapingbee.com:8887" > response = requests.get( url="http://httpbin.org/headers?json", proxies=proxies, verify=False ) print('Response HTTP Status Code: ', response.status_code) print('Response HTTP Response Body: ', response.content) send_request()

Remember that if you want to use proxy mode, your code must be configured not to verify SSL certificates. In this case, it would be verify=False since you are working with Python Requests.

That’s all there is to sending successful HTTP requests! When you use ScrapingBee’s Proxy Mode, you no longer need to deal with proxy rotation manually, we take care of everything for you. 😎

Conclusion

While it might be tempting to start scraping right away with your fancy new proxies, there are still a few key things you should know. For starters, not all proxies are the same. There are actually different types, with the three main being: transparent proxies, anonymous proxies, and elite proxies.

In most cases, you will use an elite proxy, whether paid or free, since they are the best solution to avoid being detected. If using a proxy for the sole purpose of privacy, anonymous proxies may be worth your while. It’s not advised to use a transparent proxy unless there is a particular reason for doing so, since transparent proxies reveal your real IP address and that you are using a proxy server.

Now that we have that all cleared up, it’s time to start web scraping with a proxy in Python. So, get on out there and make all the requests you can dream up!💭

Resources

Maxine is a software engineer and passionate technical writer, who enjoys spending her free time incorporating her knowledge of environmental technologies into web development.

Источник

Using a Proxy Server with Python requests

In this tutorial, you’ll learn how to use the Python requests library to make HTTP requests behind a proxy server. This has numerous benefits, including staying anonymous and secure and preventing having an IP address blocked. You’ll learn how to set HTTP, HTTPS, and FTP proxies.

By the end of this tutorial, you’ll have learned:

How to set proxy servers when using Python requests
How to use proxy servers with sessions when using Python requests
How to authenticate requests when working with proxy servers

How to Use Proxy Servers with Python requests

In order to use proxies in the requests Python library, you need to create a dictionary that defines the HTTP, HTTPS, and FTP connections. This allows each connection to map to an individual URL and port. This process is the same for any request being made, including GET requests and POST requests.

It’s important to note that while the connections map to individual URLs and ports, they can actually point to the same URL and port.

Let’s see how you can define a set of proxies for the Python requests library:

# Setting up Proxies with the requests Library import requests proxy_servers = < 'http': 'http://proxy.sample.com:8080', 'https': 'http://secureproxy.sample.com:8080', >response = requests.get('sample.abc', proxies=proxy_servers)

Let’s break down what we did above:

We imported the requests library
We defined a dictionary, proxy_servers , which mapped URLs and ports to HTTP and HTTPS connections
We then made a GET request and passed in our dictionary into the proxies= argument

How to Authenticate when Using a Proxy Server with Python requests

In order to add authentication to a request made with Python requests , you can follow normal request authentication methods. This allows you to use different types of authentication methods, including basic authentication.

Let’s take a look at how you can use basic HTTP authentication with proxy servers when making a request:

# Authenticating Requests with Proxy Servers import requests proxy_servers = < 'http': 'http://proxy.sample.com:8080', 'https': 'http://secureproxy.sample.com:8080', >auth = ('username', 'password') response="requests.get('sample.abc', proxies hl-hl">auth=auth)

We can see in the GET request we made above, that we passed in authentication information with the auth= parameter.

How to Use Sessions with a Proxy Server with Python requests

In some cases, you’ll want to use sessions when accessing data via an HTTP request. In these cases, using proxies works a little differently. We first need to instantiate a Session object and then assign our proxies using the .proxies attribute.

Let’s see how this can be done:

# Using Proxy Servers with Python requests Sessions import requests proxy_servers = < 'http': 'http://proxy.sample.com:8080', 'https': 'http://secureproxy.sample.com:8080', >s = requests.Session() s.proxies = proxy_servers response = s.get('sample.abc')

Let’s break down what we did in the code above:

We imported the requests library
We defined our proxy servers dictionary as before
We then created a Session object, s
We assigned proxies using the .proxies attribute and assigned our dictionary
We then performed a GET request which automatically applied our proxies

Conclusion

In this tutorial, you learned how to use proxy servers when making HTTP requests using the Python requests library. Using proxies can help make your requests more secure or anonymous, as well as prevent your IP from being blocked when scraping websites.

You first learned how to use proxies when making requests using the requests library. Then, you learned how to use authentication with proxy servers. Finally, you learned how to use requests Sessions to handle proxy servers.

Additional Resources

To learn more about related topics, check out the tutorials below:

Источник