Fake user agent python requests

ScrapeOps Headers API

ScrapeOps Headers API is a free easy to use browser header generating tool that returns a list of optimized fake user-agents or browser headers via a API endpoint.

It can be configured to either to return a list of:

It is recommended that you use the Fake Browser Headers API over the Fake User-Agents API as using a complete set of fake browser headers will make your scrapers more reliable versus if you just used fake user-agents alone.

Fake User-Agents API​

The Fake User-Agents returns a list of optimized fake user-agents that you can use in your web scrapers to bypass simple blocks/bans.

API Endpoint:

 http://headers.scrapeops.io/v1/user-agents?api_key=YOUR_API_KEY 
    "result": [ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.53 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; Windows; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/603.3.8 (KHTML, like Gecko) Version/10.1.2 Safari/603.3.8", "Mozilla/5.0 (Windows NT 10.0; Windows; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.53 Safari/537.36", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15", "Mozilla/5.0 (Windows NT 10.0; Windows; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36", "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.53 Safari/537.36" ] > 

Documentation: Fake User-Agents

Fake Browser Headers API​

The Fake Browser Headers returns a list of optimized fake browser headers that you can use in your web scrapers to avoid blocks/bans and improve the reliability of your scrapers.

API Endpoint:

 http://headers.scrapeops.io/v1/browser-headers?api_key=YOUR_API_KEY 
    "result": [   "upgrade-insecure-requests": "1", "user-agent": "Mozilla/5.0 (Windows NT 10.0; Windows; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36", "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", "sec-ch-ua": "\".Not/A)Brand\";v=\"99\", \"Google Chrome\";v=\"103\", \"Chromium\";v=\"103\"", "sec-ch-ua-mobile": "?0", "sec-ch-ua-platform": "\"Windows\"", "sec-fetch-site": "none", "sec-fetch-mod": "", "sec-fetch-user": "?1", "accept-encoding": "gzip, deflate, br", "accept-language": "bg-BG,bg;q=0.9,en-US;q=0.8,en;q=0.7" >,   "upgrade-insecure-requests": "1", "user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.53 Safari/537.36", "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9", "sec-ch-ua": "\".Not/A)Brand\";v=\"99\", \"Google Chrome\";v=\"103\", \"Chromium\";v=\"103\"", "sec-ch-ua-mobile": "?0", "sec-ch-ua-platform": "\"Linux\"", "sec-fetch-site": "none", "sec-fetch-mod": "", "sec-fetch-user": "?1", "accept-encoding": "gzip, deflate, br", "accept-language": "fr-CH,fr;q=0.9,en-US;q=0.8,en;q=0.7" > ] > 

Why Use Fake User-Agents?​

The first step every developer must take when developing a web scraper that doesn’t get banned or blocked by the website they are trying to scrape is to set a fake user-agent with every request.

Because, by default most HTTP clients clearly identify the library that is being used to send requests to websites server. For example, Python Requests sends the following user-agent with every request.

  'User-Agent': 'python-requests/2.26.0', 

If your web scraper is sending requests using a default user-agent like this, then it will very quickly get blocked.

As a result, you need to configure your web scraper to use fake user-agents with every request.

You could set a user-agent manually like this:

import requests headers="User-Agent": "Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148"> r = requests.get('http://httpbin.org/headers', headers=headers) print(r.json()) 

Or you could use ScrapeOps Fake User-Agent API to randomly select a new fake user-agent for every request.

  import requests from random import randint SCRAPEOPS_API_KEY = 'YOUR_API_KEY'  def get_user_agent_list():  response = requests.get('http://headers.scrapeops.io/v1/user-agents?api_key=' + SCRAPEOPS_API_KEY)  json_response = response.json() return json_response.get('result', [])  def get_random_user_agent(user_agent_list):  random_index = randint(0, len(user_agent_list) - 1) return user_agent_list[random_index]  ## Retrieve User-Agent List From ScrapeOps user_agent_list = get_user_agent_list() url_list = [ 'https://example.com/1', 'https://example.com/2', 'https://example.com/3', ]  for url in url_list:  ## Add Random User-Agent To Headers  headers = 'User-Agent': get_random_user_agent(user_agent_list)>  ## Make Requests  r = requests.get(url=url, headers=headers) print(r.text) 

Why Use Browser Headers vs User-Agents?​

The short answer is that using fake browser headers over just fake user-agents makes your scrapers requests harder to detect, and as a result, your scrapers more reliable.

For simple websites, simply setting an up-to-date user-agent should allow you to scrape a website pretty reliably.

However, a lot of popular websites are increasingly using sophisticated anti-bot technologies to try and prevent developer from scraping data from their websites.

These anti-bot solutions not only look at your requests user-agent when analysing the request, but also the other headers a real browser normally sends.

By using a full set of browser headers you make your requests look more like real user requests, and as a result harder to detect.

The fake browser header API returns the following headers in every header set:

  • Accept
  • Accept-Encoding
  • Accept-Language
  • User-Agent
  • Sec-Ch-Ua
  • Sec-Ch-Ua-Platform
  • Sec-Fetch-Site
  • Sec-Fetch-Mode
  • Sec-Fetch-User

For more information on how anti-bot technologies use header fingerprints to detect scrapers then check out our header optimization guide here.

Источник

Python Requests Fake Headers Integration

The following are two examples of how to integrate the Fake Browser Headers API and the Fake User-Agent API into your Python Request based web scrapers.

Python Requests Fake Browser Headers API Integration​

To integrate the Fake Browser Headers API you should configure your scraper to retrieve a batch of the most up-to-date headers when the scraper starts and then configure your scraper to pick a random header from this list for each request.

Here is an example Python scraper integration:

  import requests from random import randint SCRAPEOPS_API_KEY = 'YOUR_API_KEY'  def get_headers_list():  response = requests.get('http://headers.scrapeops.io/v1/browser-headers?api_key=' + SCRAPEOPS_API_KEY)  json_response = response.json() return json_response.get('result', [])  def get_random_header(header_list):  random_index = randint(0, len(header_list) - 1) return header_list[random_index]  header_list = get_headers_list() url_list = [ 'https://example.com/1', 'https://example.com/2', 'https://example.com/3', ]  for url in url_list:  r = requests.get(url=url, headers=get_random_header(header_list)) print(r.text) 

Python Requests Fake User-Agent API Integration​

To integrate the Fake User-Agent API you should configure your scraper to retrieve a batch of the most up-to-date user-agents when the scraper starts and then configure your scraper to pick a random user-agent from this list for each request.

Here is an example Python scraper integration:

  import requests from random import randint SCRAPEOPS_API_KEY = 'YOUR_API_KEY'  def get_user_agent_list():  response = requests.get('http://headers.scrapeops.io/v1/user-agents?api_key=' + SCRAPEOPS_API_KEY)  json_response = response.json() return json_response.get('result', [])  def get_random_user_agent(user_agent_list):  random_index = randint(0, len(user_agent_list) - 1) return user_agent_list[random_index]  ## Retrieve User-Agent List From ScrapeOps user_agent_list = get_user_agent_list() url_list = [ 'https://example.com/1', 'https://example.com/2', 'https://example.com/3', ]  for url in url_list:  ## Add Random User-Agent To Headers  headers = 'User-Agent': get_random_user_agent(user_agent_list)>  ## Make Requests  r = requests.get(url=url, headers=headers) print(r.text) 

Here the scraper will use a random user-agent for each request.

API Parameters​

The following is a list of API parameters that you can include with your requests to customise the header list response.

Parameter Description
api_key This is a required parameter. You can get your Free API key here.
num_results By default the API returns a list of 10 user-agents, however, you can increase that number by changing the num_results number. Max is 100 headers.

Источник

Читайте также:  Html таблица всех команд
Оцените статью