Google maps parser python

How to scrape google maps using python

I am trying to scrape the number of reviews of a place from google maps using python. For example the restaurant Pike’s Landing (see google maps URL below) has 162 reviews. I want to pull this number in python. URL: https://www.google.com/maps?cid=15423079754231040967 I am not vert well versed with HTML, but from some basic examples on the internet I wrote the following code, but what I get is a black variable after running this code. If you could let me know what am I dong wrong in this that would be much appreciated.

from urllib.request import urlopen from bs4 import BeautifulSoup quote_page ='https://www.google.com/maps?cid=15423079754231040967' page = urlopen(quote_page) soup = BeautifulSoup(page, 'html.parser') price_box = soup.find_all('button',attrs=) print(price_box.text) 

I am not trying to scrape a full map, just a specific number on the pane that is on the lest side of the map. Also the google maps api does not returns the number of reviews as of now.

it can be added by JavaScript and urllib+BeautifulSoup can’t run JavaScript. You may use Selenium to control web browser which will load page and run JavaScript. Or you can try to find this info in some JavaScript code — directly in HTML or in external files *.js . JavaScript can also uses AJAX/XHR to load data from different url and you can try to use DevTool in Chrome/Firefox to find this url. Mostly XHR gets data as JSON string which you can easily convert to python dictionary using module json

BTW: Google uses JavaScript to add elements on page but if Google sees that client doesn’t use JavaScript then it can send page which doesn’t need JavaScript but then elements mostly are in different tags with different classes. So you can turn off JavaScript in browser and load map again to see what BeautifulSoup gets from Google. Or you can save in file data from urlopen() and open this file in web browser or text editor.

I am not at very familiar with selenium or Java script, But I can definitly look into that. Also wanted to conform if you are suggesting that I can scrape google maps using the simple approach I used ? I was hoping to make minor changes to the code snippet I posted above to accomplish my goal.

3 Answers 3

It’s hard to do it in pure Python and without an API, here’s what I ended with (note that I added &hl=en at the end of the url, to get English results and not in my language):

import re import requests from ast import literal_eval urls = [ 'https://www.google.com/maps?cid=15423079754231040967&hl=en', 'https://www.google.com/maps?cid=16168151796978303235&hl=en'] for url in urls: for g in re.findall(r'\[\\"http.*?\d+ reviews?.*?]', requests.get(url).text): data = literal_eval(g.replace('null', 'None').replace('\\"', '"')) print(bytes(data[0], 'utf-8').decode('unicode_escape')) print(data[1]) 
http://www.google.com/search?q=Pike's+Landing,+4438+Airport+Way,+Fairbanks,+AK+99709,+USA&ludocid=15423079754231040967#lrd=0x51325b1733fa71bf:0xd609c9524d75cbc7,1 469 reviews http://www.google.com/search?q=Sequoia+TreeScape,+Newmarket,+ON+L3Y+8R5,+Canada&ludocid=16168151796978303235#lrd=0x882ad2157062b6c3:0xe060d065957c4103,1 42 reviews 

You need to view the source code of the page and parse window.APP_INITIALIZATION_STATE variable block using a regular expression, there you’ll find all needed data.

Читайте также:  Change the Background Color with JavaScript

Alternatively, you can use Google Maps Reviews API from SerpApi.

"place_results": < "title": "Pike's Landing", "data_id": "0x51325b1733fa71bf:0xd609c9524d75cbc7", "reviews_link": "https://serpapi.com/search.json?engine=google_maps_reviews&hl=en&place_id=0x51325b1733fa71bf%3A0xd609c9524d75cbc7", "gps_coordinates": < "latitude": 64.8299557, "longitude": -147.8488774 >, "place_id_search": "https://serpapi.com/search.json?data=%214m5%213m4%211s0x51325b1733fa71bf%3A0xd609c9524d75cbc7%218m2%213d64.8299557%214d-147.8488774&engine=google_maps&google_domain=google.com&hl=en&type=place", "thumbnail": "https://lh5.googleusercontent.com/p/AF1QipNtwheOCQ97QFrUNIwKYUoAPiV81rpiW5cIiQco=w152-h86-k-no", "rating": 3.9, "reviews": 839, "price": "$$", "type": [ "American restaurant" ], "description": "Burgers, seafood, steak & river views. Pub fare alongside steak & seafood, served in a dining room with river views & a waterfront patio.", "service_options": < "dine_in": true, "curbside_pickup": true, "delivery": false >> 
import os from serpapi import GoogleSearch params = < "engine": "google_maps", "type": "search", "q": "pike's landing", "ll": "@40.7455096,-74.0083012,14z", "google_domain": "google.com", "api_key": os.getenv("API_KEY"), >search = GoogleSearch(params) results = search.get_dict() reviews = results["place_results"]["reviews"] print(reviews) 

What do you mean it’s not Python? from X import X should tell you right away that it’s a Python script. If you were talking about SerpApi, the above code example uses Python package also.

Scraping Google Maps without a browser or proxies will lead to blocking after a few successful requests. Therefore, the main problem of scraping Google is dealing with cookies and ReCaptcha.

This is a good post where you can see an example of using selenium in python for the same purpose. The general idea you start a browser and simulate what a user does on the website.

Another way will be using some reliable 3rd party service that will do all job for you and return you the results. For example, you can try Outscraper’s Reviews service with a free tier.

from outscraper import ApiClient api_client = ApiClient(api_key='SECRET_API_KEY') # Get reviews of the specific place by id result = api_client.google_maps_reviews('ChIJrc9T9fpYwokRdvjYRHT8nI4', reviewsLimit=20, language='en') # Get reviews for places found by search query result = api_client.google_maps_reviews('Memphis Seoul brooklyn usa', reviewsLimit=20, limit=500, language='en') # Get only new reviews during last 24 hours from datetime import datetime, timedelta yesterday_timestamp = int((datetime.now() - timedelta(1)).timestamp()) result = api_client.google_maps_reviews( 'ChIJrc9T9fpYwokRdvjYRHT8nI4', sort='newest', cutoff=yesterday_timestamp, reviewsLimit=100, language='en') 

Disclaimer, I work for Outscraper.

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

License

walidsa3d/gmaps-url-parser

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Читайте также:  Destructor in python class

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

$ pip install gmaps-url-parser 
$ git clone $ cd gmaps-url-parser $ python setup.py install 
>>> from gmaps_url_parser import parse >>> url = "https://www.google.com/maps/place/Bou+Saada,+Algeria/@35.2131065,4.1479717,10106m/data=!3m1!1e3!4m2!3m1!1s0x128bb3c644644ec3:0x7f746dd09dad9a9f" >>> parse(url) 'latitude': 35.2131065, 'zoom_level': '10106', 'place': 'Bou Saada, Algeria', 'maptype': 'earth', 'longitude': 4.1479717>

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

A python Script to scrape data from google maps.

dhanraj6/Google-Maps-Scraper

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

A python Script to scrape data from google maps.

Step by step guidence to run this file in my medium post click here.

  1. Download the chromedriver from here if you dont have it.
  2. Add the path of chromedriver in above .py file.
  3. Add the link of google map place whose data you want to scrape in above .py file

Google Maps UI changes frequently if you get any errors on running just replace older id’s with new id’s for clickable items

Buy Me A Coffee

About

A python Script to scrape data from google maps.

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Google Maps reviews scraping

License

gaspa93/googlemaps-scraper

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Читайте также:  Bufferedreader и bufferedwriter java

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Scraper of Google Maps reviews. The code allows to extract the most recent reviews starting from the url of a specific Point Of Interest (POI) in Google Maps. An additional extension helps to monitor and incrementally store the reviews in a MongoDB instance.

Follow these steps to use the scraper:

  • Download Chromedrive from here.
  • Install Python packages from requirements file, either using pip, conda or virtualenv:
 conda create --name scraping python=3.6 --file requirements.txt 

Note: Python >= 3.6 is required.

The scraper.py script needs two main parameters as input:

  • —i : input file name, containing a list of urls that point to Google Maps place reviews (default: urls.txt)
  • —N : number of reviews to retrieve, starting from the most recent (default: 100)

generates a csv file containing last 50 reviews of places present in urls.txt

In current implementation, the CSV file is handled as an external function, so if you want to change path and/or name of output file, you need to modify that function.

Additionally, other parameters can be provided:

  • —place : boolean value that allows to scrape POI metadata instead of reviews (default: false)
  • —debug : boolean value that allows to run the browser using the graphical interface (default: false)
  • —source : boolean value that allows to store source URL as additional field in CSV (default: false)
  • —sort-by : string value among most_relevant, newest, highest_rating or lowest_rating (default: newest), developed by @quaesito and that allows to change sorting behavior of reviews

For a basic description of logic and approach about this software development, have a look at the Medium post

The monitor.py script can be used to have an incremental scraper and override the limitation about the number of reviews that can be retrieved. The only additional requirement is to install MongoDB on your laptop: you can find a detailed guide on the official site

The script takes two input:

  • —i : same as monitor.py script
  • —from-date : string date in the format YYYY-MM-DD, gives the minimum date that the scraper tries to obtain

The main idea is to periodically run the script to obtain latest reviews: the scraper stores them in MongoDB up to get either the latest review of previous run or the day indicated in the input parameter.

Take a look to this Medium post to have more details about the idea behind this feature.

Url must be provided as expected, you can check the example file urls.txt to have an idea of what is a correct url. If you want to generate the correct url:

  1. Go to Google Maps and look for a specific place;
  2. Click on the number of reviews in the parenthesis;
  3. Save the url that is generated from previous interaction.

About

Google Maps reviews scraping

Источник

Оцените статью