Python selenium stop loading page

Содержание

Stop infinite page load in selenium webdriver — python
1 Answer 1
Don’t wait for a page to load using Selenium in Python
3 Answers 3
Update with selenium4 (July 7, 2024)
page_load_strategy
Selenium python: How to stop page loading when the head/title gets loaded?
2 Answers 2
stop page loading with Selenium Webdriver

Stop infinite page load in selenium webdriver — python

I am loading a page using selenium web driver.But the page is loading infinitely. I tried to catch the exception and simulate the esc key action but that didn’t helped.Due to some constraints I can use only Firefox[I have seen the chrome add on solution]. Once I hit the page I am not getting the control back. I set my Firefox profile as

 firefoxProfile = FirefoxProfile() firefoxProfile.set_preference('permissions.default.stylesheet', 2) firefoxProfile.set_preference('permissions.default.image', 2) firefoxProfile.set_preference('dom.ipc.plugins.enabled.libflashplayer.so','false') firefoxProfile.set_preference("http.response.timeout", 10) firefoxProfile.set_preference("dom.max_script_run_time", 10)

 try: driver.set_page_load_timeout(10) driver.get('http://www.example.com' except Exception print 'time out' driver.send_keys(Keys.CONTROL +'Escape')

Okay, I understand, what you would not set the profile preferences at all? Is it still loading forever? Also, have you tried to wait 30+ seconds?

It is loading correctly if i didn’t set any Firefox profile but the issue is the page is having more than 100 big size images.This makes the loading process really slow .So I though to disable images .

Hi, Thanks for your suggestion. However it didn’t helped to stop the page load. However Instead trying to stop the page load if i try to apply the x-path to extract content in exception block i can get the content which i want,with even the page is still loading .So I am applying the x-paths to extract the content and close that window.

1 Answer 1

I see a couple of typos in your try/except block, so let’s correct those really quickly.

try: driver.set_page_load_timeout(10) driver.get('http://www.example.com') except Exception: print 'time out' driver.send_keys(Keys.CONTROL +'Escape')

I have been working with Selenium and Python for a while now (also using Firefox webdriver). Also, I’m assuming you’re using Python, just from the syntax of your code.

Anyways, your Firefox profile should help resolve the issue, but it doesn’t look like you’re actually applying it to the driver instance.

Try something along these lines:

from selenium import webdriver # import webdriver to create FirefoxProfile firefoxProfile = webdriver.FirefoxProfile() firefoxProfile.set_preference('permissions.default.stylesheet', 2) firefoxProfile.set_preference('permissions.default.image', 2) firefoxProfile.set_preference('dom.ipc.plugins.enabled.libflashplayer.so','false') firefoxProfile.set_preference("http.response.timeout", 10) firefoxProfile.set_preference("dom.max_script_run_time", 10) # now create browser instance and APPLY the FirefoxProfile driver = webdriver.Firefox(firefox_profile=firefoxProfile)

This works for me, using Python 2.7 and Selenium 2.46.

Let me know how it goes, and good luck!

Источник

Don’t wait for a page to load using Selenium in Python

How do I make selenium click on elements and scrape data before the page has fully loaded? My internet connection is quite terrible so it sometimes takes forever to load the page entirely, is there anyway around this?

3 Answers 3

Update with selenium4 (July 7, 2024)

page_load_strategy

page_load_strategy is now an attribute. So the minimal code block to configure page_load_strategy with Selenium v 4.6 and above is as follows:

from selenium import webdriver from selenium.webdriver.chrome.options import Options options = Options() # options.page_load_strategy = 'none' options.page_load_strategy = 'eager' # options.page_load_strategy = 'normal' driver = webdriver.Chrome(options=options) driver.get("https://google.com")

ChromeDriver 77.0 (which supports Chrome version 77) now supports eager as pageLoadStrategy.

Resolved issue 1902: Support eager page load strategy [Pri-2]

As you question mentions of click on elements and scrape data before the page has fully loaded in this case we can take help of an attribute pageLoadStrategy . When Selenium loads a page/url by default it follows a default configuration with pageLoadStrategy set to normal . Selenium can start executing the next line of code from different Document readiness state . Currently Selenium supports 3 different Document readiness state which we can configure through the pageLoadStrategy as follows:

Here is the code block to configure the pageLoadStrategy :

from selenium import webdriver from selenium.webdriver.common.desired_capabilities import DesiredCapabilities binary = r'C:\Program Files\Mozilla Firefox\firefox.exe' caps = DesiredCapabilities().FIREFOX # caps["pageLoadStrategy"] = "normal" # complete caps["pageLoadStrategy"] = "eager" # interactive # caps["pageLoadStrategy"] = "none" # undefined driver = webdriver.Firefox(capabilities=caps, firefox_binary=binary, executable_path="C:\\Utility\\BrowserDrivers\\geckodriver.exe") driver.get("https://google.com")

Источник

Selenium python: How to stop page loading when the head/title gets loaded?

wanna have a script that scrapes the titles of a list of URLs, but it could be super slow if we need to wait until the whole page gets loaded. The title is the only thing I am looking for. Can we stop page loading when the title gets loaded? maybe with something like EC.title_contains.

2 Answers 2

Have you tried with setting up pageLoadStrategy in your selenium capabilities ?

You can read more about it in this thread — Page load strategy for Chrome driver

http://webdriver.io/guide/getstarted/configuration.html — a bit outdated which says it is available only for firefox but should be available for all browsers by now.

The problem is that webdriver.io as example waits until the page has fully loaded and the loading timer in the tab is away. This is for a good reason because a lot of API´s like .getText are not working until the complete page is loaded because sometimes the element will only be loaded at the end as example. But you can reduce the loading time by:

1. You use extension like script safe or other simple script blocker that block EVERYTHING with javascript inline or external.

2. Go to chrome settings and disable everything like cookies, javascript, flash etc. just everything.

3. Go to chrome://flags and disable everything from javascript (all API´s like gamepad API ETC.) to WebGL, Canvas etc. — You can really disable everything I also have a chrome profile where I disabled everything.

Now with normal Internet Speed and good CPU you can open every site in 1-3 seconds.

Or alternative you can try a headless browser.

Источник

stop page loading with Selenium Webdriver

At this time, my script will check multiple url if some 5 different type of keywords are present in the webpage. Depending of which keyword is found or not, it will output «ok» or «no». I use set_page_load_timeout(30) to avoid infinite load of a url. Problem : some webpages doesn’t load fully before timeout (even if it’s a «very» long timeout). But I can see visually (no headless) that the page is loaded. At least it could check the keywords in the webpage but it doesn’t and after timeout, it display «fail» and the scrape saying «no» doesn’t show to the final output. So I don’t want to put an except after 30 seconds but I want to stop loading the page after 30 seconds and takes what it can be taken. My code :

# coding=utf-8 import re sites=[] keywords_1=[] keywords_2=[] keywords_3=[] keywords_4=[] keywords_5=[] import sys from selenium import webdriver import csv import urllib.parse from datetime import datetime from datetime import date from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from selenium.webdriver.chrome.options import Options def reader3(filename): with open(filename, 'r') as csvfile: # creating a csv reader object csvreader = csv.reader(csvfile) # extracting field names through first row # extracting each data row one by one for row in csvreader: sites.append(str(row[0]).lower()) try: reader3("data/script/filter_domain_OUTPUT.csv") except Exception as e: print(e) sys.exit() exc=[] def reader3(filename): with open(filename, 'r') as csvfile: # creating a csv reader object csvreader = csv.reader(csvfile) # extracting field names through first row # extracting each data row one by one for row in csvreader: exc.append(str(row[0]).lower()) try: reader3("data/script/checking_EXCLUDE.csv") except Exception as e: print(e) sys.exit() def reader2(filename): with open(filename, 'r') as csvfile: # creating a csv reader object csvreader = csv.reader(csvfile) # extracting field names through first row # extracting each data row one by one for row in csvreader: keywords_1.append(str(row[0]).lower()) keywords_2.append(str(row[1]).lower()) keywords_3.append(str(row[2]).lower()) keywords_4.append(str(row[3]).lower()) keywords_5.append(str(row[4]).lower()) try: reader2("data/script/checking_KEYWORD.csv") except Exception as e: print(e) sys.exit() chrome_options = Options() chrome_options.page_load_strategy = 'none' chrome_options.add_experimental_option("excludeSwitches", ["enable-automation"]) chrome_options.add_argument('--no-sandbox') chrome_options.add_argument('--lang=en') chrome_options.add_argument('--disable-notifications') #chrome_options.headless = True chrome_options.add_argument('start-maximized') chrome_options.add_argument('enable-automation') chrome_options.add_argument('--disable-infobars') chrome_options.add_argument('--disable-dev-shm-usage') chrome_options.add_argument('--disable-browser-side-navigation') chrome_options.add_argument('--disable-gpu') driver = webdriver.Chrome(options=chrome_options) for site in sites: try: status_1 = "no" status_2 = "no" status_3 = "no" status_4 = "no" status_5 = "no" now = datetime.now() current_time = now.strftime("%H:%M:%S") today = date.today() print("[" + current_time + "] " + str(site)) if 'http' in site: driver.get(site) else: driver.get("http://" + site) r=str(driver.page_source).lower() driver.set_page_load_timeout(30) for keyword_1 in keywords_1: if keyword_1 in r: status_1="ok" print("home -> " +str(keyword_1)) break for keyword_2 in keywords_2: if keyword_2 in r: status_2="ok" print("home -> " +str(keyword_2)) break for keyword_3 in keywords_3: if keyword_3 in r: status_3="ok" print("home -> " +str(keyword_3)) break for keyword_4 in keywords_4: if keyword_4 in r: status_4="ok" print("home -> " +str(keyword_4)) break for keyword_5 in keywords_5: if keyword_5 in r: status_5="ok" print("Home ->" +str(keyword_5)) break with open('data/script/checking_OUTPUT.csv', mode='a') as employee_file: employee_writer = csv.writer(employee_file, delimiter=';', quotechar='"', quoting=csv.QUOTE_MINIMAL,lineterminator='\n') write=[site,status_1,status_2,status_3,status_4,status_5] employee_writer.writerow(write) except Exception as e: #driver.delete_all_cookies() print("Fail") driver.quit()

Источник