Headless Chrome with Python and Selenium

In version 59, Google Chrome acquired the option to run headlessly! Here is what works for me (Ubuntu: 14.04, Chrome version: 60.0.3112.78, chromedriver: 2.31) :

from selenium import webdriver

def get_page_html(url, headless=False, screen_shot_path=None):
    if headless:
        options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        options.add_argument('--window-size=1200x2900')
        options.add_argument('--disable-gpu')
        selenium = webdriver.Chrome("/usr/local/share/chromedriver", chrome_options=options)
    else:
        selenium = webdriver.Chrome("/usr/local/share/chromedriver")
        selenium.set_window_size(1500, 900)

    selenium.get(url)
    page_html = selenium.page_source
    if screen_shot_path:
        selenium.save_screenshot(screen_shot_path)

    selenium.quit()
    return page_html

Make sure your version of chromedriver is up-to-date. Also, notice that the options.add_argument() expects the arguments to be as if you were running google-chrome from the command line (e.g. prefix with –).

Advertisements

Daemonizing Django-RQ using Supervisor

I was trying to set up a task to be run from Django-RQ. The task involved scrapping a webpage using Selenium and Google Chrome. It worked great in development, but not in production. The error message indicated that there were problems starting Chrome.

One big difference between dev and production was in production I was daemonizing Django-RQ using Supervisor. Some queued tasks would run. Just not the ones involving Selenium. The clue came when I stopped Django-RQ using supervisorctl and then started it from the command line. Now the Selenium tasks worked.

I solved the problem by adding this snippet to the top of the task that used Selenium:


import os
import json

json.dump(os.environ['PATH'].split(':'), open('debug_file.json', 'wb'))

This revealed that the environment PATH when running from the command line was much different that that when running Django-RQ from Supervisor. Adding some of those paths to the Supervisor config solved the problem.

[program:django_rq]
command= {{ virtualenv_path }}/bin/python manage.py rqworker high default low
stdout_logfile = /var/log/redis/redis_6379.log

numprocs=1

directory={{ django_manage_path }}
environment = DJANGO_SETTINGS_MODULE="{{ django_settings_import }}",PATH="{{ virtualenv_path }}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
user = vagrant
stopsignal=TERM

autostart=true
autorestart=true

Logging a Single Module in Python

The problem is that sometimes 3rd party modules have logging turned on. For example, I run into that when using Selenium for testing.

If you do not want to see any logger output, you can put this at the top of your module:

import logging
logging.disable(logging.CRITICAL)

That works fine, until you want to turn on logging for the module you are working on. Here is code that turns off all other loggers. The last line sends a logger message for this module to the console.

import logging

for logger_name in logging.Logger.manager.loggerDict.keys():
    logging.getLogger(logger_name).setLevel(logging.CRITICAL)

logging.debug('hello from my module')

Django Tests, Selenium, Ajax and PyCharm

If you need to test Django code that involves Ajax, Selenium is the way to go. If you use PyCharm, you probably use the debugger all the time. In fact, you might be inclined to put some break points in the Ajax callbacks and inspect some variables when execution stops at those break points. All pretty straight forward… except if your callback accesses the database.

When the debugger stops in the callback, it turns out it cannot access the test database. If your callback accesses the database, then what the debugger is showing you is not what is actually happening. I ended up solving the problem I was having by putting print statements in the callback.

Selenium and Firefox Problems

I just did a software update on my Ubuntu 14.04 development machine. That update included updating Firefox to 47.0. This caused my tests to fail with the message:

Traceback (most recent call last):
 File "/home/chuck/sqdb/django_fuller_calendar/django_fuller_calendar/tests/tests_selenium.py", line 23, in setUp
 self.browser = webdriver.Firefox()
 File "/home/chuck/.virtualenvs/dfc/local/lib/python2.7/site-packages/selenium/webdriver/firefox/webdriver.py", line 81, in __init__
 self.binary, timeout)
 File "/home/chuck/.virtualenvs/dfc/local/lib/python2.7/site-packages/selenium/webdriver/firefox/extension_connection.py", line 51, in __init__
 self.binary.launch_browser(self.profile, timeout=timeout)
 File "/home/chuck/.virtualenvs/dfc/local/lib/python2.7/site-packages/selenium/webdriver/firefox/firefox_binary.py", line 68, in launch_browser
 self._wait_until_connectable(timeout=timeout)
 File "/home/chuck/.virtualenvs/dfc/local/lib/python2.7/site-packages/selenium/webdriver/firefox/firefox_binary.py", line 98, in _wait_until_connectable
 raise WebDriverException("The browser appears to have exited "
WebDriverException: Message: The browser appears to have exited before we could connect. If you specified a log_file in the FirefoxBinary constructor, check it for details.

I am using selenium 2.53.5.

After extensive googling, the most useful page was: https://github.com/SeleniumHQ/selenium/issues/2110

On that page, this comment was most useful:

I strongly suggest people move over to Marionette

Following the instructions on that page, I was able to get the python demo code on the bottom of that page to work. I needed to change the name of the file to “wires” as mentioned in the text. Putting the file in /usr/bin made it available everywhere I need it.

I should mention that my selenium tests are now failing because I am using the command:

action.move_to_element_with_offset

Earlier today, I tried switching to the Chromium browser. It failed on the same command.

Rolling Back Firefox

The solution above did not work for me because I could not find a work-around for the “move_to” commands. So I rolled back Firefox as a temporary solution. I used the instructions found here: http://askubuntu.com/questions/661186/how-to-install-previous-firefox-version  I was able to find the version I wanted here: https://sourceforge.net/projects/ubuntuzilla/files/mozilla/apt/pool/main/f/firefox-mozilla-build/

 

Django, Selenium Headless Tests Fail

Running headless selenium tests is great. Except when the tests succeed when headed, then fail when headless. When that happens it’s hard not to let your imagination run wild with crazy javascript thoughts.

There are many ways to run headless. The method I use is described in this post. In a sense this is not headless because a normal browser is running; it’s just running on a hidden display. This makes the headless fails all the more perplexing.

In this case, the problem was the headless browser window was a different size compared to the headed version. I use the chosen widget a lot. It’s a fancy combo of an HTML select widget and auto-complete. Clicking on this widget creates a pop-up drop down. If the needed choice is outside of the browser, selenium auto scrolls the page so that the choice is visible. When it decides to scroll depends on the browser window dimensions.

In many cases, this auto scroll would no create problems. In my case, I also have a nav-bar locked to the top of the screen. Sometimes the desired choice would end up under the nav-bar after auto-scroll. When selenium moved to select the choice, it would get the nav bar instead. The form value would not be set and the form would fail. Although it did not happen to me, I supposed it’s possible selenium could have ended up on a different page. That could make some strange errors.

The first part of the fix is to explicitly set the browser window so the headed and headless versions are the same. Something like this:

if getattr(settings, 'HEADLESS_TESTS', False):
    self.vdisplay = Display(visible=False, size=(1600, 1000))
    self.vdisplay.start()
self.selenium = webdriver.Firefox()
self.selenium.set_window_size(1500, 900)

If all the problems go away, you are done. If not, then hopefully the headed tests and headless tests will now fail in the same way, making debugging easier.

In my case, I added code to scroll the page if the item was under the nav bar. Here is the code.

Django Headless Testing: Address already in use Error

Here is a great post for how to run Django tests headless. It uses xvfb.

Sometimes after a test crashes, the next time you run a test, you will get the error:

Traceback (most recent call last):
 File "/home/chuck/.virtualenvs/qdb7/local/lib/python2.7/site-packages/django/test/testcases.py", line 1189, in setUpClass
 raise cls.server_thread.error
error: [Errno 98] Address already in use

To get rid of this error, find the Xvbf process that was left running after the crash:

ps aux | grep Xv

Then run kill on the pid.