Headless Chrome with Python and Selenium

In version 59, Google Chrome acquired the option to run headlessly! Here is what works for me (Ubuntu: 14.04, Chrome version: 60.0.3112.78, chromedriver: 2.31) :

from selenium import webdriver

def get_page_html(url, headless=False, screen_shot_path=None):
    if headless:
        options = webdriver.ChromeOptions()
        options.add_argument('--headless')
        options.add_argument('--window-size=1200x2900')
        options.add_argument('--disable-gpu')
        selenium = webdriver.Chrome("/usr/local/share/chromedriver", chrome_options=options)
    else:
        selenium = webdriver.Chrome("/usr/local/share/chromedriver")
        selenium.set_window_size(1500, 900)

    selenium.get(url)
    page_html = selenium.page_source
    if screen_shot_path:
        selenium.save_screenshot(screen_shot_path)

    selenium.quit()
    return page_html

Make sure your version of chromedriver is up-to-date. Also, notice that the options.add_argument() expects the arguments to be as if you were running google-chrome from the command line (e.g. prefix with –).

Daemonizing Django-RQ using Supervisor

I was trying to set up a task to be run from Django-RQ. The task involved scrapping a webpage using Selenium and Google Chrome. It worked great in development, but not in production. The error message indicated that there were problems starting Chrome.

One big difference between dev and production was in production I was daemonizing Django-RQ using Supervisor. Some queued tasks would run. Just not the ones involving Selenium. The clue came when I stopped Django-RQ using supervisorctl and then started it from the command line. Now the Selenium tasks worked.

I solved the problem by adding this snippet to the top of the task that used Selenium:


import os
import json

json.dump(os.environ['PATH'].split(':'), open('debug_file.json', 'wb'))

This revealed that the environment PATH when running from the command line was much different that that when running Django-RQ from Supervisor. Adding some of those paths to the Supervisor config solved the problem.

[program:django_rq]
command= {{ virtualenv_path }}/bin/python manage.py rqworker high default low
stdout_logfile = /var/log/redis/redis_6379.log

numprocs=1

directory={{ django_manage_path }}
environment = DJANGO_SETTINGS_MODULE="{{ django_settings_import }}",PATH="{{ virtualenv_path }}/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
user = vagrant
stopsignal=TERM

autostart=true
autorestart=true

Fixing Hg Mistakes

Removing Pushes from Other Branches

I often create a branch by cloning. I do more involved or experimental things in the clone. While the parent branch is available to make small minor updates. As code in the branch progresses, I do commits. And every once in a while, by accident, I do a commit and push. This puts the experimental code in the main branch before its ready. To remove the pushed change set, cd into main and do:

hg strip -r REV

					

Django HTTP 404 Not Found Error

So you’ve got everything working on the Django dev server. Almost everything is working on the staging server… except one URL that sometimes fails with an HTTP 404 Not Found error. What is going on here?

More details; it happens to be a Django form page. The page loads with a problem. The problem occurs after you submit the form. The URL that causes the 404 error is the URL of the form. What is going on here?

In my case, the form_valid() was doing a lot of processing of the form data and the server was timing out. Maybe part of the reason I did not catch this earlier is my dev server is much faster than the Digital Ocean droplet I was using.

Executing Multiple SSH Commands Using Python

There are many ways to do this. If you only want to run a few commands, then the Python subprocess  module might be best. If you are only working in Python 2.X, then Fabric might be best. Since I wanted 2.X or 3.X and I wanted to run lots of commands, I went with Paramiko. Here is the solution:

import paramiko

IDENTITY = 'path to private key'
REMOTE = 'url of remote'

k = paramiko.RSAKey.from_private_key_file(IDENTITY)
with paramiko.SSHClient() as ssh:
    ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
    ssh.connect(REMOTE, username='vagrant', pkey=k)
    for comand in commands:
        ssh.exec_command(command)