Disclaimer: These are the notes of a Celery/RabbitMQ/Linux noob. Not sure how generalizable they are.
A lot of the Django projects I develop have a small number of users and run pretty light-weight tasks. Thus server costs SHOULD be pretty low. However, they also frequently involve asynchronous, “long-running” tasks, such as creating reports, etc… One common way to handle these tasks is with Celery and RabbitMQ.
For my needs, Celery and RabbitMQ are massive overkill. Despite that, I use them because they are well documented and very stable. There are two promising alternatives:
Neither one is beyond v 0.5. In other projects, I have used a combination of cron and custom django commands. That has worked well, but it does not seem as flexible as something like Celery.
The biggest problem with Celery and RabbitMQ is memory use. For my basic Django project, running all by itself in Vagrant, the command “ps aux”, shows these memory values in percent:
- django – 6%
- gunicorn – 3%
- supervisor – 3%
- postgres – 6 processes – 6%
- rabbitmq beam.smp – 8%
- other rabbitmq – 4 processes – negligible
- celery beat – 8%
- celery workers – 3 processes – 24%
This is with no tasks running or web requests coming in. It seems crazy to me to devote 40% of memory to Celery and friends. Maybe I can do some tuning.
I am not sure why I have 3 Celery workers. I am guessing one is enough. Here is my command to start Celery:
celery worker -A my_app -E -l info --concurrency=2
this is what gave 3 processes. When I set concurrency to 1, I got 2 processes. I also got two processes with:
celery worker -A my_app -E -l info --autoscale=3,1
OK, so it looks like I will need to accept 16% for celery workers. Maybe I can improve things by switching to Redis?
I installed Redis using the quick start guide and the defaults. Redis memory usage is negligible. Should you switch to Redis? It depends. There are other important differences. Google RabbitMQ vs Redis. For many of my projects, memory use trumps the other differences.