Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

foo.bar.tasks.my_task fails with "No module named bar.tasks" when running in AWS / Elastic Beanstalk #105

Closed
kdmukai opened this issue Oct 29, 2015 · 5 comments

Comments

@kdmukai
Copy link

kdmukai commented Oct 29, 2015

So now I've got qcluster happily running under supervisord both in local dev and on Elastic Beanstalk (EB). Local testing works great.

I have a simple test task:

def task_test(user):
    logger.debug("Hello, from the task!!")

And I can make an async call on it in local dev:

async('myapp.member.tasks.task_test', request.user)

And it runs fine:

22:15:51 [Q] INFO Process-1:1 processing [colorado-eleven-lima-skylark]
2015-10-28 22:15:51,112 DEBUG    myapp.member.tasks:task_test(9): Hello, from the task!!
22:15:51 [Q] INFO Processed [colorado-eleven-lima-skylark]

But up on Elastic Beanstalk something strange is going on with traversing the app structure:

21:23:04 [Q] INFO Process-1:2 processing [potato-mars-connecticut-ink]
21:23:04 [Q] ERROR Failed [potato-mars-connecticut-ink] - No module named member.tasks

I also tried passing the function directly:

from myapp.member.tasks import task_test
async(task_test, request.user)

But end up with a similar error:

22:09:07 [Q] INFO Process-1:10 pushing tasks at 3695
Process Process-1:10:
Traceback (most recent call last):
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/python/run/venv/local/lib/python2.7/site-packages/django_q/cluster.py", line 300, in pusher
    task = signing.SignedPackage.loads(task[1])
  File "/opt/python/run/venv/local/lib/python2.7/site-packages/django_q/signing.py", line 31, in loads
    serializer=PickleSerializer)
  File "/opt/python/run/venv/local/lib/python2.7/site-packages/django/core/signing.py", line 145, in loads
    return serializer().loads(data)
  File "/opt/python/run/venv/local/lib/python2.7/site-packages/django_q/signing.py", line 44, in loads
    return pickle.loads(data)
ImportError: No module named member.tasks

Same problem if I do a relative import:

from .tasks import task_test
async(task_test, request.user)

The EB supervisord.conf is straightforward:

[program:qcluster]
command=/opt/python/run/venv/bin/python manage.py qcluster
numprocs=1
directory=/opt/python/current/app/myapp
environment=$djangoenv

($djangoenv is injecting the environment variables elsewhere in the deploy script, but didn't make a difference with or without them):

djangoenv=`cat /opt/python/current/env | tr '\n' ',' | sed 's/export //g' | sed 's/$PATH/%(ENV_PATH)s/g' | sed 's/$PYTHONPATH//g' | sed 's/$LD_LIBRARY_PATH//g'`
djangoenv=${djangoenv%?}

I also tried SSHing into EB and doing a manual async call through the manage.py shell but saw the same errors in the qcluster logs.

Nothing crazy about the project structure:

approot
|
+---myapp
|    +---member
|    |    +---__init__.py
|    |    +---tasks.py
|    |    +---urls.py
|    |    +---views.py
|    +---__init__.py
|    +---admin.py
|    +---forms.py
|    +---models.py
|    +---settings.py
|    +---urls.py
|    +---views.py
|    +---wsgi.py
+---manage.py

Running on Python 2.7.9.

This seems most likely to be something with the EB environment, but let me know if you have any ideas. I'm all out at this point!

@kdmukai
Copy link
Author

kdmukai commented Oct 29, 2015

Did another test: I tried moving the tasks.py up to the app root so the call is now:

async('myapp.tasks.test_task', request.user)

And the error message:

00:04:42 [Q] INFO Process-1:2 processing [snake-nineteen-artist-berlin]
00:04:42 [Q] ERROR Failed [snake-nineteen-artist-berlin] - No module named tasks

Also tried moving the task function into views.py:

async('myapp.views.test_task', request.user)

Same problem:

00:15:22 [Q] INFO Process-1:2 processing [nevada-princess-friend-ack]
00:15:22 [Q] ERROR Failed [nevada-princess-friend-ack] - No module named views

I'm stumped.

@Koed00
Copy link
Owner

Koed00 commented Oct 29, 2015

You will need the exact same environment variables as you have on your webserver. Most importantly the DJANGO_SETTINGS_MODULE=MyProject.settings part.

You could try running python manage.py check to see if Django gives you any errors.

kdmukai pushed a commit to kdmukai/django-q that referenced this issue Oct 29, 2015
Explicitly copies the environment from the Cluster into the Sentinel
and all worker processes.

Fixes Koed00#105
@Koed00
Copy link
Owner

Koed00 commented Oct 30, 2015

I'm not convinced yet about this pull request. I feel it is quite a big change for something that might not even be a real issue. I have several Heroku deployments, Digital Ocean, Docker, Docker-Compose and Amazone ECS setups I administer with multiple web instances and redundant worker clusters and I've never seen this import problem before. I feel if we spend a little time with it we could probably fix your problem without a pull request. The reasoning behind it is that the environment does not need to be copied to the individual worker processes. Each is a complete fork of the spawning process and even uses the same memory space, so they run in the exact same environment.
If the imports are failing, it is because the environment at the root of the cluster has not been set up properly and not because they are not propagated to the child processes.

@kdmukai
Copy link
Author

kdmukai commented Oct 30, 2015

Yes, you were right. My fix was not actually the solution and the pull request should be deleted. But the good news is that I now have a workaround!

Here's a better understanding of what seems to be happening:

  • I deploy an update to Elastic Beanstalk.
  • It seems to load everything in an isolated environment (this makes sense--the server is still actively serving the previous code).
  • As part of the loading process supervisorctl re-creates the qcluster. However, the qcluster initializes itself to the LIVE app environment, not the new isolated environment.
  • When the new environment is ready, the existing environment is destroyed and the new one takes over.
  • At this point the qcluster seems to lose its ability to reference my app and I get the "No module found" errors.

So, weird as it sounds, when the original environment is destroyed, qcluster acts as if it can no longer find my app, even though the new app is now serving exactly where the old one used to be! Perhaps it's some internal EB symlinking that's leaving the qcluster pointing to the old code that has since been deleted.

I've confirmed that if I manually kill the qcluster after the new code is deployed, when it respawns everything will work as expected. This is how I inadvertently tricked myself into thinking that my os.environ changes had helped; it wasn't my changes, it was all the manual killing/respawning I was doing during testing.

My workaround:

  • In EB's post-deploy hook (the new code is now LIVE), I kill the qcluster and let supervisord regenerate it. Because the new code is now the live version, the qcluster will properly point to it and it's able to complete tasks as expected.

@Koed00
Copy link
Owner

Koed00 commented Oct 31, 2015

I've been reading up on supervisor, cause I've mostly been using Mozzilla's circus. I think your thoughts on the situation are correct. According to supervisors docs, it doesn't like daemonizing or forking processes. It actually modifies the environment with some of it own settings and I'm hypothesizing this might be happening in a way that leads to the problems you're seeing.

Btw. You always need to restart the clusters after deploying new code, cause you might be trying to queue tasks that just don't exist yet in the cluster copy of your Django project. I've just never seen that the environment completely disappears. Good stuff to learn about though.

@kdmukai kdmukai closed this as completed Dec 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants