Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade from 9.0.1 to 9.1.0 breaks the system #5530

Closed
ilijamt opened this issue Dec 17, 2019 · 14 comments
Closed

Upgrade from 9.0.1 to 9.1.0 breaks the system #5530

ilijamt opened this issue Dec 17, 2019 · 14 comments

Comments

@ilijamt
Copy link
Contributor

ilijamt commented Dec 17, 2019

ISSUE TYPE
  • Bug Report
SUMMARY

Upgrading from 9.0.1 to 9.1.0 broke the upgrade

ENVIRONMENT
  • AWX version: 9.0.1 to 9.1.0
  • AWX install method: k8s
STEPS TO REPRODUCE

Upgrade from 9.0.1 to 9.1.0 using the ansible-playbook installer.

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
psycopg2.errors.UndefinedColumn: column main_projectupdate.job_tags does not exist
LINE 1: ...e"."project_id", "main_projectupdate"."job_type", "main_proj...
                                                             ^
HINT:  Perhaps you meant to reference the column "main_projectupdate.job_type".


The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 86, in perform_work
    result = self.run_callable(body)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 62, in run_callable
    return _call(*args, **kwargs)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/tasks.py", line 19, in run_task_manager
    TaskManager().schedule()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 643, in schedule
    self._schedule()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 605, in _schedule
    all_sorted_tasks = self.get_tasks()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 69, in get_tasks
    project_updates = [p for p in ProjectUpdate.objects.filter(status__in=status_list, job_type='check').prefetch_related('instance_group')]
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 274, in __iter__
    self._fetch_all()
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 1242, in _fetch_all
    self._result_cache = list(self._iterable_class(self))
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/polymorphic/query.py", line 56, in _polymorphic_iterator
    o = next(base_iter)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 55, in __iter__
    results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 1100, in execute_sql
    cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute
    return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers
    return executor(sql, params, many, context)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/utils.py", line 89, in __exit__
    raise dj_exc_value.with_traceback(traceback) from exc_value
  File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
    return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: column main_projectupdate.job_tags does not exist
LINE 1: ...e"."project_id", "main_projectupdate"."job_type", "main_proj...
EXPECTED RESULTS

Upgrade to finish successfully and without an issue.

ACTUAL RESULTS

System broke.

ADDITIONAL INFORMATION
@shanemcd
Copy link
Member

Can you confirm that you followed the steps here: https://github.com/ansible/awx/blob/devel/INSTALL.md#upgrading-from-previous-versions

I am unable to reproduce.

In the installer logs, under Migrate database, you should see something like:

Operations to perform:
  Apply all migrations: auth, conf, contenttypes, main, oauth2_provider, sessions, sites, social_django, sso, taggit
Running migrations:
  Applying main.0099_v361_license_cleanup... OK
  Applying main.0100_v370_projectupdate_job_tags... OK

@ilijamt
Copy link
Contributor Author

ilijamt commented Dec 17, 2019

Yes I did.

I found a way to fix it though. After spinning the whole cluster up. I used

$ docker exec -it <task> bash
$ awx-manage migrate

After this everything was good.

I don't deploy on OpenShift but use selfhosted K8S and I noticed that the database awx is not in the ownership of awx but of postgres user. All the tables are in the ownership of awx except the database itself.

A new deployment failed because it's trying to call the endpoint for version on the Get Kubernetes API version with

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

I temporary modified the playbook to the snippet bellow instead of the original

- name: Get Kubernetes API version
  command: |
    {{ kubectl_or_oc }} version -o json
  register: kube_version

- name: Extract server version from command output
  set_fact:
    kube_api_version: "{{ (kube_version.stdout | from_json).serverVersion.gitVersion[1:] }}"

After that it just hangs on TASK [kubernetes : Migrate database], and I cannot continue

I don't have this problem when running the installation for 9.0.1

@ikkaro
Copy link

ikkaro commented Dec 20, 2019

I've the exactly same problem performing the upgrade. The just hangs on:
TASK [kubernetes : Migrate database]

@tamirshaul
Copy link

tamirshaul commented Dec 24, 2019

Running on Openshift and I'm having the same problem too.
Playbook hanging on Migrate Database.

I've tried to perform the migration manually through the management pod but it just hangs.
Any solution / hotfix for this problem?

@wbieniek
Copy link

Running On Openshift as well and I'm having the same problem too. Same issue as above, playbooks stops on Migration Database.

@wbieniek
Copy link

Traceback (most recent call last):
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 86, in perform_work
result = self.run_callable(body)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/dispatch/worker/task.py", line 62, in run_callable
return _call(*args, **kwargs)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/tasks.py", line 19, in run_task_manager
TaskManager().schedule()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 643, in schedule
self._schedule()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 605, in _schedule
all_sorted_tasks = self.get_tasks()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/awx/main/scheduler/task_manager.py", line 69, in get_tasks
project_updates = [p for p in ProjectUpdate.objects.filter(status__in=status_list, job_type='check').prefetch_related('instance_group')]
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 274, in iter
self._fetch_all()
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 1242, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/polymorphic/query.py", line 56, in _polymorphic_iterator
o = next(base_iter)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/query.py", line 55, in iter
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/models/sql/compiler.py", line 1100, in execute_sql
cursor.execute(sql, params)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 67, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 76, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/utils.py", line 89, in exit
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/var/lib/awx/venv/awx/lib64/python3.6/site-packages/django/db/backends/utils.py", line 84, in _execute
return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: column main_projectupdate.job_tags does not exist
LINE 1: ...e"."project_id", "main_projectupdate"."job_type", "main_proj...
^
HINT: Perhaps you meant to reference the column "main_projectupdate.job_type".

@smuth4
Copy link

smuth4 commented Dec 26, 2019

I ran into this problem on vanilla k8s as well, and I suspect the changes from #5239 introduced this issue.
Manually running kubectl -n awx exec -it ansible-tower-management -- bash -c "awx-manage migrate -v 3" produces this output and then hangs:

Operations to perform:
  Apply all migrations: auth, conf, contenttypes, main, oauth2_provider, sessions, sites, social_django, sso, taggit
~~snip~~
Running pre-migrate handlers for application main
2019-12-26 21:26:36,531 DEBUG    awx.main.dispatch publish awx.main.tasks.set_migration_flag(c689cb1b-c7da-46cb-bdc0-f6c90c627ae0, queue=tower_broadcast_all)

IIUC, it's trying to send out a message to rabbitmq before the migration starts (as per here and here), but the pod is configured to try to connect to localhost, which doesn't play nice on kubernetes since the pods have separate network namespaces.

There are two workarounds that I tested:

  • Once awx-0 is running (even if it's not fully functional), you can run the same command through the awx-web container, which does have access to rabbit on localhost:
    kubectl -n awx exec -it awx-0 -c awx-web -- bash -c "awx-manage migrate --noinput". Then you can comment out the Migrate database task in installer/roles/kubernetes/tasks/main.yml to make the playbook functional.
  • Edit the aforementioned credentials.py.j2 to change the rabbit hostname from localhost to rabbitmq.{{ kubernetes_namespace }}.svc, so that k8s's DNS can magically route it to the right IP, and re-run the installation playbook.

The latter method is probably the cleanest and should work long-term in most setups, if people are comfortable with it I can submit a PR.

@wbieniek
Copy link

I was able successfully upgrade from 9.0.1 to 9.1.0. I used second method.Thank you for posting workaround.

@ryanpetrello
Copy link
Contributor

@ilijamt @smuth4 @wbieniek (and others):

We think a recent change in AWX caused this issue. We're about to roll back the change here: #5579

Any of you interested in giving this a try?

@kdelee
Copy link
Member

kdelee commented Jan 2, 2020

Testing this w/ downstream tower openshift upgrades -- will update on progress soon

@kdelee
Copy link
Member

kdelee commented Jan 3, 2020

This is now working downstream w/ openshift upgrades which were experiencing same issue as awx upgrade. Going to close, but any comments from @ilijamt @smuth4 or @wbieniek are welcome for awx upgrades to devel

@Quantas
Copy link

Quantas commented Jan 6, 2020

@ryanpetrello @kdelee I too just ran into this issue upgrading 9.0.1 to 9.1.0 on k8s. The fix you mention in your comment relating to #5579 will that fix be in forthcoming AWX release?

Thanks!

@shanemcd
Copy link
Member

shanemcd commented Jan 6, 2020

9.1.1 will be released sometime within the next day or so.

@tongtie
Copy link

tongtie commented Jul 16, 2020

Yes I did.

I found a way to fix it though. After spinning the whole cluster up. I used

$ docker exec -it <task> bash
$ awx-manage migrate

After this everything was good.

I don't deploy on OpenShift but use selfhosted K8S and I noticed that the database awx is not in the ownership of awx but of postgres user. All the tables are in the ownership of awx except the database itself.

A new deployment failed because it's trying to call the endpoint for version on the Get Kubernetes API version with

{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {
    
  },
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}

I temporary modified the playbook to the snippet bellow instead of the original

- name: Get Kubernetes API version
  command: |
    {{ kubectl_or_oc }} version -o json
  register: kube_version

- name: Extract server version from command output
  set_fact:
    kube_api_version: "{{ (kube_version.stdout | from_json).serverVersion.gitVersion[1:] }}"

After that it just hangs on TASK [kubernetes : Migrate database], and I cannot continue

I don't have this problem when running the installation for 9.0.1

awx: 13.0.0
docker: 18.06.03-ce
First install ,got this error
psycopg2.errors.UndefinedTable: relation "main_instance" does not exist
Do these resolved.

docker exec -it awx_task bash
awx-manage migrate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests