migrate event table primary keys from integer to bigint #6032

ryanpetrello · 2020-02-21T19:15:40Z

tl;dr

Rename old event table (_old_main_jobevent), copy its schema to new table (main_jobevent) but id --> bigint, Django migration is done
AWX starts, you can’t see any events/stdout because main_jobevent is empty
Dispatcher startup code notices the existence of the old table (that has all the data), shovels it into the new table in chunks until the old table is empty
Old (empty) table is deleted, so the dispatcher startup code no longer attempts data shoveling

ryanpetrello · 2020-02-21T19:16:42Z

I have not actually tested how long this takes on large datasets (yet) - that's what I'm experimenting with next.

ryanpetrello · 2020-02-21T19:20:02Z

awx/main/tasks.py

+    chunk = 10000
+    with connection.cursor() as cursor:
+        while offset < total_rows:
+            sql = f'INSERT INTO {tblname} SELECT * FROM _old_{tblname} ORDER BY id DESC OFFSET {offset} LIMIT {chunk};'


I'm doing this in chunked commits so that:

Users with large amounts of data can see progress (i.e., don't have to wait for one giant INSERT to finish)

If we handle chunks that insert into NEW and delete from OLD in discrete commits, and the process is interrupted at some point (i.e., the process restarts), then you can just re-run the task again.

It would be interesting to see the benchmarks without chunking. I'd guess the chunking slows this down, vs. just letting postgre to chunk it. (not sure how much though) Or to see speed of bigger chunks, e.g. 100k.

I'd show you benchmarks, but I gave up on getting final numbers, because the answer is "mind-numbingly slower" :D. The cost of a transaction per insert is very high.

Based on some feedback from @chrismeyersfsu, I did make the "chunk size" configurable so users can lower it if they like.

ryanpetrello · 2020-02-21T19:20:27Z

awx/main/tasks.py

@@ -660,6 +660,25 @@ def update_host_smart_inventory_memberships():
        smart_inventory.update_computed_fields()


+@task()
+def migrate_legacy_event_data(tblname, total_rows):


I should probably remove this total_rows parameter and just calculate it by hand here, so that you don't have to know it beforehand to .apply_async() this task manually.

ghjm · 2020-02-21T19:27:46Z

The process looks sound to me. I was going to suggest importing newest-first because that's what the customer is most likely to want to look at, but I see you've already done that. Perhaps it would be nice to emit a log event that says "all data converted successfully" that customers can look for to know there weren't any errors (or if in an abundance of caution, they don't want to put the system back into production till the conversion is done).

ryanpetrello · 2020-02-21T19:30:15Z

@ghjm good idea about the "finished" message; I added a log line for it.

https://github.com/ansible/awx/pull/6032/files#diff-9d4ea1dd908b35fb92eaede4bd10bb46R680

ryanpetrello · 2020-02-21T19:31:28Z

One of my main goals with this is to implement the copying as an idempotent task you can launch. That way if the task is interrupted halfway through migration, you can just kick it off again and it'll pick up where it left off.

ghjm · 2020-02-21T19:33:43Z

How do you re-launch it?

ghjm · 2020-02-21T19:35:15Z

Maybe there should be some flag somewhere that notes the migration is unfinished, and a scheduled task that re-launches it if you rebooted or whatever while it was incomplete.

ryanpetrello · 2020-02-21T19:39:43Z

Maybe there should be some flag somewhere that notes the migration is unfinished, and a scheduled task that re-launches it if you rebooted or whatever while it was incomplete.

We could probably handle this with some sort of periodic task that used an advisory lock. Basically, wake up every minute or two, and if the "data is migrated!" flag isn't set, then try to obtain the lock and kick off the task again.

We'd need to be absolutely certain it never runs more than once at the same time, though.

softwarefactory-project-zuul · 2020-02-21T20:19:19Z

Build failed.

awx-api-lint : SUCCESS in 4m 23s
awx-api : FAILURE in 11m 50s
awx-ui : SUCCESS in 5m 37s
awx-ui-next : SUCCESS in 9m 12s
awx-swagger : SUCCESS in 11m 01s
awx-detect-schema-change : SUCCESS in 12m 02s (non-voting)
awx-ansible-modules : SUCCESS in 3m 02s

softwarefactory-project-zuul · 2020-02-21T20:52:22Z

Build failed.

awx-api-lint : SUCCESS in 7m 53s
awx-api : FAILURE in 8m 49s
awx-ui : SUCCESS in 7m 52s
awx-ui-next : SUCCESS in 8m 53s
awx-swagger : SUCCESS in 12m 02s
awx-detect-schema-change : SUCCESS in 9m 54s (non-voting)
awx-ansible-modules : SUCCESS in 3m 38s

ryanpetrello · 2020-02-21T21:07:35Z

@ghjm I've uploaded some changes that I think should make this work nicely. I'll spin up a 100M+ event table next week and restart services in the middle, and make sure event counts match before and after.

When the dispatcher starts (or restarts) on any node (which happens post-upgrade), we look to see if any of the _old_<table> tables still exist, and if they do, we kick off the background migration task. If none of the tables exist, it's basically just a no-op that we pay on dispatcher restart.
The background migration task is wrapped in a postgres advisory lock, so if several nodes in a cluster all restart the dispatcher at the same time, and all enqueue the same task, only one will win.

softwarefactory-project-zuul · 2020-03-26T20:50:51Z

Build succeeded.

awx-api-lint : SUCCESS in 8m 33s
awx-api : SUCCESS in 17m 37s
awx-ui : SUCCESS in 7m 22s
awx-ui-next : SUCCESS in 12m 10s
awx-swagger : SUCCESS in 12m 00s
awx-detect-schema-change : FAILURE in 11m 38s (non-voting)
awx-ansible-modules : SUCCESS in 6m 32s

ryanpetrello · 2020-03-27T13:19:33Z

awx/main/tasks.py

+            while total_rows:
+                with transaction.atomic():
+                    cursor.execute(
+                        f'INSERT INTO {tblname} SELECT * FROM _old_{tblname} ORDER BY id DESC LIMIT {chunk} RETURNING id;'


This RETURNING statement yields the primary key of the last insert.

ryanpetrello · 2020-03-27T13:19:51Z

awx/main/tasks.py

+                    cursor.execute(
+                        f'INSERT INTO {tblname} SELECT * FROM _old_{tblname} ORDER BY id DESC LIMIT {chunk} RETURNING id;'
+                    )
+                    last_insert_pk = cursor.fetchone()


If there is no last insert pk, that means we didn't insert anything, which means the old table is empty (and we're done migrating).

chrismeyersfsu · 2020-03-27T13:23:34Z

awx/main/tasks.py

+    with advisory_lock(f'bigint_migration_{tblname}', wait=False) as acquired:
+        if acquired is False:
+            return
+        chunk = 1000000


Make this a setting so support can edit it if a customer migration fails. I'm thinking of the case where Postgres gets overloaded in some way by the massive data-move size because the users job events are large.

softwarefactory-project-zuul · 2020-03-27T13:50:49Z

Build succeeded.

awx-api-lint : SUCCESS in 15m 21s
awx-api : SUCCESS in 18m 40s
awx-ui : SUCCESS in 15m 06s
awx-ui-next : SUCCESS in 21m 58s
awx-swagger : SUCCESS in 17m 49s
awx-detect-schema-change : FAILURE in 18m 08s (non-voting)
awx-ansible-modules : SUCCESS in 12m 25s

softwarefactory-project-zuul · 2020-03-27T16:01:57Z

Build succeeded (gate pipeline).

awx-api-lint : SUCCESS in 5m 43s
awx-api : SUCCESS in 17m 28s
awx-ui : SUCCESS in 7m 41s
awx-ui-next : SUCCESS in 16m 51s
awx-swagger : SUCCESS in 15m 21s
awx-detect-schema-change : FAILURE in 12m 05s (non-voting)
awx-ansible-modules : SUCCESS in 6m 18s
awx-push-new-schema : SUCCESS in 11m 18s (non-voting)

ryanpetrello requested review from chrismeyersfsu, AlanCoding, beeankha, fosterseth, ghjm, jbradberry, matburt, one-t, rebeccahhh, shanemcd, jladdjr and wenottingham February 21, 2020 19:15

ryanpetrello changed the title ~~migrate event table primary keys from integer to bigint~~ WIP: migrate event table primary keys from integer to bigint Feb 21, 2020

ryanpetrello changed the title ~~WIP: migrate event table primary keys from integer to bigint~~ RFC: migrate event table primary keys from integer to bigint Feb 21, 2020

ryanpetrello commented Feb 21, 2020

View reviewed changes

ryanpetrello force-pushed the bigint branch from 9ca6829 to 1af2a94 Compare February 21, 2020 19:29

ryanpetrello force-pushed the bigint branch from 1af2a94 to 30ed677 Compare February 21, 2020 20:27

ryanpetrello force-pushed the bigint branch from 30ed677 to bcb050b Compare February 21, 2020 21:02

ryanpetrello force-pushed the bigint branch from bcb050b to 8d9f708 Compare February 21, 2020 21:11

ryanpetrello commented Mar 27, 2020

View reviewed changes

chrismeyersfsu reviewed Mar 27, 2020

View reviewed changes

make the job event bigint migration chunk size configurable

301d6ff

chrismeyersfsu approved these changes Mar 27, 2020

View reviewed changes

ryanpetrello added the mergeit label Mar 27, 2020

softwarefactory-project-zuul bot merged commit 155a1d9 into ansible:devel Mar 27, 2020

ryanpetrello mentioned this pull request Mar 31, 2020

detect event migration tables in a less noisy way #6494

Merged

ryanpetrello mentioned this pull request May 20, 2020

Job events gone after upgrade to 11.2.0 #7095

Closed

ryanpetrello mentioned this pull request Jun 10, 2020

DB migration exception when upgrading from v8 to v11.2 #7307

Open

ryanpetrello mentioned this pull request Sep 25, 2020

refactor some callback receiver code #8236

Merged

jbradberry mentioned this pull request Aug 2, 2023

Modify main/0185 to set aside the json fields that might be a problem #14311

Merged

vennelsk mentioned this pull request Oct 27, 2023

[Snyk] Security upgrade axios from 0.18.1 to 1.6.0 vennelsk/awx#185

Open

chncaption mentioned this pull request Oct 27, 2023

[Snyk] Security upgrade axios from 0.27.2 to 1.6.0 chncaption/awx#12

Open

YP-Docker mentioned this pull request Oct 27, 2023

[Snyk] Security upgrade axios from 0.27.2 to 1.6.0 YP-Docker/awx#42

Open

ITSecOps-404 mentioned this pull request Oct 28, 2023

[Snyk] Security upgrade axios from 0.27.2 to 1.6.0 ITSecOps-404/awx#14

Open

ClaytonOSouza mentioned this pull request Oct 28, 2023

[Snyk] Security upgrade axios from 0.18.1 to 1.6.0 ClaytonOSouza/awx#160

Open

rkamisetti792 mentioned this pull request Oct 28, 2023

[Snyk] Security upgrade axios from 0.18.0 to 1.6.0 rkamisetti792/awx#140

Open

qmutz mentioned this pull request Oct 28, 2023

[Snyk] Security upgrade axios from 0.18.1 to 1.6.0 GOMYWAY-NETWORKS-LLC/awx#38

Open

minadoom mentioned this pull request Oct 28, 2023

[Snyk] Security upgrade axios from 0.27.2 to 1.6.0 minadoom/awx#16

Open

chncaption mentioned this pull request Dec 27, 2023

[Snyk] Security upgrade axios from 0.27.2 to 1.6.3 chncaption/awx#19

Open

YP-Docker mentioned this pull request Dec 27, 2023

[Snyk] Security upgrade axios from 0.27.2 to 1.6.3 YP-Docker/awx#50

Open

vennelsk mentioned this pull request Dec 27, 2023

[Snyk] Security upgrade axios from 0.18.1 to 1.6.3 vennelsk/awx#195

Open

ClaytonOSouza mentioned this pull request Dec 27, 2023

[Snyk] Security upgrade axios from 0.18.1 to 1.6.3 ClaytonOSouza/awx#169

Open

rkamisetti792 mentioned this pull request Dec 28, 2023

[Snyk] Security upgrade axios from 0.18.0 to 1.6.3 rkamisetti792/awx#146

Open

qmutz mentioned this pull request Dec 28, 2023

[Snyk] Security upgrade axios from 0.18.1 to 1.6.3 GOMYWAY-NETWORKS-LLC/awx#47

Open

ITSecOps-404 mentioned this pull request Dec 28, 2023

[Snyk] Security upgrade axios from 0.27.2 to 1.6.3 ITSecOps-404/awx#22

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

migrate event table primary keys from integer to bigint #6032

migrate event table primary keys from integer to bigint #6032

ryanpetrello commented Feb 21, 2020 •

edited

Loading

ryanpetrello commented Feb 21, 2020 •

edited

Loading

ryanpetrello Feb 21, 2020 •

edited

Loading

Ladas Mar 27, 2020 •

edited

Loading

ryanpetrello Mar 27, 2020 •

edited

Loading

ryanpetrello Mar 27, 2020

ryanpetrello Feb 21, 2020

ghjm commented Feb 21, 2020

ryanpetrello commented Feb 21, 2020 •

edited

Loading

ryanpetrello commented Feb 21, 2020

ghjm commented Feb 21, 2020

ghjm commented Feb 21, 2020

ryanpetrello commented Feb 21, 2020 •

edited

Loading

softwarefactory-project-zuul bot commented Feb 21, 2020

softwarefactory-project-zuul bot commented Feb 21, 2020

ryanpetrello commented Feb 21, 2020 •

edited

Loading

softwarefactory-project-zuul bot commented Mar 26, 2020

ryanpetrello Mar 27, 2020

ryanpetrello Mar 27, 2020

chrismeyersfsu Mar 27, 2020 •

edited

Loading

softwarefactory-project-zuul bot commented Mar 27, 2020

softwarefactory-project-zuul bot commented Mar 27, 2020

migrate event table primary keys from integer to bigint #6032

migrate event table primary keys from integer to bigint #6032

Conversation

ryanpetrello commented Feb 21, 2020 • edited Loading

ryanpetrello commented Feb 21, 2020 • edited Loading

ryanpetrello Feb 21, 2020 • edited Loading

Choose a reason for hiding this comment

Ladas Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

ryanpetrello Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

ryanpetrello Mar 27, 2020

Choose a reason for hiding this comment

ryanpetrello Feb 21, 2020

Choose a reason for hiding this comment

ghjm commented Feb 21, 2020

ryanpetrello commented Feb 21, 2020 • edited Loading

ryanpetrello commented Feb 21, 2020

ghjm commented Feb 21, 2020

ghjm commented Feb 21, 2020

ryanpetrello commented Feb 21, 2020 • edited Loading

softwarefactory-project-zuul bot commented Feb 21, 2020

softwarefactory-project-zuul bot commented Feb 21, 2020

ryanpetrello commented Feb 21, 2020 • edited Loading

softwarefactory-project-zuul bot commented Mar 26, 2020

ryanpetrello Mar 27, 2020

Choose a reason for hiding this comment

ryanpetrello Mar 27, 2020

Choose a reason for hiding this comment

chrismeyersfsu Mar 27, 2020 • edited Loading

Choose a reason for hiding this comment

softwarefactory-project-zuul bot commented Mar 27, 2020

softwarefactory-project-zuul bot commented Mar 27, 2020

ryanpetrello commented Feb 21, 2020 •

edited

Loading

ryanpetrello commented Feb 21, 2020 •

edited

Loading

ryanpetrello Feb 21, 2020 •

edited

Loading

Ladas Mar 27, 2020 •

edited

Loading

ryanpetrello Mar 27, 2020 •

edited

Loading

ryanpetrello commented Feb 21, 2020 •

edited

Loading

ryanpetrello commented Feb 21, 2020 •

edited

Loading

ryanpetrello commented Feb 21, 2020 •

edited

Loading

chrismeyersfsu Mar 27, 2020 •

edited

Loading