-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
migrate event table primary keys from integer to bigint #6032
migrate event table primary keys from integer to bigint #6032
Conversation
I have not actually tested how long this takes on large datasets (yet) - that's what I'm experimenting with next. |
awx/main/tasks.py
Outdated
chunk = 10000 | ||
with connection.cursor() as cursor: | ||
while offset < total_rows: | ||
sql = f'INSERT INTO {tblname} SELECT * FROM _old_{tblname} ORDER BY id DESC OFFSET {offset} LIMIT {chunk};' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm doing this in chunked commits so that:
- Users with large amounts of data can see progress (i.e., don't have to wait for one giant INSERT to finish)
- If we handle chunks that insert into NEW and delete from OLD in discrete commits, and the process is interrupted at some point (i.e., the process restarts), then you can just re-run the task again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be interesting to see the benchmarks without chunking. I'd guess the chunking slows this down, vs. just letting postgre to chunk it. (not sure how much though) Or to see speed of bigger chunks, e.g. 100k.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd show you benchmarks, but I gave up on getting final numbers, because the answer is "mind-numbingly slower" :D. The cost of a transaction per insert is very high.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on some feedback from @chrismeyersfsu, I did make the "chunk size" configurable so users can lower it if they like.
awx/main/tasks.py
Outdated
@@ -660,6 +660,25 @@ def update_host_smart_inventory_memberships(): | |||
smart_inventory.update_computed_fields() | |||
|
|||
|
|||
@task() | |||
def migrate_legacy_event_data(tblname, total_rows): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should probably remove this total_rows
parameter and just calculate it by hand here, so that you don't have to know it beforehand to .apply_async()
this task manually.
The process looks sound to me. I was going to suggest importing newest-first because that's what the customer is most likely to want to look at, but I see you've already done that. Perhaps it would be nice to emit a log event that says "all data converted successfully" that customers can look for to know there weren't any errors (or if in an abundance of caution, they don't want to put the system back into production till the conversion is done). |
@ghjm good idea about the "finished" message; I added a log line for it. https://github.com/ansible/awx/pull/6032/files#diff-9d4ea1dd908b35fb92eaede4bd10bb46R680 |
One of my main goals with this is to implement the copying as an idempotent task you can launch. That way if the task is interrupted halfway through migration, you can just kick it off again and it'll pick up where it left off. |
How do you re-launch it? |
Maybe there should be some flag somewhere that notes the migration is unfinished, and a scheduled task that re-launches it if you rebooted or whatever while it was incomplete. |
We could probably handle this with some sort of periodic task that used an advisory lock. Basically, wake up every minute or two, and if the "data is migrated!" flag isn't set, then try to obtain the lock and kick off the task again. We'd need to be absolutely certain it never runs more than once at the same time, though. |
Build failed.
|
Build failed.
|
@ghjm I've uploaded some changes that I think should make this work nicely. I'll spin up a 100M+ event table next week and restart services in the middle, and make sure event counts match before and after.
|
Build succeeded.
|
while total_rows: | ||
with transaction.atomic(): | ||
cursor.execute( | ||
f'INSERT INTO {tblname} SELECT * FROM _old_{tblname} ORDER BY id DESC LIMIT {chunk} RETURNING id;' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This RETURNING
statement yields the primary key of the last insert.
cursor.execute( | ||
f'INSERT INTO {tblname} SELECT * FROM _old_{tblname} ORDER BY id DESC LIMIT {chunk} RETURNING id;' | ||
) | ||
last_insert_pk = cursor.fetchone() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no last insert pk, that means we didn't insert anything, which means the old table is empty (and we're done migrating).
awx/main/tasks.py
Outdated
with advisory_lock(f'bigint_migration_{tblname}', wait=False) as acquired: | ||
if acquired is False: | ||
return | ||
chunk = 1000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a setting so support can edit it if a customer migration fails. I'm thinking of the case where Postgres gets overloaded in some way by the massive data-move size because the users job events are large.
Build succeeded.
|
Build succeeded (gate pipeline).
|
see: #6010
tl;dr
_old_main_jobevent
), copy its schema to new table (main_jobevent
) butid
-->bigint
, Django migration is donemain_jobevent
is empty