Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refine migration script #1617

Open
wants to merge 15 commits into
base: develop
Choose a base branch
from
Open

Conversation

Mythicaeda
Copy link
Contributor

@Mythicaeda Mythicaeda commented Dec 9, 2024

Description

Adds a tracked view to Hasura showing the integer contents of the migrations.schema_migrations table. This allows Aerie Admins on a venue to view which migrations have been applied (was formerly locked behind Hasura Admins/Postgres Admins).

Heavily refactors the Aerie DB Migration Script:

  • Adds the 'status' subcommand to view the current status of migrations without applying them
  • Moves the migration logic into the 'migrate' subcommand
  • Adds a Hasura class for communicating via the CLI or API. This ensures that all necessary flags are always passed to the CLI. Necessary flags include: the Hasura project root, the Hasura endpoint and admin secret, the location of the .env to load values from, the database to apply migrations to
  • Loads in endpoint and admin secret information the same way that the Hasura CLI does (flags, then envvars, then .env, then config.yaml)
  • Gets the current database migration status by running a run_sql query against the Hasura venue, removing the need to provide database connection information
  • Fixes the bug where Hasura metadata would be reloaded but not applied, forcing the user to restart Hasura. The new metadata is now applied before it's reloaded

Verification

Manually tested

Documentation

Added doc comments and type hints throughout the script.

The website docs were edited to reflect the changes to the command line arguments and introduction of subcommands.

Future work

  1. Refining the output when a migration fails to run. Currently the blob JSON is printed and it's hard to parse even if you know what you're looking for.
  2. Perhaps adding an "until" flag to the bulk migration (as in, "apply[revert] migrations until you reach this one"). Not sure if there's a pressing use case for this currently.
  3. Better support for downgrading Aerie & running down-migrations #1627

@Mythicaeda Mythicaeda added refactor A code change that neither fixes a bug nor adds a feature fix A bug fix labels Dec 9, 2024
@Mythicaeda Mythicaeda self-assigned this Dec 9, 2024
@Mythicaeda Mythicaeda requested a review from a team as a code owner December 9, 2024 23:48
@Mythicaeda Mythicaeda force-pushed the fix/refine-migration-script branch from 2c68885 to 174f6b3 Compare December 10, 2024 15:17
@Mythicaeda Mythicaeda added the breaking change A change that will require updating downstream code label Dec 11, 2024
Copy link
Collaborator

@dandelany dandelany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattdailis and I successfully tested and reviewed this yesterday - thanks @Mythicaeda ! There was only one issue that rose to the level of requested change:

As discussed, we accept that this is a non-backwards-compatible change to the migration script command (since subcommands migrate and status are now used). However, if users accidentally call it without a subcommand/arguments (ie. python aerie_db_migration.py) due to using an old CI script or referencing old docs, it now fails with an inscrutable error message:

Traceback (most recent call last):
  File "/Users/dailis/projects/aerie/migration-script-testing-2/aerie_db_migration.py", line 512, in <module>
    arguments.func(arguments)
    ^^^^^^^^^^^^^^
AttributeError: 'Namespace' object has no attribute 'func'

Can we improve the message in this case so users know what the issue is?

Other notes/non-blockers from testing:

  • I will need to include this change to the command in the upgrade guide when released
  • nonblocker: If you make a mistake while running the script, you can't Ctrl+C to exit during the "safe" part while Hasura is checking applied migrations, it only recognizes Ctrl+C after it starts running them.
  • nonblocker: I'd like to eventually improve docs/support for downgrading/running reverse migrations, since it's not super intuitive and requires knowing which down migrations need to be run in advance.
  • Unrelated issue: we noticed that in our deployment.zip files for each release, the default docker tag is latest, we should consider (separately) making releases use the release tag.

Our testing procedure:

  • We spun up an old version (v2.20?) of Aerie locally and created a plan, some activities, and simulated. Then we used the script on this branch to upgrade to the latest Aerie version, confirmed the necessary forward migrations ran, and tested with the new version with no issues.
  • We created a new dummy migration, and migrated both forward and backward using this script with no issues, with verification steps in between to make sure the correct database changes were applied.

This view allows Aerie admin users to read migration information about their venue (information was formerly locked to Hasura/Postgres Admins)
Additionally, record where the migration steps are located and update step-by-step mode to reference this location
Hasura CLI loads first from arguments passed to the command, then from environment variables, then from a passed .env, then finally from a config file. This change updates how the script determines its connection credentials to match how HasuraCLI works.
Changes the script to use a `run_sql` request to get the current migration, rather than a direct connection to a database.

This removes the ability for the user to specify one database in the 'netloc' argument and another in the Hasura configuration
Hasura replaces the os.system and subprocess.getoutput calls. This was done as there are three flags that need to be provided to every call to the hasura cli:

- skip-update-check: cleans up the output by skipping the automatic update check
- project: directory where the cli command is executed
- envfile: env file to load envvars from. Set to '.env' in the directory the CLI is run from to avoid the CLI and the Migration script loading separate envfiles
- Extracts initializing a Hasura object from a Namespace into a function
- moves migration logic into 'migrate' subfunction
The migration script now runs "hasura metadata apply" before "hasura metadata reload".
  - As metadata reloading takes noticeably longer, in step-by-step mode it now only runs when exiting, rather than between each step.
@Mythicaeda Mythicaeda force-pushed the fix/refine-migration-script branch from 0fa71a3 to df43408 Compare January 23, 2025 21:19
@Mythicaeda
Copy link
Contributor Author

Mythicaeda commented Jan 23, 2025

However, if users accidentally call it without a subcommand/arguments (ie. python aerie_db_migration.py) due to using an old CI script or referencing old docs, it now fails with an inscrutable error message:

Thanks for catching this! I've updated the script to print the help message in this case.

$ aerie_db_migration.py 
usage: aerie_db_migration.py [-h] <command> ...

Migrate the database of an Aerie venue.

options:
  -h, --help  show this help message and exit

commands:
  <command>
    status    Get the current migration status of the database
    migrate   Migrate the database

Are there any similar issues with the behavior when a subcommand is omitted entirely?

$ aerie_db_migration.py -e ../.env
usage: aerie_db_migration.py [-h] <command> ...
aerie_db_migration.py: error: argument <command>: invalid choice: '../.env' (choose from 'status', 'migrate')

nonblocker: If you make a mistake while running the script, you can't Ctrl+C to exit during the "safe" part while Hasura is checking applied migrations, it only recognizes Ctrl+C after it starts running them.

Can you clarify if you mean during mark_current_version at the start of execution or reload_metadata between steps (slash at the end in bulk-mode)?

nonblocker: I'd like to eventually improve docs/support for downgrading/running reverse migrations, since it's not super intuitive and requires knowing which down migrations need to be run in advance.

This feels beyond the scope of this PR. Could you make a ticket for this so that it's not lost? I'll link it in the "Future Work" section.

Copy link
Collaborator

@dandelany dandelany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated the script to print the help message in this case.

Thanks! Approved.

Are there any similar issues with the behavior when a subcommand is omitted entirely?

Not that I've seen, but I'll keep an eye out.

(re: Ctrl-C) Can you clarify if you mean during mark_current_version at the start of execution or reload_metadata between steps (slash at the end in bulk-mode)?

I'm not sure - when you run the migrate command there is a long (10-30 second) wait time while it does "something" before actually running migrations - it's during this time that Ctrl+C seemed to be ignored.

This feels beyond the scope of this PR. Could you make a ticket for this so that it's not lost? I'll link it in the "Future Work" section.

👍 done - filed #1627 (not prioritized yet)

@Mythicaeda
Copy link
Contributor Author

Are there any similar issues with the behavior when a subcommand is omitted entirely?

Not that I've seen, but I'll keep an eye out.

To be explicit, the code block below that comment (and I'll include it here again) was the output when there is no subcommand:

$ aerie_db_migration.py -e ../.env
usage: aerie_db_migration.py [-h] <command> ...
aerie_db_migration.py: error: argument <command>: invalid choice: '../.env' (choose from 'status', 'migrate')

I'm not sure - when you run the migrate command there is a long (10-30 second) wait time while it does "something" before actually running migrations - it's during this time that Ctrl+C seemed to be ignored.

That's mark_current_version. That means the slowdown's either on the request getting the migration information (where I don't think there's any improvements that can be made) or, more likely, it's in the loop afterwards where all the migrations that are in the DB are marked as applied in Hasura. I can refine this loop so that it only applies unmarked migrations, meaning that you have the same slowdown the very first time a DB is ever migrated but shouldn't experience it again afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change A change that will require updating downstream code fix A bug fix refactor A code change that neither fixes a bug nor adds a feature
Projects
None yet
2 participants