Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failing Database migration does not prevent routing-api from startup #57

Closed
schmidtsv opened this issue Aug 2, 2024 · 3 comments
Closed
Labels

Comments

@schmidtsv
Copy link

Current behavior

We had a bogus value in the database of the routing api leading to this error in the log:

(/var/vcap/data/compile/routing-api/src/code.cloudfoundry.org/routing-api/db/client.go:97) 
[2024-08-01 12:14:51]  pq: column "instance_id" contains null values 
{"timestamp":"2024-08-01T12:14:51.480102358Z","level":"error","source":"routing-api","message":"routing-api.migration.migrations-failed","data":{"error":"pq: column \"instance_id\" contains null values","session":"3"}}

This led to the migrations not being executed, yet for some reason the job still started, leading to all routing-api instances to be updated. WE only noticed it since the Log-api tried to register routes and failed since:

(/var/vcap/data/compile/routing-api/src/code.cloudfoundry.org/routing-api/db/client.go:79) 
[2024-08-01 12:17:09]  pq: column "instance_id" of relation "tcp_routes" does not exist 

So it seems the a database that fails migrations does not block the job from starting.

Desired behavior

If a migration fails, the job refuses to start, this would prevent the component from running in a half-updated state. This would also flag whoever operates it to investigate directly, instead of only failing if another component later uses something that was part of the migration.

Affected Version

routing/0.301.0

@geofffranks
Copy link
Contributor

Looks like this is a problem with the migration that only affects postgres databases. Mysql is able to run the migration properly and assigns a default value to the instance_id column for existing rows, despite there not being a default set. Postgres fails here.

We're working on a fix to be released soon, and also investigating why the migration failure didn't cause routing-api to outright fail.

@geofffranks
Copy link
Contributor

geofffranks commented Aug 12, 2024

Migration failures not causing BOSH deploy failures has been fixed via #60.

The migration itself was fixed in v0.302.0

@geofffranks
Copy link
Contributor

Ok, migration failure issue is fixed as of v0.304.0. Thanks for finding all of this! So many changes for the better!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

2 participants