Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(compute_ctl): Allow usage of DB names with whitespaces #9919

Merged
merged 5 commits into from
Nov 28, 2024

Conversation

ololobus
Copy link
Member

@ololobus ololobus commented Nov 27, 2024

Problem

We used set_path() to replace the database name in the connection string. It automatically does url-safe encoding if the path is not already encoded, but it does it as per the URL standard, which assumes that tabs can be safely removed from the path without changing the meaning of the URL. See, e.g., https://url.spec.whatwg.org/#concept-basic-url-parser. It also breaks for DBs with properly %-encoded names, like with %20, as they are kept intact, but actually should be escaped.

Yet, this is not true for Postgres, where it's completely valid to have trailing tabs in the database name.

I think this is the PR that caused this regression #9717, as it switched from postgres::config::Config back to set_path().

This was fixed a while ago already 1, btw, I just haven't added a test to catch this regression back then :(

Summary of changes

This commit changes the code back to use postgres/tokio_postgres::Config everywhere.

While on it, also do some changes around, as I had to touch this code:

  1. Bump some logging from debug to info in the spec apply path. We do not use debug in prod, and it was tricky to understand what was going on with this bug in prod.
  2. Refactor configuration concurrency calculation code so it was reusable. Yet, still keep 1 in the case of reconfiguration. The database can be actively used at this moment, so we cannot guarantee that there will be enough spare connection slots, and the underlying code won't handle connection errors properly.
  3. Simplify the installed extensions code. It was spawning a blocking task inside async function, which doesn't make much sense. Instead, just have a main sync function and call it with spawn_blocking in the API code -- the only place we need it to be async.
  4. Add regression python test to cover this and related problems in the future. Also, add more extensive testing of schema dump and DBs and roles listing API.

Resolves neondatabase/cloud#20869

@ololobus ololobus requested review from a team as code owners November 27, 2024 23:18
@ololobus ololobus force-pushed the alexk/compute-ctl-apply-fixes branch from 76574a7 to ced563e Compare November 27, 2024 23:47
@ololobus ololobus requested review from MMeent and lubennikovaav and removed request for NanoBjorn and mtyazici November 27, 2024 23:47
@ololobus
Copy link
Member Author

I wonder, is the way I did it this time is also OK, because I'm not sure how to switch to postgres::config::Config in all places now. Schema dump is the most problematic, as it just passes connstr to the cli

Copy link

github-actions bot commented Nov 28, 2024

6952 tests run: 6644 passed, 0 failed, 308 skipped (full report)


Code coverage* (full report)

  • functions: 30.6% (7985 of 26064 functions)
  • lines: 48.6% (63482 of 130635 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
5b68765 at 2024-11-28T20:17:03.326Z :recycle:

compute_tools/src/compute.rs Show resolved Hide resolved
compute_tools/src/compute.rs Show resolved Hide resolved
@ololobus ololobus requested a review from hlinnaka November 28, 2024 01:19
@tristan957
Copy link
Member

is it possible to add a test?

Copy link
Contributor

@hlinnaka hlinnaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good at high level. Please add test and address all the minor comments though

compute_tools/src/pg_helpers.rs Outdated Show resolved Hide resolved
compute_tools/src/spec.rs Outdated Show resolved Hide resolved
@hlinnaka
Copy link
Contributor

is it possible to add a test?

+1. The connstr_for_db function would be an ideal candidate for a unit test. Maybe max_service_connections too.

@hlinnaka
Copy link
Contributor

Let's put out the fire first, but let's also consider if this is actually an upstream bug in tokio-postgres, and submit a fix there too.

We use `set_path()` to replace the database name in the connection string.
It automatically does url-safe encoding if path is not already encoded, but
it does it as per the URL standard, which assumes that tabs can be safely
removed from the path without changing the meaning of the URL. See, e.g.,
https://url.spec.whatwg.org/#concept-basic-url-parser

Yet, this is not true for Postgres, where it's completely valid to have
trailing tabs in the database name. Here, we first encode the database name

Resolves neondatabase/cloud#20869
@ololobus ololobus force-pushed the alexk/compute-ctl-apply-fixes branch from 2509e3a to ff914f3 Compare November 28, 2024 14:46
@ololobus
Copy link
Member Author

ololobus commented Nov 28, 2024

+1. The connstr_for_db function would be an ideal candidate for a unit test. Maybe max_service_connections too.

Added a python test instead for now. It fails in main and passes with my changes. There are many places, and I'm afraid that with some future refactoring someone will use a set_path() again in some of the places. +1 for writing more unit tests and doing some further refactoring, but will keep it for further work

Also, this python test checks that we can actually create and configure databases via spec, which I think is good to catch regressions right in the neon test suite

@ololobus
Copy link
Member Author

Let's put out the fire first, but let's also consider if this is actually an upstream bug in tokio-postgres, and submit a fix there too.

@hlinnaka What do you mean by upstream bug in tokio-postgres? We used set_path(), which is a part of Url crate. I think it actually obeys the standard. After switching back to postgres/tokio-postgres::Config usage, it works just fine

@ololobus
Copy link
Member Author

Going to merge when CI is ready

test_runner/regress/test_compute_catalog.py Outdated Show resolved Hide resolved
test_runner/regress/test_compute_catalog.py Outdated Show resolved Hide resolved
compute_tools/src/compute.rs Show resolved Hide resolved
compute_tools/src/installed_extensions.rs Show resolved Hide resolved
compute_tools/src/pg_helpers.rs Outdated Show resolved Hide resolved
@hlinnaka
Copy link
Contributor

Let's put out the fire first, but let's also consider if this is actually an upstream bug in tokio-postgres, and submit a fix there too.

@hlinnaka What do you mean by upstream bug in tokio-postgres? We used set_path(), which is a part of Url crate. I think it actually obeys the standard. After switching back to postgres/tokio-postgres::Config usage, it works just fine

Oh right, I somehow thought this was in tokio-postgres. Never mind

@hlinnaka
Copy link
Contributor

hlinnaka commented Nov 28, 2024

        let base_url = url::Url::parse("postgres://user@example.com/basedb").unwrap();

        connstr_for_db(&base_url, "foo db").as_str()

returns postgres://user@example.com/foo+db. But at least psql interprets that as a + sign, not as space

@ololobus ololobus force-pushed the alexk/compute-ctl-apply-fixes branch from 3ee8e47 to 5b68765 Compare November 28, 2024 19:10
@ololobus
Copy link
Member Author

returns postgres://user@example.com/foo+db. But at least psql interprets that as a + sign, not as space

It's still bad. I think I've finally got rid of messing with urls everywhere, so just dropped this helper

@ololobus ololobus changed the title fix(compute_ctl): Allow usage of DB names with trailing whitespaces fix(compute_ctl): Allow usage of DB names with whitespaces Nov 28, 2024
compute_tools/src/compute.rs Show resolved Hide resolved
compute_tools/src/compute.rs Show resolved Hide resolved
compute_tools/src/compute.rs Show resolved Hide resolved
compute_tools/src/installed_extensions.rs Show resolved Hide resolved
@ololobus ololobus added this pull request to the merge queue Nov 28, 2024
Merged via the queue into main with commit 42fb3c4 Nov 28, 2024
80 checks passed
@ololobus ololobus deleted the alexk/compute-ctl-apply-fixes branch November 28, 2024 21:40
ololobus added a commit that referenced this pull request Nov 28, 2024
Unify the way we connect to Postgres:
1. Switch to building configs everywhere
2. Always set `application_name` and make naming consistent

Follow-up for #9919
github-merge-queue bot pushed a commit that referenced this pull request Nov 29, 2024
## Problem

It was not always possible to judge what exactly some `cloud_admin`
connections were doing because we didn't consistently set
`application_name` everywhere.

## Summary of changes

Unify the way we connect to Postgres:
1. Switch to building configs everywhere
2. Always set `application_name` and make naming consistent

Follow-up for #9919
Part of neondatabase/cloud#20948
awarus pushed a commit that referenced this pull request Dec 5, 2024
## Problem

We used `set_path()` to replace the database name in the connection
string. It automatically does url-safe encoding if the path is not
already encoded, but it does it as per the URL standard, which assumes
that tabs can be safely removed from the path without changing the
meaning of the URL. See, e.g.,
https://url.spec.whatwg.org/#concept-basic-url-parser. It also breaks
for DBs with properly %-encoded names, like with `%20`, as they are kept
intact, but actually should be escaped.

Yet, this is not true for Postgres, where it's completely valid to have
trailing tabs in the database name.

I think this is the PR that caused this regression
#9717, as it switched from
`postgres::config::Config` back to `set_path()`.

This was fixed a while ago already [1], btw, I just haven't added a test
to catch this regression back then :(

## Summary of changes

This commit changes the code back to use
`postgres/tokio_postgres::Config` everywhere.

While on it, also do some changes around, as I had to touch this code:
1. Bump some logging from `debug` to `info` in the spec apply path. We
do not use `debug` in prod, and it was tricky to understand what was
going on with this bug in prod.
2. Refactor configuration concurrency calculation code so it was
reusable. Yet, still keep `1` in the case of reconfiguration. The
database can be actively used at this moment, so we cannot guarantee
that there will be enough spare connection slots, and the underlying
code won't handle connection errors properly.
3. Simplify the installed extensions code. It was spawning a blocking
task inside async function, which doesn't make much sense. Instead, just
have a main sync function and call it with `spawn_blocking` in the API
code -- the only place we need it to be async.
4. Add regression python test to cover this and related problems in the
future. Also, add more extensive testing of schema dump and DBs and
roles listing API.

[1]:
4d1e48f
[2]:
https://www.postgresql.org/message-id/flat/20151023003445.931.91267%40wrigleys.postgresql.org

Resolves neondatabase/cloud#20869
awarus pushed a commit that referenced this pull request Dec 5, 2024
## Problem

It was not always possible to judge what exactly some `cloud_admin`
connections were doing because we didn't consistently set
`application_name` everywhere.

## Summary of changes

Unify the way we connect to Postgres:
1. Switch to building configs everywhere
2. Always set `application_name` and make naming consistent

Follow-up for #9919
Part of neondatabase/cloud#20948
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants