-
Notifications
You must be signed in to change notification settings - Fork 965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-enable 2FA metrics, speed up queries by denormalizing counts #11701
Comments
After chatting w/ @dstufft we came up with this plan: After looking at the query that was poorly performing, it appears that the cause is not the poor performance of
This is due to the expression for the warehouse/warehouse/accounts/models.py Lines 143 to 154 in a3e865e
So instead of attempting to add triggers and events for counting each of these attributes in the database, we'll instead simplify this expression by denormalizing a The steps towards this are roughly:
Optionally, if we find that the task is still running too long, we can make changes to add a readonly setting to tasks similar to what we have for views: https://github.com/pypi/warehouse/blob/main/warehouse/db.py#L190-L194 |
Double check that we're on PG 11+, prior to PG 11 adding a column with a default wasn't safe, but in PG 11 it became safe.
Just make sure that you chunk the data backfill, and that your update query ignores things that have already been migrated. That may be easier to do with 3 migrations:
I also just looked, our context.configure(
connection=connection,
target_metadata=db.metadata,
compare_server_default=True,
)
with context.begin_transaction():
context.run_migrations() In the context.configure(
connection=connection,
target_metadata=db.metadata,
compare_server_default=True,
transaction_per_migration=True,
)
with context.begin_transaction():
context.run_migrations() This will make the transaction opened by I also see that there's actually a mode to let you turn on PG's auto commit inside of a migration: def upgrade():
# Every statement inside of the autocommit block, will be automatically committed.
with op.get_context().autocommit_block():
# We need to use get_bind() here because op.execute doesn't return a result.
conn = op.get_bind()
# Update users in chunks, until no more users have a null has_webauthn
while True:
r = conn.execute("UPDATE ... WHERE id IN (SELECT id FROM users WHERE has_webauthn IS NULL LIMIT 1000)")
if r.rowcount == 0:
break
You can refactor the @hybrid_property
def has_two_factor(self):
return self.totp_secret is not None or self.has_webauthn
@has_two_factor.expression
def has_two_factor(self):
return or_(self.totp_secret_is_not(None), self.has_webauthn) Then just continue to use |
Also to be clear, I didn't actually |
The local dev database obtained with Again, in development: The query performs two sequential scans, one on |
In this case, that one query was taking almost 6 minutes to run, there were other queries, and the task was running every 5 minutes. |
There are currently 604,282 users in production (these stats are on the front page of pypi.org) and 4210 rows in the |
Just because I nerd sniped myself into seeing if I could optimize that query without denormalization:
346089ms down to 1862ms with the same results, no denormalization, no schema changes at all. |
Nice! From this analyze I would hazard that the added index on |
I don't think it would, because the filter on I managed to get it down to 326ms though:
|
With that last query, you can possibly speed it up with an index on However, maintaining indexes isn't free, and either 1862ms or 326ms in a background task every 5 minutes is more than fast enough. |
We ended up not doing most of this, and just optimizing the queries with joins instead. |
In #11699 we disabled the 2FA metrics temporarily due to DB performance issues.
We should improve the count performance with something similar to #745 and re-enable the metrics.
The text was updated successfully, but these errors were encountered: