From a2f24415874d21ef8bfe4148d2b0dd96008e7046 Mon Sep 17 00:00:00 2001 From: David Bernheisel Date: Thu, 17 Aug 2023 13:59:10 -0400 Subject: [PATCH] Add notes on created referenced columns. In #9 I benchmarked how long it would take to create a referenced column depending on the amount of data in the tables. It became clear that if the table is empty, it does not lock both tables for long at all even when validating, presumably because there isn't anything to validate. However, once there is data in the table the time will start to matter much more. At a scale of 1 million records to validate during column creation, it could be ~50 milliseconds which may not be noticeable enough. However at the scale of 100 million records, it can take seconds which will likely cause concurrent writes to time out. Therefore, err on the side of safety and separate constraint validation from referenced column creation when there is any data in the table. --- README.md | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 6f86159..6360f22 100644 --- a/README.md +++ b/README.md @@ -93,6 +93,7 @@ Adding a foreign key blocks writes on both tables. def change do alter table("posts") do add :group_id, references("groups") + # Obtains a ShareRowExclusiveLock which blocks writes on both tables end end ``` @@ -106,6 +107,7 @@ In the first migration def change do alter table("posts") do add :group_id, references("groups", validate: false) + # Obtains a ShareRowExclusiveLock which blocks writes on both tables. end end ``` @@ -115,10 +117,22 @@ In the second migration ```elixir def change do execute "ALTER TABLE posts VALIDATE CONSTRAINT group_id_fkey", "" + # Obtains a ShareUpdateExclusiveLock which doesn't block reads or writes end ``` - These migrations can be in the same deployment, but make sure they are separate migrations. +These migrations can be in the same deployment, but make sure they are separate migrations. + +**Note on empty tables**: when the table creating the referenced column is empty, you may be able to +create the column and validate at the same time since the time difference would be milliseconds +which may not be noticeable, no matter if you have 1 million or 100 million records in the referenced table. + +**Note on populated tables**: the biggest difference depends on your scale. For 1 million records in +both tables, you may lock writes to both tables when creating the column for milliseconds +(you should benchmark for yourself) which could be acceptable for you. However, once your table has +100+ million records, the difference becomes seconds which is more likely to be felt and cause timeouts. +The differentiating metric is the time that both tables are locked from writes. Therefore, err on the side +of safety and separate constraint validation from referenced column creation when there is any data in the table. ---