Skip to content

Commit

Permalink
Add notes on created referenced columns.
Browse files Browse the repository at this point in the history
In #9 I benchmarked how long it would take to create a referenced column depending on the amount of data in the tables. It became clear that if the table is empty, it does not
lock both tables for long at all even when validating, presumably because there isn't anything to validate.

However, once there is data in the table the time will start to matter much more. At a scale of 1 million records to validate during column creation, it could be ~50 milliseconds which may not be noticeable enough. However at the scale of 100 million records, it can take seconds which will likely cause concurrent writes to time out. Therefore, err on the side of safety and separate constraint validation from referenced column creation when there is any data in the table.
  • Loading branch information
dbernheisel authored Aug 17, 2023
1 parent 869c425 commit a2f2441
Showing 1 changed file with 15 additions and 1 deletion.
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Adding a foreign key blocks writes on both tables.
def change do
alter table("posts") do
add :group_id, references("groups")
# Obtains a ShareRowExclusiveLock which blocks writes on both tables
end
end
```
Expand All @@ -106,6 +107,7 @@ In the first migration
def change do
alter table("posts") do
add :group_id, references("groups", validate: false)
# Obtains a ShareRowExclusiveLock which blocks writes on both tables.
end
end
```
Expand All @@ -115,10 +117,22 @@ In the second migration
```elixir
def change do
execute "ALTER TABLE posts VALIDATE CONSTRAINT group_id_fkey", ""
# Obtains a ShareUpdateExclusiveLock which doesn't block reads or writes
end
```

These migrations can be in the same deployment, but make sure they are separate migrations.
These migrations can be in the same deployment, but make sure they are separate migrations.

**Note on empty tables**: when the table creating the referenced column is empty, you may be able to
create the column and validate at the same time since the time difference would be milliseconds
which may not be noticeable, no matter if you have 1 million or 100 million records in the referenced table.

**Note on populated tables**: the biggest difference depends on your scale. For 1 million records in
both tables, you may lock writes to both tables when creating the column for milliseconds
(you should benchmark for yourself) which could be acceptable for you. However, once your table has
100+ million records, the difference becomes seconds which is more likely to be felt and cause timeouts.
The differentiating metric is the time that both tables are locked from writes. Therefore, err on the side
of safety and separate constraint validation from referenced column creation when there is any data in the table.

---

Expand Down

0 comments on commit a2f2441

Please sign in to comment.