-
Notifications
You must be signed in to change notification settings - Fork 6
Index Prefix
Design
For high cardinality text types, any index should strongly consider using a prefix limiter to avoid overly large indexes.
Example:
CREATE TABLE a (
# ... other columns
name VARCHAR(255),
KEY idx_name (name)
);
would be better written as this:
CREATE TABLE a (
# ... other columns
name VARCHAR(255),
KEY idx_name (name(12))
);
This avoids overly large indexes since the first 12 characters will likely provide enough cardinality to provide a fast lookup without wasting disk space.
This is particularly relevant if you have a compound index and the string column is a later column in the index:
KEY idx_compound (id, int_value, string_value)
would be better like this:
KEY idx_compound (id, int_value, string_value(12))
As most of the filtering should have been done by the first two field columns (as long as they have a decent cardinality) so cardinality is slightly less important than the last field.
ALTER TABLE a DROP INDEX idx_name ADD INDEX idx_name (name(12));
Sometimes, if query that a text field is part of returns only the columns in the index, then there might be a good use-case for keeping the full column value in the index.
CREATE TABLE a (
# ... other columns
name VARCHAR(255),
city VARCHAR(30),
postal_code VARCHAR(10),
KEY idx_compound (city, postal_code, name)
);
# This will only need to use the index to collect the results and never have to lookup the actual table entries
SELECT name, city, postal_code
FROM a
WHERE postal_code = "AAAAA" AND city = "Paris" AND name = "Bob";
# This will still need to lookup the table records as `some_other_column` is not part of the index
SELECT some_other_column, name, city, postal_code
FROM a
WHERE postal_code = "AAAAA" AND city = "Paris" AND name = "Bob";
If we actually do implement an index prefix here like this:
CREATE TABLE a (
# ... other columns
name VARCHAR(255),
city VARCHAR(30),
postal_code VARCHAR(10),
KEY idx_compound (city, postal_code, name(32))
);
Then this query won't be able to use just the index to return the result as name
will not longer be completely stored in the index (only a subset of it) which means it will still need to lookup the table records.