-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set table collation to match core tables #329
Conversation
@totten it seems that this humble patch has sparked a much bigger discussion full of utopian dreams and aspirations... So while I really do like the direction of the utopian discussions, how about we merge this in the meantime. |
@colemanw OK, I want this to makes sense. On general principle, the thing to consider is whether this will be bouncy (fix one environment, break another). My best rationalization in favor it is:
But looking at the change from 5.33... and looking at current upgrader/status-checks... it doesn't look like we ever fixed the character-sets/collations of databases on historical data. If you originally installed CiviCRM 4.6 or 5.0 or 5.10 (or anything up to 5.33), then all your tables were initialized with Doesn't this just shift the bad-schema-situation to different users? (cc @seamuslee001 b/c I think you all followed the history of utf8(mb4) more) Aside: I think I have a better way to maintain/hook-into the |
Well, I think that ship might have already sailed. Because if we wanted to avoid forcing the collation when we add new tables, then why did we do this? And this? |
This uses the new syntax generated by totten/civix#329
@colemanw @totten I remember a time when we had specified UTF8 in lots of extensions and it caused major problems when switching systems to utf8mb4 (because the SQL forced the extension to UTF8 instead of using the database default which was set to utf8mb4). So the decision at the time was to specifically exclude the database collation from the SQL and rely on you setting the default collation on the database once only. That allows for an easy future change of collation and also makes it easier if someone decides they need to run with another collation. I think there may have been some issues with forcing ROW_FORMAT as well. @artfulrobot I seem to remember you having some thoughts around this. |
@mattwire you're correct about the history, but that has left us in a place of sometimes relying on db default and sometimes not. After a long discussion about it on ~dev this is what we've come to:
|
I'm all in favour of 2 and 3. Just not sure that 1. is a good idea. I have over 100 extensions on my local dev environment and none of them specify database charset/collation in their SQL except some really old ones that specify utf8. I'd rather not have to go through a round of updating sql files only to be replaced with 2 and 3 and potentially require another round of updates. |
An aside to this, and probably more important as it's certainly tripped me up a few times is when extensions have tables that don't start with |
Cool. I guess I didn't see this PR so much as "require a bunch of extra work from ext maintainers" as "prevent extra work or regressions" because at this point we've patched all core extensions to be consistent with core but if we don't merge this update and someone reruns I think it's otherwise fine for you to sit on your 100 extensions and not update them until the full fix is ready. |
I have this note to self:
As for |
Gawd why do we bother maintaining a complete list of all dao tables when stuff like that just ignores it in favor of string-fu? |
Wow that's a lot! By my calculations that's But yea we should take into account that some super-smart admins might want to change row format and not be too restrictive about that. Or if we were super-smart maybe we could come up with some better defaults for that setting per-table. |
Yeah.. then we have things like |
And yet it's still a tiny number compared to the possible moves in a game of Go - although to be fair, that number is calculated to exceed the number of atoms in the universe, so... 😆 |
This uses the new syntax generated by totten/civix#329
Previously replaced by #331 |
Overview
Fixes inconsistent collation between extension tables (including core extensions like SearchKit) and core tables.
See discussion at https://chat.civicrm.org/civicrm/pl/g3jt55akb3rk8dzgtqcr3bjjqy
Before
ENGINE=InnoDB DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC;
ENGINE=InnoDB
After
ENGINE=InnoDB DEFAULT CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC;