Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(WIP) CRM-16860 - Schema-upgrader should be merge-friendly #7440

Closed
wants to merge 15 commits into from

Conversation

totten
Copy link
Member

@totten totten commented Dec 12, 2015

This refactors the CRM_Upgrade system that performs schema updates. The major goals:

  • All upgrade steps should be separate files. (Contrast: Old upgrader requires all point-releases to have PHP functions in the same file.)
  • Adjoining upgrade steps should be displayed in-order, side-by-side. (Contrast: Old upgrader splits files for the same point-releases into separate php/ and sql/ folders.)
  • If a new upgrade file is merged after some delay, it should still be recognized and merged, regardless of whether/how the version numbers have changed.

This creates a directory structure like:

## Formula (New Code; v4.7+)
CRM/Upgrade/Steps/{MAJOR}/{COUNTER}_{DescriptiveName}.up.php

## Formula  (Legacy Code)
CRM/Upgrade/Steps/{MAJOR}_All.up.php

## Examples
CRM/Upgrade/Steps/45_All.up.php (CRM_Upgrade_Steps_45_All; legacy)
CRM/Upgrade/Steps/46_All.up.php (CRM_Upgrade_Steps_46_All; legacy)
CRM/Upgrade/Steps/47/100_FixFoo.up.php (CRM_Upgrade_Steps_47_100_FixFoo)
CRM/Upgrade/Steps/47/100_UpdateBar.up.php (CRM_Upgrade_Steps_47_100_UpdateBar)
CRM/Upgrade/Steps/47/205_TwiddleBaz.up.php (CRM_Upgrade_Steps_47_205_TwiddleBaz)

WIP. TODO:

  • Figure out CRM_Upgrade_Form::setSchemaStructureTables($rev)
  • Some kind of test coverage

Move helper functions that are specific to revision-based approach:

 * buildQueue
 * doIncrementalUpgradeStart
 * doIncrementalUpgradeStep
 * doIncrementalUpgradeFinish
This is a cleaner high-level contract (less side-effecty, more accurate
name, not tied to "$rev" concept).
The upgrade-handler class should no longer be determined by version.  These
classes are standalone entities (independent of version).
This function just calls `CRM_Core_DAO::executeQuery`, and it's only
used once.
We won't care about sequential list of revision numbers.  Instead, we'll
want a list of named upgrade steps.
@totten
Copy link
Member Author

totten commented Dec 12, 2015

@colemanw @eileenmcnaughton - This is a replacement for #7421.

…00_Legacy.

Since 4.7 will be the new stable, any maintenance upgrades will go to 4.7.
We want those to use the new merge-friendly upgrade scheme, so we should put
all 4.7-related changes in a subdirectory to set the proper atmosphere.
@totten
Copy link
Member Author

totten commented Dec 14, 2015

Recent updates to PR:

  • Record which steps have run (and avoid rerunning them).
  • Identity and (almost) run CRM/Upgrade/Steps/*.mysql.tpl files.

Still haven't sorted out how to handle SchemaStructure stuff -- it seems to be based on the assumption of linear upgrade revisions. There are two spots with FIXME notes where I couldn't imitate the old calls to SchemaStructure. (FIXMEs in RevisionBase and SqlStep)

@totten
Copy link
Member Author

totten commented Dec 14, 2015

Started analyzing how SchemaStructure works. It seems like any code that relies on version-based SchemaStructure would necessarily call getLatestSchema($version) (or, equivalently, use the string .../I18n/SchemaStructure_...php).

List of use-cases per grep: https://gist.github.com/totten/1102df4870b15fa5fc1e

@eileenmcnaughton
Copy link
Contributor

@totten does the tracking go as far as cross version? The example I'm thinking of is a missing index. I want to add it to my 4.6 sites & not have the upgrade break when I get to 4.7. Let's pretend I was dealing with 4.7 & 4.8 since this patch is not in 4.7.

Could I backport a mysql from 4.8 & run it against my 4.7 Db & have the db track it having already run?

@totten
Copy link
Member Author

totten commented Dec 15, 2015

Yes. Supposing this upgrade-manager is used in both 4.7 and 4.8:

  1. You create a new file CRM/Upgrade/Steps/48/100_AddIndex.up.php in the 4.8 branch.
  2. Then you back-port the same file to the 4.7 branch. (Or, equivalently, forward-port to the 4.9 branch.)
  3. Now, under any upgrade path (staying on 4.7.x, or going straight to 4.8.x, or going straight to 4.9.x, or gradually shifting among various point-releases of 4.7.x=>4.8.x=>4.9.x), this step (48/100_AddIndex.up.php) would execute only one time.

When the step executes, it stores a note (INSERT INTO civicrm_upgrade VALUES ('civicrm', '48/100_AddIndex.up.php)). As long as this note is present, it won't try to re-create the index in the future.

@litespeedmarc
Copy link
Contributor

@totten , I happen to be going through the backlog to pull requests, hitting this one. Thought maybe we could give it a heartbeat & comments on next steps, and potential contributions & contributors (dev & QA). Considering other changes (e.g., staged testing), maybe this even becomes less a priority, and can be moved to a "indefinite parking lot" (if there is such a thing)?

From a personal standpoint, ideally, if something isn't going to happen (which maybe the case here), I prefer to close them.

@litespeedmarc litespeedmarc added the needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state label Sep 27, 2016
@litespeedmarc litespeedmarc changed the title WIP - CRM-16860 - Schema-upgrader should be merge-friendly (WIP) CRM-16860 - Schema-upgrader should be merge-friendly Sep 27, 2016
@litespeedmarc litespeedmarc added needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state and removed needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state labels Sep 27, 2016
@eileenmcnaughton
Copy link
Contributor

@totten close & re-open later?

@eileenmcnaughton
Copy link
Contributor

Closing per above comments until someone is ready to get this review-ready

@colemanw
Copy link
Member

@totten can you rebase this please? I will be reviewing this PR to see if it can improve our upgrade system for the 5.x release cycle.

@eileenmcnaughton
Copy link
Contributor

@colemanw @totten - I'm closing this & @totten can re-open when he DOES rebase it....

@totten
Copy link
Member Author

totten commented Mar 21, 2018

A few quick reactions on reopening/updating this:

  1. The main issue from before still remains -- understanding how SchemaStructure should work in this reorganized upgrader.

  2. We now have civicrm-dev-docs. Some of the material (esp CRM/Upgrade/Steps/README.md) should be migrated there and expanded.

  3. The file-naming convention for new upgrades looks like CRM/Upgrade/Steps/47/100_FixFoo.up.php. Using a folder name like 47 worked when we iterated the second digit infrequently, but it should be reconsidered post-5.x.0. We could aim for names like CRM/Upgrade/Steps/4.7/100_FixFoo.up.php, CRM/Upgrade/Steps/4_7/100_FixFoo.up.php, CRM/Upgrade/Steps/4/7/100_FixFoo.up.php, or CRM/Upgrade/Steps/04007/100_FixFoo.up.php; the first two would require some changes to parsing/sorting logic; the last looks a little funny. But I don't care too much between them - as long as they work.

  4. I don't know if git is clever enough to the renamed files will work nicely after a rebase/merge. Only way to find out is to try.

  5. The PR discusison is short on QA. In particular: given the #SLOC, it's hard to read and be confident in the changes. How do we know that the upgrade still works correctly? I'd like to have a procedure/script for something like this:

    • Take a bunch of DB snapshots from past releases. There are synthetic ones in civicrm-upgrade-test (4.4.0-setupsh.sql.bz2, 4.6.0-setupsh.sql.bz2, etal), and it might be nice to grab a couple real ones. (Call these original snapshots.)
    • Setup current master code (without the patchset). For each original snapshot, run the upgrade and save a new snapshot. (Call these upgraded-old snapshots.)
    • Apply the patchset. For each original snapshot, run the upgrade and save a new snapshot. (Call these upgraded-new snapshots.)
    • Compare the upgraded-old and upgraded-new snapshots. They should be the same, notwithstanding some timestamps and the table civicrm_upgrade.

@totten
Copy link
Member Author

totten commented Mar 23, 2018

Trying to digest/process a bit more about the multilingual schema stuff ties in here. This is a bit scratchpad with thoughts going in different directions.


A. On general order

Trying to rephrase what's going on with this patchset... an upgrade is basically a bunch of steps, executed in order. CRM_Upgrade was originally written with the assumption that upgrade steps are clearly, linearly, universally ordered in line with the increments of a version number. It would simulate a sequence of incremental version upgrades.

This PR is based on the critique that:

  1. Upgrade steps aren't really so linear because merge workflows, backporting, etc mean that versions/builds don't form a true sequence.
  2. It usually doesn't matter because there's a lot of tolerance for small variations in order among roughly contemporaneous steps.
  3. We should not try to simulate all the intermediate version#s because doing makes the merge workflows more complicated.

Of course, we still need some way to sort upgrade steps. The PR (basically) changes the sort-key from "version number" to "filename" (and allows each version to have different/imperfectly-overlapping sets of files).

I'm still ambivalent about the second claim that there's a lot of tolerance for small variations in order. It feels mostly true, and we can help make it true at a policy level (e.g. by avoiding backports that define upgrade steps or change schema), but it's hard to prove or be sure. If we really needed something more technically exacting, we could allow each step to express dependencies (and use some kind of topological sort). But I feel a little YAGNI on that -- tempted to wait for more evidence of the need.


B. On styles of upgrade logic

When upgrading a dataset, you can describe the upgrade imperatively (list of steps) or declaratively (describe the desired end-state).

Declarative style is nice because it's almost automatic -- you don't have to explicitly write a list of steps; it just compares the start-state with the end-state and automatically picks a set of changes to reconcile any differences. CRM_Core_I18n_Schema is an example of declarative style.

But it's hard to generalize and automatically reconcile all schema changes. For example, suppose a new release converts a numeric column location_type_id to an alphanumeric location_type. A naive reconciliation algorithm would drop the old column, make a new column, and destroy all your data in the process. You have to provide a mapping that converts old location_type_id values to location_type values.

The imperative style is less automatic (you have to explicitly write each change), but it's fairly easy to improvise and define exactly the migration that you need.

The existing logic is therefore a hybrid. It's mostly imperative, but (in between steps) it does a bit of declarative reconciliation for multilingual schema.


C. Multilingual schema changes as concrete steps

One way to understand the current operation of SchemaStructure is to describe it as part of the sequence of upgrade steps. Consider an upgrade from 5.1.1 to 5.3.4:

  • upgrade_5_1_2()
  • upgrade_5_2_alpha1()
  • upgrade_5_2_0()
  • upgrade_5_2_1()
  • upgrade_5_2_2()
  • upgrade_5_3_alpha1()
  • upgrade_5_3_0()
  • upgrade_5_4_alpha1()
  • upgrade_5_4_1()
  • upgrade_5_4_2()
  • upgrade_5_4_3()

The multilingual bit adds some extra calls after each step (to CRM_Core_I18n_Schema::rebuildMultilingualSchema(...$rev...) and/or
CRM_Core_I18n_Schema::schemaStructureTables($rev,TRUE)). This is like adding more steps to the sequence.

  • upgrade_5_1_2()
  • rebuildMultilingualSchema(5.1.2)
  • upgrade_5_2_alpha1()
  • rebuildMultilingualSchema(5.2.alpha1)
  • upgrade_5_2_0()
  • rebuildMultilingualSchema(5.2.0)
  • upgrade_5_2_1()
  • rebuildMultilingualSchema(5.2.1)
  • upgrade_5_2_2()
  • rebuildMultilingualSchema(5.2.2)
  • upgrade_5_3_alpha1()
  • rebuildMultilingualSchema(5.3.alpha1)
  • upgrade_5_3_0()
  • rebuildMultilingualSchema(5.3.0)
  • upgrade_5_4_alpha1()
  • rebuildMultilingualSchema(5.4.alpha1)
  • upgrade_5_4_1()
  • rebuildMultilingualSchema(5.4.1)
  • upgrade_5_4_2()
  • rebuildMultilingualSchema(5.4.2)
  • upgrade_5_4_3()
  • rebuildMultilingualSchema(5.4.3)

But in practice, we don't have schema snapshots for every incremental step. (There are only a few files in CRM/Core/I18n/SchemaStructure_*.php.) So most of those
steps don't actually do anything. The sequence looks more like this:

  • upgrade_5_1_2()
  • upgrade_5_2_alpha1()
  • rebuildMultilingualSchema(5.2.alpha1)
  • upgrade_5_2_0()
  • upgrade_5_2_1()
  • upgrade_5_2_2()
  • upgrade_5_3_alpha1()
  • rebuildMultilingualSchema(5.3.alpha1)
  • upgrade_5_3_0()
  • upgrade_5_4_alpha1()
  • upgrade_5_4_1()
  • upgrade_5_4_2()
  • upgrade_5_4_3()
  • rebuildMultilingualSchema(5.4.3)

This analysis is appealing... because it leads to one fairly simple solution: whenever one feels inclined to make a new snapshot of the schema, just do it a little differently:

  • Don't create CRM/Core/I18n/SchemaStructure_5_2_0.php
  • Do create CRM/Upgrade/Steps/2018/300_SchemaStructure.php

The file would be substantively similar, but it could use the "filename" as the key instead of "version number". Overall, the upgrade steps (when keyed by filename) would look more like:

  • CRM_Upgrade_Step_2018_100_Foo
  • CRM_Upgrade_Step_2018_110_FooBar
  • CRM_Upgrade_Step_2018_300_SchemaStructure
  • CRM_Upgrade_Step_2018_310_Whiz
  • CRM_Upgrade_Step_2018_320_Bang
  • CRM_Upgrade_Step_2018_400_SchemaStructure
  • CRM_Upgrade_Step_2018_500_Foo
  • CRM_Upgrade_Step_2018_510_Foo
  • CRM_Upgrade_Step_2018_510_Foo
  • CRM_Upgrade_Step_2018_600_FooBar
  • CRM_Upgrade_Step_2018_700_SchemaStructure

However, I'm not certain this addresses every edge-case of multilingual. In particular, I think it would address the calls to CRM_Core_I18n_Schema::rebuildMultilingualSchema(...$rev...) but I'm not sure it could be a substitute for CRM_Core_I18n_Schema::schemaStructureTables($rev,TRUE).


D. On new upgrade steps vs old upgrade steps

After writing the original PR, I sorta regretted all the work that went into refactoring the old upgrade logic. That felt like the most brittle part of the PR, and (after a couple years) the content would fade into irrelevance. A narrower patch could be focused on adding the CRM/Upgrade/Step folder and putting those in the upgrade queue.

@totten
Copy link
Member Author

totten commented Mar 23, 2018

E. schemaStructureTables($rev,TRUE): Experiment

Continuing the open question from section C above...

I'm not certain this addresses every edge-case of multilingual. In particular, I think it would address the calls to CRM_Core_I18n_Schema::rebuildMultilingualSchema(...$rev...) but I'm not sure it could be a substitute for CRM_Core_I18n_Schema::schemaStructureTables($rev,TRUE).

I did an experiment to get a more concrete understanding of how schemaStructureTables($rev,TRUE) matters. Just run an upgrade from the snapshot 4.2.9-multilingual_af_bg_en.mysql.bz2 with and without the call. Roughly:

civibuild ut dmaster 4.2.9-multi* ; amp sql:dump -r ~/bknix/build/dmaster -N civi  > /tmp/orig-429ml-master.sql

# Apply patch to disable schemaStructureTables($rev,TRUE) during upgrade: https://gist.github.com/totten/19617abc84124d7cfb328c613f7d05a2

civibuild ut dmaster 4.2.9-multi* ; amp sql:dump -r ~/bknix/build/dmaster -N civi  > /tmp/new-429ml-master.sql

# To make the files more readable/diffable, reformat the INSERTS to use multiple lines. Search for "),(" and replace with "),\n(".

diff -u /tmp/orig-429ml-master.sql /tmp/new-429ml-master.sql

Notwithstanding some timestamps, the upgrade results were the same (with or without schemaStructureTables($rev,TRUE)). Which seems to suggest it doesn't matter; we don't need to use it for future upgrade logic. But how can it be that it doesn't matter?

F. schemaStructureTables($rev,TRUE): Analysis

Why was it ever added? Why doesn't it matter now? My theory: This was just a special case of a more general problem that we've mitigated by other means.

What does schemaStructureTables(...) do? It works in two ways:

  • If run as CRM_Core_I18n_Schema::schemaStructureTables(), it returns the list of multilingual tables/columns.
  • If run as CRM_Core_I18n_Schema::schemaStructureTables('4.2.10', TRUE) (as in the upgrade), it sets the currently active list of multilingual tables/columns to match a given version (s.t. future calls to schemaStructureTables() return a slightly different mix of tables/columns).

The upgrader calls schemaStructureTables($rev,TRUE) during each incremental/simulated upgrade step. This makes the simulation of incremental upgrading more complete -- at each step, you get the metadata for the simulated version.

Upgrading+metadata is a more general problem. Several parts of the system (not just multilingual; also settings and various APIs/BAOs/DAOs) rely on metadata, and (for a period of time, several years ago) we had a few bugs because upgrade logic relied on unstable metadata. To avoid future bugs, we made a general KISS recommendation for upgrade logic: Prefer simple SQL semantics over API/BAO/DAO. If you keep the upgrade-logic simple+concrete, then it doesn't matter whether the active metadata reflects the old/pre-upgrade system, the new/post-upgrade system, or some intermediate form.

Returning the question: How can it be that schemaStructureTables($rev,TRUE) doesn't matter? We changed general practice to prevent similar issues in new upgrade-steps, and (after a few years) the old upgrade-steps which needed it have faded into irrelevance.

@seamuslee001
Copy link
Contributor

@totten One of the problems i have noticed with the multilingual upgrade and i have tried to mitigate it by having an isUpgradeMode on the views re-create. The problem i was seeing was this:

Last known state - 4.7.alpha1
Add column to civicrm_option_value in 4.7.12 for example (not sure if actually real) - call rebuildMultilingaulSchema - works when upgrading to 4.7.12
Add column to civicrm_option_value in 4.7.15
the rebuildMultilingualScheam call in the 4.7.12 upgrade function now fails. This is because the committed DAOs have been updated to believe both columns exist. So because no known schema exists the schemaStructureTables() goes searching for tables and columns that need special handling through the DAOs.

The build Views function then fails because it constructs a SELECT statement when going through the 4.7.12 stage which includes the column that is added in 4.7.15 because the DAOs have said this column must be in the database.

My solution is probably a hack and probably caused issues with the rebuilding of the schema properly but its not been clear what is the "correct" solution. I think one solution could be is that if we are able to make use off the <add> key in the xml and add it to the DAO information (we would need to expand to 3 digits possibly that add column. That would then allow us to at least filter the columns when building the schemaStructureTables to the correct version level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
master needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants