(WIP) CRM-16860 - Schema-upgrader should be merge-friendly #7440

totten · 2015-12-12T09:28:19Z

This refactors the CRM_Upgrade system that performs schema updates. The major goals:

All upgrade steps should be separate files. (Contrast: Old upgrader requires all point-releases to have PHP functions in the same file.)
Adjoining upgrade steps should be displayed in-order, side-by-side. (Contrast: Old upgrader splits files for the same point-releases into separate php/ and sql/ folders.)
If a new upgrade file is merged after some delay, it should still be recognized and merged, regardless of whether/how the version numbers have changed.

This creates a directory structure like:

## Formula (New Code; v4.7+)
CRM/Upgrade/Steps/{MAJOR}/{COUNTER}_{DescriptiveName}.up.php

## Formula  (Legacy Code)
CRM/Upgrade/Steps/{MAJOR}_All.up.php

## Examples
CRM/Upgrade/Steps/45_All.up.php (CRM_Upgrade_Steps_45_All; legacy)
CRM/Upgrade/Steps/46_All.up.php (CRM_Upgrade_Steps_46_All; legacy)
CRM/Upgrade/Steps/47/100_FixFoo.up.php (CRM_Upgrade_Steps_47_100_FixFoo)
CRM/Upgrade/Steps/47/100_UpdateBar.up.php (CRM_Upgrade_Steps_47_100_UpdateBar)
CRM/Upgrade/Steps/47/205_TwiddleBaz.up.php (CRM_Upgrade_Steps_47_205_TwiddleBaz)

WIP. TODO:

Figure out CRM_Upgrade_Form::setSchemaStructureTables($rev)
Some kind of test coverage

CRM-16860: Make upgrade code more developer-friendly

Move helper functions that are specific to revision-based approach: * buildQueue * doIncrementalUpgradeStart * doIncrementalUpgradeStep * doIncrementalUpgradeFinish

This is a cleaner high-level contract (less side-effecty, more accurate name, not tied to "$rev" concept).

The upgrade-handler class should no longer be determined by version. These classes are standalone entities (independent of version).

This function just calls `CRM_Core_DAO::executeQuery`, and it's only used once.

We won't care about sequential list of revision numbers. Instead, we'll want a list of named upgrade steps.

totten · 2015-12-12T09:29:46Z

@colemanw @eileenmcnaughton - This is a replacement for #7421.

…00_Legacy. Since 4.7 will be the new stable, any maintenance upgrades will go to 4.7. We want those to use the new merge-friendly upgrade scheme, so we should put all 4.7-related changes in a subdirectory to set the proper atmosphere.

totten · 2015-12-14T04:49:18Z

Recent updates to PR:

Record which steps have run (and avoid rerunning them).
Identity and (almost) run CRM/Upgrade/Steps/*.mysql.tpl files.

Still haven't sorted out how to handle SchemaStructure stuff -- it seems to be based on the assumption of linear upgrade revisions. There are two spots with FIXME notes where I couldn't imitate the old calls to SchemaStructure. (FIXMEs in RevisionBase and SqlStep)

totten · 2015-12-14T19:58:22Z

Started analyzing how SchemaStructure works. It seems like any code that relies on version-based SchemaStructure would necessarily call getLatestSchema($version) (or, equivalently, use the string .../I18n/SchemaStructure_...php).

List of use-cases per grep: https://gist.github.com/totten/1102df4870b15fa5fc1e

eileenmcnaughton · 2015-12-15T02:01:31Z

@totten does the tracking go as far as cross version? The example I'm thinking of is a missing index. I want to add it to my 4.6 sites & not have the upgrade break when I get to 4.7. Let's pretend I was dealing with 4.7 & 4.8 since this patch is not in 4.7.

Could I backport a mysql from 4.8 & run it against my 4.7 Db & have the db track it having already run?

totten · 2015-12-15T03:56:59Z

Yes. Supposing this upgrade-manager is used in both 4.7 and 4.8:

You create a new file CRM/Upgrade/Steps/48/100_AddIndex.up.php in the 4.8 branch.
Then you back-port the same file to the 4.7 branch. (Or, equivalently, forward-port to the 4.9 branch.)
Now, under any upgrade path (staying on 4.7.x, or going straight to 4.8.x, or going straight to 4.9.x, or gradually shifting among various point-releases of 4.7.x=>4.8.x=>4.9.x), this step (48/100_AddIndex.up.php) would execute only one time.

When the step executes, it stores a note (INSERT INTO civicrm_upgrade VALUES ('civicrm', '48/100_AddIndex.up.php)). As long as this note is present, it won't try to re-create the index in the future.

litespeedmarc · 2016-09-26T19:45:06Z

@totten , I happen to be going through the backlog to pull requests, hitting this one. Thought maybe we could give it a heartbeat & comments on next steps, and potential contributions & contributors (dev & QA). Considering other changes (e.g., staged testing), maybe this even becomes less a priority, and can be moved to a "indefinite parking lot" (if there is such a thing)?

From a personal standpoint, ideally, if something isn't going to happen (which maybe the case here), I prefer to close them.

eileenmcnaughton · 2017-04-25T10:28:22Z

@totten close & re-open later?

eileenmcnaughton · 2017-05-03T18:39:39Z

Closing per above comments until someone is ready to get this review-ready

colemanw · 2018-03-15T02:07:21Z

@totten can you rebase this please? I will be reviewing this PR to see if it can improve our upgrade system for the 5.x release cycle.

eileenmcnaughton · 2018-03-19T08:32:54Z

@colemanw @totten - I'm closing this & @totten can re-open when he DOES rebase it....

totten · 2018-03-21T20:19:21Z

A few quick reactions on reopening/updating this:

The main issue from before still remains -- understanding how SchemaStructure should work in this reorganized upgrader.
We now have civicrm-dev-docs. Some of the material (esp CRM/Upgrade/Steps/README.md) should be migrated there and expanded.
The file-naming convention for new upgrades looks like CRM/Upgrade/Steps/47/100_FixFoo.up.php. Using a folder name like 47 worked when we iterated the second digit infrequently, but it should be reconsidered post-5.x.0. We could aim for names like CRM/Upgrade/Steps/4.7/100_FixFoo.up.php, CRM/Upgrade/Steps/4_7/100_FixFoo.up.php, CRM/Upgrade/Steps/4/7/100_FixFoo.up.php, or CRM/Upgrade/Steps/04007/100_FixFoo.up.php; the first two would require some changes to parsing/sorting logic; the last looks a little funny. But I don't care too much between them - as long as they work.
I don't know if git is clever enough to the renamed files will work nicely after a rebase/merge. Only way to find out is to try.
The PR discusison is short on QA. In particular: given the #SLOC, it's hard to read and be confident in the changes. How do we know that the upgrade still works correctly? I'd like to have a procedure/script for something like this:
- Take a bunch of DB snapshots from past releases. There are synthetic ones in civicrm-upgrade-test (4.4.0-setupsh.sql.bz2, 4.6.0-setupsh.sql.bz2, etal), and it might be nice to grab a couple real ones. (Call these original snapshots.)
- Setup current master code (without the patchset). For each original snapshot, run the upgrade and save a new snapshot. (Call these upgraded-old snapshots.)
- Apply the patchset. For each original snapshot, run the upgrade and save a new snapshot. (Call these upgraded-new snapshots.)
- Compare the upgraded-old and upgraded-new snapshots. They should be the same, notwithstanding some timestamps and the table civicrm_upgrade.

totten · 2018-03-23T01:29:10Z

Trying to digest/process a bit more about the multilingual schema stuff ties in here. This is a bit scratchpad with thoughts going in different directions.

A. On general order

Trying to rephrase what's going on with this patchset... an upgrade is basically a bunch of steps, executed in order. CRM_Upgrade was originally written with the assumption that upgrade steps are clearly, linearly, universally ordered in line with the increments of a version number. It would simulate a sequence of incremental version upgrades.

This PR is based on the critique that:

Upgrade steps aren't really so linear because merge workflows, backporting, etc mean that versions/builds don't form a true sequence.
It usually doesn't matter because there's a lot of tolerance for small variations in order among roughly contemporaneous steps.
We should not try to simulate all the intermediate version#s because doing makes the merge workflows more complicated.

Of course, we still need some way to sort upgrade steps. The PR (basically) changes the sort-key from "version number" to "filename" (and allows each version to have different/imperfectly-overlapping sets of files).

I'm still ambivalent about the second claim that there's a lot of tolerance for small variations in order. It feels mostly true, and we can help make it true at a policy level (e.g. by avoiding backports that define upgrade steps or change schema), but it's hard to prove or be sure. If we really needed something more technically exacting, we could allow each step to express dependencies (and use some kind of topological sort). But I feel a little YAGNI on that -- tempted to wait for more evidence of the need.

B. On styles of upgrade logic

When upgrading a dataset, you can describe the upgrade imperatively (list of steps) or declaratively (describe the desired end-state).

Declarative style is nice because it's almost automatic -- you don't have to explicitly write a list of steps; it just compares the start-state with the end-state and automatically picks a set of changes to reconcile any differences. CRM_Core_I18n_Schema is an example of declarative style.

But it's hard to generalize and automatically reconcile all schema changes. For example, suppose a new release converts a numeric column location_type_id to an alphanumeric location_type. A naive reconciliation algorithm would drop the old column, make a new column, and destroy all your data in the process. You have to provide a mapping that converts old location_type_id values to location_type values.

The imperative style is less automatic (you have to explicitly write each change), but it's fairly easy to improvise and define exactly the migration that you need.

The existing logic is therefore a hybrid. It's mostly imperative, but (in between steps) it does a bit of declarative reconciliation for multilingual schema.

C. Multilingual schema changes as concrete steps

One way to understand the current operation of SchemaStructure is to describe it as part of the sequence of upgrade steps. Consider an upgrade from 5.1.1 to 5.3.4:

upgrade_5_1_2()
upgrade_5_2_alpha1()
upgrade_5_2_0()
upgrade_5_2_1()
upgrade_5_2_2()
upgrade_5_3_alpha1()
upgrade_5_3_0()
upgrade_5_4_alpha1()
upgrade_5_4_1()
upgrade_5_4_2()
upgrade_5_4_3()

The multilingual bit adds some extra calls after each step (to CRM_Core_I18n_Schema::rebuildMultilingualSchema(...$rev...) and/or
CRM_Core_I18n_Schema::schemaStructureTables($rev,TRUE)). This is like adding more steps to the sequence.

upgrade_5_1_2()
rebuildMultilingualSchema(5.1.2)
upgrade_5_2_alpha1()
rebuildMultilingualSchema(5.2.alpha1)
upgrade_5_2_0()
rebuildMultilingualSchema(5.2.0)
upgrade_5_2_1()
rebuildMultilingualSchema(5.2.1)
upgrade_5_2_2()
rebuildMultilingualSchema(5.2.2)
upgrade_5_3_alpha1()
rebuildMultilingualSchema(5.3.alpha1)
upgrade_5_3_0()
rebuildMultilingualSchema(5.3.0)
upgrade_5_4_alpha1()
rebuildMultilingualSchema(5.4.alpha1)
upgrade_5_4_1()
rebuildMultilingualSchema(5.4.1)
upgrade_5_4_2()
rebuildMultilingualSchema(5.4.2)
upgrade_5_4_3()
rebuildMultilingualSchema(5.4.3)

But in practice, we don't have schema snapshots for every incremental step. (There are only a few files in CRM/Core/I18n/SchemaStructure_*.php.) So most of those
steps don't actually do anything. The sequence looks more like this:

upgrade_5_1_2()
upgrade_5_2_alpha1()
rebuildMultilingualSchema(5.2.alpha1)
upgrade_5_2_0()
upgrade_5_2_1()
upgrade_5_2_2()
upgrade_5_3_alpha1()
rebuildMultilingualSchema(5.3.alpha1)
upgrade_5_3_0()
upgrade_5_4_alpha1()
upgrade_5_4_1()
upgrade_5_4_2()
upgrade_5_4_3()
rebuildMultilingualSchema(5.4.3)

This analysis is appealing... because it leads to one fairly simple solution: whenever one feels inclined to make a new snapshot of the schema, just do it a little differently:

Don't create CRM/Core/I18n/SchemaStructure_5_2_0.php
Do create CRM/Upgrade/Steps/2018/300_SchemaStructure.php

The file would be substantively similar, but it could use the "filename" as the key instead of "version number". Overall, the upgrade steps (when keyed by filename) would look more like:

CRM_Upgrade_Step_2018_100_Foo
CRM_Upgrade_Step_2018_110_FooBar
CRM_Upgrade_Step_2018_300_SchemaStructure
CRM_Upgrade_Step_2018_310_Whiz
CRM_Upgrade_Step_2018_320_Bang
CRM_Upgrade_Step_2018_400_SchemaStructure
CRM_Upgrade_Step_2018_500_Foo
CRM_Upgrade_Step_2018_510_Foo
CRM_Upgrade_Step_2018_510_Foo
CRM_Upgrade_Step_2018_600_FooBar
CRM_Upgrade_Step_2018_700_SchemaStructure

However, I'm not certain this addresses every edge-case of multilingual. In particular, I think it would address the calls to CRM_Core_I18n_Schema::rebuildMultilingualSchema(...$rev...) but I'm not sure it could be a substitute for CRM_Core_I18n_Schema::schemaStructureTables($rev,TRUE).

D. On new upgrade steps vs old upgrade steps

After writing the original PR, I sorta regretted all the work that went into refactoring the old upgrade logic. That felt like the most brittle part of the PR, and (after a couple years) the content would fade into irrelevance. A narrower patch could be focused on adding the CRM/Upgrade/Step folder and putting those in the upgrade queue.

totten · 2018-03-23T21:31:50Z

E. schemaStructureTables($rev,TRUE): Experiment

Continuing the open question from section C above...

I'm not certain this addresses every edge-case of multilingual. In particular, I think it would address the calls to CRM_Core_I18n_Schema::rebuildMultilingualSchema(...$rev...) but I'm not sure it could be a substitute for CRM_Core_I18n_Schema::schemaStructureTables($rev,TRUE).

I did an experiment to get a more concrete understanding of how schemaStructureTables($rev,TRUE) matters. Just run an upgrade from the snapshot 4.2.9-multilingual_af_bg_en.mysql.bz2 with and without the call. Roughly:

civibuild ut dmaster 4.2.9-multi* ; amp sql:dump -r ~/bknix/build/dmaster -N civi  > /tmp/orig-429ml-master.sql

# Apply patch to disable schemaStructureTables($rev,TRUE) during upgrade: https://gist.github.com/totten/19617abc84124d7cfb328c613f7d05a2

civibuild ut dmaster 4.2.9-multi* ; amp sql:dump -r ~/bknix/build/dmaster -N civi  > /tmp/new-429ml-master.sql

# To make the files more readable/diffable, reformat the INSERTS to use multiple lines. Search for "),(" and replace with "),\n(".

diff -u /tmp/orig-429ml-master.sql /tmp/new-429ml-master.sql

Notwithstanding some timestamps, the upgrade results were the same (with or without schemaStructureTables($rev,TRUE)). Which seems to suggest it doesn't matter; we don't need to use it for future upgrade logic. But how can it be that it doesn't matter?

F. schemaStructureTables($rev,TRUE): Analysis

Why was it ever added? Why doesn't it matter now? My theory: This was just a special case of a more general problem that we've mitigated by other means.

What does schemaStructureTables(...) do? It works in two ways:

If run as CRM_Core_I18n_Schema::schemaStructureTables(), it returns the list of multilingual tables/columns.
If run as CRM_Core_I18n_Schema::schemaStructureTables('4.2.10', TRUE) (as in the upgrade), it sets the currently active list of multilingual tables/columns to match a given version (s.t. future calls to schemaStructureTables() return a slightly different mix of tables/columns).

The upgrader calls schemaStructureTables($rev,TRUE) during each incremental/simulated upgrade step. This makes the simulation of incremental upgrading more complete -- at each step, you get the metadata for the simulated version.

Upgrading+metadata is a more general problem. Several parts of the system (not just multilingual; also settings and various APIs/BAOs/DAOs) rely on metadata, and (for a period of time, several years ago) we had a few bugs because upgrade logic relied on unstable metadata. To avoid future bugs, we made a general KISS recommendation for upgrade logic: Prefer simple SQL semantics over API/BAO/DAO. If you keep the upgrade-logic simple+concrete, then it doesn't matter whether the active metadata reflects the old/pre-upgrade system, the new/post-upgrade system, or some intermediate form.

Returning the question: How can it be that schemaStructureTables($rev,TRUE) doesn't matter? We changed general practice to prevent similar issues in new upgrade-steps, and (after a few years) the old upgrade-steps which needed it have faded into irrelevance.

seamuslee001 · 2018-03-30T23:31:09Z

@totten One of the problems i have noticed with the multilingual upgrade and i have tried to mitigate it by having an isUpgradeMode on the views re-create. The problem i was seeing was this:

Last known state - 4.7.alpha1
Add column to civicrm_option_value in 4.7.12 for example (not sure if actually real) - call rebuildMultilingaulSchema - works when upgrading to 4.7.12
Add column to civicrm_option_value in 4.7.15
the rebuildMultilingualScheam call in the 4.7.12 upgrade function now fails. This is because the committed DAOs have been updated to believe both columns exist. So because no known schema exists the schemaStructureTables() goes searching for tables and columns that need special handling through the DAOs.

The build Views function then fails because it constructs a SELECT statement when going through the 4.7.12 stage which includes the column that is added in 4.7.15 because the DAOs have said this column must be in the database.

My solution is probably a hack and probably caused issues with the rebuilding of the schema properly but its not been clear what is the "correct" solution. I think one solution could be is that if we are able to make use off the <add> key in the xml and add it to the DAO information (we would need to expand to 3 digits possibly that add column. That would then allow us to at least filter the columns when building the schemaStructureTables to the correct version level.

totten added 7 commits December 11, 2015 15:37

CRM_Upgrade_Form::preProcess - Remove unused code.

cf8be4f

CRM-16860 - CRM_Upgrade_Incremental - Extract RevisionBase class

6d98879

Move helper functions that are specific to revision-based approach: * buildQueue * doIncrementalUpgradeStart * doIncrementalUpgradeStep * doIncrementalUpgradeFinish

CRM-16860 - Change setPreUpgradeMessage to createPreUpgradeMessage

8d78269

This is a cleaner high-level contract (less side-effecty, more accurate name, not tied to "$rev" concept).

CRM-16860 - Remove incrementalPhpObject($version)

5fcf4d4

The upgrade-handler class should no longer be determined by version. These classes are standalone entities (independent of version).

CRM_Upgrade_Form - Remove checkSQLConstraint (unused, unimplemented)

5c7e58f

CRM_Upgrade_Form - Remove setQuery (WTF)

5305de9

This function just calls `CRM_Core_DAO::executeQuery`, and it's only used once.

CRM_Upgrade_Form - Remove getRevisionSequence (unused)

cbe3bc8

We won't care about sequential list of revision numbers. Instead, we'll want a list of named upgrade steps.

civicrm-builder added the master label Dec 12, 2015

totten added 3 commits December 13, 2015 15:48

CRM-16860 - Add CRM_Upgrade_Incremental_Interface

b732084

CRM-16860 - CRM_Upgrade - Scan Steps/*.php files

a938516

totten force-pushed the master-upgrade-files-2 branch from 30aa66f to 9741625 Compare December 13, 2015 23:49

totten added 4 commits December 13, 2015 20:16

CRM-16860 - Add SimpleBase for and an example upgrade task

779cff4

CRM_Upgrade_Incremental_Interface - Add getName()

86c92de

CRM-16860 - Declare civicrm_upgrade table

f1b2530

CRM-16860 - Keep track of previously-executed upgrade steps.

6158210

totten force-pushed the master-upgrade-files-2 branch from 9741625 to 6158210 Compare December 14, 2015 04:16

CRM-16860 - CRM_Upgrade_Steps - Identify *.mysql.tpl files

bda7dd3

totten force-pushed the master-upgrade-files-2 branch from 5767b11 to bda7dd3 Compare December 14, 2015 04:50

totten mentioned this pull request Dec 31, 2015

CRM-17669 and CRM-17686, flexible scheduled jobs #7505

Merged

totten mentioned this pull request Jan 7, 2016

CRM_Upgrade_Form - Scan CRM/Upgrade/Incremental/any/{VERSION}/* #7421

Closed

JoeMurray mentioned this pull request Jul 7, 2016

Invitation: CiviCRM 4.7.10 release August 2016 civicrm/release-management#4

Closed

litespeedmarc added the needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state label Sep 27, 2016

litespeedmarc changed the title ~~WIP - CRM-16860 - Schema-upgrader should be merge-friendly~~ (WIP) CRM-16860 - Schema-upgrader should be merge-friendly Sep 27, 2016

litespeedmarc added needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state and removed needs-work-not-review-ready Submitter required to take action, will be closed after 1-2 weeks in this state labels Sep 27, 2016

eileenmcnaughton closed this May 3, 2017

colemanw reopened this Mar 15, 2018

eileenmcnaughton closed this Mar 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP) CRM-16860 - Schema-upgrader should be merge-friendly #7440

(WIP) CRM-16860 - Schema-upgrader should be merge-friendly #7440

totten commented Dec 12, 2015

totten commented Dec 12, 2015

totten commented Dec 14, 2015

totten commented Dec 14, 2015

eileenmcnaughton commented Dec 15, 2015

totten commented Dec 15, 2015

litespeedmarc commented Sep 26, 2016

eileenmcnaughton commented Apr 25, 2017

eileenmcnaughton commented May 3, 2017

colemanw commented Mar 15, 2018

eileenmcnaughton commented Mar 19, 2018

totten commented Mar 21, 2018

totten commented Mar 23, 2018 •

edited

Loading

totten commented Mar 23, 2018

seamuslee001 commented Mar 30, 2018

(WIP) CRM-16860 - Schema-upgrader should be merge-friendly #7440

(WIP) CRM-16860 - Schema-upgrader should be merge-friendly #7440

Conversation

totten commented Dec 12, 2015

totten commented Dec 12, 2015

totten commented Dec 14, 2015

totten commented Dec 14, 2015

eileenmcnaughton commented Dec 15, 2015

totten commented Dec 15, 2015

litespeedmarc commented Sep 26, 2016

eileenmcnaughton commented Apr 25, 2017

eileenmcnaughton commented May 3, 2017

colemanw commented Mar 15, 2018

eileenmcnaughton commented Mar 19, 2018

totten commented Mar 21, 2018

totten commented Mar 23, 2018 • edited Loading

A. On general order

B. On styles of upgrade logic

C. Multilingual schema changes as concrete steps

D. On new upgrade steps vs old upgrade steps

totten commented Mar 23, 2018

E. schemaStructureTables($rev,TRUE): Experiment

F. schemaStructureTables($rev,TRUE): Analysis

seamuslee001 commented Mar 30, 2018

totten commented Mar 23, 2018 •

edited

Loading