Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopting a (narrow) backward-compatibility standard; implications for 4.0.0 #2967

Open
gojomo opened this issue Sep 30, 2020 · 15 comments
Open
Assignees

Comments

@gojomo
Copy link
Collaborator

gojomo commented Sep 30, 2020

From #2939:

Decisions

  1. Support loading of pre-3.8.3 models?
    • Misha: better: a stand-alone script to upgrade models to the latest version… rather than class methods doing the work on each load
    • Misha: also in favour of limited backward compatibility loading (just 3.8.3)
    • Radim: in favour of keeping existing compatibility code; why remove it? pros/cons unclear
      => decision: talk to Gordon, make the costs (pros/cons) more explicit.

Originally posted by @piskvorky in #2939 (comment)

OK, compiled long thoughts here on both general issues & specifics:

As far as I can tell, Gensim hasn't had any formal backward-compatibility policy other than "we'll try to maintain the ability to load older models".

This is generous, and easy when things change very little.

But always considering older models makes larger improvements via refactoring & new ideas harder. In addition to making a system that's simply better for commmon/tested tasks, such work then also has to write translation code that brings old data into the newer format (perhaps with compromises/limits). That code may have to retain/alias/work-around older class/property names, and include long-blocks of special-case handling.

But our features are often a grab-bag of things of wildly-different importance: some flags/options are seldom used (or probably shouldn't be used), while others are central. And our test_data directory is filled with droppings of unclear origin & purpose. (What versions/features are tested by 'doc2vec_old' or 'ldamodel_python_2_7'?)

What attempts there have been to check backward-compatibility are far shallower than they may look. There's a bunch of version-labeled Word2Vec and Doc2Vec models, but only one per version, likely created with default parameters & tiny toy data. (I don't think the source code that created them is committed/referenceable anywhere.) The load-check they get is just a few rudimentary steps, so whether the real range of functionality has been maintained is unclear.

So this compatibility logic is error-prone code, hard-to-test, and under-tested – even in the best of conditions. And such code just keeps growing, barely tested & maybe unused, as long as it's not directly implicated in some reported bug/test-failure. And sometimes the impulse to "just make the test pass" or "just get the PR merged" has led to unthinkingly adding & duplicating code until it superficially works - as when #1777 added a giant deprecated subdirectory (https://github.com/RaRe-Technologies/gensim/tree/3.8.3/gensim/models/deprecated) of entirely-copied obsolete code. This was just to make passing a few back-compatibility tests a little easier, without actually thinking through the object-shape changes.

And against this, there may not even be that many people using older models in a way that they'd need to load them in fresh code! If the model is trained & deployed and only using older features, it can and perhaps should keep running in older libraries. Many newer optimizations/features are only relevant when repeating creation/training from scratch, not when simply consulting or incrementally-changing an older model. Many users already have a workflow that refreshes models regularly, or would be better served by truly repeating model-creation in the latest code.

Or if they absolutely need to bring an old model forward, we should often accommodate that via multiple steps – load each model forward a short hop, then re-save, then another short-hop. Theoretically, this should work. In real cases, this might already be broken. But it's a more realistic incremental goal, to at each new version ensure that one hop works, than to maintain a many-versions-back growing hairball for (maybe but probably not really) handling anything. Users could also ask for, or hire, specific help for thorny but important situations - if they even arise. But many-version-hop situations that aren't best served by "train a new model" might be entirely theoretical.

In the meantime, we can make beneficial changes faster, & ease maintenance by:

First, adopting general guidelines about project goals that (a) let users know that only smallish-version-increment later-loads are within project aims, so they have reasonable expectations; and (b) give developers freedom to focus on a well-defined, minimal back-compatibility target - and when that's met, aggressively remove older code debt/complications. I'll say more about the guideline I think would be best further below in the 1-2-3.

(As far back as January I requested: "in particular, it would help to say either: 'gensim 4.0 is only compatible with models saved from 3.x versions; if you have older models, load them in 3.x, re-save them, to bring them to 4.0' - or perhaps even more aggressively: 'gensim-4.0 only tries to support models that were saved from gensim-3.8.1; older models may be loaded & re-saved in 3.8.1 to be brought to 4.0'".

It's been frustrating that when I've been 'warm on the code' & ready to clear out low-value old cruft, I've repeatedly asked about making such a clarifying compatibility decision - but no opinions either way have been shared until just recently. Having a reasoned-out standing policy would avoid this frustration & stunted progress.)

Second, clearing out the old, poorly-labeled, poor-coverage test-files and test-methods, and replacing them with a tighter set of current tests. Some steps in this process could be:

  • deleting older-version/unclear-version test methods
  • running the full test suite against a test_data directory where we're logging file-touches - so that obsolete, untouched files can be finally discarded
  • creating a new script with code in project source control for creating the full suite of well-labeled test files from a specific version. Those would then made part of test_data, and new with load/functionality test methods against them added. But, these files would only be retained as long as that specific source version is within the stated back-compatibility goals, and promptly pruned when it moves out of the window.
  • (maybe coincident with this cleanup: run the whole test suite with a code-coverage tool, to find any methods/functionality that are totally missed by testing - then either eliminate that functionality or add tests)

So, what could the default cross-version compatibility policy be?

I suggest the following as a baseline, subject to only rare exceptions:

  1. Loading of models saved from newer versions, backward into older versions, is never an explicit goal, and will only happen by luck. That is, a save from version MAJOR.MINOR.MICRO might load in MAJOR.MINOR.(MICRO-N), or MAJOR.(MINOR-N).M or (MAJOR-N).M.P. But it's never a requirement, and always OK for development-in-progress to assume saved models need only be loaded into the same or later versions. (I believe this is pretty much assumed already.)

  2. Fully-functional loading of models saved from older versions, into newer versions, is a strong goal when the MICRO version has incremented by any amount, or the MINOR by 1. That is, if the save is from version MAJOR.MINOR.MICRO, users should expect, and developers should try maintain, that it can be loaded into any MAJOR.MINOR.(MICRO+N) or MAJOR.(MINOR+1).N version. (Any exceptions, as perhaps if a feature is necessarily discontinued or onerous to maintain, would be the kind of thing to be highlighted prominently in release-notes.) Many times – indeed most of the time given that lots of code doesn't change much – models will successfully load in much-later versions. But this is the minimum we're aiming for, and it's generally OK when devs making other substantive improvements break any loads into a (MINOR+2)-or-more future version.

  3. When the MAJOR version increments, it is only a strong goal for saves from the final release of the prior MAJOR version to load with full-functionality. That is, in versions MAJOR.0.N, only saves from (MAJOR-1).LAST_MINOR.LAST_MICRO are by default considered must-loads. Earlier saves might load, but if they don't, the recommended strategy will be to load them in the latest version that supports them, then re-save, & repeat until reaching MAJOR.0.N.

In the context of the current situation with a pending 4.0.0 release, adopting this policy as-is would mean:

  • we discard all tests/files of pre-gensim.3.8.3 loading
  • we add new better-labeled/more-coverage test files & methods for just testing things saved from gensim-3.8.3, & consider 4.0.0 meeting its backward-compatibility goals when all those focused tests pass
  • as soon as 4.0.0 is finalized, its set of future-backward-compatibility test files should be generated and added to the 4.0.1-dev project tree

In the future, having this as a standing policy – rather than something to be re-discussed each release or PR – will make it easier to improve things with confidence about how much backward-compatibility work is expected, and to know when older tests/test-files can be retired. It means any contributor, whether an old-hand or newbie, has a much more narrow & comprehensible set of things to consider. They then have a chance of improving things rather than "typing & hoping" around scary blocks of "who knows what might break if I change this" code.

@piskvorky
Copy link
Owner

piskvorky commented Sep 30, 2020

I'll read all that when I find some time. Until then, I'm strongly in favour of supporting old models, as far back as possible. Stability is important.

Changes I'd propose would go toward making compatibility easier going forward: a dedicated "model migration script", pickling additional metadata inside models, better testing etc.

@gojomo
Copy link
Collaborator Author

gojomo commented Sep 30, 2020

I made a detailed writeup & top-level issue because I'd prefer not to make a single-case decision, with some mumbled live-call compromises of the moment, where the full reasoning isn't captured and thus soon forgotten.

I'd like to fix this as a source of bad code, technical debt, and drag on future contributions, not just rush out 4.0.0. I'd rather not have either me, nor any other contributor, waste their time on backward-compatibility code that is very likely to never even run, on the unproven theory that someone, somewhere might still be using years-old version A, and iffy feature B, and need to bring that into gensim-4.0.0 for unknown reasons.

@gojomo
Copy link
Collaborator Author

gojomo commented Sep 30, 2020

TLDR pros/cons of 3-point policy (now numbered to stick out more) I proposed above:

Pros:

  • we can clean up the sloppy mess that's the gensim/test/test_data directory of poorly-labeled files, and delete many of the old test-methods that (while most are passing) only provide superficial, not actual, assurance we can load 1.x, 2.x, early 3.x models
  • we can do a dev, alpha, or beta prerelease ASAP
  • we spend more time on actually-known-to-be-pain-points confusing/inefficient APIs, or requested features, rather than hypothetically-required backward-compatibility - right now and in every future release cycle
  • users who hit rare loading issues & see the policy clearly understand what efforts they should exert - be that keeping up with releases or trying other processes - to update or replace their older models

Cons:

  • I'm not sure of any in enough detail to describe them

@piskvorky
Copy link
Owner

piskvorky commented Sep 30, 2020

Great, thanks! I'll finish the tutorial migrations today and look into this (serialization, support) tomorrow, so we move forward.

@piskvorky
Copy link
Owner

piskvorky commented Oct 2, 2020

Sorry Radu got sick yesterday, I'll try to finish over the weekend.

@piskvorky
Copy link
Owner

piskvorky commented Oct 4, 2020

The proposed policy seems reasonable: a) no guarantees of forward compatibility (=loading of new models in older versions) + b) limited guarantee on backward compatibility (=loading of old models in newer versions). +1 on that.

In terms of practical execution, I conceptualize it in two parts below. @gojomo @mpenkov WDYT?

1. Migrating 4.n objects to 4.n+1

Each minor 4.n+1 release must contain a migration script that accepts a model stored by 4.n, and spits out an equivalent upgraded model in 4.n+1 where possible, or fails with an error where impossible. So a one-hop upgrade.

Upgrading across multiple hops is the users' responsibility. Ideally as simple as running multiple one-hop scripts back-to-back.

The idea of an "upgrade script" was Misha's and I like it because it allows us to move the all the "old model compatibility" stuff out of the model classes (LdaModel.load etc), and into a dedicated script. Less code pollution, a clearer division of responsibilities. This way, the "load a model" code can always assume the model's in the latest = current version.

2. Migrating 3.x objects to 4.0.0

Again, a migration script, but this time migrating any old model to 4.0.0 where possible + error out where impossible. "Any old model" = any model and version we already have the migration code for, I'd put no extra effort into it.

I'm not in favour of forcing users to install 3.8.3, load their models, store them in 3.8.3, then install gensim 4.0.0, and load those 3.8.3 models. What advantage would that give us, or them?

Proposed roadmap

  1. Make mmap work, fix Ensure 2Vec classes (KeyedVectors, Word2Vec, Doc2Vec, FastText) support desired level of mmap support #2955@gojomo what exactly is the problem there? Why did mmap stop working?
  2. Extract any 3.x => 4.0 serialization migration code into a standalone script.
  3. Document this script in the Migration guide + finish Update documentation for 4.0.0 #2960.
  4. Release 4.0.0rc1.
  5. Implement @gojomo 's "creating a new script with code in project source control for creating the full suite of well-labeled test files from a specific version".
  6. Implement "keep a log of salient object events and pickle it on save()" Keep "lifecycle log" in models #2863.
  7. (@gojomo is Further focus/slim keyedvectors.py module #2873 realistic?)
  8. Collect RC feedback, fix bugs.
  9. Release 4.0.0.

Let me know if that makes sense.

@piskvorky piskvorky added this to the 4.0.0 milestone Oct 6, 2020
@piskvorky
Copy link
Owner

piskvorky commented Oct 6, 2020

@gojomo ping – let me know which tickets you're working on, so I can work on the rest, in parallel.

I'll branch off of your #2944, to avoid conflicts.

@gojomo
Copy link
Collaborator Author

gojomo commented Oct 6, 2020

Regarding general 4.0 roadmap:

This is seeming to grow into a "steps to 4.0 checklist" issue, beyond one particular (back-compatibility) policy choice & supporting work. I'll again suggest there should be a top-level "release 4.0" issue so that discussion/checklists/decisions spanning many concerns, that carry through until final release, don't get hidden deep in other issues/PRs that may be closed/deferred/etc.

I've put a temporary workaround for the real-head-scratcher test-failure into #2944 & marked it ready for merge. I think if #2944 is good enough to build off, it should just be merged in develop. That'd make other followup PRs cleaner than each branching-off-pending PRs.

I've just made #2975 to describe the loose-threads from many-places that I've been trying to tie up, while also considering #2955. I hope the latest comment in #2955 clarifies what the mmap issue to-be-resolved entails.

Regarding backward-compatibility questions

A convenience "migrate forward" script is a good idea, but it doesn't really simplify the issue much, because there are many cases where it's trivial, but others where it just expands other complexity. In the pleasant where the design/intent of the SaveLoad system has been respected, and existing code already handles the older versions, a migration function would just be:

def migrate_old_model(old_path, new_path):
    model = SaveLoad.load(old_path)
    model.save(new_path)

Voila! It will succeed in all the cases already covered by our tests.

But that's deceptive, because in all the tricky cases:

  • there's no guarantee that the error-free load really maintained functionality - a 'migrate' script that seems to succeed would often give false sense of security. Then, users delete old model, & only discover corruption/functionality-loss later. It could be better not to create expectation: "run this migration once then you're done/safe" - at least not until we've improved migration test-coverage a lot.
  • through 3.8.3, some of the older-version fixup was erroneously done in the load methods (only dispatched by the named class) rather than the _load_specials method (dispatched by the actual loaded class). (I've cleaned this up wherever I noticed it, but might still exist somewhere in classes I haven't touched.)

And, if actually moving all prior-version fixup knowledge outside the classes themselves, into more dedicated purpose migration module(s), that's helpful in some ways – but also a bunch of extra work up-front, and that spreads the historical class-format info across more disparate places. But there will still often be unavoidable old-version cruft in the original packages, so that old types still deserialize before fixup code can run. And I'd fear that letting arbitrary amounts of old-format-special-casing code collect, far from the current classes, doesn't cap the growing tech-debt of dead-code/untested-code/unrealistic-expectations problems. (But if it meant I could just focus on the current functionality, and loading one-release back for a static set of probe files, while others struggle with arbitrarily-in-the-past issues that only show up later, I'd love it!)

I'd prefer pruning the test directory, and methods, to only purport-to-test 3.8.3 models because it creates a well-defined target for what must be kept, and what can start to be discarded. We have a supposed ability to load lots of older models, and models in less-common configurations, simply because (1) it hasn't errored/been-reported-as-failure yet; (2) we have some super-shallow tests of a bunch of older versions through to 1.0. (I disabled pre-1.0 loading tests at one point when they were failing & the fix wasn't clear.)

But as noted above, those aren't strong guarantees. If we want to start upgrading our guarantees to what is really tested, it will help to be focused on a smaller, sure-to-be-valuable reference set (some things we save out from 3.8.3 using TBD "version save-out test-suite" routine). If it's instead, "anything that seems to work now and someone might be using", we'll have to leave a lot of suspect, low-value code around.

(For example, imagine we had a set of N save-files from gensim-3.8.3, and pruned the test methods to only test loading/functionality of those files, and then used a coverage tool to notate un-exercised blocks of code in the load/back-compatibility code. If so, I think it'd be a positive step to review those unexercised blocks, and if it wasn't immediately evident that they were only being missed because the set of 3.8.3 files was missing some mode/case, delete them. The chance they might be needed by some unknown, untested X.Y vintage isn't enough reason to keep them. They're still in source-control history, if anyone with a really high-value, impossible-to-use-in-any-other way needs to do an archeological/forensic one-off migration effort.)

@piskvorky
Copy link
Owner

piskvorky commented Oct 6, 2020

Tasks are already collected in the 4.0 Milestone; what's needed here is the decision re. serialization & migrations.

#2944 wasn't ready for merge that's why I suggest branching off there. But that's a minor technicality, makes no difference.

I'll take up #2863 + migration script + #2960 then, branching off of gojomo:ft_save_after_update_vocab. Leaving #2955, #2873, #2975 to you.

@gojomo
Copy link
Collaborator Author

gojomo commented Oct 7, 2020

Unfortunately, the Milestone can't hold discussion/decisions/reasoning like an issue can. The fact that analysis/rationales/counterproposals are springing up deep in unrelated transient issues is a hint that a necessary forum doesn't exist.

I think #2944 is ready for merge & building off it after-merge is a less thorny approach. I'll get on those issues! (Though, any progress on #2873 unlikely until perhaps a final reordering.)

@piskvorky piskvorky self-assigned this Oct 16, 2020
@piskvorky
Copy link
Owner

piskvorky commented Oct 16, 2020

@gojomo says in #2960 (comment): For the reasons I gave on the other issue, I fear too much emphasis on the 'migration script' may cause problems & extra work, implying there's more assurance than there is. It could successfully run, but silently leave models with problems that are only discovered later. On the other hand, even as an advocate of faster sunsetting of old-version compatibility, I think it is unlikely to be strictly true that people will regularly "have to upgrade in multiple hops" (as written now). Much/most of the time, a load from many points back will still work. My goal is just to set the expectation that it's only strongly-targetted to work one-back, and older may require multiple hops, so that some version-increment-improvements can choose to trim their testing-effort & crufty-compatibility code. Not that, for example, 4.3 refuses to direct-load a 4.1-model even in a model class that's not been touched recently.

Yes. I was also thinking of leaving it as "upgrade FROM any old (supported) version TO the current version". Kinda like it is now, but in a single explicit script, rather than .load()s strewn across multiple classes.

Or, slightly less messy: keep the model-upgrade code version-specific, but maintain a look-up table that automatically knows which-version-needs-which-upgrade.

Example:

  1. User calls the upgrade script from their Gensim 4.4 on stored model X.
  2. Code checks that X is version 4.0 (e.g. using the lifecycle info from Keep "lifecycle log" in models #2863).
  3. Code consults the look-up table, knows that it needs to run two migration hops for X: from 4.0 to 4.1, and then from 4.1 to 4.4 (no changes in between).
  4. Script executes the two hops and stores the output model in 4.4 format.

Easy for users, but extra work for maintainers – that look-up table will be easy to forget to update. Not hard or time-consuming, just easy to miss.

There must be other options too, this seems like a problem that's as old as time. Probably solved already somewhere.

@gojomo
Copy link
Collaborator Author

gojomo commented Oct 27, 2020

Per my other comment, my concern is projecting (in docs or code tools) more confidence in such migrations than is warranted. And similarly, implying maintainers are likely to do new extra work that's (1) not historically been that important; (2) not yet covered by unit tests or clear actual-elucidated-user-requests; (3) kinda boring work that risks being "done" to check-off-an-issue but in a rote-but-not-really-correct way.

If the 'script' is more than...

def migrate_old_model(old_path, new_path):
    model = SaveLoad.load(old_path)
    model.save(new_path)

...it's probably overkill. And while such a simple script, as a command-line executable tool, could still be valuable, its docs/output should feature a warning: "Test your model after migration" – not just report a shallow "Success!".

@piskvorky
Copy link
Owner

piskvorky commented Mar 5, 2021

I re-read the whole discussion with fresh eyes. Thanks for your detailed inputs @gojomo .

Here's my conclusion:

  1. I create a script to automatically generate a data suite for compatibility tests (e.g. "create model files with 3.8.3"). Both the script and the generated data will be git-versioned (updated with each major release) and run automatically in CI. E.g. "do the models generated with 3.8.3 still work in 4.0.0?"

  2. We offer the guarantee of "models stored in Gensim X work in X+1". We keep all older loading code too, just without the guarantee & automated testing. In other words, no change in code, only documentation.

    It's definitely not acceptable that if a user skips a version or two of Gensim, they won't be able to load up their existing models. Or do so only by installing several intermediate Gensim versions and "hopping" their model manually from release to release.

    So this is basically @gojomo 's proposal. I don't think I'll have the time for my proposal = a script to do the "model migration hops" automatically, between any version. Although that's clean and desirable from the user POV, it's too much work for me now. Maybe in the future.

  3. I clean up "legacy" test data & obsolete tests. @gojomo what did you see as the "low-value old cruft"?

  4. I edit the Upgrading trained models Wiki page to reflect these.

Not sure whether I manage 1) and 3) in time for 4.0.0. Ideally yes, but don't want to delay the release too much. Help welcome!

@piskvorky
Copy link
Owner

piskvorky commented Mar 7, 2021

OK, I updated 2) and 4).

@mpenkov IMO 1) and 3) can wait for after the release, not blocking. WDYT? I've been sick since Friday, so I don't think I'll get to it now realistically.

@mpenkov
Copy link
Collaborator

mpenkov commented Mar 9, 2021

I agree that 1) and 3) are not blockers, but leaving 1) till later does introduce some risk. More specifically, if 4.0.0 introduces some kind of backwards incompatibility, people will complain, and we will have to address that in a bugfix release.

If we do 1) before the release, then we find the bug first, at the cost of delaying the release.

Not a huge risk, in my opinion, so I'd probably err on the side of "get this thing out the door".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants