Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review change log for 4.1.0 #3199

Closed
mpenkov opened this issue Jul 22, 2021 · 9 comments · Fixed by #3203 or #3214
Closed

Review change log for 4.1.0 #3199

mpenkov opened this issue Jul 22, 2021 · 9 comments · Fixed by #3203 or #3214
Assignees
Labels
housekeeping internal tasks and processes
Milestone

Comments

@mpenkov
Copy link
Collaborator

mpenkov commented Jul 22, 2021

Gentlemen, I'm planning to do the next release in the immediate future.

One thing I'd like to do differently for this release is to review the change log before we do the release. Previously, we updated the change log as part of the release project, which made review of the change log itself impossible, because by the time the changes were visible, the new version was already on PyPI and it was too late to make changes.

This time around, I've kept the change log up to date as we merged each PR. This has two benefits:

  1. Anyone can spot errors in the change log and correct them.
  2. Anyone can see how many of each issue type (bug fix, documentation, improvement, etc) we've made since the last release. Previously, this information was only visible via the issue tracker, and only a select few bother to dig around there.

Can you please have a look at the change log and make any required changes? Things like wording, omission of less relevant PRs, wrong PR category, missed PR, etc.

For the last part, here is the list of merged PRs since the last release: https://github.com/RaRe-Technologies/gensim/pulls?q=is%3Apr+is%3Aclosed+closed%3A%3E2021-04-01

@piskvorky
Copy link
Owner

piskvorky commented Jul 22, 2021

Release \o/

IIRC we had a Python script for listing all tickets + PRs since the last release (which also formats them nicely and consistently). I'll start from there and then edit the items where the original ticket had a bad / inconsistent / too vague title.

@mpenkov mpenkov added the housekeeping internal tasks and processes label Jul 22, 2021
@piskvorky piskvorky added this to the 4.1.0 milestone Jul 22, 2021
@mpenkov
Copy link
Collaborator Author

mpenkov commented Jul 22, 2021

Yes, this is the script here: https://github.com/RaRe-Technologies/gensim/blob/develop/release/generate_changelog.py

@gojomo
Copy link
Collaborator

gojomo commented Jul 22, 2021

Looking at existing decriptions, my main thought is that the summaries should be more detailed to be useful in release notes - especially to the extent that they name specific errors/failures a user would observe if lacking these fixes.

For example, #3136 "Fix indexing error in word2vec_inner.pyx" hides the important aspect for users: training was incomplete/corrupted in models with >4GB of word-vectors. (And come to think of it, having been bitten by this in word2vec_inner, we may not want to consider the matter resolved until we're sure the same error wasn't made in doc2vec_inner & fasttext_inner code that's often been lightly-edited variants of the word2vec_inner code.)

Or, #3178 "Fix Unicode string incompatibility in gensim.similarities.fastss.editdist" would be improved if the actual exception it now prevents, ValueError: incompatible types of unicode strings, was stated.

The idea of auto-generating from PRs is a nice step, but it raises the importance of the PR titles - which then should be not just descriptive to internal-reviewers, but end-users. (To some extent, the associated issues have slightly better descriptions of the pre-fix problem that users are msot likely to be searching about - so if the automated generation also appended bug number/title of associated closed-bugs, that might help.)

@piskvorky
Copy link
Owner

piskvorky commented Jul 22, 2021

The idea of auto-generating from PRs is a nice step, but it raises the importance of the PR titles

It does, and has for many months now. That's why I always try to edit titles of tickets & PRs.

I'd like each title to capture the "gist", impact of the change, so a user can quickly decide whether it's relevant to them (=> click and investigate the full description) or not (something was fixed in a module / functionality I don't care about, OK whatever).

Whether I manage or not is another question :) But the issue + PR titles are absolutely important.

@gojomo
Copy link
Collaborator

gojomo commented Jul 22, 2021

If there's some way to deduce (even if it requires extra manual markup) that a certain PR definitively closed a certain Issue, then pulling both into the changelog would be nice, and we could more firmly pursue a policy of: "make sure the issue describes symptom in ways a user would understand/search for, and the final PR title describes the fix/change well for devs & users".

EG:

  • (#)3136 Corrects word2vec_inner long-pointer indexing arithmetic
    • fixes (#)3100 word-vectors larger than 4GB missing or receiving wrong training updates

@mpenkov
Copy link
Collaborator Author

mpenkov commented Jul 23, 2021

If there's some way to deduce

It's possible to do automatically. The tickets and PRs are linked together in github's data model, with the exception of the rare PR that does not address an existing issue.

@mpenkov
Copy link
Collaborator Author

mpenkov commented Jul 23, 2021

How do we want to handle the minor backward incompatibilities? Do we want to mention them somewhere?

The ones I can see so far are #3000 and #3176.

@piskvorky
Copy link
Owner

piskvorky commented Jul 23, 2021

Yes. All backward incompatibilities warrant an example of "before => after", but if it's minor, let's just mention the example in the release docs. No need for a fancy migration guide or a major-version-bump (annoying, desensitizes users to actual big changes).

@mpenkov
Copy link
Collaborator Author

mpenkov commented Aug 14, 2021

Reopening for #3214

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
housekeeping internal tasks and processes
Projects
None yet
3 participants