-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Doc fixes and separate workflow for building docs via CI #3462
Conversation
827e41b
to
0c480a3
Compare
I believe this pull request is now ready for merging. Without these commits I will not be able to complete #3456 (updating links). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you as always Paul!
.github/workflows/tests.yml
Outdated
@@ -12,7 +12,7 @@ jobs: | |||
|
|||
docs: | |||
name: build documentation | |||
timeout-minutes: 10 | |||
timeout-minutes: 120 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a significant increase. Why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the runs for the update-links branch took over an hour, so I thought I would bump up the limit a bit.
https://github.com/RaRe-Technologies/gensim/actions/runs/4454218032/jobs/7823263904
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that the build from today only took a minute, so I'll drop the timeout bump commit.
https://github.com/RaRe-Technologies/gensim/actions/runs/4459623150/jobs/7832189964
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the problem here is that the build can take hours in theory. Some of the documentation examples take a really long time to build: that's why we cache their build output. So if somebody updates those examples, a rebuild will require a long time, and the build documentation action will time out.
We're kind of stuck between a rock and a hard place here.
- We don't want to run these build steps for hours all the time (e.g. for every commit)
- We do want to make it easier for people to contribute to documentation, which occasionally requires a long time to run
I think the best way to compromise between the above would be to move this build-and-stash-artifacts step to a separate action that people would trigger explicitly via a comment. We could set a long timeout (e.g. 2 hours, longer if necessary). We could restrict the range of people who can issue that trigger to e.g. project maintainers (if possible).
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a reasonable plan.
Do we know how long the build takes when starting from scratch
after you delete all of the cache from the git repository?
I don't know GitHub workflows well enough to implement the plan myself.
So perhaps you can merge the existing docs fixes commits,
then modify the workflows to implement the plan?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mpenkov mergeable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite. If we merge this, then it will change our existing build procedure, which isn't what we want.
I need to extract this contribution out into a separate github action (as we discussed above) before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So perhaps you can merge the existing docs fixes commits,
then modify the workflows to implement the plan?
I'd rather modify the workflow first, then merge. That way, we won't break the existing workflow.
This comment was marked as duplicate.
This comment was marked as duplicate.
…ossible Keep compatibility with old versions of smart_open just in case. 1.8.1 is required by the deps, 5.1.0 introduced the compression parameter, 6.0.0 dropped the ignore_ext parameter.
Fixes the docs build for the run_lda.py and run_ensemblelda.py tutorials.
The run_compare_lda.py howto and run_word2vec.py tutorial import it. It was removed from docs_testenv when the scikit-learn wrapper was removed. Fixes: commit a21d9cc
Add the ensemblelda and scm tutorials added in 2021. Remove the summarization tutorial as it was removed in 2020. Use the order from the existing prebuilt docs files. Without a defined order they will be placed non-deterministically, which means commits not changing docs will change prebuilt docs. Fixes: commit 76579b3 Fixes: commit ddeeb12 Fixes: commit 2dcaaf8
Print the .md5 file when it is the stale file. Print the source path for each stale file. Print paths relative to the source tree. Print only one stale file pair per line.
Building the docs often takes too long locally, so this allows pull request submitters to build on GitHub, download the changes and incorporate the changes to the generated docs in a commit, then update their pull request with the generated docs commit. Also check that the changes to the prebuilt docs are committed, except for docs that change for every single rebuild.
@pabs3 Can you please enable commits from maintainers to this branch? I'd like to push some changes before merging. |
Enabled maintainer commits. |
Thanks for merging.
Looks like this was squashed when merging, I strongly suggest next time
using rebase merging or branch merging instead, because that preserves
the commits as they were, in this case they were deliberately separated
into logical commits, which helps later with git bisects and reviews.
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
My personal preference is to squash and merge. There's a wide range of contributors, and everyone does things at their own granularity: some people make lots of small commits, others a few large ones. The squash brings everything to the same granularity: one PR, one commit. So, all PRs from the past several years have been squashed into a single commit before merging. You make a good point about reviews and bisections, but we rarely do those after code has been merged in. I know this preference is not universal (e.g. @piskvorky has mentioned he has a slight preference for not squashing) but there hasn't been an acute reason to change so far. |
Exactly right – I prefer full branch merging (with |
I can see using squash+rebase merges is useful when contributors are
submitting messy PRs with lots of fixup commits. Personally I would ask
such folks to instead do logical commits and clean up their PRs with
rebase, have them force-push their fixed PR branches and then either do
rebase merges or fast-forward merges.
The other thing I don't like about squash merges is the commit message
ends up being very long with a ton of different concerns listed. When
there are lots of fixup commits, that means a lot of clutter too,
unless you manually clean that up locally before pushing.
I've been bisecting an infinite loop due to some kind of threading race
and or memory corruption in a test in another project recently and the
giant squash merges there really make it a lot harder to figure out
where the problem is coming from and which parts of the commit are
relevant. I'm thinking about bisecting the different hunks of the
commit to figure it out, but there are no tools to do that.
Anyways, thanks again for merging the patches, I'll move on now :)
…--
bye,
pabs
https://bonedaddy.net/pabs3/
|
This set of changes fixes the current docs build failures in GitHub workflows
and makes it easier for contributors to submit docs changes, by doing the docs
build for them and exporting the changes to the prebuilt docs to a patch,
which they can then incorporate into their pull request.
smart_open
compression parameter instead ofignore_ext
when possible