Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

detect number of procs during sphinx build #16512

Merged
merged 2 commits into from
Oct 22, 2019

Conversation

aaronmarkham
Copy link
Contributor

Description

I ran into a build issue when making the python docs on an instance with 16 processors. The makefile for sphinx was hardcoded to 32 which I guess was fine for CI.

The error was like this:

/work/mxnet/docs/python_docs/python/build/tutorials/packages/gluon/training/fit_api_tutorial.ipynb:395: WARNING: File not found: 'tutorials/packages/gluon/blocks/save_load_params.html#saving-model-parameters-to-file'

Sphinx parallel build error:
IndexError: list index out of range

I turned off parallel builds with Sphinx and it worked, then switched to this new code that detects the number of processors in the Makefile and that worked.

Testing

  1. Clone the repo.
  2. Install Docker and Docker for python and make it so docker doesn't need sudo to run.
  3. Run the following two commands to build the mxnet binary, then build the python docs.
ci/build.py --docker-registry mxnetci --platform ubuntu_cpu_lite /work/runtime_functions.sh build_ubuntu_cpu_docs
ci/build.py --docker-registry mxnetci --platform ubuntu_cpu_python /work/runtime_functions.sh build_python_docs

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@larroy larroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also if we have troubles with make we can use a python thing to build the sphinx docs, there are python alternatives. Make is just a dumb depgraph.

@aaronmarkham
Copy link
Contributor Author

Also if we have troubles with make we can use a python thing to build the sphinx docs, there are python alternatives. Make is just a dumb depgraph.

This file is what's included in Sphinx so people expect it. But sure, we could use something different to invoke sphinx.

@aaronmarkham aaronmarkham added the pr-awaiting-testing PR is reviewed and waiting CI build and test label Oct 19, 2019
@aaronmarkham
Copy link
Contributor Author

Kind of ridiculous how many times I've had to restart the tests on this PR.

Restarting centos-gpu now due to failing here: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/PR-16512/4/pipeline

@aaronmarkham aaronmarkham merged commit 06b86da into apache:master Oct 22, 2019
@larroy
Copy link
Contributor

larroy commented Oct 22, 2019

Can we collect the failures to understand what were the root causes?

@larroy
Copy link
Contributor

larroy commented Oct 22, 2019

In the one you linked S3 failed, so we could add a retry.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Doc pr-awaiting-testing PR is reviewed and waiting CI build and test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants