Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the native toolchain binaries are taking up a lot of bandwidth and require a new approach #5779

Closed
cosmicexplorer opened this issue May 4, 2018 · 3 comments

Comments

@cosmicexplorer
Copy link
Contributor

from @benjyw on #general just now:

So, our S3 bandwidth costs went through the roof last month, and it’s mostly because of clang and gcc:

936.9GB 3452 bin/clang/linux/x86_64/6.0.0/clang.tar.gz
785.9GB 3401 bin/gcc/linux/x86_64/7.3.0/gcc.tar.gz
537.2GB 6987 bin/go/linux/x86_64/1.7.3/go.tar.gz
292.0GB 6891 bin/protobuf/linux/x86_64/3.4.1/protoc
195.9GB 6981 bin/cmake/linux/x86_64/3.9.5/cmake.tar.gz
187.8GB 5029 bin/thrift/linux/x86_64/0.9.2/thrift
183.1GB 5553 bin/watchman/linux/x86_64/4.9.0-pants1/watchman

These are the top few downloads, by cumulative # of bytes (the middle number is the # of downloads)
It seems not great for us to be hosting clang and gcc. Are these custom tarballs, or could we have users download them directly from… wherever one does that from?
Ditto go, that one seems ~straightforward

Because this is driving up bandwidth costs as we speak, it's relatively high priority. Notes:

  • We should just download the go distribution from the go download page.
  • clang for osx is an unmodified binary download, this can be replaced verbatim.
  • gcc for osx may be available as a binary download through homebrew. The idea here would be to pull a binary download from the homebrew repos, not to invoke the homebrew command. If homebrew actually builds gcc locally when installed (I will check this), we may need to do the same unless there is a trustworthy alternative.
  • There are no official binary releases of clang or gcc for linux. We already have scripts to build them locally, which could be applied here -- alternatively, we can try to download a binary package from one of the linux package manager repositories (again, without invoking any package managers).
  • Worst case, for all of the largest binary packages, we do as we already are doing in xcode_cli_tools.py and find and use the installed tools, or error out with installation instructions if they could not be found.

All the reasons to provide binaries in general are still valid (e.g. reproducibility), especially for the native toolchain, because the versions and configurations of gcc and clang may vary wildly across linux distributions. However, we can address that in followup work after we fix this issue.

@cosmicexplorer
Copy link
Contributor Author

See #5777 for further context.

stuhood pushed a commit that referenced this issue May 14, 2018
…ibution and LLVM subsystems to use it (#5780)

### Problem

`BinaryTool` is a great recent development which makes using binaries downloaded lazily from a specified place much more declarative and much more extensible. However, it's still only able to download from either our S3 hosting, or a mirror.

The previous structure requires the urls provided to the global option `--binaries-baseurls` to point to an exact mirror of the hierarchy we provide in our S3 hosting, but that can change at any time. It's not incredibly difficult to write a script to mirror our hosting into an internal network, but in general there's no reason the layout of binaries in `~/.cache/pants/bin/` needs to determine where those binaries are downloaded from.

Our bandwidth costs in S3 have recently increased due to the introduction of clang and gcc in #5490. *See #5777 and #5779 for further context on S3 hosting.*  There are reliable binary downloads for some of these tools, which we would be remiss not to use if we can do it in a structured way.


### Solution

- Introduce a `urls=` argument to multiple methods of `BinaryUtil` for `BinaryTool`s that don't download from our s3.
- Add support for extracting (not creating) `.tar.xz` archives by adding the `xz` BinaryTool (see pantsbuild/binaries#66) and integrating it into BinaryTool's `archive_type` selection mechanism.
- Use the above to download the `go` and `llvm` binaries from their official download urls.
  - Also, rename the `Clang` subsystem to `LLVM` as the binary download we use now (for ubuntu 16.04, currently) also contains many other LLVM tools, including e.g. `lld`.

### Result

Urls for binary downloads can now be created in a structured way for external downloads, with the `--force-baseurls` option as an escape hatch. Some binaries now default to external urls provided for public use by the maintainers of the software to download, thanks to the introduction of the `xz` binary tool. Two out of the three largest bandwidth users among our provided binaries have been switched to use the download urls provided by the maintainers of each project (LLVM and Go). gcc still needs to be fixed, which will happen in a separate PR.
@cosmicexplorer
Copy link
Contributor Author

After #5780, this is partially addressed, but we still don't have a replacement for gcc yet. Will get to that.

@Eric-Arellano
Copy link
Contributor

I think this is stale. We generally don't download from hosted binaries anymore. Please feel free to reopen if still relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants