Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performantly redeploying PEX files by sharding their requirements #789

Closed
cosmicexplorer opened this issue Nov 11, 2019 · 9 comments
Closed

Comments

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Nov 11, 2019

This issue should be conclusively resolved before #787 can be reviewed.
From that PR:

Problem

Redeploying pex files full of many extremely large 3rdparty requirements (tensorflow, etc) into our datacenter at Twitter currently takes a very long time, since we upload them all at once into an internal artifact resolution utility, and then pull down the entire pex file before executing it. This slowness to redeploy then also affects multiple of our internal python development workflows and tooling for machine learning (including a Jupyter wrapper developed by @kwlzn) which depend on executing a pex file within the datacenter -- in that case, modifying any python source files in our monorepo currently requires waiting several minutes for changes to be usable within that Jupyter notebook.

As far as we are aware, other users of pex who package large machine learning applications also suffer from this issue and do not have an easy workaround.

(Initial) Proposed Solution

This is implemented in #787.

We would like to be able to ship around "dehydrated" pex files without 3rdparty requirements embedded in the pex, and resolve ("hydrate") them before executing the pex. This removes one half of the current process of synchronously waiting to upload and download 3rdparty requirements, and moves the remaining download part off the critical path of the entire redeploy process. Because the requirements to hydrate were already resolved when building the pex, we know all the exact versions of all the transitive dependencies to resolve at bootstrap time.

There are many ways we could potentially make the bootstrap resolve process faster -- #787 just uses a CachingResolver, with the idea that the machines that execute our pex files will eventually have most of the large distributions cached and won't need to redownload them on every redeploy (or that we can provision machines to have these requirements already contained within their local pex cache).

An alternative implementation of "hydration" that was considered was to use a virtualenv to hydrate requirements before running the pex with PEX_INHERIT_PATH=fallback, but it would be extremely helpful for us to avoid having to maintain separate tooling with virtualenv, and it would be really nice if pex could do that itself at bootstrap.

Feedback

From #787 (comment):

This changes a fundamental bit of the informal pex contract. You no-longer ship an app, you resolve it at runtime and allow in all manner of non-reproduceabilities that need to be carefully excluded or contended with.

Alternative Solutions

From #787 (comment):

Doesn't use of PEX_PATH already solve this? Build 2 pexes, one with the highly variable bits - maybe just sources and resources, and one with the more stable bits - large dependencies. Then on the deploy side, of the 2 pexes that now compose the app, only deploy the ones that have changed. To make this work well, we may need to enhance the build-side of pex; i.e.: have the pex tool / API understand how to resolve from other pexes as 1st-class indexes / find-links repos; but, crucially, this leaves the runtime unchanged. The same guarantees apply.

Responses to the above feedback to follow as comments in this issue.

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Nov 11, 2019

@jsirois:

Doesn't use of PEX_PATH already solve this? Build 2 pexes, one with the highly variable bits - maybe just sources and resources, and one with the more stable bits - large dependencies. Then on the deploy side, of the 2 pexes that now compose the app, only deploy the ones that have changed.

This is precisely how multiple of our machine learning teams are working around the problem internally. A few issues with that which spurred this new approach:

  • "The dependencies between split pex files are not formally defined anywhere in the build graph or the pex files themselves."
    • This was feedback from the team who developed the current workaround.
  • This means we have to maintain separate tooling which:
    • is aware of which pex files to deploy at build time
    • makes sure that all are deployed (and avoid rebuilding/redeploying the 3rdparty requirements pex if it's up to date)
    • at runtime, pulls down the correct pex files, organizes them via PEX_PATH, then executes it.
  • Currently, this also still means that our developers still need to both wait for:
    • if the 3rdparty requirements pex is invalidated, to resolve massive requirements at build time, then to push those massive requirements to our distributed file store. This can take several minutes.
    • pulling down the massive 3rdparty requirements pex from our file store onto our datacenter machine (somewhat faster)

As mentioned above, we were considering using a virtualenv created via pip containing 3rdparty requirements, and using that and then running the pex file with PEX_INHERIT_PATH=fallback to implement this, in order to make use of pip's local cache. We were thinking that either we could:

  • provision machines with most large requirements beforehand,
  • or that after downloading requirements once to a machine, that we could expect a fleet to be eventually consistent and have large requirements in their local pip cache,
    depending upon whether we're running this in kubernetes or not.

Either way, downloading maybe-cached individual requirements in parallel from within the datacenter seemed easier to implement and likely to be more performant than stuffing all of that into a monolithic pex which always has to be copied over in full to each machine before being able to run the user's job.

The reason why the current approach (of resolving "dehydrated" requirements in pex at runtime) was considered useful enough to make a PR out of is because:

  • It avoids having to maintain any separate tooling in our CI at runtime, since the pex resolve logic is always stored in the .bootstrap/ dir of the generated pex. Executing the pex no longer requires extra scripting or tooling as required by the multi-pex or virtualenv setup.
  • It avoids having to manually separate pants targets into different pex files. Separately, it also avoids having to introduce any logic into pants.
  • It avoids pushing the burden of (checking whether each pex file in the multi-pex setup is invalidated, and downloading massive requirements and uploading if so) into wrapper scripts maintained by multiple separate teams that execute on the developer's laptop. Allowing this functionality to be part of the pex runtime means that our users can write the pants targets they'd expect to, instead of having to worry about the implementation details of how they get uploaded.

Two trains of thought we've been thinking of adopting are:

  • to adopt the convention of suffixing pex files with dehydrated_requirements in PEX-INFO with .ipex instead of .pex (and making sure that pants respects this convention when creating "dehydrated" pex files).
  • to introduce a PEX_ALLOW_LATE_HYDRATION env var, defaulting to False, which would mean that without explicitly specifying PEX_ALLOW_LATE_HYDRATION=True in their environment, running the pex file would immediately and loudly fail with a clear error message.

Both of these are intended to make the fact that this different type of "dehydrated" pex file isn't reproducible much more evident.

Finally, it's important to note that the current implementation in #787 specifically makes sure to resolve all transitive requirements fully, then storing them into the pex file. This ensures that the intransitive "resolve" at runtime (with transitive=False) should produce a pex file with the exact same requirements as if at build time. While this still introduces non-reproducibility (in case the contents of requirements with the same version have changed), we actually consider that to be a positive, because it also means that we can potentially hot-patch requirements with security vulnerabilities (for example) without having to do a redeploy. In any case, freezing versions of all transitive requirements felt like it ensured the exact kind of reproducibility we were looking for. I'd love to know whether there are issues with introducing that kind of reproducibility in pex at all, or if we can build in appropriate safeguards that allow us to overcome these issues.

@cosmicexplorer
Copy link
Contributor Author

Thinking about it now, one alternative approach that might let us iterate on this idea internally that doesn't require explicit pex support or separate wrapper scripts at runtime is to make an internal pants task which modifies the bootstrap script in the generated pex file instead of doing the resolve within pex itself.

However, support within the pex tool would still be required for the first commit of #787 (6390b49), to produce "dehydrated" pex files with the --dehydrated flag. Being able to create "dehydrated" pex files from within pex seems to be key to a lot of the benefits of the approach proposed in #787 -- it means users don't have to manually shard their pex files into 3rdparty and source pexes, for example.

There are a lot of alternatives we have to the actual runtime resolve, but separating out the diff to cover just introducing the --dehydrated option seems like it would avoid tying the pex project to supporting any specific runtime behavior. If that's desirable, we could then open up a separate discussion for the best implementation of the runtime hydration (and whether that part could or should be farmed out to a separate tool).

@jsirois
Copy link
Member

jsirois commented Nov 12, 2019

I'm still stuck back on why PEX is the right tool for the job at all here. It sounds like your requirements / acceptable actions include:

  1. It's ok to resolve pinned artifacts on the target host at PEX boot time.
  2. You want to resolve the minimal set of things.
  3. You want an isolated virtual environment.

A virtualenv + a pinned requirements file with hashes and a pip install --no-deps --only-binary :all: --find-links=<internal flat repo> --index-url=<internal pypi> --requirement requirements.locked.txt does exactly what you want. Is it perhaps the case that its only because Pants does not support this that you want to jam this functionality into pex?

@jsirois
Copy link
Member

jsirois commented Nov 12, 2019

To put a finer point on the last - with #781 in flight, the pex resolver will == the pip resolver. When Pants upgrades to pex 2.0.0 with the pip resolver, would pants generating a lockfile with hashes be enough here? You deploy the lockfile and run pip install .. which will do the minimal update.

@cosmicexplorer
Copy link
Contributor Author

cosmicexplorer commented Nov 12, 2019

Yes, specifically pex using the pip resolver as per #781 would make virtualenv a much more feasible solution! You correctly divined that a major concern with that approach was that this would lead to dissonance between the pex and pip resolves (didn't state this explicitly).

However, the reason for focusing on the --dehydrated option separately from the implementation of whatever does the "late hydration" (pex, virtualenv) is so that we can avoid pulling down large requirements into pex files on the user's laptop. That part seems difficult to avoid, unless we simply make pants do the same logic we see in 6390b49 using pex-as-a-library. However, it's not possible to use pex-as-a-library in pants's v2 process execution API, and we'd like to be able to eventually make this process remotely executable.

It's possible, however, that pants v2 + remote execution may be able to replace this entirely, as it:
(1) Occurs inside the datacenter.
(2) Makes use of a distributed caching scheme.

That's something @stuhood has been discussing, but we haven't delved into super thoroughly yet. A previous implementation of this idea (at https://github.com/cosmicexplorer/pex/tree/incremental-pex-production-with-fingerprinting) was intended to be more v2-friendly, which attempted to fingerprint individual source modules and requirements. There would need to be some design work to understand how to hook up pex to the pants v2 distributed file store, since the only reason that could be useful is if each requirement could be downloaded separately, in parallel, at runtime.

Currently, pex resolving and downloading all requirements monolithically is not something we are able to break down post hoc, and we continue to pay the cost of uploading and downloading a massive pex each time. The --dehydrated approach allows any number of ways to pull down individual requirements separately, in parallel, and can make use of a cache containing individual requirements, which is where we (currently) see the opportunity for a maintainable speedup.

@stuhood also mentioned that rsync is supposed to be able to smartly handle diffs of files like this. The teams making use of the current multi-pex workaround are using rsync, and still seeing these issues. If there's reason to think that rsync or something else would be able to automatically shard large pex files into their constituent changed sources or requirements, and that we might be able to configure it to do so, we would be very interested in pursuing that, as it likely means we can avoid any pex changes to support this workflow.

@illicitonion
Copy link
Contributor

To put a finer point on the last - with #781 in flight, the pex resolver will == the pip resolver. When Pants upgrades to pex 2.0.0 with the pip resolver, would pants generating a lockfile with hashes be enough here? You deploy the lockfile and run pip install .. which will do the minimal update.

Yes. Fundamentally, we're looking for a way to separate layers. And fundamentally what we need is a requirements.txt or Pipfile.lock either inside or alongside the pex. It sounds like we're all agreed that this is a reasonable thing to want to do (as long as it's clear that these files operate differently to ordinary pex files, by virtue of not being self-contained, perhaps by a different file extension).

Where we're not so clear is preferences between (at least) three ways of using that information:

  1. Have an external script make a virtualenv and run the pip install from your previous comment to bootstrap. This adds complexity to the user (an external script needs to make sure things like virtualenv are present), but keeps pex itself beautifully simple.
  2. Have the pex bootstrap script make and populate the virtualenv, rather than in an external script. This loads a bunch more logic (and potentially dependencies) into pex (and the pex file itself), but provides a simple and clear user experience.
  3. Have the pex bootstrap script hydrate the files to where it would happen to do so if they were in the zipfile, rather than a separate virtualenv. This maybe is the best of both worlds; we only add "how to download deps" logic to pex - we don't need to introduce virtualenvs and instead use pex's existing directory conventions.

I don't have a strong preference, I think we can make any of the three work. Does anyone else have a strong preference (looking specifically at @jsirois and @kwlzn for this one :))

A separate axis of decision-making is whether the file downloads should be done by using pip, or by additionally adorning the lockfile additionally with content digests per requirement, and using something like fs_util to fetch the files from a remote execution CAS server, rather than fetching them from a flat repo / pypi index mirror. This feels like it adds complexity, rather than simplifying, to me, but I'm willing to be convinced otherwise.

@cosmicexplorer
Copy link
Contributor Author

additionally adorning the lockfile additionally with content digests per requirement, and using something like fs_util to fetch the files from a remote execution CAS server

My first attempt at addressing this issue did something like this, but adorning the PEX-INFO with content digests for source files. While that might be interesting later, it doesn’t address the issue we’re actually concerned about, which is downloading the 3rdparty requirements piece by piece (it still revolves requirements monolithically).

So I don’t think this work should be used at all, but it did introduce a model that is potentially similar to what you’re describing here, just for source files instead of requirements. While this branch is incredibly complex, I believe that scoping the changes to just cover digesting 3rdparty requirements might be a way to support the fs_util approach. (https://github.com/pantsbuild/pex/compare/master...cosmicexplorer:incremental-pex-production-with-fingerprinting?expand=1)

It’s definitely possible to do that entirely outside of pex, but in investigating this type of implementation, I might at first experiment with having pex itself shard + digest 3rdparty requirements. I’ll post to this issue if we follow that route and find that it works.

@cosmicexplorer
Copy link
Contributor Author

I've created a google doc to discuss the alternate implementations of this idea in greater depth at https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit?usp=sharing!

@jsirois
Copy link
Member

jsirois commented Dec 21, 2019

Since work is well underway to implement all this on top if pex in pantsbuild/pants#8793 I'm going to close. Thanks for working through this @cosmicexplorer.

@jsirois jsirois closed this as completed Dec 21, 2019
cosmicexplorer added a commit to pantsbuild/pants that referenced this issue Mar 12, 2020
### Problem

See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

### Solution

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

### Result

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
cosmicexplorer added a commit to cosmicexplorer/pants that referenced this issue Mar 12, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert pantsbuild#8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
cosmicexplorer added a commit to cosmicexplorer/pants that referenced this issue Mar 31, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert pantsbuild#8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
stuhood pushed a commit to pantsbuild/pants that referenced this issue Mar 31, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
stuhood pushed a commit to pantsbuild/pants that referenced this issue Mar 31, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
stuhood pushed a commit to pantsbuild/pants that referenced this issue Apr 1, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
cosmicexplorer added a commit to pantsbuild/pants that referenced this issue May 4, 2020
See pex-tool/pex#789 for a description of the issue, and https://docs.google.com/document/d/1B_g0Ofs8aQsJtrePPR1PCtSAKgBG1o59AhS_NwfFnbI/edit for a google doc with pros and cons of different approaches.

@jsirois was extremely helpful throughout the development of this feature, and pex-tool/pex#819 and pex-tool/pex#821 in pex `2.0.3` will help to optimize several other aspects of this process when we can unrevert #8787.

**Note:** `src/python/pants/backend/python/subsystems/pex_build_util.py` was removed in this PR, along with all floating references to it.

With `--binary-py-generate-ipex`, a `.ipex` file will be created when `./pants binary` is run against a `python_binary()` target. This `.ipex` archive will create a `.pex` file and run it when first executed. The `.ipex` archive contains:
- in `IPEX-INFO`: the source files to inject into the resulting `.pex`, and pypi indices to resolve requirements from.
- in `BOOSTRAP-PEX-INFO`: the `PEX-INFO` of the pex file that *would* have been generated if `--generate-ipex` was False.
- in `ipex.py`: A bootstrap script which will generate a `.pex` file when the `.ipex` file is first executed.

For a `.ipex` file which hydrates the `tensorflow==1.14.0` dependency when it is first run, this translates to a >100x decrease in file size:
```bash
X> ls dist
total 145M
-rwxr-xr-x 1 dmcclanahan staff 267k Dec 10 21:11 dehydrated.ipex*
-rwxr-xr-x 1 dmcclanahan staff 134M Dec 10 21:11 dehydrated.pex*
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants