-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Divided docker layer to make it easier to cache #4313
Conversation
c25453e
to
0716db7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM! Mainly want to discuss about the tag readability & length of hash to be cryptographic secure
@@ -59,9 +59,6 @@ def build( | |||
target_image_repo, target_image_hash_tag = target_image_hash_name.split(':') | |||
target_image_tag = tags[1].split(':')[1] if len(tags) > 1 else None | |||
|
|||
# Check if the image exists and pull if necessary | |||
self.image_exists(target_image_hash_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the removal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this because it doesn't do anything: We are in the middle of a function to build an image, and we call this function that checks if an image exists (And pulls it if it doesn't) - but we ignore the result of this function.
Further we have already checked that the image exists in runtime_build.py
build_from_scratch = True | ||
|
||
# If the exact image already exists, we do not need to build it | ||
if runtime_builder.image_exists(hash_image_name, False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should we turn False to True? though sometime it maybe faster to re-build it than pull it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The hashes guarantee that we get the same version - since this only hits the local file system it will always be faster to build here than to pull
logger.info(f'Building source distribution using project root: {project_root}') | ||
|
||
# Copy the 'openhands' directory (Source code) | ||
shutil.copytree( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still kinda prefer sdist
over shutil.copytree
as long as it is not TOO slow.
The thing is: user may unexpectly put other files in the source directly (e.g., a JSON that's extremely large) -- although this is not developing best practices, but it could still happen (sometime may happen with me if I'm debugging stuff).
We could only copy over *.py
but not sure if that will cause problems - using sdist
is probably the safest here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see much difference in performance - but the change to use shutil was done last week by the OS community: (It looks like sdist does not work when OpenHands is being used as a python library) 568c8ce
) | ||
# We get away with truncation because we want something that is unique | ||
# rather than something that is cryptographically secure | ||
result = dir_hash[:16] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto - maybe longer with 32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok - I updated the hash to use base36 truncated to 16 characters. This gives it a collision space of 1:22,300,745,198,530,623,141,535,718,272,648,361,505,980,416
Co-authored-by: Xingyao Wang <xingyao@all-hands.dev>
**Divided and Reordered and Docker layer for better caching. **
We do a lot in that single layer at the end of the Docker Install. This PR moves some of the installs around so that they occur before we copy the code. This effectively means that the install of playwright and chromium and the
apt-get
can be cached by docker when in development mode, meaning that updates due to changing code are a lot faster.Unfortunately, in order to do this, I had to change the spec for naming our runtime docker images - I updated the documentation with these changes.
Testing
Aside from the tests passing, I tested the Web browsing capability specifically...