Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Divided docker layer to make it easier to cache #4313

Merged
merged 28 commits into from
Oct 17, 2024

Conversation

tofarr
Copy link
Collaborator

@tofarr tofarr commented Oct 10, 2024

**Divided and Reordered and Docker layer for better caching. **

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

We do a lot in that single layer at the end of the Docker Install. This PR moves some of the installs around so that they occur before we copy the code. This effectively means that the install of playwright and chromium and the apt-get can be cached by docker when in development mode, meaning that updates due to changing code are a lot faster.

Unfortunately, in order to do this, I had to change the spec for naming our runtime docker images - I updated the documentation with these changes.

Testing

Aside from the tests passing, I tested the Web browsing capability specifically...
image
image

@tofarr tofarr marked this pull request as ready for review October 15, 2024 13:24
@tofarr tofarr requested a review from xingyaoww October 17, 2024 00:54
Copy link
Contributor

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM! Mainly want to discuss about the tag readability & length of hash to be cryptographic secure

@@ -59,9 +59,6 @@ def build(
target_image_repo, target_image_hash_tag = target_image_hash_name.split(':')
target_image_tag = tags[1].split(':')[1] if len(tags) > 1 else None

# Check if the image exists and pull if necessary
self.image_exists(target_image_hash_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the removal?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this because it doesn't do anything: We are in the middle of a function to build an image, and we call this function that checks if an image exists (And pulls it if it doesn't) - but we ignore the result of this function.

Further we have already checked that the image exists in runtime_build.py

build_from_scratch = True

# If the exact image already exists, we do not need to build it
if runtime_builder.image_exists(hash_image_name, False):
Copy link
Contributor

@xingyaoww xingyaoww Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we turn False to True? though sometime it maybe faster to re-build it than pull it

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hashes guarantee that we get the same version - since this only hits the local file system it will always be faster to build here than to pull

openhands/runtime/utils/runtime_build.py Show resolved Hide resolved
logger.info(f'Building source distribution using project root: {project_root}')

# Copy the 'openhands' directory (Source code)
shutil.copytree(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still kinda prefer sdist over shutil.copytree as long as it is not TOO slow.

The thing is: user may unexpectly put other files in the source directly (e.g., a JSON that's extremely large) -- although this is not developing best practices, but it could still happen (sometime may happen with me if I'm debugging stuff).

We could only copy over *.py but not sure if that will cause problems - using sdist is probably the safest here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see much difference in performance - but the change to use shutil was done last week by the OS community: (It looks like sdist does not work when OpenHands is being used as a python library) 568c8ce

openhands/runtime/utils/runtime_build.py Show resolved Hide resolved
)
# We get away with truncation because we want something that is unique
# rather than something that is cryptographically secure
result = dir_hash[:16]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto - maybe longer with 32?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok - I updated the hash to use base36 truncated to 16 characters. This gives it a collision space of 1:22,300,745,198,530,623,141,535,718,272,648,361,505,980,416

tofarr and others added 3 commits October 17, 2024 06:22
@tofarr tofarr requested a review from xingyaoww October 17, 2024 12:54
@xingyaoww xingyaoww enabled auto-merge (squash) October 17, 2024 14:09
@xingyaoww xingyaoww merged commit 5fb3dec into main Oct 17, 2024
15 checks passed
@xingyaoww xingyaoww deleted the feat-divide-docker-layers branch October 17, 2024 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants