Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

[Edu Version] Add offline apt package cache in kube runtime #4222

Closed
wants to merge 7 commits into from

Conversation

hzy46
Copy link
Contributor

@hzy46 hzy46 commented Feb 19, 2020

To resolve: #4211

How to use:

1. Add packages you want to store cache for in the file package_cache_info:

# group_name, os, packages(space-gapped), precommands
# "#" can be used for comments
ssh,ubuntu16.04,openssh-client openssh-server,
ssh,ubuntu18.04,openssh-client openssh-server,
nfs,ubuntu16.04,nfs-common,
nfs,ubuntu18.04,nfs-common,

The first column is group_name. One group can contain multiple packages. The second column stands for the OS type. Currently only ubuntu16.04 and ubuntu18.04 are supported. The third column is the packages you want to add for the group. The last column is the precommands, which will be executed before gathering packages and it can be left empty.

2. In init.py of each plugin:

from plugins.plugin_utils import try_to_install_by_cache
command = [
  try_to_install_by_cache('<group_name>') + ' || { <install packages if cache is not found>; }',
  "<other commands>",
]

try_to_install_by_cache('<group_name>') will generate a script to install all packages of a certain group name. It guarantees:

  • If it returns 0, all the packages are installed successfully.
  • If it has a non-zero exit code, the package installation has failed. Reasons could be that the required cache is not found or other internal problems. In such case, plugin should fallback to apt-get, yum or other cmdlines to install the packages.

Here is an example for the ssh plugin:

command = [
    try_to_install_by_cache('ssh') + ' || { apt-get update; apt-get install -y openssh-client openssh-server; }',
    '.....' 
]

3. Add an environmental variable ENABLE_PACKAGE_CACHE=true when you build the kube-runtime image:

ENABLE_PACKAGE_CACHE=true ./build/pai_build.py build -s kube-runtime -c <pai-config>
./build/pai_build.py push -i kube-runtime -c <pai-config>

By default package cache is disabled to save build time.

Optimization (TBD)

  1. Use hash of package url to save storage space: Packages with the same url can be saved together.

@hzy46 hzy46 requested review from Binyang2014, abuccts and mydmdm and removed request for Binyang2014 February 20, 2020 08:03
if not(os.path.exists(name_target_folder)): # avoid duplicate copy
shutil.copytree(name_source_folder, name_target_folder)
return '/bin/bash {}/runtime.d/install_group.sh '.format(PAI_WORK_DIR) + group_name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would the below behavior would be better

def try_to_install_by_cache(group_name: string, failed_cmds: list):
    # no change, copy caches
    cached_cmd = '/bin/bash {}/runtime.d/install_group.sh '.format(PAI_WORK_DIR) + group_name
    return '%s || {%s}' % (cached_cmd, ';'.join(failed_cmds))

and the example usage is

cmds = [
    ...
    try_to_install_by_cache('ssh', ['apt-get update', 'apt-get install -y openssh-client openssh-server']),
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix this in a different PR

@mydmdm
Copy link
Contributor

mydmdm commented Feb 20, 2020

Would you please add this conversion to a README in somewhere?

To resolve: #4211

How to use:

1. Add packages you want to store cache for in the file package_cache_info:

# group_name, os, packages(space-gapped), precommands
# "#" can be used for comments
ssh,ubuntu16.04,openssh-client openssh-server,
ssh,ubuntu18.04,openssh-client openssh-server,
nfs,ubuntu16.04,nfs-common,
nfs,ubuntu18.04,nfs-common,

The first column is group_name. One group can contain multiple packages. The second column stands for the OS type. Currently only ubuntu16.04 and ubuntu18.04 are supported. The third column is the packages you want to add for the group. The last column is the precommands, which will be executed before gathering packages and it can be left empty.

2. In init.py of each plugin:

from plugins.plugin_utils import try_to_install_by_cache
command = [
  try_to_install_by_cache('<group_name>') + ' || { <install packages if cache is not found>; }',
  "<other commands>",
]

try_to_install_by_cache('<group_name>') will generate a script to install all packages of a certain group name. It guarantees:

  • If it returns 0, all the packages are installed successfully.
  • If it has a non-zero exit code, the package installation has failed. Reasons could be that the required cache is not found or other internal problems. In such case, plugin should fallback to apt-get, yum or other cmdlines to install the packages.

Here is an example for the ssh plugin:

command = [
    try_to_install_by_cache('ssh') + ' || { apt-get update; apt-get install -y openssh-client openssh-server; }',
    '.....' 
]

3. Add an environmental variable ENABLE_PACKAGE_CACHE=true when you build the kube-runtime image:

ENABLE_PACKAGE_CACHE=true ./build/pai_build.py build -s kube-runtime -c <pai-config>
./build/pai_build.py push -i kube-runtime -c <pai-config>

By default package cache is disabled to save build time.

Optimization (TBD)

  1. Use hash of package url to save storage space: Packages with the same url can be saved together.

@hzy46
Copy link
Contributor Author

hzy46 commented Feb 20, 2020

Close this to use #4226 instead.

@hzy46 hzy46 closed this Feb 20, 2020
@hzy46 hzy46 deleted the zhiyuhe/edu/package_cache branch February 24, 2020 06:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants