Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add accelerator value to job cfg and extend PR comment if accelerator arg is used #280

Merged
merged 9 commits into from
Sep 17, 2024
Merged
17 changes: 12 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -645,17 +645,24 @@ scontrol_command = /usr/bin/scontrol
#### `[submitted_job_comments]` section

The `[submitted_job_comments]` section specifies templates for messages about newly submitted jobs.
```
initial_comment = New job on instance `{app_name}` for architecture `{arch_name}` for repository `{repo_id}` in job dir `{symlink}`
```
`initial_comment` is used to create a comment to a PR when a new job has been created.

```
awaits_release = job id `{job_id}` awaits release by job manager
```
`awaits_release` is used to provide a status update of a job (shown as a row in the job's status
table).

```
initial_comment = New job on instance `{app_name}` for architecture `{arch_name}`{accelerator_spec} for repository `{repo_id}` in job dir `{symlink}`
```
`initial_comment` is used to create a comment to a PR when a new job has been
created. Note, the part '{accelerator_spec}' is only filled-in by the bot if the
argument 'accelerator' to the `bot: build` command has been used.
```
with_accelerator =  and accelerator `{accelerator}`
```
`with_accelerator` is used to provide information about the accelerator the job
should build for if and only if the argument `accelerator:X/Y` has been provided.

#### `[new_job_comments]` section

The `[new_job_comments]` section sets templates for messages about jobs whose `hold` flag was released.
Expand Down
3 changes: 2 additions & 1 deletion app.cfg.example
Original file line number Diff line number Diff line change
Expand Up @@ -243,8 +243,9 @@ scontrol_command = /usr/bin/scontrol
# are removed, the output (in PR comments) will lack important
# information.
[submitted_job_comments]
initial_comment = New job on instance `{app_name}` for architecture `{arch_name}` for repository `{repo_id}` in job dir `{symlink}`
awaits_release = job id `{job_id}` awaits release by job manager
initial_comment = New job on instance `{app_name}` for CPU micro-architecture `{arch_name}`{accelerator_spec} for repository `{repo_id}` in job dir `{symlink}`
with_accelerator =  and accelerator `{accelerator}`


[new_job_comments]
Expand Down
49 changes: 42 additions & 7 deletions tasks/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
# Local application imports (anything from EESSI/eessi-bot-software-layer)
from connections import github
from tools import config, cvmfs_repository, job_metadata, pr_comments, run_cmd
import tools.filter as tools_filter


# defaults (used if not specified via, eg, 'app.cfg')
Expand All @@ -46,7 +47,7 @@
_ERROR_NONE = "none"


Job = namedtuple('Job', ('working_dir', 'arch_target', 'repo_id', 'slurm_opts', 'year_month', 'pr_id'))
Job = namedtuple('Job', ('working_dir', 'arch_target', 'repo_id', 'slurm_opts', 'year_month', 'pr_id', 'accelerator'))

# global repo_cfg
repo_cfg = {}
Expand Down Expand Up @@ -474,6 +475,17 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
# call to just before download_pr
year_month, pr_id, run_dir = create_pr_dir(pr, cfg, event_info)

# determine accelerator from action_filter argument
accelerators = action_filter.get_filter_by_component(tools_filter.FILTER_COMPONENT_ACCEL)
if len(accelerators) == 1:
accelerator = accelerators[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we warn or something when the list has more than 1 element?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Logging is maybe sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 8f44ba9

Also logs in case there is no element.

elif len(accelerators) > 1:
log(f"{fn}(): found more than one ({len(accelerators)}) accelerator requirement")
accelerator = None
else:
log(f"{fn}(): found no accelerator requirement")
accelerator = None

jobs = []
for arch, slurm_opt in arch_map.items():
arch_dir = arch.replace('/', '_')
Expand Down Expand Up @@ -501,6 +513,15 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
continue
else:
log(f"{fn}(): context DOES satisfy filter(s), going on with job")
# we reached this point when the filter matched (otherwise we
# 'continue' with the next repository)
# for each match of the filter we create a specific job directory
# however, matching CPU architectures works differently to handling
# accelerators; multiple CPU architectures defined in arch_target_map
# can match the (CPU) architecture component of a filter; in
# contrast, the value of the accelerator filter is just passed down
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this rather limiting?

How could we then send GPU builds to a GPU node/partition, when desired?

Copy link
Contributor Author

@trz42 trz42 Sep 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a separate argument for that (e.g., node: or nodetype:) or rework the handling of arguments, e.g., have arguments that are passed down to the build script and arguments that are used by the bot to allocate the right node type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to not add this capability in this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, agreed

# to scripts in bot/ directory of the pull request (see function
# prepare_job_cfg and creation of Job tuple below)
job_dir = os.path.join(run_dir, arch_dir, repo_id)
os.makedirs(job_dir, exist_ok=True)
log(f"{fn}(): job_dir '{job_dir}'")
Expand All @@ -513,11 +534,14 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
# prepare job configuration file 'job.cfg' in directory <job_dir>/cfg
cpu_target = '/'.join(arch.split('/')[1:])
os_type = arch.split('/')[0]
log(f"{fn}(): arch = '{arch}' => cpu_target = '{cpu_target}' , os_type = '{os_type}'")
prepare_job_cfg(job_dir, build_env_cfg, repocfg, repo_id, cpu_target, os_type)

log(f"{fn}(): arch = '{arch}' => cpu_target = '{cpu_target}' , os_type = '{os_type}'"
f", accelerator = '{accelerator}'")

prepare_job_cfg(job_dir, build_env_cfg, repocfg, repo_id, cpu_target, os_type, accelerator)

# enlist jobs to proceed
job = Job(job_dir, arch, repo_id, slurm_opt, year_month, pr_id)
job = Job(job_dir, arch, repo_id, slurm_opt, year_month, pr_id, accelerator)
jobs.append(job)

log(f"{fn}(): {len(jobs)} jobs to proceed after applying white list")
Expand All @@ -527,7 +551,7 @@ def prepare_jobs(pr, cfg, event_info, action_filter):
return jobs


def prepare_job_cfg(job_dir, build_env_cfg, repos_cfg, repo_id, software_subdir, os_type):
def prepare_job_cfg(job_dir, build_env_cfg, repos_cfg, repo_id, software_subdir, os_type, accelerator):
"""
Set up job configuration file 'job.cfg' in directory <job_dir>/cfg

Expand All @@ -538,6 +562,7 @@ def prepare_job_cfg(job_dir, build_env_cfg, repos_cfg, repo_id, software_subdir,
repo_id (string): identifier of the repository to build for
software_subdir (string): software subdirectory to build for (e.g., 'x86_64/generic')
os_type (string): type of the os (e.g., 'linux')
accelerator (string): defines accelerator to build for (e.g., 'nvidia/cc80')

Returns:
None (implicitly)
Expand All @@ -563,6 +588,7 @@ def prepare_job_cfg(job_dir, build_env_cfg, repos_cfg, repo_id, software_subdir,
# [architecture]
# software_subdir = software_subdir
# os_type = os_type
# accelerator = accelerator
job_cfg = configparser.ConfigParser()
job_cfg[job_metadata.JOB_CFG_SITE_CONFIG_SECTION] = {}
build_env_to_job_cfg_keys = {
Expand Down Expand Up @@ -607,6 +633,7 @@ def prepare_job_cfg(job_dir, build_env_cfg, repos_cfg, repo_id, software_subdir,
job_cfg[job_cfg_arch_section] = {}
job_cfg[job_cfg_arch_section][job_metadata.JOB_CFG_ARCHITECTURE_SOFTWARE_SUBDIR] = software_subdir
job_cfg[job_cfg_arch_section][job_metadata.JOB_CFG_ARCHITECTURE_OS_TYPE] = os_type
job_cfg[job_cfg_arch_section][job_metadata.JOB_CFG_ARCHITECTURE_ACCELERATOR] = accelerator if accelerator else ''

# copy contents of directory containing repository configuration to directory
# containing job configuration/metadata
Expand Down Expand Up @@ -726,11 +753,18 @@ def create_pr_comment(job, job_id, app_name, pr, gh, symlink):
# obtain arch from job.arch_target which has the format OS/ARCH
arch_name = '-'.join(job.arch_target.split('/')[1:])

submitted_job_comments_cfg = config.read_config()[config.SECTION_SUBMITTED_JOB_COMMENTS]

# set string for accelerator if job.accelerator is defined/set (e.g., not None)
accelerator_spec_str = ''
if job.accelerator:
accelerator_spec = f"{submitted_job_comments_cfg[config.SUBMITTED_JOB_COMMENTS_SETTING_WITH_ACCELERATOR]}"
accelerator_spec_str = accelerator_spec.format(accelerator=job.accelerator)

# get current date and time
dt = datetime.now(timezone.utc)

# construct initial job comment
submitted_job_comments_cfg = config.read_config()[config.SECTION_SUBMITTED_JOB_COMMENTS]
job_comment = (f"{submitted_job_comments_cfg[config.SUBMITTED_JOB_COMMENTS_SETTING_INITIAL_COMMENT]}"
f"\n|date|job status|comment|\n"
f"|----------|----------|------------------------|\n"
Expand All @@ -741,7 +775,8 @@ def create_pr_comment(job, job_id, app_name, pr, gh, symlink):
arch_name=arch_name,
symlink=symlink,
repo_id=job.repo_id,
job_id=job_id)
job_id=job_id,
accelerator_spec=accelerator_spec_str)

# create comment to pull request
repo_name = pr.base.repo.full_name
Expand Down
3 changes: 2 additions & 1 deletion tests/test_app.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@

# variable 'comment' under 'submitted_job_comments' should not be changed as there are regular expression patterns matching it
[submitted_job_comments]
initial_comment = New job on instance `{app_name}` for architecture `{arch_name}` for repository `{repo_id}` in job dir `{symlink}`
awaits_release = job id `{job_id}` awaits release by job manager
initial_comment = New job on instance `{app_name}` for CPU micro-architecture `{arch_name}`{accelerator_spec} for repository `{repo_id}` in job dir `{symlink}`
with_accelerator = &nbsp;and accelerator `{accelerator}`

[new_job_comments]
awaits_lauch = job awaits launch by Slurm scheduler
Expand Down
20 changes: 10 additions & 10 deletions tests/test_task_build.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@ def test_create_pr_comment_succeeds(mocked_github, tmpdir):
print("CREATING PR COMMENT")
ym = datetime.today().strftime('%Y.%m')
pr_number = 1
job = Job(tmpdir, "test/architecture", "EESSI", "--speed-up", ym, pr_number)
job = Job(tmpdir, "test/architecture", "EESSI", "--speed-up", ym, pr_number, "fpga/magic")

job_id = "123"
app_name = "pytest"
Expand Down Expand Up @@ -311,7 +311,7 @@ def test_create_pr_comment_succeeds_none(mocked_github, tmpdir):
print("CREATING PR COMMENT")
ym = datetime.today().strftime('%Y.%m')
pr_number = 1
job = Job(tmpdir, "test/architecture", "EESSI", "--speed-up", ym, pr_number)
job = Job(tmpdir, "test/architecture", "EESSI", "--speed-up", ym, pr_number, "fpga/magic")

job_id = "123"
app_name = "pytest"
Expand All @@ -336,7 +336,7 @@ def test_create_pr_comment_raises_once_then_succeeds(mocked_github, tmpdir):
print("CREATING PR COMMENT")
ym = datetime.today().strftime('%Y.%m')
pr_number = 1
job = Job(tmpdir, "test/architecture", "EESSI", "--speed-up", ym, pr_number)
job = Job(tmpdir, "test/architecture", "EESSI", "--speed-up", ym, pr_number, "fpga/magic")

job_id = "123"
app_name = "pytest"
Expand All @@ -361,7 +361,7 @@ def test_create_pr_comment_always_raises(mocked_github, tmpdir):
print("CREATING PR COMMENT")
ym = datetime.today().strftime('%Y.%m')
pr_number = 1
job = Job(tmpdir, "test/architecture", "EESSI", "--speed-up", ym, pr_number)
job = Job(tmpdir, "test/architecture", "EESSI", "--speed-up", ym, pr_number, "fpga/magic")

job_id = "123"
app_name = "pytest"
Expand All @@ -387,7 +387,7 @@ def test_create_pr_comment_three_raises(mocked_github, tmpdir):
print("CREATING PR COMMENT")
ym = datetime.today().strftime('%Y.%m')
pr_number = 1
job = Job(tmpdir, "test/architecture", "EESSI", "--speed-up", ym, pr_number)
job = Job(tmpdir, "test/architecture", "EESSI", "--speed-up", ym, pr_number, "fpga/magic")

job_id = "123"
app_name = "pytest"
Expand All @@ -409,7 +409,7 @@ def test_create_read_metadata_file(mocked_github, tmpdir):
# create some test data
ym = datetime.today().strftime('%Y.%m')
pr_number = 999
job = Job(tmpdir, "test/architecture", "EESSI", "--speed_up_job", ym, pr_number)
job = Job(tmpdir, "test/architecture", "EESSI", "--speed_up_job", ym, pr_number, "fpga/magic")

job_id = "123"

Expand Down Expand Up @@ -440,14 +440,14 @@ def test_create_read_metadata_file(mocked_github, tmpdir):

# use directory that does not exist
dir_does_not_exist = os.path.join(tmpdir, "dir_does_not_exist")
job2 = Job(dir_does_not_exist, "test/architecture", "EESSI", "--speed_up_job", ym, pr_number)
job2 = Job(dir_does_not_exist, "test/architecture", "EESSI", "--speed_up_job", ym, pr_number, "fpga/magic")
job_id2 = "222"
with pytest.raises(FileNotFoundError):
create_metadata_file(job2, job_id2, pr_comment)

# use directory without write permission
dir_without_write_perm = os.path.join("/")
job3 = Job(dir_without_write_perm, "test/architecture", "EESSI", "--speed_up_job", ym, pr_number)
job3 = Job(dir_without_write_perm, "test/architecture", "EESSI", "--speed_up_job", ym, pr_number, "fpga/magic")
job_id3 = "333"
with pytest.raises(OSError):
create_metadata_file(job3, job_id3, pr_comment)
Expand All @@ -457,7 +457,7 @@ def test_create_read_metadata_file(mocked_github, tmpdir):

# use undefined values for parameters
# job_id = None
job4 = Job(tmpdir, "test/architecture", "EESSI", "--speed_up_job", ym, pr_number)
job4 = Job(tmpdir, "test/architecture", "EESSI", "--speed_up_job", ym, pr_number, "fpga/magic")
job_id4 = None
create_metadata_file(job4, job_id4, pr_comment)

Expand All @@ -472,7 +472,7 @@ def test_create_read_metadata_file(mocked_github, tmpdir):

# use undefined values for parameters
# job.working_dir = None
job5 = Job(None, "test/architecture", "EESSI", "--speed_up_job", ym, pr_number)
job5 = Job(None, "test/architecture", "EESSI", "--speed_up_job", ym, pr_number, "fpga/magic")
job_id5 = "555"
with pytest.raises(TypeError):
create_metadata_file(job5, job_id5, pr_comment)
3 changes: 2 additions & 1 deletion tools/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,8 +105,9 @@
RUNNING_JOB_COMMENTS_SETTING_RUNNING_JOB = 'running_job'

SECTION_SUBMITTED_JOB_COMMENTS = 'submitted_job_comments'
SUBMITTED_JOB_COMMENTS_SETTING_INITIAL_COMMENT = 'initial_comment'
SUBMITTED_JOB_COMMENTS_SETTING_AWAITS_RELEASE = 'awaits_release'
SUBMITTED_JOB_COMMENTS_SETTING_INITIAL_COMMENT = 'initial_comment'
SUBMITTED_JOB_COMMENTS_SETTING_WITH_ACCELERATOR = 'with_accelerator'

SECTION_CLEAN_UP = 'clean_up'
CLEAN_UP_SETTING_TRASH_BIN_ROOT_DIR = 'trash_bin_dir'
Expand Down
21 changes: 21 additions & 0 deletions tools/filter.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@
COMPONENT_UNKNOWN = "unknown component={component} in {component}:{pattern}"
FILTER_EMPTY_PATTERN = "pattern in filter string '{filter_string}' is empty"
FILTER_FORMAT_ERROR = "filter string '{filter_string}' does not conform to format 'component:pattern'"
UNKNOWN_COMPONENT_CONST = "unknown component constant {component}"

Filter = namedtuple('Filter', ('component', 'pattern'))

Expand Down Expand Up @@ -175,6 +176,26 @@ def add_filter_from_string(self, filter_string):
raise EESSIBotActionFilterError(msg)
self.add_filter(_filter_split[0], _filter_split[1])

def get_filter_by_component(self, component):
"""
Returns filter pattern for component.

Args:
component (string): one of FILTER_COMPONENTS

Returns:
(list): list of pattern for filters whose component matches argument
"""
if component not in FILTER_COMPONENTS:
msg = UNKNOWN_COMPONENT_CONST.format(component=component)
raise EESSIBotActionFilterError(msg)

pattern = []
for _filter in self.action_filters:
if component == _filter.component:
pattern.append(_filter.pattern)
return pattern

def remove_filter(self, component, pattern):
"""
Removes all elements matching the filter given by (component, pattern)
Expand Down
1 change: 1 addition & 0 deletions tools/job_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
JOB_CFG_ARCHITECTURE_SECTION = "architecture"
JOB_CFG_ARCHITECTURE_OS_TYPE = "os_type"
JOB_CFG_ARCHITECTURE_SOFTWARE_SUBDIR = "software_subdir"
JOB_CFG_ARCHITECTURE_ACCELERATOR = "accelerator"

JOB_CFG_REPOSITORY_SECTION = "repository"
JOB_CFG_REPOSITORY_CONTAINER = "container"
Expand Down
Loading