Skip to content

Commit

Permalink
CalcJob: make sure local_copy_list files do not end up in node repo
Browse files Browse the repository at this point in the history
The concept of the `local_copy_list` is to provide a possibility to
`CalcJob` plugins to write files to the remote working directory but
that are not also copied to the calculation job's repository folder.
However, due to commit 9dfad2e this
guarantee is broken.

The relevant commit refactored the handling of the `local_copy_list` in
the `upload_calculation` method to allow the target filepaths in the
list to contain nested paths with subdirectories that might not yet
necessarily exist. The approach was to first write all files to the
sandbox folder, where it is easier to deal with non-existing
directories. To make sure that these files weren't then also copied to
the node's repository folder, the copied files were also added to the
`provenance_exclude_list`. However, the logic in that part of the code
did not normalize filepaths, which caused files to be copied that
shouldm't have. The reason is that the `provenance_exclude_list` could
contain `./path/file_a.txt`, which would be compared to the relative
path `path/file_a.txt` which references the same file, but the strings
are not equal.

The solution is to ensure that all paths are fully normalized before
they are compared. This will turn the relative path `./path/file_a.txt`
into `path/file_a.txt`.
  • Loading branch information
sphuber committed Sep 30, 2020
1 parent 9c4a8b4 commit 739eb14
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 2 deletions.
8 changes: 6 additions & 2 deletions aiida/engine/daemon/execmanager.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,11 +273,15 @@ def find_data_node(inputs, uuid):
# folders that would be empty when considering the `provenance_exclude_list` will *not* be copied to the repo. The
# advantage of this explicit copying instead of deleting the files from `provenance_exclude_list` from the sandbox
# first before moving the entire remaining content to the node's repository, is that in this way we are guaranteed
# not to accidentally move files to the repository that should not go there at all cost.
# not to accidentally move files to the repository that should not go there at all cost. Note that all entries in
# the provenance exclude list are normalized first, just as the paths that are in the sandbox folder, otherwise the
# direct equality test may fail, e.g.: './path/file.txt' != 'path/file.txt' even though they reference the same file
provenance_exclude_list = [os.path.normpath(entry) for entry in provenance_exclude_list]

for root, _, filenames in os.walk(folder.abspath):
for filename in filenames:
filepath = os.path.join(root, filename)
relpath = os.path.relpath(filepath, folder.abspath)
relpath = os.path.normpath(os.path.relpath(filepath, folder.abspath))
if relpath not in provenance_exclude_list:
with open(filepath, 'rb') as handle:
node._repository.put_object_from_filelike(handle, relpath, 'wb', force=True) # pylint: disable=protected-access
Expand Down
36 changes: 36 additions & 0 deletions tests/engine/daemon/test_execmanager.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
# For further information please visit http://www.aiida.net #
###########################################################################
"""Tests for the :mod:`aiida.engine.daemon.execmanager` module."""
import io
import os
import pytest

Expand Down Expand Up @@ -54,3 +55,38 @@ def test_retrieve_files_from_list(tmp_path_factory, generate_calculation_node):

with open(str(target / 'file_a.txt'), 'rb') as handle:
assert handle.read() == content_a


@pytest.mark.usefixtures('clear_database_before_test')
def test_upload_local_copy_list(fixture_sandbox, aiida_localhost, aiida_local_code_factory):
"""Test the ``local_copy_list`` functionality in ``upload_calculation``.
Specifically, verify that files in the ``local_copy_list`` do not end up in the repository of the node.
"""
from aiida.common.datastructures import CalcInfo, CodeInfo
from aiida.orm import CalcJobNode, SinglefileData

inputs = {
'file_a': SinglefileData(io.BytesIO(b'content_a')).store(),
'file_b': SinglefileData(io.BytesIO(b'content_b')).store(),
}

node = CalcJobNode(computer=aiida_localhost)
node.store()

code = aiida_local_code_factory('arithmetic.add', '/bin/bash').store()
code_info = CodeInfo()
code_info.code_uuid = code.uuid

calc_info = CalcInfo()
calc_info.uuid = node.uuid
calc_info.codes_info = [code_info]
calc_info.local_copy_list = [
(inputs['file_a'].uuid, inputs['file_a'].filename, './files/file_a'),
(inputs['file_a'].uuid, inputs['file_a'].filename, './files/file_b'),
]

with LocalTransport() as transport:
execmanager.upload_calculation(node, transport, calc_info, fixture_sandbox)

assert node.list_object_names() == []

0 comments on commit 739eb14

Please sign in to comment.