Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow test failing due to path inconsistencies #172

Closed
jsheunis opened this issue Jul 22, 2022 · 3 comments
Closed

Workflow test failing due to path inconsistencies #172

jsheunis opened this issue Jul 22, 2022 · 3 comments

Comments

@jsheunis
Copy link
Member

jsheunis commented Jul 22, 2022

(Environment: datalad 0.17.1 on a Mac)

I have a test that creates a super and nested subdataset, and then passes the superdataset path to a method that internally calls foreach_dataset. See:

Code repeated for clarity:

super_ds_tree = {
    'superdataset': {
        '.studyminimeta.yaml': studyminimeta_content,
        'random_file.txt': 'some content',
        'some_dir': {
            'file_in_dir.txt': 'some content in file in dir',
            'subdataset': {
                'datacite.yml': datacitegin_content,
                'random_file.txt': 'some content',
            }
        }
    }
}
@with_tree(tree=super_ds_tree)
@with_tempfile(mkdir=True)
def test_workflow_new(super_path=None, cat_path=None):
    ckwa=dict(result_renderer='disabled')
    # Create super and subdataset, save all
    sub_ds = create(super_path + "/some_dir/subdataset",  force=True, **ckwa)
    sub_ds.save(to_git=True, **ckwa)
    super_ds = create(super_path, force=True, **ckwa)
    super_ds.save(to_git=True, **ckwa)
    assert_repo_status(super_ds.path)
    # Create catalog
    cat = WebCatalog(location=cat_path)
    cat.create(force=True)
    # Create catalog
    cat_path = Path(cat_path)
    cat = WebCatalog(location=cat_path)
    cat.create(force=True)
...
    # Run workflow
    tuple(super_workflow(super_ds.path, cat))

When I run this test with pytest -rP from the package root, I get this full log:

>>pytest datalad_catalog/tests/test_workflow.py -rP

==================================================================== test session starts =====================================================================
platform darwin -- Python 3.9.12, pytest-7.1.2, pluggy-1.0.0
rootdir: /Users/jsheunis/Documents/psyinf/datalad-catalog
collected 1 item

datalad_catalog/tests/test_workflow.py F                                                                                                               [100%]

========================================================================== FAILURES ==========================================================================
_____________________________________________________________________ test_workflow_new ______________________________________________________________________

super_path = '/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8'
cat_path = PosixPath('/private/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_test_workflow_new2bz1ra0l')

    @with_tree(tree=super_ds_tree)
    @with_tempfile(mkdir=True)
    def test_workflow_new(super_path=None, cat_path=None):
        ckwa=dict(result_renderer='disabled')
        # Create super and subdataset, save all
        sub_ds = create(super_path + "/some_dir/subdataset",  force=True, **ckwa)
        sub_ds.save(to_git=True, **ckwa)
        super_ds = create(super_path, force=True, **ckwa)
        super_ds.save(to_git=True, **ckwa)
        assert_repo_status(super_ds.path)
        # Create catalog
        cat = WebCatalog(location=cat_path)
        cat.create(force=True)
        # Create catalog
        cat_path = Path(cat_path)
        cat = WebCatalog(location=cat_path)
        cat.create(force=True)
        assert cat_path.exists()
        assert cat_path.is_dir()
        for p in catalog_paths:
            pth = cat_path / p
            assert pth.exists()
        # Run workflow
        tuple(super_workflow(super_ds.path, cat))
        # Test workflow outputs
        meta_path = cat_path / "metadata"
        assert meta_path.exists()
        dataset_details = {
            "super_ds": get_id_and_version(super_ds),
            "sub_ds": get_id_and_version(sub_ds),
        }
        for ds in dataset_details.values():
            pth = meta_path / str(ds[0]) / str(ds[1])
>           assert pth.exists()
E           AssertionError: assert False
E            +  where False = <bound method Path.exists of PosixPath('/private/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_test_workflow_new2bz1ra0l/metadata/a8aaea4b-b7e9-4542-a812-59ccbedc938f/99a4803ce24e0ddbb1276bc42d613dc6c9c49b9a')>()
E            +    where <bound method Path.exists of PosixPath('/private/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_test_workflow_new2bz1ra0l/metadata/a8aaea4b-b7e9-4542-a812-59ccbedc938f/99a4803ce24e0ddbb1276bc42d613dc6c9c49b9a')> = PosixPath('/private/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_test_workflow_new2bz1ra0l/metadata/a8aaea4b-b7e9-4542-a812-59ccbedc938f/99a4803ce24e0ddbb1276bc42d613dc6c9c49b9a').exists

datalad_catalog/tests/test_workflow.py:343: AssertionError
-------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------
foreach-dataset(error): /var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8 (dataset) ['/private/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8/some_dir/subdataset' is not in the subpath of '/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8' OR one path is relative and the other is absolute.]
foreach-dataset(ok): /var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8/some_dir/subdataset (dataset)
action summary:
  foreach-dataset (error: 1, ok: 1)
Could not run workflow for all datasets. Inspect errors:

Command did not complete successfully. 1 failed:
[{'action': 'foreach-dataset',
  'command': <function super_workflow.<locals>._dataset_workflow_inner at 0x7f9912e4f820>,
  'exception': ValueError("'/private/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8/some_dir/subdataset' is not in the subpath of '/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8' OR one path is relative and the other is absolute."),
  'exception_traceback': '[foreach_dataset.py:run_cmd:317,workflows.py:_dataset_workflow_inner:79,workflows.py:dataset_workflow:149,workflows.py:extract_dataset_level:174,utils.py:eval_func:447,utils.py:return_func:439,utils.py:generator_func:357,utils.py:_process_results:544,extract.py:__call__:277,extract.py:do_dataset_extraction:321,extract.py:legacy_extract_dataset:626,core.py:__call__:81,core.py:_yield_dsmeta:143,pathlib.py:relative_to:939]',
  'message': "'/private/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8/some_dir/subdataset' "
             'is not in the subpath of '
             "'/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8' "
             'OR one path is relative and the other is absolute.',
  'path': '/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8',
  'status': 'error',
  'type': 'dataset'}]
-------------------------------------------------------------------- Captured stderr call --------------------------------------------------------------------
[INFO] Start core metadata extraction from Dataset(/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8)
[INFO] Extracted core metadata from /var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8
[INFO] Start core metadata extraction from Dataset(/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8/some_dir/subdataset)
[INFO] Extracted core metadata from /var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8/some_dir/subdataset
[INFO] Finished core metadata extraction from Dataset(/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8/some_dir/subdataset)
--------------------------------------------------------------------- Captured log call ----------------------------------------------------------------------
INFO     datalad.core.local.save:log.py:431 Total: starting
INFO     datalad.core.local.save:log.py:431
INFO     datalad.core.local.save:log.py:431 Total: processed result for /var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8/some_dir/subdataset
INFO     datalad.core.local.save:log.py:431 Total: done
INFO     datalad.core.local.save:log.py:431 Total: starting
INFO     datalad.core.local.save:log.py:431
INFO     datalad.core.local.save:log.py:431 Total: processed result for /var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8
INFO     datalad.core.local.save:log.py:431 Total: done
INFO     datalad.metadata.extractors.metalad_core:log.py:431 Start core metadata extraction from Dataset(/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8)
INFO     datalad.metadata.extractors.metalad_core:log.py:431 Extracted core metadata from /var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8
INFO     datalad.ui.dialog:log.py:431 Clear progress bars
INFO     datalad.ui.dialog:log.py:431 Refresh progress bars
INFO     datalad.metadata.extractors.metalad_core:log.py:431 Start core metadata extraction from Dataset(/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8/some_dir/subdataset)
INFO     datalad.metadata.extractors.metalad_core:log.py:431 Extracted core metadata from /var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8/some_dir/subdataset
INFO     datalad.metadata.extractors.metalad_core:log.py:431 Finished core metadata extraction from Dataset(/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_new5v_bz5b8/some_dir/subdataset)
INFO     datalad.ui.dialog:log.py:431 Clear progress bars
INFO     datalad.ui.dialog:log.py:431 Refresh progress bars
INFO     datalad.ui.dialog:log.py:431 Clear progress bars
INFO     datalad.ui.dialog:log.py:431 Refresh progress bars
====================================================================== warnings summary ======================================================================
../../../opt/miniconda3/envs/meow/lib/python3.9/site-packages/datalad/support/external_versions.py:242
datalad_catalog/tests/test_workflow.py::test_workflow_new
  /Users/jsheunis/opt/miniconda3/envs/meow/lib/python3.9/site-packages/datalad/support/external_versions.py:242: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    return LooseVersion(version)

../../../opt/miniconda3/envs/meow/lib/python3.9/site-packages/setuptools/_distutils/version.py:351
../../../opt/miniconda3/envs/meow/lib/python3.9/site-packages/setuptools/_distutils/version.py:351
../../../opt/miniconda3/envs/meow/lib/python3.9/site-packages/setuptools/_distutils/version.py:351
../../../opt/miniconda3/envs/meow/lib/python3.9/site-packages/setuptools/_distutils/version.py:351
datalad_catalog/tests/test_workflow.py::test_workflow_new
datalad_catalog/tests/test_workflow.py::test_workflow_new
  /Users/jsheunis/opt/miniconda3/envs/meow/lib/python3.9/site-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    other = LooseVersion(other)

../../../opt/miniconda3/envs/meow/lib/python3.9/site-packages/boto/plugin.py:40
  /Users/jsheunis/opt/miniconda3/envs/meow/lib/python3.9/site-packages/boto/plugin.py:40: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

datalad_catalog/tests/test_workflow.py::test_workflow_new
datalad_catalog/tests/test_workflow.py::test_workflow_new
  /Users/jsheunis/opt/miniconda3/envs/meow/lib/python3.9/site-packages/pkg_resources/_vendor/packaging/specifiers.py:255: DeprecationWarning: Creating a LegacyVersion has been deprecated and will be removed in the next major release
    warnings.warn(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html

I.e. the test fails on an assert after the foreach_dataset part has concluded, since the foreach_dataset is within a try...except statement.

I think the important part of the failure is

'/private/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_newajrfkjq_/some_dir/subdataset' is not in the subpath of '/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_newajrfkjq_' OR one path is relative and the other is absolute.

since one path has private prepended to it.

Inserting and running the following directly after the assert_repo_status(super_ds.path) line in the first code block above:

>> print(f"superpath: {super_ds.path}")
>> print(f"subpath: {sub_ds.path}")

superpath: /var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_newajrfkjq_
subpath: /var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_newajrfkjq_/some_dir/subdataset

And getting the superdataset submodules after creating and saving gives the following:

{'gitshasum': '5a8f2ba2da9a6370fe25ce06368bbce99eca0041', 'type': 'dataset', 'path': PosixPath('/private/var/folders/g7/720mm8ns7hg3d28_3yyzkx_r0000gp/T/datalad_temp_tree_test_workflow_newajrfkjq_/some_dir/subdataset'), 'gitmodule_url': './some_dir/subdataset', 'gitmodule_datalad-id': '7651cf0d-9dcd-4e0a-9a15-0c76b774e88d', 'gitmodule_name': 'some_dir/subdataset'}
@bpoldrack
Copy link
Member

FTR: Seems this call in extract_dataset_level triggers the issue:

    res = meta_extract(
        extractorname=extractor_name,
        dataset=dataset,
        result_renderer="disabled")

Which implies the problem is with metalad (and/or an extractor).

@jsheunis
Copy link
Member Author

Also another piece of information. I did this same procedure using the command line interface:

# Create nested filetree
(meow) ➜  mkdir superdataset
(meow) ➜  cd superdataset
(meow) ➜  echo "some content" > random_file.txt
(meow) ➜  cp ../studyforrest-data/.studyminimeta.yaml .studyminimeta.yaml
(meow) ➜  mkdir some_dir
(meow) ➜  cd some_dir
(meow) ➜  echo "some content in file in dir" > some_file_in_dir.txt
(meow) ➜  mkdir subdataset
(meow) ➜  cd subdataset
(meow) ➜  echo "some deeper content" > random_file_2.txt
(meow) ➜  cp ../../../sfdata15Jul2022/original/3T_structural_mri/datacite.yml datacite.yml
(meow) ➜  cd ../..
(meow) ➜  tree

superdataset
.
├── random_file.txt
└── some_dir
    ├── some_file_in_dir.txt
    └── subdataset
        ├── datacite.yml
        └── random_file_2.txt

# Create subdataset
(meow) ➜  datalad create --force -c text2git some_dir/subdataset
[INFO   ] Running procedure cfg_text2git
[INFO   ] == Command start (output follows) =====
[INFO   ] == Command exit (modification check follows) =====
run(ok): /Users/jsheunis/Documents/psyinf/Data/superdataset/some_dir/subdataset (dataset) [/Users/jsheunis/opt/miniconda3/envs/meow...]
create(ok): /Users/jsheunis/Documents/psyinf/Data/superdataset/some_dir/subdataset (dataset)
action summary:
  create (ok: 1)
  run (ok: 1)

# Create superdataset
(meow) ➜  datalad create --force -c text2git .
[INFO   ] Running procedure cfg_text2git
[INFO   ] == Command start (output follows) =====
[INFO   ] == Command exit (modification check follows) =====
run(ok): /Users/jsheunis/Documents/psyinf/Data/superdataset (dataset) [/Users/jsheunis/opt/miniconda3/envs/meow...]
create(ok): /Users/jsheunis/Documents/psyinf/Data/superdataset (dataset)
action summary:
  create (ok: 1)
  run (ok: 1)

# Save superdataset
(meow) ➜  git:(main) datalad save
add(ok): some_dir/subdataset (file)
add(ok): .gitmodules (file)
add(ok): .studyminimeta.yaml (file)
add(ok): random_file.txt (file)
add(ok): some_dir/some_file_in_dir.txt (file)
save(ok): . (dataset)
action summary:
  add (ok: 5)
  save (ok: 1)

# Create catalog
(meow) ➜  git:(main) datalad catalog create -c ../something_testy
catalog_create(ok): ../something_testy [Catalog successfully created at: ../something_testy]

# Run workflow
(meow) ➜  git:(main) datalad catalog workflow-new -d . -c ../something_testy
foreach-dataset(ok): /Users/jsheunis/Documents/psyinf/Data/superdataset (dataset)
foreach-dataset(ok): /Users/jsheunis/Documents/psyinf/Data/superdataset/some_dir/subdataset (dataset)
action summary:
  foreach-dataset (ok: 2)
datacite_gin metadata extraction:   0%|                                                           | 0.00/3.00 [00:01<?, ? Files/s]

In this case there are two ok results coming from the foreach_dataset.

@christian-monch
Copy link
Contributor

christian-monch commented Jul 26, 2022

The relative-path error should be fixed by PR datalad/datalad-metalad#272

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants