Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve message when dangling nodes or links are found in verdi archive import #5378

Open
2 tasks
giovannipizzi opened this issue Feb 22, 2022 · 9 comments
Open
2 tasks

Comments

@giovannipizzi
Copy link
Member

In old AiiDA 0.x, export files could have dangling links - with the idea that one could just perform delta incremental exports, for just a few more nodes w.r.t. to those already available at the destination.

We realized that this is confusing for users that typically expect an archive to be fully self-contained.
However, while these archive cannot be migrated automatically in AiiDA 2 (see also #5377), migrations of old files will work in 1.x and create an archive with dangling links (e.g. this happens in v1.6.5).

When importing in 1.6.5, the error lists all missing UUIDs that is useful to detect the problem, and explains what the problem is:

DanglingLinkError: The import file refers to 155 nodes with unknown UUID, therefore it cannot be imported. Either first import the unknown nodes, or export also the parents when exporting. The unknown UUIDs are: [FULL LIST]

In develop, one just gets a more obscure

Critical: an exception occurred while migrating the archive MIGRATED_bands_pwcalcs_soc_2Dmat_750_800.aiida: KeyError: '236534fa-9fa0-4cf6-9ddd-f68770241f62' 

Suggested actions:

  • Since these archives exist on computers of people, I suggest to improve the error message not to show just "KeyError" but some more descriptive error message.
  • If possible, I'd show all UUIDs. If not possible easily, I would at least say that there is "at least one missing node" and that the UUID we print is the UUID of the "first node that is missing".

@chrisjsewell what do you think? is this easy to do?

Note that at least one of the missing nodes is just referenced, actually, not as a link endpoint (maybe this was already caught in 1.6.5? I'm not sure though) but as a node that should have belonged to a group, in case this helps to debug the issue. I can also send privately the problematic file.

@giovannipizzi giovannipizzi changed the title Improve message when dangling nodes or links are found Improve message when dangling nodes or links are found in verdi archive import Feb 22, 2022
@chrisjsewell
Copy link
Member

Heya, it would be great to get a "minimal working example" of such an archive file, that can then be added to the test suite, and I can improve the warning message from there

@sphuber
Copy link
Contributor

sphuber commented Feb 23, 2022

You will have to create it manually because I don't think the interface allows it, and I don't know how to manually hack the new archive format. But essentially just create an archive containing 1 node and 1 link that points to it. The link will be dangling because it should have a second node to point to. That should be the MWE for reproduction

@chrisjsewell
Copy link
Member

create it manually

Yep that's what I mean, not in the new archive format though since that will literally not allow this (as expected by foreign key constraints), but in one of the old formats, so I can test the migration.

But this is for you guys to do

@sphuber
Copy link
Contributor

sphuber commented Feb 23, 2022

aiida.zip

There you go, that should do it I think

@chrisjsewell
Copy link
Member

cheers 👍

@chrisjsewell
Copy link
Member

The full error is below (run using #5375).

It's easy to change the behaviour, but what should it be @giovannipizzi @sphuber:

  • Fail with a better error message, or
  • Simply skip adding these links to the new database (with a warning)?

The relevant code:

        if data['links_uuid']:

            def _transform_link(link_row):
                return {
                    'input_id': node_uuid_map[link_row['input']],  # <- key error is here
                    'output_id': node_uuid_map[link_row['output']],
                    'label': link_row['label'],
                    'type': link_row['type']
                }

            with get_progress_reporter()(desc='Adding Links', total=len(data['links_uuid'])) as progress:
                for nrows, rows in batch_iter(data['links_uuid'], batch_size, transform=_transform_link):
                    connection.execute(insert(v1_schema.DbLink.__table__), rows)
                    progress.update(nrows)

The error:

$ verdi archive migrate --verbosity DEBUG 'tests/static/export/migrate/0.10_dangling_link.aiida' test.aiida
Report: Legacy migrations required
Report: Extracting data.json ...
Report: Legacy migration pathway: 0.10 -> 0.11 -> 0.12
Performing migrations: 0.11 -> 0.12      100.0%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2
Report: legacy '0.12' -> 'main_0000' conversion required
Report: Initialising new archive...
Converting repo                          100.0%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10
Report: Unique files written: 0
Report: Converting DB to SQLite
Adding Users                             100.0%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Adding Nodes                             100.0%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Adding Links                               0.0%|                                                                                                                                             | 0/1
Traceback (most recent call last):
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/bin/verdi", line 8, in <module>
    sys.exit(verdi())
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/lib/python3.8/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/lib/python3.8/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/cmdline/commands/cmd_archive.py", line 245, in migrate
    archive_format.migrate(input_file, output_file, version, force=force, compression=6)
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/tools/archive/implementations/sqlite_zip/main.py", line 110, in migrate
    return migrate(inpath, outpath, version, force=force, compression=compression)
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/storage/sqlite_zip/migrator.py", line 221, in migrate
    db_path = perform_v1_migration(inpath, Path(tmpdirname), new_zip, central_dir, is_tar, metadata, data)
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/storage/sqlite_zip/migrations/legacy_to_main.py", line 131, in perform_v1_migration
    _json_to_sqlite(working / DB_FILENAME, data, node_repos)
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/storage/sqlite_zip/migrations/legacy_to_main.py", line 199, in _json_to_sqlite
    for nrows, rows in batch_iter(data['links_uuid'], batch_size, transform=_transform_link):
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/tools/archive/common.py", line 45, in batch_iter
    current.append(transform(item))
  File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/storage/sqlite_zip/migrations/legacy_to_main.py", line 192, in _transform_link
    'input_id': node_uuid_map[link_row['input']],
KeyError: '26522c2f-13e0-4b80-af13-cbc8c09df211'

@chrisjsewell
Copy link
Member

chrisjsewell commented Feb 28, 2022

Note that at least one of the missing nodes is just referenced, actually, not as a link endpoint (maybe this was already caught in 1.6.5? I'm not sure though) but as a node that should have belonged to a group, in case this helps to debug the issue. I can also send privately the problematic file.

Similarly for groups, should missing uuids cause errors or warnings (and be omitted):

        # groups to nodes
        if data['groups_uuid']:
            # get mapping of node IDs to node UUIDs
            group_uuid_map = {
                uuid: pk for uuid, pk in connection.execute(select(v1_schema.DbGroup.uuid, v1_schema.DbGroup.id))  # pylint: disable=unnecessary-comprehension
            }
            length = sum(len(uuids) for uuids in data['groups_uuid'].values())
            with get_progress_reporter()(desc='Adding Group-Nodes', total=length) as progress:
                for group_uuid, node_uuids in data['groups_uuid'].items():
                    group_id = group_uuid_map[group_uuid]
                    connection.execute(
                        insert(v1_schema.DbGroupNodes.__table__), [{
                            'dbnode_id': node_uuid_map[uuid],  # <- key error would be here
                            'dbgroup_id': group_id
                        } for uuid in node_uuids]
                    )
                    progress.update(len(node_uuids))

@giovannipizzi
Copy link
Member Author

Mmm. For missing nodes in a group, probably a warning is enough.

For missing nodes in an endpoint of a link, the safest bet is to stop with an error and ask to export differently - and this should not even happen in AiiDA 1.x I think?

We could add more complex logic - i.e. if the dangling links are those that our current logic accepts, we can just issue a warning, but this is trickier to implement (not technically, but to do in a way that does not hardcode in another point of the code the design decision of which nodes are valid). So I'm OK with just raising with a clear error message, unless it's easy to reuse the same logic that is used e.g. to establish which links at a minimum should be followed (practically, for instance: if one of the inputs or outputs of a ProcessNode are missing, it's an error. If the creator of a Data node is missing it's acceptable, if the outgoing links of a Data node are missing, it's fine.

@chrisjsewell
Copy link
Member

chrisjsewell commented Mar 4, 2022

See f0991b3

$ verdi archive migrate -f 'tests/static/export/migrate/0.10_dangling_link.aiida' test.aiida      
Report: Legacy migrations required
Report: Extracting data.json ...
Report: Legacy migration pathway: 0.10 -> 0.11 -> 0.12
Report: legacy '0.12' -> 'main_0000' conversion required                                                                                                                         
Report: Initialising new archive...
Report: Unique repository files written: 0                                                                                                                                                  
Report: Converting DB to SQLite
Critical: failed to migrate the archive file (use `--verbosity DEBUG` to see traceback): StorageMigrationError:Database contains link with unknown input node: {'input': '26522c2f-13e0-4b80-af13-cbc8c09df211', 'output': 'a36001a4-fdad-4f0d-86c3-4373a5b88c6b', 'label': 'data', 'type': 'input_calc'}
$ verdi archive migrate -f 'tests/static/export/migrate/0.10_unknown_nodes_in_group.aiida' test.aiida
Report: Legacy migrations required
Report: Extracting data.json ...
Report: Legacy migration pathway: 0.10 -> 0.11 -> 0.12
Report: legacy '0.12' -> 'main_0000' conversion required                                                                                                                         
Report: Initialising new archive...
Report: Unique repository files written: 0                                                                                                                                                  
Report: Converting DB to SQLite
Warning: Dropped unknown nodes in groups: {'bde385c0-c1d5-4c6e-95db-b1b26026c7fc': {None, 'unknown'}}                                                                            
Report: Performing SQLite migrations:
Report: - main_0000 -> main_0001
Report: Finalising the migration ...
Success: migrated the archive to version 'main_0001'

I'll leave you to decide if you want to add more to the exception message, or more complex logic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants