-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve message when dangling nodes or links are found in verdi archive import
#5378
Comments
verdi archive import
Heya, it would be great to get a "minimal working example" of such an archive file, that can then be added to the test suite, and I can improve the warning message from there |
You will have to create it manually because I don't think the interface allows it, and I don't know how to manually hack the new archive format. But essentially just create an archive containing 1 node and 1 link that points to it. The link will be dangling because it should have a second node to point to. That should be the MWE for reproduction |
Yep that's what I mean, not in the new archive format though since that will literally not allow this (as expected by foreign key constraints), but in one of the old formats, so I can test the migration. But this is for you guys to do |
There you go, that should do it I think |
cheers 👍 |
The full error is below (run using #5375). It's easy to change the behaviour, but what should it be @giovannipizzi @sphuber:
The relevant code: if data['links_uuid']:
def _transform_link(link_row):
return {
'input_id': node_uuid_map[link_row['input']], # <- key error is here
'output_id': node_uuid_map[link_row['output']],
'label': link_row['label'],
'type': link_row['type']
}
with get_progress_reporter()(desc='Adding Links', total=len(data['links_uuid'])) as progress:
for nrows, rows in batch_iter(data['links_uuid'], batch_size, transform=_transform_link):
connection.execute(insert(v1_schema.DbLink.__table__), rows)
progress.update(nrows) The error: $ verdi archive migrate --verbosity DEBUG 'tests/static/export/migrate/0.10_dangling_link.aiida' test.aiida
Report: Legacy migrations required
Report: Extracting data.json ...
Report: Legacy migration pathway: 0.10 -> 0.11 -> 0.12
Performing migrations: 0.11 -> 0.12 100.0%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2
Report: legacy '0.12' -> 'main_0000' conversion required
Report: Initialising new archive...
Converting repo 100.0%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10
Report: Unique files written: 0
Report: Converting DB to SQLite
Adding Users 100.0%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Adding Nodes 100.0%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1
Adding Links 0.0%| | 0/1
Traceback (most recent call last):
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/bin/verdi", line 8, in <module>
sys.exit(verdi())
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/lib/python3.8/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/lib/python3.8/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/lib/python3.8/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/lib/python3.8/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/.tox/py38-pre-commit/lib/python3.8/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/cmdline/commands/cmd_archive.py", line 245, in migrate
archive_format.migrate(input_file, output_file, version, force=force, compression=6)
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/tools/archive/implementations/sqlite_zip/main.py", line 110, in migrate
return migrate(inpath, outpath, version, force=force, compression=compression)
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/storage/sqlite_zip/migrator.py", line 221, in migrate
db_path = perform_v1_migration(inpath, Path(tmpdirname), new_zip, central_dir, is_tar, metadata, data)
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/storage/sqlite_zip/migrations/legacy_to_main.py", line 131, in perform_v1_migration
_json_to_sqlite(working / DB_FILENAME, data, node_repos)
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/storage/sqlite_zip/migrations/legacy_to_main.py", line 199, in _json_to_sqlite
for nrows, rows in batch_iter(data['links_uuid'], batch_size, transform=_transform_link):
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/tools/archive/common.py", line 45, in batch_iter
current.append(transform(item))
File "/Users/chrisjsewell/Documents/GitHub/aiida_core_develop/aiida/storage/sqlite_zip/migrations/legacy_to_main.py", line 192, in _transform_link
'input_id': node_uuid_map[link_row['input']],
KeyError: '26522c2f-13e0-4b80-af13-cbc8c09df211' |
Similarly for groups, should missing uuids cause errors or warnings (and be omitted): # groups to nodes
if data['groups_uuid']:
# get mapping of node IDs to node UUIDs
group_uuid_map = {
uuid: pk for uuid, pk in connection.execute(select(v1_schema.DbGroup.uuid, v1_schema.DbGroup.id)) # pylint: disable=unnecessary-comprehension
}
length = sum(len(uuids) for uuids in data['groups_uuid'].values())
with get_progress_reporter()(desc='Adding Group-Nodes', total=length) as progress:
for group_uuid, node_uuids in data['groups_uuid'].items():
group_id = group_uuid_map[group_uuid]
connection.execute(
insert(v1_schema.DbGroupNodes.__table__), [{
'dbnode_id': node_uuid_map[uuid], # <- key error would be here
'dbgroup_id': group_id
} for uuid in node_uuids]
)
progress.update(len(node_uuids)) |
Mmm. For missing nodes in a group, probably a warning is enough. For missing nodes in an endpoint of a link, the safest bet is to stop with an error and ask to export differently - and this should not even happen in AiiDA 1.x I think? We could add more complex logic - i.e. if the dangling links are those that our current logic accepts, we can just issue a warning, but this is trickier to implement (not technically, but to do in a way that does not hardcode in another point of the code the design decision of which nodes are valid). So I'm OK with just raising with a clear error message, unless it's easy to reuse the same logic that is used e.g. to establish which links at a minimum should be followed (practically, for instance: if one of the inputs or outputs of a ProcessNode are missing, it's an error. If the creator of a Data node is missing it's acceptable, if the outgoing links of a Data node are missing, it's fine. |
See f0991b3 $ verdi archive migrate -f 'tests/static/export/migrate/0.10_dangling_link.aiida' test.aiida
Report: Legacy migrations required
Report: Extracting data.json ...
Report: Legacy migration pathway: 0.10 -> 0.11 -> 0.12
Report: legacy '0.12' -> 'main_0000' conversion required
Report: Initialising new archive...
Report: Unique repository files written: 0
Report: Converting DB to SQLite
Critical: failed to migrate the archive file (use `--verbosity DEBUG` to see traceback): StorageMigrationError:Database contains link with unknown input node: {'input': '26522c2f-13e0-4b80-af13-cbc8c09df211', 'output': 'a36001a4-fdad-4f0d-86c3-4373a5b88c6b', 'label': 'data', 'type': 'input_calc'} $ verdi archive migrate -f 'tests/static/export/migrate/0.10_unknown_nodes_in_group.aiida' test.aiida
Report: Legacy migrations required
Report: Extracting data.json ...
Report: Legacy migration pathway: 0.10 -> 0.11 -> 0.12
Report: legacy '0.12' -> 'main_0000' conversion required
Report: Initialising new archive...
Report: Unique repository files written: 0
Report: Converting DB to SQLite
Warning: Dropped unknown nodes in groups: {'bde385c0-c1d5-4c6e-95db-b1b26026c7fc': {None, 'unknown'}}
Report: Performing SQLite migrations:
Report: - main_0000 -> main_0001
Report: Finalising the migration ...
Success: migrated the archive to version 'main_0001' I'll leave you to decide if you want to add more to the exception message, or more complex logic |
In old AiiDA 0.x, export files could have dangling links - with the idea that one could just perform delta incremental exports, for just a few more nodes w.r.t. to those already available at the destination.
We realized that this is confusing for users that typically expect an archive to be fully self-contained.
However, while these archive cannot be migrated automatically in AiiDA 2 (see also #5377), migrations of old files will work in 1.x and create an archive with dangling links (e.g. this happens in v1.6.5).
When importing in 1.6.5, the error lists all missing UUIDs that is useful to detect the problem, and explains what the problem is:
In develop, one just gets a more obscure
Suggested actions:
@chrisjsewell what do you think? is this easy to do?
Note that at least one of the missing nodes is just referenced, actually, not as a link endpoint (maybe this was already caught in 1.6.5? I'm not sure though) but as a node that should have belonged to a group, in case this helps to debug the issue. I can also send privately the problematic file.
The text was updated successfully, but these errors were encountered: