-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add black box test for AIP reingest #139
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice simple test again. I've left two comments that you can choose to address for the two different queries. I wasn't sure about the fsentry
method of retrieving a list of deleted files...
No reason not to merge once you're happy.
features/steps/black_box_steps.py
Outdated
deleted_file.attrib["GROUPID"][6:] # remove "Group-" | ||
for deleted_file in reingest_mets.tree.findall( | ||
'mets:fileSec/mets:fileGrp[@USE="deleted"]/mets:file', | ||
namespaces=context.mets_nsmap, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this Xpath? Very cool. It seems like this query has reuse potential, I wonder if it can be extracted to a helper module somewhere?
def retrieve_filesec_by_use(mets_path, use):
filesec_query = "'mets:fileSec/mets:fileGrp[@USE="{}"]/mets:file'".format(use)
return reingest_mets.tree.findall(
filesec_query,
namespaces=context.mets_nsmap,,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! I used this more generic approach so we can retrieve all the files as well (something we were already doing in the Create AIP tests and I updated those calls to use this):
def get_filesec_files(tree, use=None, nsmap={}):
use_query = ""
# an empty use parameter will retrieve all the files in the fileSec
if use:
use_query = '="{}"'.format(use)
return tree.findall(
"mets:fileSec/mets:fileGrp[@USE{}]/mets:file".format(use_query),
namespaces=nsmap,
)
Let me know what you think.
# and verify that its file_uuid is included in the deleted files | ||
initial_mets = metsrw.METSDocument.fromfile(context.initial_aip_mets_location) | ||
original_files = [ | ||
fsentry for fsentry in initial_mets.all_files() if fsentry.use == "original" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could the deleted_files
list be constructed the same way? fsentry for fsentry in initial_mets.all_files() if fsentry.use == "deleted"
? If so, we should consider the xpath query above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried this and it doesn't seem possible. Maybe the metsrw bases its file logic in the structMap and the deleted files are not listed there after the reingest.
fsentry for fsentry in initial_mets.all_files() if fsentry.use == "original" | ||
] | ||
for fsentry in original_files: | ||
if utils.get_premis_events_by_type(fsentry, "normalization"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the tests start to grow, it could be useful to have values like deleted
, original
, normalization
in a controlled dictionary. But perhaps it isn't needed here? What do you think?
In my head, I'm also considering if that's something that could be found from metsrw
? Or is that too circular a dependency when it comes to testing? Probably...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. I started doing something like that in the transfer micro-services PR. Although the purpose of that is different I agree that the metsrw could help us abstract these constants.
3798d9a
to
2eaee14
Compare
This implements the AIP reingest scenarios from https://docs.google.com/document/d/1KGTsuc9NV5yaq-lUY7CVK3XY33C21zS3Ykr-N-6gpEg/edit#.
Connects to archivematica/Issues#561