Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add black box test for AIP reingest #139

Merged
merged 1 commit into from
Mar 21, 2019
Merged

Conversation

replaceafill
Copy link
Contributor

@replaceafill replaceafill commented Mar 21, 2019

Copy link
Contributor

@ross-spencer ross-spencer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice simple test again. I've left two comments that you can choose to address for the two different queries. I wasn't sure about the fsentry method of retrieving a list of deleted files...

No reason not to merge once you're happy.

deleted_file.attrib["GROUPID"][6:] # remove "Group-"
for deleted_file in reingest_mets.tree.findall(
'mets:fileSec/mets:fileGrp[@USE="deleted"]/mets:file',
namespaces=context.mets_nsmap,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Xpath? Very cool. It seems like this query has reuse potential, I wonder if it can be extracted to a helper module somewhere?

def retrieve_filesec_by_use(mets_path, use):
   filesec_query = "'mets:fileSec/mets:fileGrp[@USE="{}"]/mets:file'".format(use)
   return reingest_mets.tree.findall(
               filesec_query,
               namespaces=context.mets_nsmap,,
      )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! I used this more generic approach so we can retrieve all the files as well (something we were already doing in the Create AIP tests and I updated those calls to use this):

def get_filesec_files(tree, use=None, nsmap={}):
    use_query = ""
    # an empty use parameter will retrieve all the files in the fileSec
    if use:
        use_query = '="{}"'.format(use)
    return tree.findall(
        "mets:fileSec/mets:fileGrp[@USE{}]/mets:file".format(use_query),
        namespaces=nsmap,
)

Let me know what you think.

# and verify that its file_uuid is included in the deleted files
initial_mets = metsrw.METSDocument.fromfile(context.initial_aip_mets_location)
original_files = [
fsentry for fsentry in initial_mets.all_files() if fsentry.use == "original"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the deleted_files list be constructed the same way? fsentry for fsentry in initial_mets.all_files() if fsentry.use == "deleted"? If so, we should consider the xpath query above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this and it doesn't seem possible. Maybe the metsrw bases its file logic in the structMap and the deleted files are not listed there after the reingest.

fsentry for fsentry in initial_mets.all_files() if fsentry.use == "original"
]
for fsentry in original_files:
if utils.get_premis_events_by_type(fsentry, "normalization"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the tests start to grow, it could be useful to have values like deleted, original, normalization in a controlled dictionary. But perhaps it isn't needed here? What do you think?

In my head, I'm also considering if that's something that could be found from metsrw? Or is that too circular a dependency when it comes to testing? Probably...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I started doing something like that in the transfer micro-services PR. Although the purpose of that is different I agree that the metsrw could help us abstract these constants.

@replaceafill replaceafill force-pushed the dev/issue-561-aip-reingest branch from 3798d9a to 2eaee14 Compare March 21, 2019 15:54
@replaceafill replaceafill merged commit 8b9d508 into master Mar 21, 2019
@replaceafill replaceafill deleted the dev/issue-561-aip-reingest branch March 21, 2019 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants