-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem: PREMIS files are not validated #951
Comments
Note: I've only listed validating against the schema as a first iteration. Other checks might include:
|
Note additionally that this is something that will be used repeatedly for any Enduro user performing custom ingest activities that might generate PREMIS, and/or anyone submitting their own PREMIS files with a SIP. For this reason, ideally this will be implemented as a reusable temporal activity, rather than a client-specific child workflow. |
@mcantelon also, as discussed in the meeting today: Let's make this a general "Validate XML" task for its first pass, that can accept both a file to validate and a schema file to use for the validation as inputs. |
PR for CR: artefactual-sdps/temporal-activities#21 |
There are some comments about this issue in artefactual-sdps/preprocessing-sfa#22 (comment). |
PREMIS file looks good and it's being properly parsed into the METS. I think we can finally put this issue to bed! |
Is your feature request related to a problem? Please describe.
Whether generated by Enduro (through a child workflow) or included in a SIP, PREMIS XML files should be validated before the package is sent to preservation. Archivematica/a3m can parse a PREMIS file to add the file's events to the AIP METS, which happens quite late in the AM/a3m workflow - ensuring that the PREMIS file is valid will hopefully avoid errors at this late point.
Describe the solution you'd like
Add a new activity to validate the premis.xml file against the PREMIS v3 schema before sending to AM/a3m, ensuring that it's well-formed and valid.
PREMIS files generated by Enduro child workflows should always be validated. A PREMIS file included in a transfer may have been validated in advance, so it might not be necessary to validate these. A reasonable approach might be to validate any PREMIS file in the SIP's
metadata
directory, regardless of origin, as this is the file that will be picked up by Archivematica/a3m.Describe alternatives you've considered
None
Additional context
The text was updated successfully, but these errors were encountered: