Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem: There is an upstream issue with PIM validation of PREMIS 3 #655

Open
ross-spencer opened this issue Apr 16, 2019 · 1 comment
Open
Labels
Ⓜ️ mets/premis METS/PREMIS issues Severity: medium An inconvenient situation where the software is usable but inconvenient or slow.

Comments

@ross-spencer
Copy link
Contributor

ross-spencer commented Apr 16, 2019

Expected behaviour

PIM validation works for PREMIS 3.

Current behaviour

I have logged this issue with the people at FCLA.

Steps to reproduce

Given an Archivematica qa/1.x/1.10 METS file:

You can download this schematron.

And run it with this script:

# -*- coding: utf-8 -*-

"""Test module to understand PREMIS schematron from FCLA."""

from __future__ import print_function
from lxml import etree, isoschematron


SCHEMATRON = "pim.stron"
METS = "mets.xml"


def _get_schematron():
    """Return a schematron object."""
    with open(SCHEMATRON, "r") as f:
        sct_doc = etree.parse(f)
    return isoschematron.Schematron(sct_doc, store_report=True)


def report_failures(doc):
    """Validate the XML and return the result."""
    schematron = _get_schematron()
    result = schematron.validate(doc)
    report = schematron.validation_report
    return result, report


def get_failures(report):
    """Look for failures in a schematron output.

  Looking for examples like the following:

    <svrl:failed-assert
        test="count(//mets:xmlData/pre:object) +
        count(//mets:xmlData/pre:agent) +
        count(//mets:xmlData/pre:rights) +
        count(//mets:xmlData/pre:event) &gt; 0 or
        count(//mets:xmlData/pre:premis) = 1"
        location="/*[local-name()='mets' and
        namespace-uri()='http://www.loc.gov/METS/']">
      <svrl:text>
          There must be PREMIS elements inside the METS container.
        </svrl:text>
    </svrl:failed-assert>

  """
    failures = report.xpath(
        "//svrl:failed-assert",
        namespaces={"svrl": "http://purl.oclc.org/dsdl/svrl"}
    )
    out = ""
    for res in failures:
        out = "{}{}\n".format(
            out,
            res.find(
                "svrl:text",
                namespaces={"svrl": "http://purl.oclc.org/dsdl/svrl"}
            ).text.strip(),
        )
    return out


with open(METS, "r") as valid:
    doc = etree.parse(valid)

result, report = report_failures(doc)
if report:
    print("METS, valid:", result, get_failures(report).strip())

And you will see the error: METS, valid: False There must be PREMIS elements inside the METS container.

Initially i thought there was a specific type of input causing this, and couldn't quite understand the pattern, but given more time to understand the issue, it stands to reason that any METS output by Archviematica right now is causing this issue.

Users can also recreate this issue via the Archivematica or PIM user interface, and they will see:

image

Your environment (version of Archivematica, OS version, etc)

qa/1.x


For Artefactual use:
Please make sure these steps are taken before moving this issue from Review to Verified in Waffle:

  • All PRs related to this issue are properly linked 👍
  • All PRs related to this issue have been merged 👍
  • Test plan for this issue has been implemented and passed 👍
  • Documentation regarding this issue has been written and it has been added to the release notes, if needed 👍
@evelynPM
Copy link

I confirmed that the issue is still happening in PIM with METS files produced by AM 1.10.1-qa, but the METS files validate in other online validators such as http://xmlvalidator.new-studio.org/ and https://www.freeformatter.com/xml-validator-xsd.html. PIM verifies that the way the PREMIS entities are embedded in METS follows guidelines, which the other validators don't do, but we haven't made any changes to the way we wrap PREMIS in METS so that shouldn't matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ⓜ️ mets/premis METS/PREMIS issues Severity: medium An inconvenient situation where the software is usable but inconvenient or slow.
Projects
None yet
Development

No branches or pull requests

3 participants