Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

preservation copy of vq518cf9512 (service disk 8) has a number of issues #1395

Closed
jmartin-sul opened this issue Feb 22, 2020 · 6 comments
Closed
Assignees
Labels
moab_remediation online moab that may need remediation (e.g. missing files, extraneous files, corrupted content)

Comments

@jmartin-sul
Copy link
Member

jmartin-sul commented Feb 22, 2020

spawned from #1324.

when looking for moabs with non-ok status, one that turned up was:

CompleteMoab.joins(:preserved_object, :moab_storage_root).where.not(status: :ok).pluck(:status, :storage_location, :druid, :status_details)
[
...snip other moabs with other types of errors...
["invalid_checksum",
    "/services-disk08/sdr2objects",
    "vq518cf9512",
    "validate_checksums (actual location: services-disk08; ) /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/content/ds201_R1.0.0_Polysomnography_sub-9001-9049.tar) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/content/ds201_R1.0.0_Polysomnography_sub-9050-9100.tar) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/content/ds201_R1.0.0_T1-weighted.tar) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/content/ds201_R1.0.0_T2-weighted.tar) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/content/ds201_R1.0.0_dwi.tar) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/content/ds201_R1.0.0_fieldmap.tar) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/content/ds201_R1.0.0_non-imaging-data_notasksfmri.tgz) not found in Moab && checksums for /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/metadata/contentMetadata.xml version 1 do not match. && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/metadata/descMetadata.xml) not found in Moab && checksums for /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/metadata/events.xml version 1 do not match. && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/metadata/identityMetadata.xml) not found in Moab && checksums for /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/metadata/provenanceMetadata.xml version 1 do not match. && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/metadata/relationshipMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/metadata/rightsMetadata.xml) not found in Moab && checksums for /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/metadata/technicalMetadata.xml version 1 do not match. && checksums for /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/metadata/versionMetadata.xml version 1 do not match. && checksums for /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/metadata/workflows.xml version 1 do not match. && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/data/metadata/contentMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/data/metadata/descMetadata.xml) not found in Moab && checksums for /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/data/metadata/events.xml version 2 do not match. && checksums for /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/data/metadata/provenanceMetadata.xml version 2 do not match. && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/data/metadata/relationshipMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/data/metadata/rightsMetadata.xml) not found in Moab && checksums for /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/data/metadata/versionMetadata.xml version 2 do not match. && checksums for /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/data/metadata/workflows.xml version 2 do not match. && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0003/data/metadata/descMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0003/data/metadata/events.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0003/data/metadata/identityMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0003/data/metadata/provenanceMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0003/data/metadata/versionMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0003/data/metadata/workflows.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0004/data/metadata/descMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0004/data/metadata/events.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0004/data/metadata/provenanceMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0004/data/metadata/versionMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0004/data/metadata/workflows.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0005/data/metadata/contentMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0005/data/metadata/events.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0005/data/metadata/provenanceMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0005/data/metadata/technicalMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0005/data/metadata/versionMetadata.xml) not found in Moab && /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0005/data/metadata/workflows.xml) not found in Moab && checksums for /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/data/metadata/versionMetadata.xml version 2 do not match. && CompleteMoab status changed from validity_unknown to invalid_checksum"],
...snip other moabs with other types of errors...
]

when looking at that object on disk, it appears that there are only two version directories, neither with data/content subdirectories, though the signatureCatalog.xml files expect a number of content files.

[pres@preservation-catalog-prod-02 ~]$ ls -R /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/
/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/:
v0001  v0002

/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001:
data  manifests

/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data:
metadata

/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/data/metadata:
contentMetadata.xml  events.xml  provenanceMetadata.xml  technicalMetadata.xml  versionMetadata.xml  workflows.xml

/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0001/manifests:
fileInventoryDifference.xml  manifestInventory.xml  signatureCatalog.xml  versionAdditions.xml  versionInventory.xml

/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002:
data  manifests

/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/data:
metadata

/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/data/metadata:
events.xml  provenanceMetadata.xml  versionMetadata.xml  workflows.xml

/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests:
fileInventoryDifference.xml  manifestInventory.xml  signatureCatalog.xml  versionAdditions.xml  versionInventory.xml
[pres@preservation-catalog-prod-02 ~]$ 

other oddities:

  • signatureCatalog.xml files also refer to a number of missing datastream files.
  • there are some checksum mis-matches for files that are present (e.g. checksums for /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/data/metadata/events.xml version 2 do not match.)
  • v2 sigCat refers to files from later versions (which don't even have version directories in this moab!). this definitely doesn't conform with my recollection of the moab spec (signatureCatalog.xml for a given version should have signatures for content from all prior versions, but no knowledge of subsequent versions, as moabs are structured using a forward-delta versioning approach).
    • e.g. /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0002/manifests/signatureCatalog.xml refers to file (/services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/v0003/data/metadata/events.xml) not found in Moab
@jmartin-sul jmartin-sul added the moab_remediation online moab that may need remediation (e.g. missing files, extraneous files, corrupted content) label Feb 22, 2020
@jmartin-sul jmartin-sul changed the title preservation copy of vq518cf9512 has a number of issues preservation copy of vq518cf9512 (service disk 8) has a number of issues Feb 22, 2020
@jmartin-sul
Copy link
Member Author

questions for all of these remediations: have the moabs in question been replicated? if so, do the archives need to be re-pushed?

related useful query: https://github.com/sul-dlss/preservation_catalog/tree/master/db#view-the-zip-parts-for-a-given-druid

input> druid = 'ab123cd4567'
input> ZipPart.joins(zipped_moab_version: [{ complete_moab: [:preserved_object] }, :zip_endpoint]).where(preserved_objects: { druid: druid }).pluck(:druid, 'current_version AS highest_version', 'zipped_moab_versions.version AS zip_version', :endpoint_name, :status)

@jermnelson
Copy link
Contributor

jermnelson commented Feb 26, 2020

With the help of @jmartin-sul did some exploration for the missing files in druid vq518cf9512 and found JIRA ticket LEGACY-2715 from 2016 that mentioned two files to be removed are:

  • ds201_R1.0.0_Polysomnography_sub-9001-9049.tar
  • ds201_R1.0.0_Polysomnography_sub-9050-9100.tar

The DRUID is accessible at this PURL link with these five remaining files available for download through stacks but are missing from both versions on the preservation storage root of /services-disk08/sdr2objects/vq/518/cf/9512/vq518cf9512/:

@andrewjbtw any suggestions on the best process for remediation of druid vq518cf9512?

@andrewjbtw
Copy link

I've looked at the expunge documentation and the procedures there aren't detailed on how to make something a valid Moab. The last step is literally:

5. Fix all the errors until it's right.

It also says that part of expunging is to set the version back to 1. Looking at preservation and the workflows, it appears that the versioning information is inconsistent across systems. It's not clear to me that all the steps were followed.

I think for remediation, I should try to bring things in line with what expunging is supposed to do. Since expunge is on the list of repo manager tasks, you can go ahead and assign this issue to me.

@jermnelson jermnelson assigned andrewjbtw and unassigned jermnelson Feb 26, 2020
@peetucket peetucket added this to the 2020 Workcycle Sprint 4 milestone Feb 26, 2020
@andrewjbtw
Copy link

There are many things wrong with this object beyond the Moab state. It looks like what should have happened was:

  1. Removal and/or replacement of files that were expunged.
  2. Resetting of current version to version 1. This involves:
    • Setting versionMetadata to show one version
    • Removing all workflow history from the workflow service except version 1
    • Restructuring the Moab to be a valid v001
    • Updating prescat to know about only v001
  3. Re-running the audit to verify the changes were correct

Looking at the JIRA ticket, it appears that the files went through multiple rounds of review, which prevented this from being a one-time process where all steps were followed in sequence. I believe this is why:

  • the Moab refers to files from later versions (there were originally more than two versions)
  • the Moab has files where checksums don't match (some files were modified without being renamed)

I think that if we accept that there is currently a v002 (avoiding the need to update prescat to show only one version) I can:

  1. Restore the content so that the current version contains the files and metadata it's supposed to have, and then run the audit to make sure the Moab is valid.
  2. Obliterate the workflows for earlier (yet higher-numbered) versions, leaving only a version 2. It won't be "accurate" to what really happened in version 2, but I think that's acceptable. The expunge process happens outside of any known workflows. This step is literally just to make sure things sync up across systems.
  3. Accession one more version with no substantive changes simply for the purposes of verifying that the object can be used again

@andrewjbtw
Copy link

I've now made this into a valid Moab. To do so I needed to:

  1. Make sure the correct files were in v0001. This involved copying from Stacks, and copying missing datastreams from Fedora that had been deleted at some point in the Moab.
  2. Edit all manifests for version 1 to match the new data.
  3. Edit all manifests for version 2 to match the new data -- in the future, it might be better to just delete version 2 and update prescat to know about only a version 1.

Remediation won't be complete until the workflows are also fixed and I've confirmed that a new object version can be created. I think what's left to do now is to delete the duplicate version 1 and version 2 workflows, which were likely created because the object was reaccessioned after the version numbering had been reset to 1.

@andrewjbtw
Copy link

Ok, I've cleaned up the workflows on this item, created a new version successfully, and re-run the audit successfully. So I think it's closeable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
moab_remediation online moab that may need remediation (e.g. missing files, extraneous files, corrupted content)
Projects
None yet
Development

No branches or pull requests

4 participants