Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checksum validation failure for vf742yx0561 #1520

Closed
honeybadger bot opened this issue May 1, 2020 · 9 comments
Closed

checksum validation failure for vf742yx0561 #1520

honeybadger bot opened this issue May 1, 2020 · 9 comments
Assignees
Labels
moab_remediation online moab that may need remediation (e.g. missing files, extraneous files, corrupted content)

Comments

@honeybadger
Copy link

honeybadger bot commented May 1, 2020

https://argo.stanford.edu/view/vf742yx0561

one of the parker manuscripts. cc @andrewjbtw and @blalbrit

Backtrace

[preservation_catalog/prod] Notice: validate_checksums(vf742yx0561, services-disk16) checksums for /services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0005/data/metadata/contentMetadata.xml version 5 do not match.

line 211 of [PROJECT_ROOT]/app/services/audit_results.rb: send_honeybadger_notification
line 169 of [PROJECT_ROOT]/app/services/audit_results.rb: block in report_results
line 154 of [PROJECT_ROOT]/app/services/audit_results.rb: each

View full backtrace and more info at honeybadger.io

checksum validation results:

[{:moab_file_checksum_mismatch=>
      "checksums for /services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0005/data/content/CCC062.tar.gz version 5 do not match."},
    {:moab_file_checksum_mismatch=>
      "checksums for /services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0005/data/metadata/contentMetadata.xml version 5 do not match."},
    {:cm_status_changed=>"CompleteMoab status changed from ok to invalid_checksum"}]
@jmartin-sul jmartin-sul changed the title [preservation_catalog/prod] Notice: validate_checksums(vf742yx0561, services-disk16) checksums for /services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0005/data/metadata/contentMetadata.xml version 5 do not match. checksum validation failure for vf742yx0561 May 1, 2020
@jmartin-sul jmartin-sul added the moab_remediation online moab that may need remediation (e.g. missing files, extraneous files, corrupted content) label May 1, 2020
@aaron-collier aaron-collier self-assigned this May 19, 2020
@aaron-collier
Copy link
Contributor

@andrewjbtw / @blalbrit the file being reported appears to be the correct size:

I, [2020-05-19T10:44:29.643991 #30894]  INFO -- : 2020-05-19T17:44:29Z CV validate_druid ended for vf742yx0561
=> [#<AuditResults:0x0000000008c86150
  @actual_version=6,
  @check_name="validate_checksums",
  @druid="vf742yx0561",
  @log_msg_prefix="validate_checksums(vf742yx0561, services-disk16)",
  @moab_storage_root=
   #<MoabStorageRoot:0x0000000008cfff00
    id: 17,
    name: "services-disk16",
    created_at: Tue, 16 Oct 2018 19:22:02 UTC +00:00,
    updated_at: Tue, 16 Oct 2018 19:22:02 UTC +00:00,
    storage_location: "/services-disk16/sdr2objects">,
  @result_array=
   [{:moab_file_checksum_mismatch=>"checksums for /services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0005/data/content/CCC062.tar.gz version 5 do not match."},
    {:moab_file_checksum_mismatch=>"checksums for /services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0005/data/metadata/contentMetadata.xml version 5 do not match."}],
  @string_prefix="validate_checksums (actual location: services-disk16; actual version: 6)">]
[3] pry(main)> ** [Honeybadger] Success ⚡ https://app.honeybadger.io/notice/dbf6d749-bff7-4a20-ac50-004a96b52ee9 id=dbf6d749-bff7-4a20-ac50-004a96b52ee9 code=201 level=1 pid=30894
** [Honeybadger] Success ⚡ https://app.honeybadger.io/notice/6a457d95-dc16-450e-a12e-fe9b0ebbe0e9 id=6a457d95-dc16-450e-a12e-fe9b0ebbe0e9 code=201 level=1 pid=30894
[3] pry(main)>
[4] pry(main)> quit
[pres@preservation-catalog-prod-01 current]$ ls -l /services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0005/data/content/
total 240893976
-rw-r--r-- 1 pres pres 245708067077 Feb 24  2019 CCC062.tar.gz
[pres@preservation-catalog-prod-01 current]$ ls -lh /services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0005/data/content/
total 230G
-rw-r--r-- 1 pres pres 229G Feb 24  2019 CCC062.tar.gz
[pres@preservation-catalog-prod-01 current]$

And reported by argo:

Screen Shot 2020-05-19 at 11 14 54 AM

So we'll have to figure out if the content is correct. A

@andrewjbtw
Copy link

It's a bit of a mystery how this object made it to preservation with this checksum error as it seems like the bag validation should have caught it.

There are six versions and versions 5 and 6 both have a tar archive. If version 6 is correct, then it seems like the best solution will be to manually edit the manifests so that the v5 checksum matches.

@andrewjbtw
Copy link

Confirmed that v6 has the correct content. Editing manifests to make everything match is time consuming, so setting aside time to finish up next week.

@andrewjbtw
Copy link

This morning I updated the following manifests so they're consistent with the correct v5 checksums for the tar.gz and the contentMetadata.xml:

in v0005:
fileInventoryDifference.xml
manifestInventory.xml
signatureCatalog.xml
versionAdditions.xml
versionInventory.xml

in v0006 (the versionInventory.xml and versionAdditions.xml files do not contain v5 checksums, as far as I can tell):
fileInventoryDifference.xml
manifestInventory.xml
signatureCatalog.xml

I then re-ran the audit. This time the checksums on the tar.gz were not a problem but I still got an error for the v5 contentMetadata.xml:

[1] pry(main)> Audit::Checksum.validate_druid('vf742yx0561')
I, [2020-05-26T11:21:47.980439 #686]  INFO -- : 2020-05-26T18:21:47Z CV validate_druid starting for vf742yx0561
D, [2020-05-26T11:21:48.021645 #686] DEBUG -- : Found 1 complete moabs.
E, [2020-05-26T12:06:59.968284 #686] ERROR -- : validate_checksums(vf742yx0561, services-disk16) checksums for /services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0005/data/metadata/contentMetadata.xm
l version 5 do not match.
** [Honeybadger] Reporting error id=3636e43b-663e-4230-8ba0-ed062a7566d9 level=1 pid=686
I, [2020-05-26T12:07:00.263492 #686]  INFO -- : [{:moab_file_checksum_mismatch=>"checksums for /services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0005/data/metadata/contentMetadata.xml version 5 do n
ot match."}] for vf742yx0561
I, [2020-05-26T12:07:00.263620 #686]  INFO -- : 2020-05-26T19:07:00Z CV validate_druid ended for vf742yx0561
=> [#<AuditResults:0x00000000069da408
  @actual_version=6,
  @check_name="validate_checksums",
  @druid="vf742yx0561",
  @log_msg_prefix="validate_checksums(vf742yx0561, services-disk16)",
  @moab_storage_root=
   #<MoabStorageRoot:0x0000000006ab3410
    id: 17,
    name: "services-disk16",
    created_at: Tue, 16 Oct 2018 19:22:02 UTC +00:00,
    updated_at: Tue, 16 Oct 2018 19:22:02 UTC +00:00,
    storage_location: "/services-disk16/sdr2objects">,
  @result_array=[{:moab_file_checksum_mismatch=>"checksums for /services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0005/data/metadata/contentMetadata.xml version 5 do not match."}],
  @string_prefix="validate_checksums (actual location: services-disk16; actual version: 6)">]

@ndushay
Copy link
Contributor

ndushay commented May 26, 2020

The manifest being used is always the latest signatureCatalog.xml: /services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0006/manifests/signatureCatalog.xml because the signatureCatalog.xml always includes the information from all previous versions.

It appears to have incorrect data for at least the size of v5 of the signatureCatalog.xml:

-rw-r--r-- 1 pres pres 496387 Jun 12  2015 v0001/data/metadata/contentMetadata.xml
-rw-r--r-- 1 pres pres 496387 Jul  6  2015 v0002/data/metadata/contentMetadata.xml
-rw-r--r-- 1 pres pres 499099 Jan 21  2017 v0003/data/metadata/contentMetadata.xml
-rw-rw-r-- 1 pres pres 479247 Apr 10  2017 v0004/data/metadata/contentMetadata.xml
-rw-r--r-- 1 pres pres 479650 Mar 18 13:03 v0005/data/metadata/contentMetadata.xml
-rw-rw-r-- 1 pres pres 479649 Apr 23 09:56 v0006/data/metadata/contentMetadata.xml
  <entry originalVersion="5" groupId="metadata" storagePath="contentMetadata.xml">
    <fileSignature size="479649" md5="4aa570c169ce71c878ba583ea7762d0e" sha1="f3f8717632ee3e2ef954377a3cfb476d9afa235e" sha256="a061505086fd3fd057d17bd5ee17b54d09985b0c2a9e41127e8ce1d127a8c5c8"/>
  </entry>

@ndushay
Copy link
Contributor

ndushay commented May 26, 2020

the size appears to be incorrect in v5 signatureCatalog.xml as well.

what the code gets for a newly computed Moab::FileSignature for v5 contentMetadata.xml

mfs_cm5 = Moab::FileSignature.new.signature_from_file(Pathname('/services-disk16/sdr2objects/vf/742/yx/0561/vf742yx0561/v0005/data/metadata/contentMetadata.xml'))
=> #<Moab::FileSignature:0x0000000008576248 @md5="4aa570c169ce71c878ba583ea7762d0e", @sha1=nil, @sha256=nil, @size=479650>

What's from the signatureCatalogs (v5 and v6) for Moab::FileSignature for v5 contentMetadata.xml

> mv5_sigcatentry_contentMetadata_v5
=> [#<Moab::SignatureCatalogEntry:0x0000000004007090
  @group_id="metadata",
  @path="contentMetadata.xml",
  @signature=
   #<Moab::FileSignature:0x0000000004001cd0
    @md5="4aa570c169ce71c878ba583ea7762d0e",
    @sha1="f3f8717632ee3e2ef954377a3cfb476d9afa235e",
    @sha256="a061505086fd3fd057d17bd5ee17b54d09985b0c2a9e41127e8ce1d127a8c5c8",
    @size=479649>,
  @version_id=5>]
> mv6_sigcatentry_contentMetadata_v5
=> [#<Moab::SignatureCatalogEntry:0x00000000055573a0
  @group_id="metadata",
  @path="contentMetadata.xml",
  @signature=
   #<Moab::FileSignature:0x0000000005554a38
    @md5="4aa570c169ce71c878ba583ea7762d0e",
    @sha1="f3f8717632ee3e2ef954377a3cfb476d9afa235e",
    @sha256="a061505086fd3fd057d17bd5ee17b54d09985b0c2a9e41127e8ce1d127a8c5c8",
    @size=479649>,
  @version_id=5>]

@andrewjbtw
Copy link

Fixing the file size fixed the problem. This druid just passed the audit:

[1] pry(main)> Audit::Checksum.validate_druid('vf742yx0561')
I, [2020-05-26T16:28:11.414732 #20505]  INFO -- : 2020-05-26T23:28:11Z CV validate_druid starting for vf742yx0561
D, [2020-05-26T16:28:11.464036 #20505] DEBUG -- : Found 1 complete moabs.
I, [2020-05-26T17:11:03.029445 #20505]  INFO -- : validate_checksums(vf742yx0561, services-disk16) checksum(s) match
I, [2020-05-26T17:11:03.029567 #20505]  INFO -- : validate_checksums(vf742yx0561, services-disk16) CompleteMoab status changed from invalid_checksum to ok
I, [2020-05-26T17:11:03.324241 #20505]  INFO -- : [{:moab_checksum_valid=>"checksum(s) match"}, {:cm_status_changed=>"CompleteMoab status changed from invalid_checksum to ok"}] for vf742yx0561
I, [2020-05-26T17:11:03.324364 #20505]  INFO -- : 2020-05-27T00:11:03Z CV validate_druid ended for vf742yx0561
=> [#<AuditResults:0x000000000778eb58
  @actual_version=6,
  @check_name="validate_checksums",
  @druid="vf742yx0561",
  @log_msg_prefix="validate_checksums(vf742yx0561, services-disk16)",
  @moab_storage_root=
   #<MoabStorageRoot:0x0000000007871e08
    id: 17,
    name: "services-disk16",
    created_at: Tue, 16 Oct 2018 19:22:02 UTC +00:00,
    updated_at: Tue, 16 Oct 2018 19:22:02 UTC +00:00,
    storage_location: "/services-disk16/sdr2objects">,
  @result_array=[{:moab_checksum_valid=>"checksum(s) match"}, {:cm_status_changed=>"CompleteMoab status changed from invalid_checksum to ok"}],
  @string_prefix="validate_checksums (actual location: services-disk16; actual version: 6)">]

@ndushay
Copy link
Contributor

ndushay commented May 27, 2020

@andrewjbtw so can this ticket be closed?

@andrewjbtw
Copy link

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
moab_remediation online moab that may need remediation (e.g. missing files, extraneous files, corrupted content)
Projects
None yet
Development

No branches or pull requests

4 participants