Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fr360kt0172 seems to have a duplicate content file, maybe as a result of an scp mistake? (services-disk16) #1399

Closed
jmartin-sul opened this issue Feb 22, 2020 · 4 comments
Assignees
Labels
moab_remediation online moab that may need remediation (e.g. missing files, extraneous files, corrupted content)

Comments

@jmartin-sul
Copy link
Member

spawned from #1324

CompleteMoab.joins(:preserved_object, :moab_storage_root).where.not(status: :ok).pluck(:status, :storage_location, :druid, :status_details)
[
...snip other moabs with other types of errors...
  ["invalid_checksum",
    "/services-disk16/sdr2objects",
    "fr360kt0172",
    "validate_checksums (actual location: services-disk16; ) Moab file /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/lyberadmin@lyberservices-prod was not found in Moab signature catalog /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0004/manifests/signatureCatalog.xml && CompleteMoab status changed from validity_unknown to invalid_checksum"]
...snip other moabs with other types of errors...
]

the file that's flagged as unexpected is actually pretty big (808 MB), but it's also the same size as a known content file...

[pres@preservation-catalog-prod-02 ~]$ ls -lah /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/
total 1.6G
drwxrwxr-x 2 pres pres 4.0K Oct 23  2018 .
drwxrwxr-x 4 pres pres 4.0K Oct 19  2018 ..
-rw-r--r-- 1 pres pres 808M Oct 23  2018 lyberadmin@lyberservices-prod
-rw-r--r-- 1 pres pres 808M Oct 18  2018 MVI_1372.AVI

if i checksum the two files repeatedly, it appears that the content is the same:

[pres@preservation-catalog-prod-02 ~]$ sha512sum /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/lyberadmin\@lyberservices-prod 
2b5c79afbbb763788b5226aba5b84d9a236d6b9fbc6b9ba5c23c0ab6702878302e9d4ffafcf4039cc7676cf1fc740b4046579156cfd7dae1ca25dabaa6497ad7  /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/lyberadmin@lyberservices-prod
[pres@preservation-catalog-prod-02 ~]$ 
[pres@preservation-catalog-prod-02 ~]$ sha512sum /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/lyberadmin\@lyberservices-prod 
2b5c79afbbb763788b5226aba5b84d9a236d6b9fbc6b9ba5c23c0ab6702878302e9d4ffafcf4039cc7676cf1fc740b4046579156cfd7dae1ca25dabaa6497ad7  /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/lyberadmin@lyberservices-prod
[pres@preservation-catalog-prod-02 ~]$ 
[pres@preservation-catalog-prod-02 ~]$ sha512sum /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/lyberadmin\@lyberservices-prod 
2b5c79afbbb763788b5226aba5b84d9a236d6b9fbc6b9ba5c23c0ab6702878302e9d4ffafcf4039cc7676cf1fc740b4046579156cfd7dae1ca25dabaa6497ad7  /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/lyberadmin@lyberservices-prod
[pres@preservation-catalog-prod-02 ~]$ 
[pres@preservation-catalog-prod-02 ~]$ 
[pres@preservation-catalog-prod-02 ~]$ 
[pres@preservation-catalog-prod-02 ~]$ sha512sum /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/MVI_1372.AVI 
2b5c79afbbb763788b5226aba5b84d9a236d6b9fbc6b9ba5c23c0ab6702878302e9d4ffafcf4039cc7676cf1fc740b4046579156cfd7dae1ca25dabaa6497ad7  /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/MVI_1372.AVI
[pres@preservation-catalog-prod-02 ~]$ 
[pres@preservation-catalog-prod-02 ~]$ sha512sum /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/MVI_1372.AVI 
2b5c79afbbb763788b5226aba5b84d9a236d6b9fbc6b9ba5c23c0ab6702878302e9d4ffafcf4039cc7676cf1fc740b4046579156cfd7dae1ca25dabaa6497ad7  /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/MVI_1372.AVI
[pres@preservation-catalog-prod-02 ~]$ 
[pres@preservation-catalog-prod-02 ~]$ sha512sum /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/MVI_1372.AVI 
2b5c79afbbb763788b5226aba5b84d9a236d6b9fbc6b9ba5c23c0ab6702878302e9d4ffafcf4039cc7676cf1fc740b4046579156cfd7dae1ca25dabaa6497ad7  /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/MVI_1372.AVI

i'm inclined to delete the file that's not in signatureCatalog.xml, under the assumption that it landed there as the result of a bad manual copy attempt, maybe via scp, considering the filename.

would of course want approval from @andrewjbtw before removing.

@jmartin-sul jmartin-sul self-assigned this Feb 22, 2020
@jmartin-sul jmartin-sul added the moab_remediation online moab that may need remediation (e.g. missing files, extraneous files, corrupted content) label Feb 22, 2020
@jmartin-sul
Copy link
Member Author

looks like this druid is also the subject of #1194

@jmartin-sul
Copy link
Member Author

diff says they are exactly the same file:

[pres@preservation-catalog-prod-02 ~]$ diff /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/lyberadmin\@lyberservices-prod /services-disk16/sdr2objects/fr/360/kt/0172/fr360kt0172/v0001/data/content/MVI_1372.AVI 
[pres@preservation-catalog-prod-02 ~]$ 

will delete this file and re-run validation.

@jmartin-sul
Copy link
Member Author

moab is now valid!

@jmartin-sul
Copy link
Member Author

jmartin-sul commented Feb 24, 2020

FWIW, this was already replicated with the extraneous file, since all the expected replicated copies are already there and ok:

[19] pry(main)> ZipPart.joins(zipped_moab_version: [{ complete_moab: [:preserved_object] }, :zip_endpoint]).where(preserved_objects: { druid: druid }).pluck(:druid, 'current_version AS highest_version', 'zipped_moab_versions.version AS zip_version', :endpoint_name, :status)
=> [["fr360kt0172", 4, 4, "aws_s3_east_1", "ok"],
 ["fr360kt0172", 4, 3, "aws_s3_east_1", "ok"],
 ["fr360kt0172", 4, 2, "aws_s3_east_1", "ok"],
 ["fr360kt0172", 4, 1, "aws_s3_east_1", "ok"],
 ["fr360kt0172", 4, 1, "aws_s3_west_2", "ok"],
 ["fr360kt0172", 4, 4, "ibm_us_south", "ok"],
 ["fr360kt0172", 4, 3, "ibm_us_south", "ok"],
 ["fr360kt0172", 4, 2, "ibm_us_south", "ok"],
 ["fr360kt0172", 4, 1, "ibm_us_south", "ok"],
 ["fr360kt0172", 4, 4, "aws_s3_west_2", "ok"],
 ["fr360kt0172", 4, 3, "aws_s3_west_2", "ok"],
 ["fr360kt0172", 4, 2, "aws_s3_west_2", "ok"]]

sounds like @andrewjbtw would prefer to re-push archives for such moabs, so that the cloud copies are conformant moabs if we ever have to retrieve them (then we won't puzzle over why they don't validate when we pull them).

filed #1402 for tracking that decision for all moabs facing this problem, and linked to the moab remediation tickets that are already open.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
moab_remediation online moab that may need remediation (e.g. missing files, extraneous files, corrupted content)
Projects
None yet
Development

No branches or pull requests

1 participant