Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Archives can sometimes be "duplicated" in the entries after modifying their contents #671

Closed
soarpasser opened this issue Aug 2, 2022 · 8 comments

Comments

@soarpasser
Copy link

Win10 0.8.6 msi

Description:
When locally editing a archive's contents (e.g. adding/removing images), said archive can sometimes get "duplicated" in the app's database, meaning that it'll show up as two different entries with different IDs despite being the same source file.
This means that deleting either one of these two entries will also lead to the other one being deleted - which is extremely annoying since you'll need to add the metadata all over again.

See:
image
The source file is exactly the same while the IDs are somehow different:
image
image

@Difegue
Copy link
Owner

Difegue commented Aug 3, 2022

👋 Running a database cleanup should fix this. Changed IDs should normally clean up their older variant, but depending on the state of the background worker(especially on windows) it might've missed it.

Backup your db just in case and let me know how it goes! Support for updated archives is fairly recent.

@soarpasser
Copy link
Author

soarpasser commented Aug 3, 2022

Support for updated archives is fairly recent.

Well, that's the exact reason why I'm opening this issue. I know that support for updated archives was added in 0.8.6, but these archives in question were also modified after said update (well, the original archives were added to the database before the update, does that matter?), which is why I find it a bit weird.

I've already manually fixed the issue so unfortunately I can't let you know what a database cleanup will do.

but depending on the state of the background worker(especially on windows) it might've missed it.

I think this might be the issue here. If this is a thing that can be optimized/fixed then it'll be great. If not then oh well. haha

@soarpasser
Copy link
Author

👋 Running a database cleanup should fix this. Changed IDs should normally clean up their older variant, but depending on the state of the background worker(especially on windows) it might've missed it.

Backup your db just in case and let me know how it goes! Support for updated archives is fairly recent.

So this issue popped up again today and I decided to try the database cleanup option like you suggested.

The result is that LR decided to keep the new gallery with no metadata instead of the one with - so I suppose the new gallery here is the updated one.
image
image
So the issue here seems to be that support for updated archives seems to still have some issues - basically it's not copying the metadata of the old archive over to the new and updated one when it should. Maybe you can look into this problem and see if it's fixable.

@soarpasser
Copy link
Author

soarpasser commented Aug 15, 2022

P.S.
Interestingly, from my experience this issue only seems to happen when the first file in the archive is changed, basically something like this:
image
image

It seems that as long as the first file is not changed then everything usually works fine, but when it is then this issue ALWAYS happens. Maybe this is the cause of the problem?

P.P.S: Cleaning the database will get rid of the duplicate file, but it's the old file ID that's being removed, and the metadata of said old archive does not carry over to the new one - which means that all the metadata of that archive is lost. This shouldn't be intended behavior and hopefully will be fixed in the future.

Edit - looks like we've found the cause. in #685 we've learnt that by modifying the first 512kb of a file the archive ID will change, thus resulting in another duplicate gallery being created - but what shouldn't happen is

A. The old archive not being deleted and still "stuck" to the new archive (deleting either archive in LR will result in the both of them being removed).

B. The old archive's metadata not being copied to the new archive. This means that in the event a user wishes to clear the duplicate by cleaning the database, they will also lose all the metadata they've attatched to this archive.

Let's see if there'll be a good fix for this issue.

@polak14
Copy link
Contributor

polak14 commented Aug 28, 2022

polak14@70772fe

record_19_34_57.mp4

Seems to work as expected, more testing would be welcome. Anything you see wrong here? @Difegue

@Difegue
Copy link
Owner

Difegue commented Aug 29, 2022

That does look like it'd do the trick - - Although we shouldn't have to do this since the redis rename operation shouldn't create a second new key to begin with 😞

Maybe I missed something in the docs about it, needs some more digging.
Thanks for confirming this works for you though! It helps a lot. 🙏

@polak14
Copy link
Contributor

polak14 commented Aug 29, 2022

Although we shouldn't have to do this since the redis rename operation shouldn't create a second new key to begin with disappointed

And you are right, we don't have to and i found the problem.

For clarity:
old_id = d5ec6e08c5ce11f843a5d98710e2f533f308daed
new_id = adb04a2bfb45fd87d9384a38205a357cb539c29d

I replaced test2.cbz and it tells me it changed the ID but the archive itself disappears.

--Shinobu--
[2022-08-29 17:19:22] [Shinobu] [debug] Received inotify event create on /opt/LANraragi/content/test2.cbz
[2022-08-29 17:19:22] [Shinobu] [debug] New file detected: /opt/LANraragi/content/test2.cbz
[2022-08-29 17:19:22] [Shinobu] [debug] Adding /opt/LANraragi/content/test2.cbz to Shinobu filemap.
[2022-08-29 17:19:22] [Shinobu] [debug] Computed ID is adb04a2bfb45fd87d9384a38205a357cb539c29d.
[2022-08-29 17:19:22] [Shinobu] [debug] /opt/LANraragi/content/test2.cbz was logged but is already in the filemap!
[2022-08-29 17:19:22] [Shinobu] [debug] /opt/LANraragi/content/test2.cbz has a different ID than the one in the filemap! (d5ec6e08c5ce11f843a5d98710e2f533f308daed)
[2022-08-29 17:19:22] [Shinobu] [info] /opt/LANraragi/content/test2.cbz has been modified, updating its ID from d5ec6e08c5ce11f843a5d98710e2f533f308daed to adb04a2bfb45fd87d9384a38205a357cb539c29d.

--LANraragi--
[2022-08-29 17:19:22] [Archive] [debug] Changing ID d5ec6e08c5ce11f843a5d98710e2f533f308daed to adb04a2bfb45fd87d9384a38205a357cb539c29d

Now if you press "clean database", the archive appears again BUT it turns the ID back to the old ID despite it still being the new file.

--LANraragi--
[2022-08-29 17:21:14] [Archive] [info] Saving automatic backup to /opt/LANraragi/autobackup.json
[2022-08-29 17:21:14] [Archive] [warn] File exists but its ID is no longer adb04a2bfb45fd87d9384a38205a357cb539c29d!
[2022-08-29 17:21:14] [Archive] [warn] Trying to find its new ID in the Shinobu filemap...
[2022-08-29 17:21:14] [Archive] [warn] Found d5ec6e08c5ce11f843a5d98710e2f533f308daed in the filemap! Changing ID from adb04a2bfb45fd87d9384a38205a357cb539c29d to it.
[2022-08-29 17:21:14] [Archive] [debug] Changing ID adb04a2bfb45fd87d9384a38205a357cb539c29d to d5ec6e08c5ce11f843a5d98710e2f533f308daed
[2022-08-29 17:21:14] [Archive] [debug] Updating categories that contained adb04a2bfb45fd87d9384a38205a357cb539c29d to d5ec6e08c5ce11f843a5d98710e2f533f308daed.
[2022-08-29 17:21:14] [Categories] [debug] Finding categories containing adb04a2bfb45fd87d9384a38205a357cb539c29d

So now it renamed it back to the old ID and if you rescan the archive folder it will create duplicates

--Shinobu--
[2022-08-29 17:24:45] [Shinobu] [debug] Adding /opt/LANraragi/content/test2.cbz to Shinobu filemap.
[2022-08-29 17:24:45] [Shinobu] [debug] Computed ID is adb04a2bfb45fd87d9384a38205a357cb539c29d.
[2022-08-29 17:24:45] [Shinobu] [info] Adding new file /opt/LANraragi/content/test2.cbz with ID adb04a2bfb45fd87d9384a38205a357cb539c29d

Now clean database again, and it gets rid of the old (and duplicate) archive, but now you're left with a new archive and without any of the metadata carried over.

--LANraragi--
[2022-08-29 17:26:33] [Archive] [warn] File exists but its ID is no longer d5ec6e08c5ce11f843a5d98710e2f533f308daed!
[2022-08-29 17:26:33] [Archive] [warn] Trying to find its new ID in the Shinobu filemap...
[2022-08-29 17:26:33] [Archive] [warn] Found adb04a2bfb45fd87d9384a38205a357cb539c29d in the filemap! Changing ID from d5ec6e08c5ce11f843a5d98710e2f533f308daed to it.
[2022-08-29 17:26:33] [Archive] [warn] ID adb04a2bfb45fd87d9384a38205a357cb539c29d already exists in the database! Unlinking old ID.

This is avoided if you "rescan archive directory" before doing "clean database". It will also carry over the metadata if you do that (since it doesn't create a new entry in the database and just replaces the old one as it should)

I added 2 videos that hopefully make it a bit more clear whats happening.

rescan.archive.first.-.clean.database.mp4
clean.database.mp4

@Difegue
Copy link
Owner

Difegue commented Oct 14, 2022

Thanks for the detailed breakdown (and sorry for having taken so long to look at it!); Turns out the ID update code wasn't re-updating the master ID list after changing it, which would lead to issues like this when cleaning the DB later.

I'll have a fix for this in 0.8.7.

@Difegue Difegue closed this as completed Oct 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants