Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan optimizations #419

Merged
merged 12 commits into from
Aug 5, 2019
Merged

Scan optimizations #419

merged 12 commits into from
Aug 5, 2019

Conversation

mmatous
Copy link
Contributor

@mmatous mmatous commented Aug 2, 2019

Hi, I did some optimizations for library scanning/reset, it should be way faster. My ancient potato with non-hw-accelerated luks-aes-encrypted 2TB hdd went from ~30min to ~30sec while scanning 98 song sample w/ 20 albums & 16 artists and reset from ~30sec to <10sec.
Most of the magic is in the first 2 commits. The last one should help in theory, but my benchmarks varied so much I can't really tell. I will run it again later on real data with way more files.
Could you run the scan prepended with time command for the whole thing and then again without the last commit, so we get more results?
I'll revert it if we don't get anything conclusive.

@mmatous
Copy link
Contributor Author

mmatous commented Aug 3, 2019

All commits:

Audios found: 6814
Duplicates found: 97
Written to library: 6717
Albums found: 579
Errors: 0
532.78user 126.71system 2:16:03elapsed 8%CPU (0avgtext+0avgdata 219836maxresident)k
26334208inputs+24outputs (67major+12695181minor)pagefaults 0swaps

Sans last:

Audios found: 6814
Duplicates found: 97
Written to library: 6717
Albums found: 579
Errors: 0
533.69user 125.59system 2:11:02elapsed 8%CPU (0avgtext+0avgdata 220192maxresident)k
26300280inputs+0outputs (21major+12667821minor)pagefaults 0swaps

659.49 vs 659.28 CPU time in total. Damn. That's one useless commit. I won't scan without the patches, it would takes ages.
Could you post your results, just to be sure? Preferably before patches, after, and without the last one.

@r4sas
Copy link
Contributor

r4sas commented Aug 3, 2019

273 tracks, 36 albums, 20+ artists. Used NC encryption module.
w/o:
Scan - 2:40
Reset - ~1 sec

w/ 2 commits:
Scan - scanning status notification broken, writing 0/0 but when pressed cancel, it returned scanned tracks amount. Scanned in ~1:05-1:10
Reset - ~1 sec

w/ 3 commits:
Scan - same issue with progress, scanned in ~1:10-1:20
Reset - ~1 sec

@Rello
Copy link
Owner

Rello commented Aug 4, 2019

@mmatous
Copy link
Contributor Author

mmatous commented Aug 4, 2019

a) Even after searching I have no idea what that is or why would I want that. It's not well documented and not really part of public API. If no bugs appear, I won't mess with that.
b) Definitely! I plan to split the scanForAudios() into smaller functions and wrap appropriate pieces using transactional().
I am currently trying to figure out EventSources using references from gallery and files app. Fixing the progress bar is a bit complicated since the app shared the progress through DB and method responding to request was not in the scan transaction. If it works it will likely provide additional (albeit small) performance benefit.

@Rello
Copy link
Owner

Rello commented Aug 4, 2019

yes. the change towards the status in the DB was necessary as session/cache values could not be stored. this was done on advice by the NC backend team.

The scanForAudios is a bit complex - yes. but it is working.
I would recommend to focus on the JS part because there your said you have the most experience. I think there are the biggest gains.
scan is complex but working - and it is only used once in a while.
It becomes tricky when working with external storage

I want to do a test also of your current implementation. but @r4sas test looked also interesting already

@mmatous
Copy link
Contributor Author

mmatous commented Aug 4, 2019

I reverted the commit (thanks @r4sas), rebased on current master and split the scan into smaller methods without really touching the insides. Moving ~200 LoC into try-catch with yet another level of indendation would make it unreadable.
Recommendation was OK while it was, but I don't like abusing DB and IO in general for every little thing and it should be even nicer now. Push instead of pull and all in one controller instance.
I could think of a few more things to improve (e.g. scan for all audio MIMEs and array_filter unsupported to reduce initial IO to 1/5, might especially help external storage) but I think this is big enough as is.

@Rello
Copy link
Owner

Rello commented Aug 5, 2019

thank you.
i will start testing. I like the EventSource part pretty much

I am thinking about the commit.
what do you think about making a commit every x loops? e.g. 100?

from the history there are regular tickets for users with either wrong php config in terms of tmp, runtime or external storage settings.
hitting one of these spots would mean that they will never be able to scan the whole library. Now there is the workaround, that they simply restart the scanner several times and they will eventually have scanned everything

what do you think?

Rello added a commit that referenced this pull request Aug 5, 2019
@Rello Rello merged commit 742760e into Rello:master Aug 5, 2019
Rello added a commit that referenced this pull request Aug 5, 2019
Rello added a commit that referenced this pull request Aug 5, 2019
@Rello
Copy link
Owner

Rello commented Aug 5, 2019

nice.
scan for 60 files from SMB storage down from 1:40 to 1:00

but: I am getting Error 405 when scanning from the Webfrontend. is it working for you?
no error in nextcloud.log or somewhere?!

@mmatous
Copy link
Contributor Author

mmatous commented Aug 5, 2019

but: I am getting Error 405 when scanning from the Webfrontend. is it working for you?

All OK for me. Are you sure correct routes.php and settings.js are being used? Could you post the saved request & response from devtools (save all as HAR, maybe censor access tokens there first)? I tested with FF and Brave with nginx 1.17.

what do you think about making a commit every x loops? e.g. 100?

I thought about that but when I saw a transaction limit somewhere (maybe the files app, I'm not sure, I went through a lot of code) it was 100k or maybe even 1M, so I opted against it.
Any commits between will mean slowdown of course, but feel free to experiment. Maybe try to reproduce one of those erroneous configs and see if it's a problem? If you point me towards what exactly you mean, I could test for it too.

@r4sas
Copy link
Contributor

r4sas commented Aug 5, 2019

I tested with current trunk, havent that problem. But at end of scanning and closing notification window right side of page not changes from "Add new tracks to library" page.

@Rello
Copy link
Owner

Rello commented Aug 5, 2019

One note for the mime types: the search goes for the filecach (table) and not for the files itself as far as i know.
So the idea with the wildcard search might not be of a benefit

@Rello
Copy link
Owner

Rello commented Aug 5, 2019

All OK for me. Are you sure correct routes.php and settings.js are being used? Could you post the saved request & response from devtools (save all as HAR, maybe censor access tokens there first)? I tested with FF and Brave with nginx 1.17.

ok. seems to be a cache issue. working now...

@mmatous
Copy link
Contributor Author

mmatous commented Aug 5, 2019

But at end of scanning and closing notification window right side of page not changes from "Add new tracks to library" page.

I know it does not, but what is the appropriate screen here? If the library gets reset, last session values get deleted. I think AP previously defaulted to the graphical albums view but this is the new default since the latest graphical view commits. What should be displayed after scan?

So the idea with the wildcard search might not be of a benefit

OK. One DB read + filter would probably still beat 5 separate DB reads, but not by a large amount. If measurable.

@r4sas
Copy link
Contributor

r4sas commented Aug 5, 2019

What should be displayed after scan?

I think we must switch page to clean track list (where just table without tracks). Or better to Titles - All Titles page.

Rello referenced this pull request Aug 24, 2019
@Rello Rello mentioned this pull request Aug 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants