Improve loading pages (library, titles) #222

Leeingnyo · 2021-08-30T15:49:29Z

Implementations

To improve page loading, I cached lots of data. The principles are:

Avoid accessing info.json files
Reuse results of method that called many times with same arguments

Misc.

cache entry_cover_url

Like entry_display_name, cached entry_cover_url would improve rendering src/views/components/card.html.ecr. This is very effective.

cache `display_name` of title

This would avoid to access info.json

change where sort opt are saved

It's okay that sort options are saved when accessing library, book with parameters. I removed unnecessary actions. This would avoid to access info.json

Caching Library

This utility implemented at cache.cr helps cached data to be preserved after scanning.
This caches:

cover_url of each entry
results of deep_read_page_count of each title
sort options from info.json files

I think the second one is quite effective.

Sorted Entries Cache

There are lots of calls sorted_entries, which takes long (222 entries took 80ms on my machine). Besides it called three times with the same arguments when accessing title page. Caching them is effective since they are not changed until modifying entries. Currently, this option is not enabled as default. You should set the flag true for enabling. And the cache size option doesn't guarantee the machine to consume memory precisely.

Effectiveness

Test environment

I loaded a title page that has 222 entries (no nested title, about 11GB) on SATA3 SSD. I'm sorry that I have tested them only once

As is

first access: took 1140ms

second access: 1119ms (not changed)

took long, almost same time

With only caching entries' cover url

first access: 700ms (x1.6 faster)

second access: 735ms

With full implementation except sorted entries cache

first access: took 233ms (x4.9 faster)

second access: took 238ms

With full implementation

first access: took 115ms (x10 faster)

second access: took 50ms (even faster)

This resolve #186 partly.

* Cache sum of entry progress * Cache cover_url * Cache display_name * Cache sort_opt

sorted_entries cached

optional

hkalexling · 2021-09-01T08:28:42Z

Thanks for the PR! I was trying to review this but got confused about two points. I am probably missing something so it would be great if you could explain it a bit more.

Why did you calculate the cache item instance size as @value.size * (instance_sizeof(String) + sizeof(String)) + @value.sum(&.size) + instance_sizeof(SortedEntriesCacheEntry)? I don't understand why you are adding instance_sizeof(String) and sizeof(String) together, and why you do @value.sum(&.size).
Why do we need cached_xx and cached_xx_previous? And when clearing the cache why do we do it in two steps (clear and clean)?

Leeingnyo · 2021-09-01T16:02:36Z

instance_sizeof(SortedEntriesCacheEntry): sizeof(Time) * 2 + sizeof(String) + sizeof(Array(String)) + sizeof(object id)

Since String and Array are a Reference type, they are somewhat pointers. Their sizeof return sizeof pointer (4bytes or 8bytes). I tried to calculate their actual data size. I thought @value, which of type is Array(String), would have @value.size of String, so I added them (@value.size * sizeof(String)). Again, the String is a reference type, I added instance_sizeof(String) too. In String, there would be a pointer that point to memory allocated for string data. So I added @value.sum(&.size) for allocated memory.

It's not perfectly accurate (I forgot to add instance_sizeof and data of @key, and there would be other stuff), but I think this would be close to actual size of memory consumption. Let me know if there is already fancy method to do this in the Crystal

After scanning, entries and titles could be added or removed. We should invalidate cached_xx for removed entries and titles and preserve cache for remained ones (I intend this).
At clear stage, it backs up cached_xx to cached_xx_previous and initializes cached_xx.
In scanning process, titles and entries call to move(=preserve) their cached data from cached_xx_previous to cached_xx (entries and titles).
After scanning, the clean stage initialize cached_xx_previous, hoping to be GCed.

I think my naming sense is not good 😢

hkalexling · 2021-09-02T03:09:44Z

Ah now it makes more sense. Thanks for the clarification!

String.size returns the number of unicode characters in the string, so the value might not be equal to the actual bytesize, but I guess we can assume that we will only be storing ASCII characters.

I think the SortedEntriesCache class can be easily generalized to a general in-memory key-value DB to cache other metadata as well. In fact I think we can implement the same two-stage clearing method you mentioned into it and store the title info data in it as well. In this way we can have a size limit that applies to all types of caches. What do you think?

I will test the current implementation and try to improve the naming a bit. I can then start exploring the idea above.

Leeingnyo · 2021-09-02T07:43:23Z

I just notice there is the String#bytesize to get real byte size!

I think the data cached by InfoCache (i.e. display_name, sum of deep titles progress, cover_url) are small enough to stay in a memory. they are just additional three or four variables on each titles and entries and actually I want them not to be invalidated. but, yes, in other words, they are small to be cached in LRU cache system.

I agree that generalized cache entry would be good. Let me try to do. It takes a time learning the Crystal 😄 I'll ask you a help when having problems.
By the way, I checked the total size of info.json from my huge library (12000 entries, 500GB, 1 user), which is only 1MB (as text files, a info.json of title which has 1000 entries has the size about 80KB). It seems okay that whole info.json files could be cached in a memory. How about using the generalized LRU cache at TitleInfo#new?

hkalexling · 2021-09-02T09:18:51Z

Haha I didn't know about String#bytesize either but it's definitely a better choice.

The only problem I have with the current implementation is that we have two cache systems InfoCache and the LRU cache, while the LRU cache can be easily extended to replace InfoCache. I think having a single unified class for caching will make future maintenance and development easier.

Yes I agree that we can cache the metadata into the generalized LRU cache at TitleInfo#new. Since the generalized LRU cache will be a key-value store, we can use keys like [id]-coverurl and [id]-progress to store and query the data. For SortOption we can serialize it to JSON and cache it.

Let me try to do. It takes a time learning the Crystal 😄 I'll ask you a help when having problems.

Sure looking forward to it! As always, thank you for the effort you put into this :D

Leeingnyo · 2021-09-05T02:39:22Z

I Fixed my implementation

Implementation

`LRUCache`

This is general LRU Cache that is key-value store. the value is contained in CacheEntry(SaveT, ReturnT) for handling access time.

On CacheEntry(SaveT, ReturnT) , SaveT is a type of storing in the cache and ReturnT is a type of value that return value of get. e.g. SaveT of cache entry for sorted_entries is Array(String), while ReturnT of that is Array(Entry). If you have a cache entry of which SaveT and ReturnT are different each other, extend CacheEntry(SaveT, ReturnT) and override self.to_save_t and self.to_return_t. There are aliases, CacheableType and CacheEntryType. When you want more cacheable type, add them to these alias.

the generate_cache_entry method helps to make a cache entry easily.

It's a return value of get should be checked by is_a?.

I've cached

sorted_entries
sort_options
progress sum of titles
cover_url of entries and titles
whole TitleInfo

Fixmes

Tuple#instance_size is not implemented well.
auto-generatable CacheableType and CacheEntryType (?)

hkalexling · 2021-09-05T10:55:26Z

Thanks! The code looks great overall, and I just have one question - why do we need to cache the sort options and cover URLs separately when we already have the TitleInfo in the cache? You can easily get the JSON from the cache and then retrieve the information you need. This would be a little bit slower but I think the difference would be marginal. I am missing something?

auto-generatable CacheableType and CacheEntryType (?)

Perhaps we can somehow achieve this with the Crystal inherited macro. I will look into this.

Leeingnyo · 2021-09-05T17:30:13Z

I agree that caching sort options and cover URLs separately to LRU cache is verbose. So I removed them and tested it, It took about 2ms to restore cached info.json string to JSON object of which title has about 220 entries, and took about 10ms more to load title page. I think this is fair enough.

I'm going to cache the cover_url in title instance like the cached_display_name. Because, for getting cover URLs of its nested titles, it retrieves info.json of titles. I think a member variable cached_cover_url is much cheaper.

hkalexling · 2021-09-06T08:50:28Z

Again thanks a lot for the PR! I took the liberty and made some small changes.

Leeingnyo · 2021-09-06T08:57:27Z

Great changes! Thanks for merging

Leeingnyo added 8 commits August 30, 2021 08:24

Cache entries' cover_url

244f97a

Cache display_name

51a47b5

Prevent saving a sort opt unnecessarily

00c9cc1

Implement library caching TitleInfo

4a09aee

* Cache sum of entry progress * Cache cover_url * Cache display_name * Cache sort_opt

Implement sorted entries cache

bf81a4e

sorted_entries cached

Add config for sorted entries cache

e988a8c

optional

Set cache if enabled

601346b

Change kbs to mbs

365f71c

Leeingnyo added 8 commits September 3, 2021 11:11

Use bytesize and add comments

0a8fd99

Move entry specific method

9e90aa1

Make entry generic

5e919d3

Rename

0fd7cae

Replace InfoCache to LRUCache

de410f4

Cache TitleInfo using LRUCache

847f516

Make LRUCache togglable

11976b1

make check

c75c717

Leeingnyo closed this Sep 5, 2021

Leeingnyo reopened this Sep 5, 2021

Improve instance_size for Tuple

c5b6a8b

Leeingnyo added 2 commits September 6, 2021 02:32

Remove caching verbosely, add cached_cover_url

565a535

Fix bug on entry_cover_url_cache

9807db6

Leeingnyo force-pushed the feature/enhance-loading-library branch from db5bc33 to 9807db6 Compare September 5, 2021 17:32

Leeingnyo and others added 7 commits September 6, 2021 09:41

Set cached data when changed

5cb85ea

Remove unused variable

79ef7bc

Rename config fields and improve logging

51806f1

Add :sorted_entries suffix to gen_key

15a54f4

Fix logging

44d9c51

Rename ids2entries -> ids_to_entries

ca1e221

Document CacheEntry

d809c21

hkalexling approved these changes Sep 6, 2021

View reviewed changes

hkalexling merged commit da8a485 into getmango:dev Sep 6, 2021

Leeingnyo deleted the feature/enhance-loading-library branch September 6, 2021 08:57

hkalexling mentioned this pull request Sep 12, 2021

[Feature Request] Tachiyomi Support #33

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve loading pages (library, titles) #222

Improve loading pages (library, titles) #222

Leeingnyo commented Aug 30, 2021

hkalexling commented Sep 1, 2021

Leeingnyo commented Sep 1, 2021 •

edited

Loading

hkalexling commented Sep 2, 2021 •

edited

Loading

Leeingnyo commented Sep 2, 2021

hkalexling commented Sep 2, 2021 •

edited

Loading

Leeingnyo commented Sep 5, 2021 •

edited

Loading

hkalexling commented Sep 5, 2021

Leeingnyo commented Sep 5, 2021

hkalexling commented Sep 6, 2021

Leeingnyo commented Sep 6, 2021

Improve loading pages (library, titles) #222

Improve loading pages (library, titles) #222

Conversation

Leeingnyo commented Aug 30, 2021

Implementations

Misc.

cache entry_cover_url

cache display_name of title

change where sort opt are saved

Caching Library

Sorted Entries Cache

Effectiveness

Test environment

As is

With only caching entries' cover url

With full implementation except sorted entries cache

With full implementation

hkalexling commented Sep 1, 2021

Leeingnyo commented Sep 1, 2021 • edited Loading

hkalexling commented Sep 2, 2021 • edited Loading

Leeingnyo commented Sep 2, 2021

hkalexling commented Sep 2, 2021 • edited Loading

Leeingnyo commented Sep 5, 2021 • edited Loading

Implementation

LRUCache

Fixmes

hkalexling commented Sep 5, 2021

Leeingnyo commented Sep 5, 2021

hkalexling commented Sep 6, 2021

Leeingnyo commented Sep 6, 2021

cache `display_name` of title

Leeingnyo commented Sep 1, 2021 •

edited

Loading

hkalexling commented Sep 2, 2021 •

edited

Loading

hkalexling commented Sep 2, 2021 •

edited

Loading

Leeingnyo commented Sep 5, 2021 •

edited

Loading

`LRUCache`