Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve loading pages (library, titles) #222

Merged
merged 26 commits into from
Sep 6, 2021

Conversation

Leeingnyo
Copy link
Member

Implementations

To improve page loading, I cached lots of data. The principles are:

  • Avoid accessing info.json files
  • Reuse results of method that called many times with same arguments

Misc.

cache entry_cover_url

Like entry_display_name, cached entry_cover_url would improve rendering src/views/components/card.html.ecr. This is very effective.

cache display_name of title

This would avoid to access info.json

change where sort opt are saved

It's okay that sort options are saved when accessing library, book with parameters. I removed unnecessary actions. This would avoid to access info.json

Caching Library

This utility implemented at cache.cr helps cached data to be preserved after scanning.
This caches:

  • cover_url of each entry
  • results of deep_read_page_count of each title
  • sort options from info.json files

I think the second one is quite effective.

Sorted Entries Cache

There are lots of calls sorted_entries, which takes long (222 entries took 80ms on my machine). Besides it called three times with the same arguments when accessing title page. Caching them is effective since they are not changed until modifying entries. Currently, this option is not enabled as default. You should set the flag true for enabling. And the cache size option doesn't guarantee the machine to consume memory precisely.

Effectiveness

Test environment

I loaded a title page that has 222 entries (no nested title, about 11GB) on SATA3 SSD. I'm sorry that I have tested them only once

As is

first access: took 1140ms

image

second access: 1119ms (not changed)

image

took long, almost same time

With only caching entries' cover url

first access: 700ms (x1.6 faster)

image

second access: 735ms

image

With full implementation except sorted entries cache

first access: took 233ms (x4.9 faster)

image

second access: took 238ms

image

With full implementation

first access: took 115ms (x10 faster)

image

second access: took 50ms (even faster)

image


This resolve #186 partly.

@hkalexling
Copy link
Member

Thanks for the PR! I was trying to review this but got confused about two points. I am probably missing something so it would be great if you could explain it a bit more.

  • Why did you calculate the cache item instance size as @value.size * (instance_sizeof(String) + sizeof(String)) + @value.sum(&.size) + instance_sizeof(SortedEntriesCacheEntry)? I don't understand why you are adding instance_sizeof(String) and sizeof(String) together, and why you do @value.sum(&.size).
  • Why do we need cached_xx and cached_xx_previous? And when clearing the cache why do we do it in two steps (clear and clean)?

@Leeingnyo
Copy link
Member Author

Leeingnyo commented Sep 1, 2021

instance_sizeof(SortedEntriesCacheEntry): sizeof(Time) * 2 + sizeof(String) + sizeof(Array(String)) + sizeof(object id)

Since String and Array are a Reference type, they are somewhat pointers. Their sizeof return sizeof pointer (4bytes or 8bytes). I tried to calculate their actual data size. I thought @value, which of type is Array(String), would have @value.size of String, so I added them (@value.size * sizeof(String)). Again, the String is a reference type, I added instance_sizeof(String) too. In String, there would be a pointer that point to memory allocated for string data. So I added @value.sum(&.size) for allocated memory.

It's not perfectly accurate (I forgot to add instance_sizeof and data of @key, and there would be other stuff), but I think this would be close to actual size of memory consumption. Let me know if there is already fancy method to do this in the Crystal


After scanning, entries and titles could be added or removed. We should invalidate cached_xx for removed entries and titles and preserve cache for remained ones (I intend this).
At clear stage, it backs up cached_xx to cached_xx_previous and initializes cached_xx.
In scanning process, titles and entries call to move(=preserve) their cached data from cached_xx_previous to cached_xx (entries and titles).
After scanning, the clean stage initialize cached_xx_previous, hoping to be GCed.

I think my naming sense is not good 😢

@hkalexling
Copy link
Member

hkalexling commented Sep 2, 2021

Ah now it makes more sense. Thanks for the clarification!

String.size returns the number of unicode characters in the string, so the value might not be equal to the actual bytesize, but I guess we can assume that we will only be storing ASCII characters.

I think the SortedEntriesCache class can be easily generalized to a general in-memory key-value DB to cache other metadata as well. In fact I think we can implement the same two-stage clearing method you mentioned into it and store the title info data in it as well. In this way we can have a size limit that applies to all types of caches. What do you think?

I will test the current implementation and try to improve the naming a bit. I can then start exploring the idea above.

@Leeingnyo
Copy link
Member Author

I just notice there is the String#bytesize to get real byte size!

I think the data cached by InfoCache (i.e. display_name, sum of deep titles progress, cover_url) are small enough to stay in a memory. they are just additional three or four variables on each titles and entries and actually I want them not to be invalidated. but, yes, in other words, they are small to be cached in LRU cache system.

I agree that generalized cache entry would be good. Let me try to do. It takes a time learning the Crystal 😄 I'll ask you a help when having problems.
By the way, I checked the total size of info.json from my huge library (12000 entries, 500GB, 1 user), which is only 1MB (as text files, a info.json of title which has 1000 entries has the size about 80KB). It seems okay that whole info.json files could be cached in a memory. How about using the generalized LRU cache at TitleInfo#new?

@hkalexling
Copy link
Member

hkalexling commented Sep 2, 2021

Haha I didn't know about String#bytesize either but it's definitely a better choice.

The only problem I have with the current implementation is that we have two cache systems InfoCache and the LRU cache, while the LRU cache can be easily extended to replace InfoCache. I think having a single unified class for caching will make future maintenance and development easier.

Yes I agree that we can cache the metadata into the generalized LRU cache at TitleInfo#new. Since the generalized LRU cache will be a key-value store, we can use keys like [id]-coverurl and [id]-progress to store and query the data. For SortOption we can serialize it to JSON and cache it.

Let me try to do. It takes a time learning the Crystal 😄 I'll ask you a help when having problems.

Sure looking forward to it! As always, thank you for the effort you put into this :D

@Leeingnyo
Copy link
Member Author

Leeingnyo commented Sep 5, 2021

I Fixed my implementation

Implementation

LRUCache

This is general LRU Cache that is key-value store. the value is contained in CacheEntry(SaveT, ReturnT) for handling access time.

On CacheEntry(SaveT, ReturnT) , SaveT is a type of storing in the cache and ReturnT is a type of value that return value of get. e.g. SaveT of cache entry for sorted_entries is Array(String), while ReturnT of that is Array(Entry). If you have a cache entry of which SaveT and ReturnT are different each other, extend CacheEntry(SaveT, ReturnT) and override self.to_save_t and self.to_return_t. There are aliases, CacheableType and CacheEntryType. When you want more cacheable type, add them to these alias.

the generate_cache_entry method helps to make a cache entry easily.

It's a return value of get should be checked by is_a?.

I've cached

  • sorted_entries
  • sort_options
  • progress sum of titles
  • cover_url of entries and titles
  • whole TitleInfo

Fixmes

  • Tuple#instance_size is not implemented well.
  • auto-generatable CacheableType and CacheEntryType (?)

@Leeingnyo Leeingnyo closed this Sep 5, 2021
@Leeingnyo Leeingnyo reopened this Sep 5, 2021
@hkalexling
Copy link
Member

Thanks! The code looks great overall, and I just have one question - why do we need to cache the sort options and cover URLs separately when we already have the TitleInfo in the cache? You can easily get the JSON from the cache and then retrieve the information you need. This would be a little bit slower but I think the difference would be marginal. I am missing something?

auto-generatable CacheableType and CacheEntryType (?)

Perhaps we can somehow achieve this with the Crystal inherited macro. I will look into this.

@Leeingnyo
Copy link
Member Author

I agree that caching sort options and cover URLs separately to LRU cache is verbose. So I removed them and tested it, It took about 2ms to restore cached info.json string to JSON object of which title has about 220 entries, and took about 10ms more to load title page. I think this is fair enough.

I'm going to cache the cover_url in title instance like the cached_display_name. Because, for getting cover URLs of its nested titles, it retrieves info.json of titles. I think a member variable cached_cover_url is much cheaper.

@Leeingnyo Leeingnyo force-pushed the feature/enhance-loading-library branch from db5bc33 to 9807db6 Compare September 5, 2021 17:32
@hkalexling hkalexling merged commit da8a485 into getmango:dev Sep 6, 2021
@hkalexling
Copy link
Member

Again thanks a lot for the PR! I took the liberty and made some small changes.

@Leeingnyo
Copy link
Member Author

Great changes! Thanks for merging

@Leeingnyo Leeingnyo deleted the feature/enhance-loading-library branch September 6, 2021 08:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants