Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch to pre-fetching model for thread-safety #382

Merged
merged 9 commits into from
May 24, 2023
Merged

Conversation

omus
Copy link
Member

@omus omus commented Jun 20, 2022

While working on #359 I took a look over our caching logic for compiled tzdata. We are currently using a cache per-thread to avoid situations where read/writes for the same entry would occur concurrently. This approach ended up being the fastest as it avoided using any locks but does introduce some memory bloat as we have a cache per-thread.

As the compiled tzdata isn't all that large it seems more sensible to load the entirety of the contents into memory upon package initialization and have all threads just read from a single cache. The situation where we would want to write to this cache is rather rare so this seems like a simpler approach. The one issue I can see is that the only other time we mutate the cache is when TimeZones.build is called. As this function really only used to: solve build failures or experiment with other tzdata versions it seems reasonable to mark this function as thread-unsafe and call it a day.

@omus
Copy link
Member Author

omus commented Jun 20, 2022

@NHDaly let me know what you think of this alternate approach.

@codecov-commenter
Copy link

codecov-commenter commented Jun 30, 2022

Codecov Report

Merging #382 (463304d) into master (804864c) will decrease coverage by 0.02%.
The diff coverage is 100.00%.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

@@            Coverage Diff             @@
##           master     #382      +/-   ##
==========================================
- Coverage   95.58%   95.57%   -0.02%     
==========================================
  Files          38       38              
  Lines        1766     1762       -4     
==========================================
- Hits         1688     1684       -4     
  Misses         78       78              
Impacted Files Coverage Δ
src/TimeZones.jl 100.00% <100.00%> (ø)
src/build.jl 100.00% <100.00%> (ø)
src/types/timezone.jl 93.93% <100.00%> (-3.21%) ⬇️
src/tzdata/compile.jl 96.05% <100.00%> (+0.30%) ⬆️

@omus
Copy link
Member Author

omus commented Dec 17, 2022

Single nightly failure is:

 [3956] signal (11.1): Segmentation fault: 11
in expression starting at /Users/runner/work/TimeZones.jl/TimeZones.jl/test/thread-safety.jl:1

Time to remove these tests

@omus
Copy link
Member Author

omus commented Dec 21, 2022

@NHDaly let me know if you have any feedback

@NHDaly
Copy link
Contributor

NHDaly commented Dec 28, 2022

Yeah from the description that sounds excellent!

@nickrobinson251 told me that we can also construct new timezones if the user constructs a custom timezone or something? is that something we need to consider?

Otherwise, if not, i FULLY agree that preloading all of these makes sense!


One other possibility to add is that maybe you can make this an optional prefetch() / init() method that users can call, and they can optionally provide which timezones to load, if some users don't want to load all of them?

But that seems like overkill and i think i agree this sounds great to me! I haven't looked at the code yet, but the description sounds good

if isfile(tz_path)
open(TZJFile.read, tz_path, "r")(str)
elseif occursin(FIXED_TIME_ZONE_REGEX, str)
tz, class = get(_TZ_CACHE, str) do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused: why is there still a get() do here?

Shouldn't all of the timezones be pre-fetched now? Or do we only cache the variable time zones, not the FixedTimeZones?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use get here as we previously supported TimeZone("UTC+4") which isn't a time zone which would be automatically added to the cache. This allows this behaviour to continue but with the caveat that the construction of this new FixedTimeZone would have previously have been added to the cache but no longer. I very much suspect this is a non-issue as the caching would only be beneficial when calling TimeZone("UTC+4") multiple times.

@omus
Copy link
Member Author

omus commented Jan 4, 2023

@nickrobinson251 told me that we can also construct new timezones if the user constructs a custom timezone or something? is that something we need to consider?

It is possible that users can create custom FixedTimeZone or VariableTimeZone. That is still supported by TimeZones.jl but these time zones will no longer be automatically cached. More accurately, in older versions of TimeZones.jl if you were to specify say TimeZone("UTC+4") a FixedTimeZone would be created which would be entered into the cache. However, any instances of FixedTimeZone and VariableTimeZone created directly through their constructors wouldn't be added to the cache. Most likely these instances would be referenced via a constant and caching here is only useful for avoiding the deserialization cost which wouldn't exist for these time zones anyway.

One other possibility to add is that maybe you can make this an optional prefetch() / init() method that users can call, and they can optionally provide which timezones to load, if some users don't want to load all of them?

That's a reasonable idea if people really want to minimize the memory footprint or initialization time of the package. Since we're talking about KB here I think we can always add this functionality in later.

@NHDaly
Copy link
Contributor

NHDaly commented Jan 5, 2023

thanks, that's a helpful clarification. This looks good then! :)

@omus
Copy link
Member Author

omus commented May 24, 2023

I've self-reviewed this code again and it seems like a better path forward for handling thread safety. I'm re-running the tests again here to verify things are working with the latest version of Julia and then will proceed with a new release.

@omus omus merged commit 03af16a into master May 24, 2023
@omus omus deleted the cv/prefetch-cache branch May 24, 2023 04:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants