Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize and clean up lock #15917

Merged
merged 2 commits into from
Apr 19, 2016
Merged

Optimize and clean up lock #15917

merged 2 commits into from
Apr 19, 2016

Conversation

yuyichao
Copy link
Contributor

This PR has two related goals.

  1. Minor clean up of the lock code and make it available on non-threading build.
    • Wrap the lock in a jl_mutex_t type.

      This also fix the last remaining non-C-like syntax introduced by the threading branch...

    • Make jl_mutex_* available on non-threading build too.

      This will be useful for synchronization with the signal thread. The JL_{UN,}LOCK* macros
      are still disabled for non-threading build although type checks are added to make sure they
      are used correctly. (Working towards a cheap solution of make ccall sigatomic (defer SIGINT handling) #2622)

      This add ~100 bytes of global variable to libjulia.so which is unlikely to cause any change in
      binary size.

      IIUC it is better if the locks doesn't share a cacheline so we can add padding or alignments to
      jl_mutex_t later.

  2. Optimize lock performance
    • Avoid calling uv_thread_self multiple times.

    • Use a different workaround for GCC bug in store_release and use it in unlock.

      This saves a lot of cpu cycles compare to sequential consistent store or compare and exchange.

Timing of LOCK_NOGC UNLOCK_NOGC pair before,

julia> @benchmark test2()
================ Benchmark Results ========================
     Time per evaluation: 24.80 ns [24.35 ns, 25.25 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 5701
   Number of evaluations: 458401
         R² of OLS model: 0.951
 Time spent benchmarking: 3.54 s

After,

julia> @benchmark test2()
================ Benchmark Results ========================
     Time per evaluation: 12.07 ns [11.88 ns, 12.27 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 6101
   Number of evaluations: 670901
         R² of OLS model: 0.959
 Time spent benchmarking: 3.67 s

* Avoid calling `uv_thread_self` multiple times.
* Use a different workaround for GCC bug in `store_release` and use it
  in unlock.
* Wrap the lock in a `jl_mutex_t` type.
* Make `jl_mutex_t` available on non-threading build too.

    This will be useful for synchronization witht the signal thread.
@yuyichao yuyichao added the multithreading Base.Threads and related functionality label Apr 18, 2016
@yuyichao
Copy link
Contributor Author

Threading CI passed on travis and appveyor. The normal appveyor has a weird DivideError in the numbers test. I've restarted it...

@yuyichao
Copy link
Contributor Author

yuyichao commented Apr 18, 2016

Hmm, I believe the numbers test failure is

julia> Rational(Int8(0), 3) / Complex(3, 2)
ERROR: DivideError: integer division error
 [inlined code] from ./rational.jl:19
 in +(::Rational{Int64}, ::Rational{Int64}) at ./rational.jl:179
 [inlined code] from ./complex.jl:4
 in inv(::Complex{Rational{Int64}}) at ./complex.jl:118
 in /(::Rational{Int64}, ::Complex{Int64}) at ./rational.jl:191
 in eval(::Module, ::Any) at ./boot.jl:237

Edit: reported as #15920

#else
return (unsigned long)pthread_self();
#endif
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is breaking the libuv abstraction a good idea? I have a PR open for upstream libuv for a patch I made to our fork to handle thread affinity in the library so that we don't use pthreads directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I see it is that there's basically no way we can get rid of all the platform dependent code and using libuv for everything. For complex stuff like io and event loops libuv has a nice abstraction and we can use it but for simple or things that doesn't exist in libuv (especially threading and signal handling) I'm in favor of just implement it ourselves. Of course if we can get more features upstream in the way we want it's great but otherwise maintaining our own shouldn't be a big problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all of pthreads is implemented in all UNIX-like systems, so unfortunately, it isn't just the Windows thread API and the pthreads API as in this case, but quite a mess of #ifdefs. I'm happy to have our own threading API wrappers if folks agree, but I do think they should be collected in one place rather than having a mess of conditionals in a number of different places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not all of pthreads is implemented in all UNIX-like systems, so unfortunately, it isn't just the Windows thread API and the pthreads API as in this case, but quite a mess of #ifdefs.

At least it is for this case. I'm not saying we should throw away pthread for everything. The lock API for example is still useful for a number of cases.

I do think they should be collected in one place rather than having a mess of conditionals in a number of different places.

They are not. Other than what's in threading.c before, most (if not all) of the newly added threading related code are in julia_thread.h as inline functions or macros and they are as "collected in one place" as I can possibly make without breaking cross reference etc.

@vtjnash vtjnash merged commit 4efe30d into master Apr 19, 2016
@vtjnash vtjnash deleted the yyc/threads/lock branch April 19, 2016 15:15
@@ -1432,7 +1432,7 @@ JL_DLLEXPORT void JL_NORETURN jl_rethrow(void);
JL_DLLEXPORT void JL_NORETURN jl_rethrow_other(jl_value_t *e);

#ifdef JULIA_ENABLE_THREADING
STATIC_INLINE void jl_lock_frame_push(void (*unlock_func)(void))
static inline void jl_lock_frame_push(jl_mutex_t *lock)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stick to STATIC_INLINE

@StefanKarpinski
Copy link
Member

I'm generally in favor of not putting ourselves in the position of waiting for upstream changes to get merged – since that has generally been a rather slow process. However, I'm also increasingly concerned about our divergence from standard libuv.

@tkelman
Copy link
Contributor

tkelman commented Apr 19, 2016

Upstream on our fork of libuv is us, really. Rebasing the fork to recent versions is broken on windows still, last time that was tried.

@StefanKarpinski
Copy link
Member

I'm aware – just making a general comment that we really need to make a concerted effort to upstream some of the stuff on our libuv branch at some point. Not the most urgent thing in the world.

@kpamnany
Copy link
Contributor

It's been about a year since I opened the PR for upstream libuv (libuv/libuv#280, now libuv/libuv#597), so it's clearly not going very quickly. The point of my comment was that since we are using libuv to abstract platform-dependent functionality, it'd be best to keep all the platform-dependence flags in one place, there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants