Optimize and clean up lock #15917

yuyichao · 2016-04-18T04:13:00Z

This PR has two related goals.

Minor clean up of the lock code and make it available on non-threading build.
- Wrap the lock in a jl_mutex_t type.
  
  This also fix the last remaining non-C-like syntax introduced by the threading branch...
- Make jl_mutex_* available on non-threading build too.
  
  This will be useful for synchronization with the signal thread. The JL_{UN,}LOCK* macros
  are still disabled for non-threading build although type checks are added to make sure they
  are used correctly. (Working towards a cheap solution of make ccall sigatomic (defer SIGINT handling) #2622)
  
  This add ~100 bytes of global variable to libjulia.so which is unlikely to cause any change in
  binary size.
  
  IIUC it is better if the locks doesn't share a cacheline so we can add padding or alignments to
  jl_mutex_t later.
Optimize lock performance
- Avoid calling uv_thread_self multiple times.
- Use a different workaround for GCC bug in store_release and use it in unlock.
  
  This saves a lot of cpu cycles compare to sequential consistent store or compare and exchange.

Timing of LOCK_NOGC UNLOCK_NOGC pair before,

julia> @benchmark test2()
================ Benchmark Results ========================
     Time per evaluation: 24.80 ns [24.35 ns, 25.25 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 5701
   Number of evaluations: 458401
         R² of OLS model: 0.951
 Time spent benchmarking: 3.54 s

After,

julia> @benchmark test2()
================ Benchmark Results ========================
     Time per evaluation: 12.07 ns [11.88 ns, 12.27 ns]
Proportion of time in GC: 0.00% [0.00%, 0.00%]
        Memory allocated: 0.00 bytes
   Number of allocations: 0 allocations
       Number of samples: 6101
   Number of evaluations: 670901
         R² of OLS model: 0.959
 Time spent benchmarking: 3.67 s

* Avoid calling `uv_thread_self` multiple times. * Use a different workaround for GCC bug in `store_release` and use it in unlock. * Wrap the lock in a `jl_mutex_t` type. * Make `jl_mutex_t` available on non-threading build too. This will be useful for synchronization witht the signal thread.

yuyichao · 2016-04-18T12:40:37Z

Threading CI passed on travis and appveyor. The normal appveyor has a weird DivideError in the numbers test. I've restarted it...

yuyichao · 2016-04-18T12:43:04Z

Hmm, I believe the numbers test failure is

julia> Rational(Int8(0), 3) / Complex(3, 2)
ERROR: DivideError: integer division error
 [inlined code] from ./rational.jl:19
 in +(::Rational{Int64}, ::Rational{Int64}) at ./rational.jl:179
 [inlined code] from ./complex.jl:4
 in inv(::Complex{Rational{Int64}}) at ./complex.jl:118
 in /(::Rational{Int64}, ::Complex{Int64}) at ./rational.jl:191
 in eval(::Module, ::Any) at ./boot.jl:237

Edit: reported as #15920

kpamnany · 2016-04-18T23:48:11Z

src/julia_threads.h

+#else
+    return (unsigned long)pthread_self();
+#endif
+}


Is breaking the libuv abstraction a good idea? I have a PR open for upstream libuv for a patch I made to our fork to handle thread affinity in the library so that we don't use pthreads directly.

The way I see it is that there's basically no way we can get rid of all the platform dependent code and using libuv for everything. For complex stuff like io and event loops libuv has a nice abstraction and we can use it but for simple or things that doesn't exist in libuv (especially threading and signal handling) I'm in favor of just implement it ourselves. Of course if we can get more features upstream in the way we want it's great but otherwise maintaining our own shouldn't be a big problem.

Not all of pthreads is implemented in all UNIX-like systems, so unfortunately, it isn't just the Windows thread API and the pthreads API as in this case, but quite a mess of #ifdefs. I'm happy to have our own threading API wrappers if folks agree, but I do think they should be collected in one place rather than having a mess of conditionals in a number of different places.

Not all of pthreads is implemented in all UNIX-like systems, so unfortunately, it isn't just the Windows thread API and the pthreads API as in this case, but quite a mess of #ifdefs.

At least it is for this case. I'm not saying we should throw away pthread for everything. The lock API for example is still useful for a number of cases.

I do think they should be collected in one place rather than having a mess of conditionals in a number of different places.

They are not. Other than what's in threading.c before, most (if not all) of the newly added threading related code are in julia_thread.h as inline functions or macros and they are as "collected in one place" as I can possibly make without breaking cross reference etc.

tkelman · 2016-04-19T20:16:29Z

src/julia.h

@@ -1432,7 +1432,7 @@ JL_DLLEXPORT void JL_NORETURN jl_rethrow(void);
 JL_DLLEXPORT void JL_NORETURN jl_rethrow_other(jl_value_t *e);

 #ifdef JULIA_ENABLE_THREADING
-STATIC_INLINE void jl_lock_frame_push(void (*unlock_func)(void))
+static inline void jl_lock_frame_push(jl_mutex_t *lock)


stick to STATIC_INLINE

StefanKarpinski · 2016-04-19T21:28:06Z

I'm generally in favor of not putting ourselves in the position of waiting for upstream changes to get merged – since that has generally been a rather slow process. However, I'm also increasingly concerned about our divergence from standard libuv.

tkelman · 2016-04-19T21:49:39Z

Upstream on our fork of libuv is us, really. Rebasing the fork to recent versions is broken on windows still, last time that was tried.

StefanKarpinski · 2016-04-19T21:53:48Z

I'm aware – just making a general comment that we really need to make a concerted effort to upstream some of the stuff on our libuv branch at some point. Not the most urgent thing in the world.

kpamnany · 2016-04-19T21:58:24Z

It's been about a year since I opened the PR for upstream libuv (libuv/libuv#280, now libuv/libuv#597), so it's clearly not going very quickly. The point of my comment was that since we are using libuv to abstract platform-dependent functionality, it'd be best to keep all the platform-dependence flags in one place, there.

yuyichao added 2 commits April 17, 2016 22:30

Replace uv_thread_self with a const function jl_thread_self.

57c7ee0

yuyichao added the multithreading Base.Threads and related functionality label Apr 18, 2016

yuyichao mentioned this pull request Apr 18, 2016

DivideError when dividing a zero rational by a complex number #15920

Closed

kpamnany reviewed Apr 18, 2016
View reviewed changes

vtjnash merged commit 4efe30d into master Apr 19, 2016

vtjnash deleted the yyc/threads/lock branch April 19, 2016 15:15

tkelman reviewed Apr 19, 2016
View reviewed changes

timholy mentioned this pull request Apr 20, 2016

thread: support for affinity and detach libuv/libuv#597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize and clean up lock #15917

Optimize and clean up lock #15917

yuyichao commented Apr 18, 2016

yuyichao commented Apr 18, 2016

yuyichao commented Apr 18, 2016 •

edited

Loading

kpamnany Apr 18, 2016

yuyichao Apr 19, 2016

kpamnany Apr 19, 2016

yuyichao Apr 19, 2016

tkelman Apr 19, 2016

StefanKarpinski commented Apr 19, 2016

tkelman commented Apr 19, 2016

StefanKarpinski commented Apr 19, 2016

kpamnany commented Apr 19, 2016

Optimize and clean up lock #15917

Optimize and clean up lock #15917

Conversation

yuyichao commented Apr 18, 2016

yuyichao commented Apr 18, 2016

yuyichao commented Apr 18, 2016 • edited Loading

kpamnany Apr 18, 2016

Choose a reason for hiding this comment

yuyichao Apr 19, 2016

Choose a reason for hiding this comment

kpamnany Apr 19, 2016

Choose a reason for hiding this comment

yuyichao Apr 19, 2016

Choose a reason for hiding this comment

tkelman Apr 19, 2016

Choose a reason for hiding this comment

StefanKarpinski commented Apr 19, 2016

tkelman commented Apr 19, 2016

StefanKarpinski commented Apr 19, 2016

kpamnany commented Apr 19, 2016

yuyichao commented Apr 18, 2016 •

edited

Loading