Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile: allocate buffer for n instruction pointers per thread #41821

Merged

Conversation

IanButterworth
Copy link
Sponsor Member

Given that profiling samples all threads (except windows that only profiles the main thread) it seems to make sense to allocate nsamples per thread (except windows).

@@ -40,7 +40,7 @@ end
init(; n::Integer, delay::Real))

Configure the `delay` between backtraces (measured in seconds), and the number `n` of
instruction pointers that may be stored. Each instruction pointer corresponds to a single
instruction pointers that may be stored per thread. Each instruction pointer corresponds to a single
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a !!! compat note?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to a compat note

@vchuravy vchuravy added status:merge me PR is reviewed. Merge when all tests are passing profiler labels Aug 9, 2021
@@ -40,7 +40,7 @@ end
init(; n::Integer, delay::Real))

Configure the `delay` between backtraces (measured in seconds), and the number `n` of
instruction pointers that may be stored. Each instruction pointer corresponds to a single
instruction pointers that may be stored per thread. Each instruction pointer corresponds to a single
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to a compat note

@IanButterworth IanButterworth merged commit c12e63f into JuliaLang:master Aug 11, 2021
@DilumAluthge DilumAluthge removed the status:merge me PR is reviewed. Merge when all tests are passing label Aug 12, 2021
@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 16, 2021

It looks like this broke the linux32 build, as seen in the pre-merge CI

@IanButterworth
Copy link
Sponsor Member Author

Sorry about that.

The only way I can think that this PR is causing the error below is if somehow the nthreads var inside Profile.init() is messing with Threads.nthreads... but.... .....

Error in testset cmdlineargs:
Error During Test at /buildworker/worker/tester_linux32/build/share/julia/test/cmdlineargs.jl:202
  Test threw exception
  Expression: string(cpu_threads) == read(`$exename --threads auto -e $code`, String) == read(`$exename --threads=auto -e $code`, String) == read(`$exename -tauto -e $code`, String) == read(`$exename -t auto -e $code`, String)
  failed process: Process(`/buildworker/worker/tester_linux32/build/bin/julia -Cnative -J/buildworker/worker/tester_linux32/build/lib/julia/sys.so --depwarn=error --check-bounds=yes -g1 --startup-file=no --startup-file=no --color=no --threads auto -e 'print(Threads.nthreads())'`, ProcessExited(1)) [1]
  
  Stacktrace:
   [1] pipeline_error
     @ ./process.jl:532 [inlined]
   [2] read(cmd::Cmd)
     @ Base ./process.jl:419
   [3] read(cmd::Cmd, #unused#::Type{String})
     @ Base ./process.jl:428
   [4] macro expansion
     @ /buildworker/worker/package_linux32/build/usr/share/julia/stdlib/v1.8/Test/src/Test.jl:445 [inlined]
   [5] top-level scope
     @ /buildworker/worker/tester_linux32/build/share/julia/test/cmdlineargs.jl:202

@IanButterworth IanButterworth deleted the ib/profile_samples_per_thread branch August 16, 2021 18:15
@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 16, 2021

It is attempting to initialize profile space for 128 threads, and now failing (OOM)

@IanButterworth
Copy link
Sponsor Member Author

128 real threads, or is that a bug also?

If that's real, then options seem:

  1. Revert. It's not critical, more of a design tweak
  2. Reduce the default buffer size (per thread). How much ram does the linux32 128 thread machine have? Perhaps that can guide this.

I guess it's not reliable that a n-thread machine would have n-times more ram than a 1-thread machine, which this PR kind of assumes

"""
function init(; n::Union{Nothing,Integer} = nothing, delay::Union{Nothing,Real} = nothing)
n_cur = ccall(:jl_profile_maxlen_data, Csize_t, ())
delay_cur = ccall(:jl_profile_delay_nsec, UInt64, ())/10^9
if n === nothing && delay === nothing
return Int(n_cur), delay_cur
end
nnew = (n === nothing) ? n_cur : n
nthreads = Sys.iswindows() ? 1 : Threads.nthreads() # windows only profiles the main thread
nnew = (n === nothing) ? n_cur : n * nthreads
Copy link
Sponsor Member Author

@IanButterworth IanButterworth Aug 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's also a bug here. This method passes to the method below, so this was multiplying by nthreads twice. I'm about to open a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants