Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelise resampling #97

Closed
tkoskela opened this issue Jun 12, 2020 · 7 comments
Closed

Parallelise resampling #97

tkoskela opened this issue Jun 12, 2020 · 7 comments
Assignees

Comments

@tkoskela
Copy link
Member

As we scale up, resampling is becoming the bottleneck. Here is an example of time spent scaling from 1 to 64 cores.

image

Calculating which particles get resampled (the indices to the particle array) is not the problem, but rather copying the resampled particle states back to the particle array. Here is an example from #95

julia> using TDAC, BenchmarkTools

julia> state = rand(200,200,3,1000);

julia> state_buffer = Array{Float64,4}(undef, 200,200,3,1000);

julia> weight = rand(Float64,1000);

julia> nweight = weight/sum(weight);

julia> indices = Vector{Int}(undef,1000);

julia> @btime TDAC.resample!(indices,nweight);
  6.170 μs (1 allocation: 7.94 KiB)

julia> @btime TDAC.copy_resampled_state!(state,state_buffer,indices);
  368.403 ms (0 allocations: 0 bytes)
@tkoskela
Copy link
Member Author

tkoskela commented Jun 12, 2020

Thoughs and possible solutions

  • Because the velocity variables of the state are not needed at the beginning of the next time step (see New Initial State of Particles #67 for details), it is enough to copy the height variable after resampling. This speeds up the copy by 3x

  • Since we don't need the velocity variables in the resampling, we can omit them from the MPI gather and scatter operations.

  • To falicitate the above, it would probably make sense to store the height and velocities in separate variables, similar to what I suggested in Represent states with structs #58. This would not parallelise the process, just speed up the communication and the serial part, and postpone the problem.

  • To fully parallelise, we would have to get rid of the MPI gather and scatter calls completely. This would also solve the looming issue of running out of memory on the master rank when the particle array gets too large. Quick back-of-the-envelope calculation suggests that on CSD3 we will hit this limit at around 100 000 particles.

  • You can calculate the weights independently if you broadcast the true state observations (or calculate them redundantly on each rank -- this is better since we can do it for free)

  • For the resampling you need to gather the weights, but that’s only one number per particle, not 200x200

  • The copy part is the tricky one, each rank needs to retrieve the copies of their new particles from whoever has them, so it will create a lot of communication in a random pattern. I don't see a way around this, we might have to just bite the bullet and do it.

@tkoskela
Copy link
Member Author

PR #95 has implemented some of the above ideas,

  • Only copy the height variable after resampling
  • Omit velocity variables from MPI comms
  • Store height and velocity in separate variables
  • Calculate weights independently

This has Improved the scaling somewhat

image

@tkoskela
Copy link
Member Author

However, the State Copy is still a serial operation and will prevent scaling much further. It must be fully parallelised to progress further.

@tkoskela
Copy link
Member Author

Unfortunately #95 introduced some bugs and had to be rolled back. Some of the improvements to resampling were kept in #102, but the resampling is still serial. It will have to be parallelised to improve scaling.

@tkoskela tkoskela self-assigned this Jul 2, 2020
@tkoskela
Copy link
Member Author

tkoskela commented Jul 2, 2020

Status of master as of 2 July

image

@tkoskela tkoskela mentioned this issue Jul 2, 2020
6 tasks
@tkoskela
Copy link
Member Author

tkoskela commented Jul 8, 2020

#111 implements a two-pass parallel algorithm and is looking promising

image

@tkoskela
Copy link
Member Author

Resampling is parallelised by #111

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant