Profiling #69

fiedl · 2018-05-26T08:34:26Z

Which part of the new code does cost how much computation time?

Some considerations

Profiling within the kernel code and outputting the results is very imperformant itself and thus can only be done for a small number of photons, whereas the real performance issues arise when there are many photons as in flasher simulations (Is flasher study performance bad? #68).
In the GPU kernel, one can't just ask for the current time. But one can use the GPU clock. The clock counter are not nanoseconds, but are just GPU cycle ticks. Nevertheless, this gives an understanding of the relative amount of computation spent at the several parts of the code.

What needs to be done

Profiling the new medium propagation code: Layers, hole ice, sorting, medium loop.
What portion of the total propagation is spent in the new code?
How does that compare to standard clsim?
Are there differences when using GPU vs. CPU?

Results

On the CPU, the hole-ice-2018 code is almost as fast as standard clsim: Profiling #69 (comment)
But on the GPU, the hole-ice-2018 code costs 7 times as much time as standard clsim! Profiling #69 (comment)
The most expensive operations are allocating large arrays: Profiling #69 (comment)

from https://stackoverflow.com/a/34252109/2066546. fiedl/hole-ice-study#69

fiedl/hole-ice-study#69

fiedl · 2018-05-26T11:33:48Z

Profiling the propagation kernel on the cpu

Preparation: fiedl/clsim@7d7414d

[2018-05-26 13:26:30] fiedl@fiedl-mbp ~/hole-ice-study/scripts/FiringRange master ⚡
▶ ./run.rb --number-of-photons=1e4 --number-of-runs=1 --number-of-parallel-runs=1 --cpu

[2018-05-26 13:55:23] fiedl@fiedl-mbp ~/hole-ice-study/scripts/FiringRange master ⚡
▶ grep PROFILING tmp/clsim.log > tmp/profiling.txt

fiedl · 2018-05-26T12:00:06Z

The individual steps of the medium propagation code make a small portion of the propagation simulation time.

[2018-05-26 13:56:45] fiedl@fiedl-mbp ~/hole-ice-study/scripts/FiringRange master ⚡
▶ ../lib/plot_profiling_results.py tmp/profiling.txt

#69 (comment)

fiedl · 2018-05-26T13:43:09Z

How does this compare to standard clsim?

Standard clsim handles the medium propagation differently. The original source can be found here: https://github.com/fiedl/clsim/blob/icesim-v05-00-07/resources/kernels/propagation_kernel.c.cl

I've copied the original source to clsim/resources/kernels/lib/propagation_through_media/standard_clsim.c.

The method can be switched from the propagation_kernel.c.cl by applying the desired method and commenting out the other one:

// propagation_kernel.c.cl
// ...

floating_t distancePropagated = 0;
floating_t distanceToAbsorption = 0;

// apply_propagation_through_different_media(
//   photonPosAndTime,
//   photonDirAndWlen,
//   #ifdef HOLE_ICE
//     numberOfCylinders,
//     cylinderPositionsAndRadii,
//     cylinderScatteringLengths,
//     cylinderAbsorptionLengths,
//   #endif
//   &sca_step_left,
//   &abs_lens_left,
//   &distancePropagated,
//   &distanceToAbsorption
// );

apply_propagation_through_different_media_with_standard_clsim(
  photonPosAndTime,
  photonDirAndWlen,
  &sca_step_left,
  &abs_lens_left,
  &distancePropagated,
  &distanceToAbsorption
);

Source: https://github.com/fiedl/clsim/blob/icesim-v05-00-07/resources/kernels/propagation_kernel.c.cl fiedl/hole-ice-study#69 (comment)

fiedl · 2018-05-26T13:44:44Z

Run the profiling on the cpu with standard clsim:

[2018-05-26 15:44:19] fiedl@fiedl-mbp ~/hole-ice-study/scripts/FiringRange master ⚡
▶ time ./run.rb --number-of-photons=1e4 --number-of-runs=1 --number-of-parallel-runs=1 --cpu

# Ctrl-C
9794,71s user 7443,22s system 99% cpu 4:48:27,72 total

I cancelled the job, because it ran forever. Also, it might not be comparable to the previous run, because this run has less profiling outputs, i.e. takes a shorter amount of time for profiling.

Nevertheless, try to get some histograms out of that run:

[2018-05-26 20:37:28] fiedl@fiedl-mbp ~/hole-ice-study/scripts/FiringRange master ⚡
▶ grep PROFILING tmp/clsim.log |grep -v INFO > tmp/profiling.txt
▶ ../lib/plot_profiling_results.py tmp/profiling.txt

fiedl · 2018-05-26T19:22:49Z

In order to make it comparable, profile hole-ice-2018 again, but without intermediate profiling steps. Profile just the whole scattering step and the portion of it that goes into the new hole-ice code.

[2018-05-26 21:23:59] fiedl@fiedl-mbp ~/hole-ice-study/scripts/FiringRange master ⚡
▶ time ./run.rb --number-of-photons=1e3 --number-of-runs=1 --number-of-parallel-runs=1 --cpu > tmp/output.log

236,82s user 248,03s system 193% cpu 4:10,93 total

Some statistics:

[2018-05-26 21:28:34] fiedl@fiedl-mbp ~/hole-ice-study/scripts/FiringRange master ⚡
▶ grep "kernel statistics" tmp/clsim.log
PROFILINFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 1.69622e+08 nanoseconds/photon, 1024 photons (util: 95%) (Apple Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz)  (I3CLSimStepToPhotonConverterOpenCL.cxx:1212 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))

[2018-05-26 21:33:31] fiedl@fiedl-mbp ~/hole-ice-study/scripts/FiringRange master ⚡
▶ grep PROFILING tmp/clsim.log |grep -v INFO > tmp/profiling.txt
▶ ../lib/plot_profiling_results.py tmp/profiling.txt

fiedl · 2018-05-26T19:38:38Z

And, to compare it properly, the same statistics with standard clsim:

[2018-05-26 21:35:21] fiedl@fiedl-mbp ~/hole-ice-study/scripts/FiringRange master ⚡
▶ rm tmp/*
▶ time ./run.rb --number-of-photons=1e3 --number-of-runs=1 --number-of-parallel-runs=1 --cpu > tmp/output.log

194,35s user 212,68s system 184% cpu 3:40,22 total

[2018-05-26 21:41:04] fiedl@fiedl-mbp ~/hole-ice-study/scripts/FiringRange master ⚡
▶ grep PROFILING tmp/clsim.log |grep -v INFO > tmp/profiling.txt
▶ ../lib/plot_profiling_results.py tmp/profiling.txt

fiedl · 2018-05-26T19:39:03Z

add mean and standard deviation to these plots.
calculate how much the profiling itself costs and subtract this

fiedl · 2018-05-26T20:34:17Z

In order to get an idea how much the profiling itself costs, I'm adding a second profiling output at the same position in the code:

        clock_t t2 = clock();
        clock_t t1 = clock();
        apply_propagation_through_different_media(
          photonPosAndTime,
          photonDirAndWlen,
          #ifdef HOLE_ICE
            numberOfCylinders,
            cylinderPositionsAndRadii,
            cylinderScatteringLengths,
            cylinderAbsorptionLengths,
          #endif
          &sca_step_left,
          &abs_lens_left,
          &distancePropagated,
          &distanceToAbsorption
        );
        printf("PROFILING apply_propagation_through_different_media %lu\n", clock() - t1);
        printf("PROFILING apply_propagation_through_different_media_2 %lu\n", clock() - t2);

Without this extra output, the clock count has been 2.50 +- 2.61 (#69 (comment)).

With this extra output, the clock count is:

3.16 +- 3.65 for the inner clock counter
32.59 +- 32.70 for the outer clock counter
56.74 +- 44.80 for the whole scattering step clock counter

This is really strange. Thus, I can't really tell from this how much the performance measurement itself costs. Repeating the same measurement does confirm the strange result.

fiedl · 2018-05-26T20:44:28Z

Another attempt to measure the performance cost of the profiling itself: Drop the inner clock counter and measure just the overall scattering step.

        //clock_t t1 = clock();
        apply_propagation_through_different_media(
          photonPosAndTime,
          photonDirAndWlen,
          #ifdef HOLE_ICE
            numberOfCylinders,
            cylinderPositionsAndRadii,
            cylinderScatteringLengths,
            cylinderAbsorptionLengths,
          #endif
          &sca_step_left,
          &abs_lens_left,
          &distancePropagated,
          &distanceToAbsorption
        );
        //printf("PROFILING apply_propagation_through_different_media %lu\n", clock() - t1);

When both the t0 counter and the t1 counter are active, the outer measurement is 30.02 +- 29.37.

When only the t0 counter is active, the outer measurement is 3.00 +- 3.79.

These results suggest that the dominant (most expensive) operation is the printf that outputs the performance measurement result, which costs about 25 ticks whereas the scattering step only costs about 5 ticks.

fiedl · 2018-05-26T21:11:18Z

Repeat the same measurement of inner and outer clock counter, but print the result outside the counter.

        // start profiling the simulation step here
        clock_t t0 = clock();

        floating_t sca_step_left = -my_log(RNG_CALL_UNIFORM_OC);
        floating_t distancePropagated = 0;
        floating_t distanceToAbsorption = 0;
        
        clock_t t1 = clock();
        clock_t t2 = clock();
        apply_propagation_through_different_media(
          photonPosAndTime,
          photonDirAndWlen,
          #ifdef HOLE_ICE
            numberOfCylinders,
            cylinderPositionsAndRadii,
            cylinderScatteringLengths,
            cylinderAbsorptionLengths,
          #endif
          &sca_step_left,
          &abs_lens_left,
          &distancePropagated,
          &distanceToAbsorption
        );
        clock_t t3 = clock();
        clock_t t4 = clock();

        // ...

        clock_t t5 = clock();
        printf("PROFILING propagation_kernel_simulation_step %lu\n", t5 - t0);
        printf("PROFILING apply_propagation_through_different_media %lu\n", t3 - t2);
        printf("PROFILING apply_propagation_through_different_media_outer %lu\n", t4 - t1);

This suggests that the performance measurement itself costs about 4.2 ticks. Thus, the overall scattering step is 3.5 ticks, including the the hole-ice-2018 code, which is 2.8 ticks. If this is correct, the hole-ice-2018 code costs 80% of the total scattering step.

fiedl · 2018-05-26T21:26:45Z

Repeating the same for standard clsim:

        clock_t t1 = clock();
        clock_t t2 = clock();
        apply_propagation_through_different_media_with_standard_clsim(
          photonPosAndTime,
          photonDirAndWlen,
          &sca_step_left,
          &abs_lens_left,
          &distancePropagated,
          &distanceToAbsorption
        );
        clock_t t3 = clock();
        clock_t t4 = clock();

With standard clsim, the performance measurement again is about 4.1 ticks. The overall scattering step is 3.3 ticks, the standard-clsim media propagation code is 2.5 ticks, which is 76% of the total scattering step.

fiedl · 2018-05-26T21:38:52Z

Comparing the hole-ice-2018 algorithm to the standard-clsim algorithm:

When subtracting the performance measurement, the total scattering step costs:

hole-ice-2018: 3.5 ticks
standard clsim: 3.3 ticks

Thus, the hole-ice-2018 code costs about 6% more cpu time than standard clsim.

fiedl · 2018-05-26T21:41:52Z

GPU profiling

hole-ice-2018 code

[2018-05-26 23:45:26] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/clsim
▶ git pull

[2018-05-26 23:45:57] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study
▶ git pull

[2018-05-26 23:45:47] fiedl@kepler18 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FiringRange
▶ rm tmp/*
▶ time ./run.rb --number-of-photons=1e3 --number-of-runs=1 --number-of-parallel-runs=1 > tmp/output.log
5.14s user 3.93s system 12% cpu 1:15.24 total

[2018-05-26 23:49:15] fiedl@kepler18 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FiringRange
▶ grep PROFILING tmp/clsim.log |grep -v INFO > tmp/profiling.txt
▶ grep "kernel statistics" tmp/clsim.log
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 1.54137e+06 nanoseconds/photon, 1005 photons (util: 4%) (NVIDIA CUDA Tesla K80)  (I3CLSimStepToPhotonConverterOpenCL.cxx:1212 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))

[2018-05-26 23:50:31] fiedl@kepler18 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FiringRange
▶ mv tmp $SCRATCH/hole-ice-study/results/profiling/hole-ice-2018_gpu

[2018-05-27 00:00:50] fiedl@fiedl-mbp ~/hole-ice-study/scripts/lib master ⚡
▶ python plot_profiling_results.py ~/hole-ice-study/results/profiling/hole-ice-2018_gpu/profiling.txt

The performance measurement costs 117 ticks. The medium propagation costs 38022 ticks, the total scattering step costs 41060 ticks. The medium propagation makes 92.6% of the total scattering step gpu time.

fiedl/hole-ice-study#69 (comment)

these results are dominated by the cost of the `printf` rather than the actual work. See: #69 (comment)

fiedl/hole-ice-study#69 (comment)

fiedl · 2018-05-26T21:58:03Z

GPU profiling for standard clsim

[2018-05-26 23:53:19] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/clsim
▶ git pull

[2018-05-26 23:51:34] fiedl@kepler18 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FiringRange
▶ mkdir tmp
▶ time ./run.rb --number-of-photons=1e3 --number-of-runs=1 --number-of-parallel-runs=1 > tmp/output.log

5.58s user 3.90s system 8% cpu 1:49.88 total

[2018-05-26 23:57:04] fiedl@kepler18 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FiringRange
▶ grep PROFILING tmp/clsim.log |grep -v INFO > tmp/profiling.txt
▶ grep "kernel statistics" tmp/clsim.log
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 1.04102e+06 nanoseconds/photon, 1005 photons (util: 2%) (NVIDIA CUDA Tesla K80)  (I3CLSimStepToPhotonConverterOpenCL.cxx:1212 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))

[2018-05-26 23:57:19] fiedl@kepler18 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FiringRange
▶ mv tmp $SCRATCH/hole-ice-study/results/profiling/standard-clsim_gpu

[2018-05-27 00:03:38] fiedl@fiedl-mbp ~/hole-ice-study/scripts/lib master ⚡
▶ python plot_profiling_results.py ~/hole-ice-study/results/profiling/standard-clsim_gpu/profiling.txt

Performance measurement itself costs 114 ticks. The standard-clsim media propagation costs 1773 ticks, the total scattering step costs 5571 ticks. The media propagation makes 32% of the total scattering step when running standard clsim on the gpu.

On the gpu, the hole-ice-2018 code costs 7.37 as much time as standard clsim.

#69 (comment)

fiedl/hole-ice-study#69

fiedl · 2018-05-26T22:31:29Z

Run detailed gpu profiling for the hole-ice-2018 code.

[2018-05-27 00:26:25] fiedl@wgs16 /afs/ifh.de/group/amanda/scratch/fiedl/clsim
▶ git pull

[2018-05-27 00:27:41] fiedl@kepler18 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FiringRange
▶ mkdir tmp
▶ time ./run.rb --number-of-photons=1e3 --number-of-runs=1 --number-of-parallel-runs=1 > tmp/output.log

1.95s user 1.44s system 3% cpu 1:31.76 total

[2018-05-27 00:29:47] fiedl@kepler18 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FiringRange
▶ grep PROFILING tmp/clsim.log |grep -v INFO > tmp/profiling.txt
▶ grep "kernel statistics" tmp/clsim.log
INFO (I3CLSimStepToPhotonConverterOpenCL): kernel statistics: 1.96359e+06 nanoseconds/photon, 1005 photons (util: 10%) (NVIDIA CUDA Tesla K80)  (I3CLSimStepToPhotonConverterOpenCL.cxx:1212 in boost::posix_time::ptime I3CLSimStepToPhotonConverterOpenCL::DumpStatistics(const cl::Event &, const boost::posix_time::ptime &, uint64_t, bool, const std::string &, const std::string &, uint64_t))

[2018-05-27 00:30:17] fiedl@kepler18 /afs/ifh.de/group/amanda/scratch/fiedl/hole-ice-study/scripts/FiringRange
▶ mv tmp $SCRATCH/hole-ice-study/results/profiling/hole-ice-2018_detail_gpu

[2018-05-27 00:38:12] fiedl@fiedl-mbp ~/hole-ice-study/scripts/lib master ⚡
▶ python plot_profiling_results.py ~/hole-ice-study/results/profiling/hole-ice-2018_detail_gpu/profiling.txt

This is pretty unexpected. The start code that has nothing to do with the hole ice is most expensive. The relevant parts cost 2740 ticks. (Standard clsim medium propagation costs 1773 ticks.)

fiedl · 2018-05-26T22:45:10Z

Why is the start code so expensive? Profile it in more detail:

  clock_t t0 = clock();
  int number_of_medium_changes = 0;
  clock_t t01 = clock();
  floating_t distances_to_medium_changes[MEDIUM_LAYERS] = {0.0};
  clock_t t02 = clock();
  int currentPhotonLayer = min(max(findLayerForGivenZPos(photonPosAndTime.z), 0), MEDIUM_LAYERS-1);
  clock_t t03 = clock();
  floating_t local_scattering_lengths[MEDIUM_LAYERS] = {getScatteringLength(currentPhotonLayer, photonDirAndWlen.w)};
  clock_t t04 = clock();
  floating_t local_absorption_lengths[MEDIUM_LAYERS] = {getAbsorptionLength(currentPhotonLayer, photonDirAndWlen.w)};
  clock_t t05 = clock();
  const floating_t photonRange = *sca_step_left * local_scattering_lengths[0];
  clock_t t1 = clock();

  // ...

  printf("PROFILING start %lu\n", t1 - t0);
  printf("PROFILING start_01 %lu\n", t01 - t0);
  printf("PROFILING start_02 %lu\n", t02 - t01);
  printf("PROFILING start_03 %lu\n", t03 - t02);
  printf("PROFILING start_04 %lu\n", t04 - t03);
  printf("PROFILING start_05 %lu\n", t05 - t04);
  printf("PROFILING start_1 %lu\n", t1 - t05);
  printf("PROFILING add_ice_layers %lu\n", t2 - t1);
  printf("PROFILING add_hole_ice %lu\n", t3 - t2);
  printf("PROFILING sort %lu\n", t4 - t3);
  printf("PROFILING loop_over_media %lu\n", t5 - t4);
  printf("PROFILING total %lu\n", t5 - t0);

[2018-05-27 00:52:53] fiedl@fiedl-mbp ~/hole-ice-study/scripts/lib master ⚡
▶ python plot_profiling_results.py ~/hole-ice-study/results/profiling/hole-ice-2018_start_gpu/profiling.txt

fiedl/hole-ice-study#69 (comment)

#69

fiedl · 2018-05-26T22:59:14Z

From the detailed profiling of the start code, one can see that these three commands are the most expensive ones in the hole-ice-2018 code:

  floating_t distances_to_medium_changes[MEDIUM_LAYERS] = {0.0};
  floating_t local_scattering_lengths[MEDIUM_LAYERS] = {getScatteringLength(currentPhotonLayer, photonDirAndWlen.w)};
  floating_t local_absorption_lengths[MEDIUM_LAYERS] = {getAbsorptionLength(currentPhotonLayer, photonDirAndWlen.w)};

As the first and the second have almost the same performance cost, the costly part is not getScatteringLength but the allocation of the MEDIUM_LAYERS-sized array.

fiedl · 2018-05-26T23:18:50Z

fiedl added the performance label May 26, 2018

fiedl added a commit to fiedl/clsim that referenced this issue May 26, 2018

adding idea on how to profile kernel code

546b81e

from https://stackoverflow.com/a/34252109/2066546. fiedl/hole-ice-study#69

This was referenced May 26, 2018

Is flasher study performance bad? #68

Closed

Benchmark performance: how much does the hole ice simulation cost? #18

Open

fiedl added a commit to fiedl/clsim that referenced this issue May 26, 2018

preparing clsim for kernel profiling on the cpu

7d7414d

fiedl/hole-ice-study#69

fiedl added a commit that referenced this issue May 26, 2018

plot script: profiling results

03c4e72

#69 (comment)

fiedl added a commit that referenced this issue May 26, 2018

adding cpu profiling results

ee76140

#69 (comment)

fiedl added a commit to fiedl/clsim that referenced this issue May 26, 2018

bringing back the standard clsim code for profiling

9968ac7

Source: https://github.com/fiedl/clsim/blob/icesim-v05-00-07/resources/kernels/propagation_kernel.c.cl fiedl/hole-ice-study#69 (comment)

fiedl added a commit to fiedl/clsim that referenced this issue May 26, 2018

profiling standard clsim vs. hole-ice-2018 code

4785ac1

fiedl/hole-ice-study#69 (comment)

fiedl added a commit to fiedl/clsim that referenced this issue May 26, 2018

clsim: preparing gpu profiling of the hole-ice-2018 code

097b906

fiedl/hole-ice-study#69 (comment)

fiedl added a commit that referenced this issue May 26, 2018

removing profiling results

355db43

these results are dominated by the cost of the `printf` rather than the actual work. See: #69 (comment)

fiedl added a commit to fiedl/clsim that referenced this issue May 26, 2018

switching to standard clsim

4a89e23

fiedl/hole-ice-study#69 (comment)

fiedl added a commit that referenced this issue May 26, 2018

adding gpu profiling results

a9d1817

#69 (comment)

fiedl added a commit to fiedl/clsim that referenced this issue May 26, 2018

clsim: preparing detailed gpu profiling of the hole-ice-2018 code

c128959

fiedl/hole-ice-study#69

fiedl added a commit that referenced this issue May 26, 2018

adding gpu profiling result for #69 (comment)

45fb71b

fiedl added a commit to fiedl/clsim that referenced this issue May 26, 2018

prepare clsim for detailed profiling of the hole-ice-2018 start code

bc7af6b

fiedl/hole-ice-study#69 (comment)

fiedl added a commit that referenced this issue May 26, 2018

plot script for cpu profiling

3bacdf2

#69

fiedl added a commit that referenced this issue May 26, 2018

adding gpu profiling results for start code #69 (comment)

5d95e82

fiedl mentioned this issue May 27, 2018

Performance optimization: Declare media arrays in local kernel memory rather than for each scattering step separately #70

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profiling #69

Profiling #69

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018

fiedl commented May 26, 2018

Profiling #69

Profiling #69

Comments

fiedl commented May 26, 2018 • edited Loading

Some considerations

What needs to be done

Results

fiedl commented May 26, 2018 • edited Loading

Profiling the propagation kernel on the cpu

fiedl commented May 26, 2018 • edited Loading

fiedl commented May 26, 2018

How does this compare to standard clsim?

fiedl commented May 26, 2018 • edited Loading

fiedl commented May 26, 2018 • edited Loading

fiedl commented May 26, 2018 • edited Loading

fiedl commented May 26, 2018 • edited Loading

fiedl commented May 26, 2018 • edited Loading

fiedl commented May 26, 2018 • edited Loading

fiedl commented May 26, 2018 • edited Loading

fiedl commented May 26, 2018 • edited Loading

fiedl commented May 26, 2018 • edited Loading

fiedl commented May 26, 2018 • edited Loading

GPU profiling

hole-ice-2018 code

fiedl commented May 26, 2018 • edited Loading

GPU profiling for standard clsim

fiedl commented May 26, 2018 • edited Loading

fiedl commented May 26, 2018 • edited Loading

fiedl commented May 26, 2018

fiedl commented May 26, 2018

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading

fiedl commented May 26, 2018 •

edited

Loading