Channel flushing on kernel termination #69

mahmoodn · 2021-10-03T08:52:25Z

Hi
In previous version of nvbit, there was an else block for if (!exit) which was responsible for flushing the channel when the kernel reaches its end. However, in the 1.53, there is no such block. Instead, flushing occurs in nvbit_at_ctx_term which (based on my observations) is called at the end of the execution. I tried to do a flush on kernel termination like this

void nvbit_at_cuda_event(CUcontext ctx, int is_exit, nvbit_api_cuda_t cbid,
                         const char* name, void* params, CUresult* pStatus) {
  pthread_mutex_lock(&mutex);

  /* we prevent re-entry on this callback when issuing CUDA functions inside
   * this function */
  if (skip_callback_flag) {
    pthread_mutex_unlock(&mutex);
    return;
  }
  skip_callback_flag = true;

  assert(ctx_state_map.find(ctx) != ctx_state_map.end());
  CTXstate* ctx_state = ctx_state_map[ctx];
  if (!is_exit) {
   ....
  } else {
     flush_channel<<<1, 1>>>(ctx_state->channel_dev);
     cudaDeviceSynchronize();
     assert(cudaGetLastError() == cudaSuccess);
   }
   skip_callback_flag = false; 
   pthread_mutex_unlock(&mutex);    
}

But the execution seems to get stuck and it doesn't move to the next kernel.
Any thoughts on that?

The text was updated successfully, but these errors were encountered:

ovilla · 2021-10-04T23:12:00Z

Hard to tell without debugging your tool (which we obviously can't), however as a general suggestion you need to be very careful with cudaDeviceSynchronize(); and pthread_mutex_unlock/lock(&mutex); to avoid possible deadlocks.

mahmoodn · 2021-10-05T09:11:41Z

The tool is mem_trace. I just added the else block to force flushing the channel on kernel exit (rather than force flush on context termination). I was thinking why that else block is missing in 1.5.3 while it presents in previous versions.

Based on my further debugs (with printf), in the flush function, there exists some number of bytes to flush (which is a positive sign), but it seems that the doorbell loop never exits because the printf after the while is not shown. So, that means, the channel host is not notified and hence recv_thread_fun is not called. In other words, the problem occurs before cudaDeviceSynchronize() in my opinion.

As pthread_mutex_unlock and pthread_mutex_lock are introduced in 1.5.3 (mem_trace), I wonder what is the critical section that needs lock/unlock? I haven't figured it out yet. I have stuck at that... seems that something is locked but I don't know what is locked and who had locked it...

mahmoodn · 2021-10-05T09:38:47Z

@ovilla
I see the following headers:

/* This function is called just before the program terminates, no GPU calls
 * should be made a this moment */
void nvbit_at_term();

/* This function is called as soon as the GPU context is terminated and it
 * should contain any code that we would like to execute at that moment. */
void nvbit_at_ctx_term(CUcontext ctx);

May I ask why there is no nvbit_at_kernel_term(CUcontext ctx);?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Channel flushing on kernel termination #69

Channel flushing on kernel termination #69

mahmoodn commented Oct 3, 2021

ovilla commented Oct 4, 2021

mahmoodn commented Oct 5, 2021 •

edited

Loading

mahmoodn commented Oct 5, 2021

Channel flushing on kernel termination #69

Channel flushing on kernel termination #69

Comments

mahmoodn commented Oct 3, 2021

ovilla commented Oct 4, 2021

mahmoodn commented Oct 5, 2021 • edited Loading

mahmoodn commented Oct 5, 2021

mahmoodn commented Oct 5, 2021 •

edited

Loading