Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel flushing on kernel termination #69

Open
mahmoodn opened this issue Oct 3, 2021 · 3 comments
Open

Channel flushing on kernel termination #69

mahmoodn opened this issue Oct 3, 2021 · 3 comments

Comments

@mahmoodn
Copy link

mahmoodn commented Oct 3, 2021

Hi
In previous version of nvbit, there was an else block for if (!exit) which was responsible for flushing the channel when the kernel reaches its end. However, in the 1.53, there is no such block. Instead, flushing occurs in nvbit_at_ctx_term which (based on my observations) is called at the end of the execution. I tried to do a flush on kernel termination like this

void nvbit_at_cuda_event(CUcontext ctx, int is_exit, nvbit_api_cuda_t cbid,
                         const char* name, void* params, CUresult* pStatus) {
  pthread_mutex_lock(&mutex);

  /* we prevent re-entry on this callback when issuing CUDA functions inside
   * this function */
  if (skip_callback_flag) {
    pthread_mutex_unlock(&mutex);
    return;
  }
  skip_callback_flag = true;

  assert(ctx_state_map.find(ctx) != ctx_state_map.end());
  CTXstate* ctx_state = ctx_state_map[ctx];
  if (!is_exit) {
   ....
  } else {
     flush_channel<<<1, 1>>>(ctx_state->channel_dev);
     cudaDeviceSynchronize();
     assert(cudaGetLastError() == cudaSuccess);
   }
   skip_callback_flag = false; 
   pthread_mutex_unlock(&mutex);    
}

But the execution seems to get stuck and it doesn't move to the next kernel.
Any thoughts on that?

@ovilla
Copy link
Collaborator

ovilla commented Oct 4, 2021

Hard to tell without debugging your tool (which we obviously can't), however as a general suggestion you need to be very careful with cudaDeviceSynchronize(); and pthread_mutex_unlock/lock(&mutex); to avoid possible deadlocks.

@mahmoodn
Copy link
Author

mahmoodn commented Oct 5, 2021

The tool is mem_trace. I just added the else block to force flushing the channel on kernel exit (rather than force flush on context termination). I was thinking why that else block is missing in 1.5.3 while it presents in previous versions.

Based on my further debugs (with printf), in the flush function, there exists some number of bytes to flush (which is a positive sign), but it seems that the doorbell loop never exits because the printf after the while is not shown. So, that means, the channel host is not notified and hence recv_thread_fun is not called. In other words, the problem occurs before cudaDeviceSynchronize() in my opinion.

As pthread_mutex_unlock and pthread_mutex_lock are introduced in 1.5.3 (mem_trace), I wonder what is the critical section that needs lock/unlock? I haven't figured it out yet. I have stuck at that... seems that something is locked but I don't know what is locked and who had locked it...

@mahmoodn
Copy link
Author

mahmoodn commented Oct 5, 2021

@ovilla
I see the following headers:

/* This function is called just before the program terminates, no GPU calls
 * should be made a this moment */
void nvbit_at_term();

/* This function is called as soon as the GPU context is terminated and it
 * should contain any code that we would like to execute at that moment. */
void nvbit_at_ctx_term(CUcontext ctx);

May I ask why there is no nvbit_at_kernel_term(CUcontext ctx);?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants