Regarding the device for the training variable in transformer model #1955

base-y · 2022-05-15T18:00:10Z

base-y
May 15, 2022

I subclassed a T5 model and defined a new variable self.global_variable=torch.Tensor(...).cuda() (actually a constant tensor) inside the __init__ method of the subclass.

During forward, I want to use this variable to multiply to the hidden_states input before proceeding with the original forward method of T5. I wrote the code and when I run it, I get an error saying self.global_variable and hidden_states are not on the same device.

I am using distributed data-parallel training and I assumed that self.global_variable.cuda() will initialize the variable according to the GPU the model replica is on. But looks like cuda() method always assigns the variable to gpu_index:0 but my hidden_states variable is in gpu_index:1 (I guess this is running on model replica on GPU:1). Can I know how to define the global variable with this scenario where I want to use the self.global_variable across multiple gpus. @tjruwase would be glad to know this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the device for the training variable in transformer model #1955

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Regarding the device for the training variable in transformer model #1955

base-y May 15, 2022

Replies: 0 comments

base-y
May 15, 2022