You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I subclassed a T5 model and defined a new variable self.global_variable=torch.Tensor(...).cuda() (actually a constant tensor) inside the __init__ method of the subclass.
During forward, I want to use this variable to multiply to the hidden_states input before proceeding with the original forward method of T5. I wrote the code and when I run it, I get an error saying self.global_variable and hidden_states are not on the same device.
I am using distributed data-parallel training and I assumed that self.global_variable.cuda() will initialize the variable according to the GPU the model replica is on. But looks like cuda() method always assigns the variable to gpu_index:0 but my hidden_states variable is in gpu_index:1 (I guess this is running on model replica on GPU:1). Can I know how to define the global variable with this scenario where I want to use the self.global_variable across multiple gpus. @tjruwase would be glad to know this.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
I subclassed a T5 model and defined a new variable
self.global_variable=torch.Tensor(...).cuda()
(actually a constant tensor) inside the__init__
method of the subclass.During
forward
, I want to use this variable to multiply to thehidden_states
input before proceeding with the originalforward
method of T5. I wrote the code and when I run it, I get an error sayingself.global_variable
andhidden_states
are not on the same device.I am using distributed data-parallel training and I assumed that
self.global_variable.cuda()
will initialize the variable according to the GPU the model replica is on. But looks likecuda()
method always assigns the variable togpu_index:0
but myhidden_states
variable is ingpu_index:1
(I guess this is running on model replica on GPU:1). Can I know how to define the global variable with this scenario where I want to use theself.global_variable
across multiple gpus. @tjruwase would be glad to know this.Beta Was this translation helpful? Give feedback.
All reactions