You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
/etc/anaconda3/envs/ml/lib/python3.7/site-packages/snorkel/labeling/model/label_model.py in fit(self, L_train, Y_dev, class_balance, **kwargs)
781
782 # Forward pass to calculate the average loss per example
--> 783 loss = self._loss_mu(l2=self.train_config.l2)
784 if torch.isnan(loss):
785 msg = "Loss is NaN. Consider reducing learning rate."
/etc/anaconda3/envs/ml/lib/python3.7/site-packages/snorkel/labeling/model/label_model.py in _loss_mu(self, l2)
522 Overall mu loss between learned mu and initial mu
523 """
--> 524 loss_1 = torch.norm((self.O - self.mu @ self.P @ self.mu.t())[self.mask]) ** 2
525 loss_2 = torch.norm(torch.sum(self.mu @ self.P, 1) - torch.diag(self.O)) ** 2
526 return loss_1 + loss_2 + self._loss_l2(l2=l2)
RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'mat2'
Code example/repro steps
I've added device=torch.device('cuda') to line that defines LabelModel in recsys_tutorial from this example
Hi @lambdaofgod, thanks for reporting this! I'll repro later today, but looking at the code, it looks like O and P are not correctly registered as nn.Parameters and so won't be moved to GPU when the parent module is. We'll submit a fix for v0.9.1 v0.9.2. The label model (even with lots of data points and dozens of LFs) often trains in a matter of seconds on CPU, so we don't really use GPU support for it internally. Are you able to train it quickly enough on CPU?
Issue description
I tried running LabelModel with
device=torch.device('cuda')
but I getDetailed error message:
Code example/repro steps
I've added
device=torch.device('cuda')
to line that defines LabelModel in recsys_tutorial from this exampleSystem info
Snorkel installed from requirements.txt
The text was updated successfully, but these errors were encountered: