Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when using LabelModel with GPU #1430

Closed
lambdaofgod opened this issue Aug 27, 2019 · 1 comment · Fixed by #1466
Closed

Error when using LabelModel with GPU #1430

lambdaofgod opened this issue Aug 27, 2019 · 1 comment · Fixed by #1466
Assignees
Labels

Comments

@lambdaofgod
Copy link

Issue description

I tried running LabelModel with device=torch.device('cuda') but I get

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'mat2'

Detailed error message:

I0827 13:07:07.767502 140001064118080 label_model.py:749] Computing O...
I0827 13:07:07.818106 140001064118080 label_model.py:755] Estimating \mu...
I0827 13:07:07.818606 140001064118080 label_model.py:762] Using GPU...

RuntimeError Traceback (most recent call last)
in
----> 1 label_model.fit(L_train, n_epochs=5000, seed=123, log_freq=20, lr=0.01)

/etc/anaconda3/envs/ml/lib/python3.7/site-packages/snorkel/labeling/model/label_model.py in fit(self, L_train, Y_dev, class_balance, **kwargs)
781
782 # Forward pass to calculate the average loss per example
--> 783 loss = self._loss_mu(l2=self.train_config.l2)
784 if torch.isnan(loss):
785 msg = "Loss is NaN. Consider reducing learning rate."

/etc/anaconda3/envs/ml/lib/python3.7/site-packages/snorkel/labeling/model/label_model.py in _loss_mu(self, l2)
522 Overall mu loss between learned mu and initial mu
523 """
--> 524 loss_1 = torch.norm((self.O - self.mu @ self.P @ self.mu.t())[self.mask]) ** 2
525 loss_2 = torch.norm(torch.sum(self.mu @ self.P, 1) - torch.diag(self.O)) ** 2
526 return loss_1 + loss_2 + self._loss_l2(l2=l2)

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'mat2'

Code example/repro steps

I've added device=torch.device('cuda') to line that defines LabelModel in recsys_tutorial from this example

System info

Snorkel installed from requirements.txt

  • OS: Linux Kubuntu 18.04
  • Python version: 3.7.3
  • Snorkel version: 0.9.0
  • Versions of any other relevant libraries: torch 1.1.0
@henryre henryre self-assigned this Aug 27, 2019
@henryre henryre added the bug label Aug 27, 2019
@henryre
Copy link
Member

henryre commented Aug 27, 2019

Hi @lambdaofgod, thanks for reporting this! I'll repro later today, but looking at the code, it looks like O and P are not correctly registered as nn.Parameters and so won't be moved to GPU when the parent module is. We'll submit a fix for v0.9.1 v0.9.2. The label model (even with lots of data points and dozens of LFs) often trains in a matter of seconds on CPU, so we don't really use GPU support for it internally. Are you able to train it quickly enough on CPU?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants