Error when using LabelModel with GPU #1430

lambdaofgod · 2019-08-27T11:22:13Z

Issue description

I tried running LabelModel with device=torch.device('cuda') but I get

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'mat2'

Detailed error message:

I0827 13:07:07.767502 140001064118080 label_model.py:749] Computing O...
I0827 13:07:07.818106 140001064118080 label_model.py:755] Estimating \mu...
I0827 13:07:07.818606 140001064118080 label_model.py:762] Using GPU...

RuntimeError Traceback (most recent call last)
in
----> 1 label_model.fit(L_train, n_epochs=5000, seed=123, log_freq=20, lr=0.01)

/etc/anaconda3/envs/ml/lib/python3.7/site-packages/snorkel/labeling/model/label_model.py in fit(self, L_train, Y_dev, class_balance, **kwargs)
781
782 # Forward pass to calculate the average loss per example
--> 783 loss = self._loss_mu(l2=self.train_config.l2)
784 if torch.isnan(loss):
785 msg = "Loss is NaN. Consider reducing learning rate."

/etc/anaconda3/envs/ml/lib/python3.7/site-packages/snorkel/labeling/model/label_model.py in _loss_mu(self, l2)
522 Overall mu loss between learned mu and initial mu
523 """
--> 524 loss_1 = torch.norm((self.O - self.mu @ self.P @ self.mu.t())[self.mask]) ** 2
525 loss_2 = torch.norm(torch.sum(self.mu @ self.P, 1) - torch.diag(self.O)) ** 2
526 return loss_1 + loss_2 + self._loss_l2(l2=l2)

RuntimeError: Expected object of backend CUDA but got backend CPU for argument #2 'mat2'

Code example/repro steps

I've added device=torch.device('cuda') to line that defines LabelModel in recsys_tutorial from this example

System info

Snorkel installed from requirements.txt

OS: Linux Kubuntu 18.04
Python version: 3.7.3
Snorkel version: 0.9.0
Versions of any other relevant libraries: torch 1.1.0

The text was updated successfully, but these errors were encountered:

henryre · 2019-08-27T16:13:45Z

Hi @lambdaofgod, thanks for reporting this! I'll repro later today, but looking at the code, it looks like O and P are not correctly registered as nn.Parameters and so won't be moved to GPU when the parent module is. We'll submit a fix for ~~v0.9.1~~ v0.9.2. The label model (even with lots of data points and dozens of LFs) often trains in a matter of seconds on CPU, so we don't really use GPU support for it internally. Are you able to train it quickly enough on CPU?

henryre self-assigned this Aug 27, 2019

henryre added the bug label Aug 27, 2019

paroma mentioned this issue Sep 20, 2019

Option to run LabelModel on GPU #1466

Merged

5 tasks

paroma closed this as completed in #1466 Sep 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error when using LabelModel with GPU #1430

Error when using LabelModel with GPU #1430

lambdaofgod commented Aug 27, 2019

I0827 13:07:07.767502 140001064118080 label_model.py:749] Computing O...
I0827 13:07:07.818106 140001064118080 label_model.py:755] Estimating \mu...
I0827 13:07:07.818606 140001064118080 label_model.py:762] Using GPU...

henryre commented Aug 27, 2019 •

edited

Loading

Error when using LabelModel with GPU #1430

Error when using LabelModel with GPU #1430

Comments

lambdaofgod commented Aug 27, 2019

Issue description

I0827 13:07:07.767502 140001064118080 label_model.py:749] Computing O... I0827 13:07:07.818106 140001064118080 label_model.py:755] Estimating \mu... I0827 13:07:07.818606 140001064118080 label_model.py:762] Using GPU...

Code example/repro steps

System info

henryre commented Aug 27, 2019 • edited Loading

I0827 13:07:07.767502 140001064118080 label_model.py:749] Computing O...
I0827 13:07:07.818106 140001064118080 label_model.py:755] Estimating \mu...
I0827 13:07:07.818606 140001064118080 label_model.py:762] Using GPU...

henryre commented Aug 27, 2019 •

edited

Loading