You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I encountered a discrepancy when training a model using PyTorch XLA on a TPU, where the results differed significantly from those obtained using CPU or GPU. After further investigation using a toy example, I noticed that the predictions made with PyTorch XLA were not consistent with those made on the CPU. Interestingly, when using PyTorch Lightning for training on TPU, the results were identical to the CPU output PyTorch Lightning depends on pytoch XLA when training with tpu. This led me to suspect that there may be some device-specific differences or initialization issues when using PyTorch XLA directly.
Code
def generate_random_data(batch_size=1, num_channels=3, height=224, width=224):
return torch.randn(batch_size, num_channels, height, width, dtype=torch.float32)
def load_model():
return models.efficientnet_b0(weights='DEFAULT').eval()
random_data = generate_random_data()
# CPU inference
model = load_model()
cpu_result = inference_on_device(model, torch.device('cpu'), random_data)
# TPU inference
tpu_device = xm.xla_device() #single core of tpu
tpu_result = inference_on_device(model, tpu_device, random_data)
# Compare CPU and TPU results
print("Difference between CPU and XLA TPU results:", np.abs(cpu_result - tpu_result).max())
Output
Difference between CPU and XLA TPU results: 0.025713682
Best Regard.
The text was updated successfully, but these errors were encountered:
❓ Questions and Help
I encountered a discrepancy when training a model using PyTorch XLA on a TPU, where the results differed significantly from those obtained using CPU or GPU. After further investigation using a toy example, I noticed that the predictions made with PyTorch XLA were not consistent with those made on the CPU. Interestingly, when using PyTorch Lightning for training on TPU, the results were identical to the CPU output PyTorch Lightning depends on pytoch XLA when training with tpu. This led me to suspect that there may be some device-specific differences or initialization issues when using PyTorch XLA directly.
Code
Output
Difference between CPU and XLA TPU results: 0.025713682
Best Regard.
The text was updated successfully, but these errors were encountered: