Single core of TPU gives prediction results different than the CPU results #8625

mohamedamara7 · 2025-01-25T08:17:31Z

❓ Questions and Help

I encountered a discrepancy when training a model using PyTorch XLA on a TPU, where the results differed significantly from those obtained using CPU or GPU. After further investigation using a toy example, I noticed that the predictions made with PyTorch XLA were not consistent with those made on the CPU. Interestingly, when using PyTorch Lightning for training on TPU, the results were identical to the CPU output PyTorch Lightning depends on pytoch XLA when training with tpu. This led me to suspect that there may be some device-specific differences or initialization issues when using PyTorch XLA directly.

Code

def generate_random_data(batch_size=1, num_channels=3, height=224, width=224):
    return torch.randn(batch_size, num_channels, height, width, dtype=torch.float32)

def load_model():
    return models.efficientnet_b0(weights='DEFAULT').eval()

random_data = generate_random_data()
# CPU inference
model = load_model()
cpu_result = inference_on_device(model, torch.device('cpu'), random_data)

# TPU inference
tpu_device = xm.xla_device() #single core of tpu
tpu_result = inference_on_device(model, tpu_device, random_data)

# Compare CPU and TPU results
print("Difference between CPU and XLA TPU results:", np.abs(cpu_result - tpu_result).max())

Output

Difference between CPU and XLA TPU results: 0.025713682

Best Regard.

The text was updated successfully, but these errors were encountered:

miladm · 2025-01-27T15:45:28Z

thank you for sharing this bug - cc @ysiraichi to assist

miladm assigned ysiraichi Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single core of TPU gives prediction results different than the CPU results #8625

Single core of TPU gives prediction results different than the CPU results #8625

mohamedamara7 commented Jan 25, 2025

miladm commented Jan 27, 2025

Single core of TPU gives prediction results different than the CPU results #8625

Single core of TPU gives prediction results different than the CPU results #8625

Comments

mohamedamara7 commented Jan 25, 2025

❓ Questions and Help

Code

Output

miladm commented Jan 27, 2025