#25871 slows down non-CPU runs #25944

ydshieh · 2023-09-04T09:33:59Z

(copy paste)

Using repr here significantly slows down non-CPU runs (-33% on HPU, probably similar numbers on GPU). Which makes sense as repr copies data from the device to the host.
Could we rely on type(x) instead?

Here is a code snippet to measure it:

import time
import torch

cpu_tensor = torch.ones(512, 512, device="cpu")
gpu_tensor = torch.ones(512, 512, device="cuda")

n = 100

t0 = time.perf_counter()
for i in range(n):
    _ = repr(cpu_tensor)
t1 = time.perf_counter()
for i in range(n):
    _ = repr(gpu_tensor)
t2 = time.perf_counter()

print("CPU time:", t1-t0)
print("GPU time:", t2-t1)

The text was updated successfully, but these errors were encountered:

ydshieh self-assigned this Sep 4, 2023

ydshieh mentioned this issue Sep 4, 2023

Fix smart check #25955

Merged

ydshieh closed this as completed in #25955 Sep 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#25871 slows down non-CPU runs #25944

#25871 slows down non-CPU runs #25944

ydshieh commented Sep 4, 2023 •

edited

Loading

#25871 slows down non-CPU runs #25944

#25871 slows down non-CPU runs #25944

Comments

ydshieh commented Sep 4, 2023 • edited Loading

ydshieh commented Sep 4, 2023 •

edited

Loading