[question] libtpu.so already in used by another process. #3214

MukundVarmaT · 2021-11-15T07:51:23Z

Hi,
I was recently trying to set up a training experiment with torch and tpus (V3-8). While the code works as expected, there were repeated warning messages libtpu.so already in used by another process. Not attempting to load libtpu.so in this process. which is really annoying as it appears multiple times after each epoch. I only noticed this when training on multiple cores and not on a single core.
It would be really helpful if you could suggest any method to suppress/solve these warning messages. Thanks

The text was updated successfully, but these errors were encountered:

JackCaoG · 2021-11-15T18:07:08Z

Hi @MukundVarmaT . This problem should be solved with the latest tpu-vm-pt-1.10 image(with pt/xla 1.10 preinstalled)

MukundVarmaT · 2021-11-15T18:29:25Z

Hi @JackCaoG, I am using the latest tpu-vm-pt-1.10 image but still get these warnings.

JackCaoG · 2021-11-15T18:58:40Z

Soory @MukundVarmaT I thought we fixed the issue with the new image, but seems like we didn't. We will work on the fix on the default image.

In the mean time, can you run

sudo pip3 install https://storage.googleapis.com/cloud-tpu-tpuvm-artifacts/wheels/libtpu-nightly/libtpu_nightly-0.1.dev20211015-py3-none-any.whl

to bypass this error?

MukundVarmaT · 2021-11-16T08:09:42Z

@JackCaoG Yep that solves the problem.
Hi just a follow-up (possibly unrelated question), after libtpu installation and when running on multiple cores, i get all xm.xla_device() to be "xla:0" except for one which is "xla:1". Is this as expected? Shouldn't it be from "xla:0" to "xla:7". PS: Before the libtpu installation, it used to print different device ids.

JackCaoG · 2021-11-16T17:57:57Z

It is expected. For one of the process xla:0 is a cpu device, so it uses xla:1.

xla:0(CPU), xla:1(TPU) process 1
xla:0, process 2
xla:0,  process 3
xla:0, process 4
xla:0, process 5
xla:0, process 6
xla:0, process 7
xla:0, process 8

ronghanghu · 2022-01-07T02:28:37Z

I once encountered a frequent GRPC error tpu-vm-pt-1.10. After upgrading to libtpu_nightly-0.1.dev20211015-py3-none-any.whl, the error seems gone for me.

MukundVarmaT closed this as completed Nov 16, 2021

bram-w mentioned this issue Jul 13, 2022

Errors when too many submodules are used #3708

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[question] libtpu.so already in used by another process. #3214

[question] libtpu.so already in used by another process. #3214

MukundVarmaT commented Nov 15, 2021

JackCaoG commented Nov 15, 2021

MukundVarmaT commented Nov 15, 2021 •

edited

Loading

JackCaoG commented Nov 15, 2021

MukundVarmaT commented Nov 16, 2021 •

edited

Loading

JackCaoG commented Nov 16, 2021

ronghanghu commented Jan 7, 2022

[question] libtpu.so already in used by another process. #3214

[question] libtpu.so already in used by another process. #3214

Comments

MukundVarmaT commented Nov 15, 2021

JackCaoG commented Nov 15, 2021

MukundVarmaT commented Nov 15, 2021 • edited Loading

JackCaoG commented Nov 15, 2021

MukundVarmaT commented Nov 16, 2021 • edited Loading

JackCaoG commented Nov 16, 2021

ronghanghu commented Jan 7, 2022

MukundVarmaT commented Nov 15, 2021 •

edited

Loading

MukundVarmaT commented Nov 16, 2021 •

edited

Loading