Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Tensorboard logdir not right #4618

Closed
lbin opened this issue Jun 12, 2020 · 16 comments
Closed

Tensorboard logdir not right #4618

lbin opened this issue Jun 12, 2020 · 16 comments

Comments

@lbin
Copy link

lbin commented Jun 12, 2020

Data location: path:/mnt/tensorboard

/opt/conda/bin/python /opt/conda/bin/tensorboard --logdir=path:/mnt/tensorboard --port=12148 --bind_all

path:

mightbe not required

and I killed
/opt/conda/bin/python /opt/conda/bin/tensorboard --logdir=path:/mnt/tensorboard --port=12148 --bind_all

and rerun
/opt/conda/bin/python /opt/conda/bin/tensorboard --logdir=/mnt/tensorboard --port=12148 --bind_all

Tensorboard worked

@scarlett2018
Copy link
Member

@lbin are you suggesting a bug fix? would you like to submit a PR for this fix?

@hzy46
Copy link
Contributor

hzy46 commented Jun 15, 2020

Hi lbin, where did you see the command /opt/conda/bin/python /opt/conda/bin/tensorboard --logdir=path:/mnt/tensorboard --port=12148 --bind_all? Is it a part of OpenPAI's document? I think the usage of --logdir=path:/mnt/tensorboard is wrong.

@lbin
Copy link
Author

lbin commented Jun 15, 2020

Hi lbin, where did you see the command /opt/conda/bin/python /opt/conda/bin/tensorboard --logdir=path:/mnt/tensorboard --port=12148 --bind_all? Is it a part of OpenPAI's document? I think the usage of --logdir=path:/mnt/tensorboard is wrong.

I used ssh to login into the container, and 'htop' to get this command @hzy46

@hzy46
Copy link
Contributor

hzy46 commented Jun 15, 2020

What is the version of your OpenPAI?

@lbin
Copy link
Author

lbin commented Jun 15, 2020

What is the version of your OpenPAI?

v1.0.1

@hzy46
Copy link
Contributor

hzy46 commented Jun 15, 2020

Can you share your job configuration? I'm going to reproduce this bug.

@lbin
Copy link
Author

lbin commented Jun 15, 2020

@lbin are you suggesting a bug fix? would you like to submit a PR for this fix?

https://github.com/microsoft/pai/blob/v1.0.1/contrib/submit-job-v2/src/App/TensorBoard.tsx#L408 maybe here give the wrong path, but i am not familiar with tsx or pai framwork @scarlett2018 @hzy46

@lbin
Copy link
Author

lbin commented Jun 15, 2020

https://github.com/microsoft/pai/blob/v1.0.1/contrib/submit-job-v2/src/App/TensorBoard.tsx#L385

Object.keys(logDirectories).forEach((key) => { logPathList.push(${logDirectories[key]}); });
@hzy46 is this right?

@hzy46
Copy link
Contributor

hzy46 commented Jun 15, 2020

@lbin, after a careful check of tensorboard command, I find --logdir=path:/mnt/tensorboard is not wrong. It is a proper setting.

Here is the help for tensorboard:

image

You can see tensorboard allows formats like --logdir=name1:/path/to/logs/1,name2:/path/to/logs/2. I tried --logdir=path:/mnt/tensorboard and it worked. I'm not sure whether it is a problem of tensorboard version. My tensorboard version is 1.15.0.

@lbin
Copy link
Author

lbin commented Jun 15, 2020

my tensorboard version is 2.2.2

@lbin
Copy link
Author

lbin commented Jun 15, 2020

Screen Shot 2020-06-15 at 16 20 39

@hzy46
Copy link
Contributor

hzy46 commented Jun 15, 2020

Thanks, I found it is a breaking change of tensorboard v2.0.0. Refer to https://github.com/tensorflow/tensorboard/releases/tag/2.0.0:

The --logdir flag no longer supports passing multiple comma-delimited paths,
which means that it now supports paths containing literal comma and colon
characters, like ./logs/m=10,n=20,lr=0.001 or ./logs/run_12:30:15. To
mimic the old behavior, prefer using a tree of symlinks as it works with more
plugins, but as a fallback the flag --logdir_spec exposes the old behavior.
See PR 2664.

I think our tensorboard plugin should use --logdir_spec when tensorboard >= v2.0.0.

@hzy46
Copy link
Contributor

hzy46 commented Jun 15, 2020

@Binyang2014
Copy link
Contributor

As the describe above for --logdir_spec. This flag is discouraged and can usually be avoided. @lbin can you use symlinks as the command suggested to avoid using --logdir_spec?

@lbin
Copy link
Author

lbin commented Jun 15, 2020

As the describe above for --logdir_spec. This flag is discouraged and can usually be avoided. @lbin can you use symlinks as the command suggested to avoid using --logdir_spec?

I downgraded my tensorboard~

Binyang2014 added a commit to microsoft/openpai-runtime that referenced this issue Jun 16, 2020
Original issue: microsoft/pai#4618.

When using tensorboard v2.
If logdir only has one entry, using `--logdir` option
If logdir contains many entries, using `--logdir_spec` option
@Binyang2014
Copy link
Contributor

Fixed by: #4658

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants