Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

Cannot find the stdout/stderr log of a long run job #4777

Closed
zheng-ningxin opened this issue Aug 3, 2020 · 7 comments · Fixed by #4792
Closed

Cannot find the stdout/stderr log of a long run job #4777

zheng-ningxin opened this issue Aug 3, 2020 · 7 comments · Fixed by #4792

Comments

@zheng-ningxin
Copy link

zheng-ningxin commented Aug 3, 2020

Organization Name: MS internal usage

Short summary about the issue/question:
Hi, I met a problem when I used the OpenPAI to run the training jobs, and I think it may confuse other users, so I raise this issue.
I submitted a task, after the task is completed, I want to check the stdout of this task, unexpectedly found that the output of the task was empty.

image

Later, it was found that the output of the task was compressed into the log folder, and the website page no longer displayed the output of the task.
image

I think this is confusing for users, so I raise this issue.
Brief what process you are following:

How to reproduce it:

OpenPAI Environment:

  • OpenPAI version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Hardware (e.g. core number, memory size, storage size, GPU type etc.):
  • Others:

Anything else we need to know:

@fanyangCS
Copy link
Contributor

fanyangCS commented Aug 3, 2020

@zheng-ningxin , please share us the job url.

@zheng-ningxin
Copy link
Author

@fanyangCS
Copy link
Contributor

@Binyang2014 , pls take a look.

@Binyang2014
Copy link
Contributor

This is caused by log-rotate . Current log-rotate config is:

{
    nomail
    weekly
    rotate 4
    compress
    missingok
    notifempty
    copytruncate
    maxsize 300M
    maxage 30
}

It will auto compress the log if size is more than 300M or exist more than one week. This work start at 7.28, which is one-week ago.
So log is rotated.

@fanyangCS
Copy link
Contributor

This is caused by log-rotate . Current log-rotate config is:

{
    nomail
    weekly
    rotate 4
    compress
    missingok
    notifempty
    copytruncate
    maxsize 300M
    maxage 30
}

It will auto compress the log if size is more than 300M or exist more than one week. This work start at 7.28, which is one-week ago.
So log is rotated.

We need a complete design for log management. including view, archive, etc.

@zheng-ningxin
Copy link
Author

Yes~ I understand that log-rotate is necessary. However, I think, after the log-rotation, the website should still show part of the output log, or inform the users that their output has been compressed.

@Binyang2014
Copy link
Contributor

According to https://manpages.debian.org/jessie/logrotate/logrotate.8.en.html
log will be rotated at first day of week. So every Sunday, the old log will be rotated.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants