Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Logging configuration and message formats #30005

Closed
peytondmurray opened this issue Nov 3, 2022 · 12 comments
Closed

[RFC] Logging configuration and message formats #30005

peytondmurray opened this issue Nov 3, 2022 · 12 comments
Assignees
Labels
enhancement Request for new feature and/or capability observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling stale The issue is stale. It will be closed within 7 days unless there are further conversation

Comments

@peytondmurray
Copy link
Contributor

peytondmurray commented Nov 3, 2022

Description

In the interest of improving both the user and development experiences with Ray, and after having a discussion with @xwjiang2010, I'm opening this issue to solicit community input about Ray's logging capabilities.

Background

Logging configuration

Logging in Ray is currently set up in multiple places. Some modules set custom logging configurations with calls to basicConfig:

basicConfig

This makes sense to do inside separate processes because each process has its own root logger, although there are instances where basicConfig is called inside the main process.

From the python docs, basicConfig

does nothing if the root logger already has handlers configured, unless the keyword argument force is set to True.

In principle I think this means that only the first call to basicConfig modifies the root logger, and all other calls have no effect. Additionally, modifying the root logger may also not be intended because Ray's main logger is set up with the name ray here.

Currently the code to set up the main logger, ray, also appears in a number of different places in order to capture all the different entry points for ray execution, e.g. calling ray.init(), using the CLI to run ray start, etc:

setup_logger

Log output

Currently for a number of different Ray use cases, a lot of output can be generated, which sometimes makes it hard to identify important information.

current_logging

In this example, there are logging statements from multiple Ray libraries but it's hard to tell at a glance which ones they are. In part this is due to the training results being interspersed throughout, but also because the logging formatter is different for the different Ray libraries, meaning that logging messages from different parts of the code may have different message formats. As part of this effort, I'm hoping we can improve the clarity of these logging statements so that they are more visible when they are shown to the user.

Log Rotation

Another common requested feature is to allow for log rotation, particularly with respect to progress statements; see #28268.

Proposal

One option is to create a logger for each Ray library (tune, train, air, etc), and convert existing logging calls to use the logger from the appropriate library. Logging configuration would be done in the __init__.py for each library, helping to make configuration more uniform and transparent across different modules, and making it easier to address the issue of log rotation.

To improve log message clarity, we can append the name of the Ray library to the logging statement to make it clear what part of the code the log message is being emitted from, and use similar message formats across multiple modules. For example:

[2022-11-02 13:19:32][Ray Core] INFO     Started a local Ray instance. View the dashboard at 127.0.0.1:8265        worker.py:1527
[2022-11-03 10:24:25][Ray Tune]	INFO     Total run time: 5.89 seconds (5.79 seconds for the tuning loop).          tune.py:787

The time, Ray library issuing the log statement, logging level, and location in the source where the message is emitted are clearly stated, and different Ray libraries have similar message formats.

Note: This RFC does not include any changes to remote actor logs, only driver side logging structure. Remote actor logs are for the most part handled on the user side, and so are outside the scope of this proposal.

I'd also be interested in input from the Ray community, so leave a comment if you have any feedback about anything discussed here. @xwjiang2010 Let me know if I missed anything here!

@peytondmurray peytondmurray added the enhancement Request for new feature and/or capability label Nov 3, 2022
@xwjiang2010
Copy link
Contributor

tagging a few people for feedbacks
@rkooo567 @ericl

@xwjiang2010 xwjiang2010 added the observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling label Nov 3, 2022
@rkooo567
Copy link
Contributor

rkooo567 commented Nov 4, 2022

One option is to create a logger for each Ray library (tune, train, air, etc), and convert existing logging calls to use the logger from the appropriate library. Logging configuration would be done in the init.py for each library, helping to make configuration more uniform and transparent across different modules, and making it easier to address the issue of log rotation.

This sgtm!

The time, Ray library issuing the log statement, logging level, and location in the source where the message is emitted are clearly stated, and different Ray libraries have similar message formats.

Also I like this idea. Maybe Ray can provide a public API to standardize some logging output in the long term.

Btw, should we also enable propagate? I found all of Ray loggers have logger.propagate = False which won't give user control of logs. I think logs printed to user program's stdout should all be controlled by main user's logger configuration.

@peytondmurray
Copy link
Contributor Author

I think logs printed to user program's stdout should all be controlled by main user's logger configuration.

💯 Agree, when a user can't configure logging it can be very frustrating.

@rkooo567
Copy link
Contributor

rkooo567 commented Nov 10, 2022

Actually, I had some additional research here.

  1. We should allow users to specify the handler. Right now, we only allow formatted / logger and all logs are force streamed to the stderr (StreamHandler).
  2. From ^, the user should be able to stream ray logs to their root logger. Maybe logging_handler = None can do this.

from the appropriate library. Logging configuration would be done in the init.py for each library,

After some research, this may not be the best idea. The reason is the logging configuration happens upon the import time if we add it to init, meaning users will lose control over logs from Ray (this basically means all users are forced to use the default configuration of Ray loggers). It seems like this should happen after users' root logging configuration or after they specify the handler. @peytondmurray do you know how other libraries configure logger?

@richardliaw
Copy link
Contributor

Also cc @scottsun94

@peytondmurray
Copy link
Contributor Author

| Do you know how other libraries configure logger?

@rkooo567 This is a much tougher question than I originally thought because at least in the libraries I took a look at there isn't much inspiration to be found. I looked at a handful of the major scientific python libraries for insight, and here is what I found:

  • numpy: no logging at all
  • scipy: no logging anywhere except to the root logger in the scipy.optimize package only. Doesn't seem to have documentation on the logging approach taken.
  • matplotlib: logging configuration is done in matplotlib.__init__.py:
_log = logging.getLogger(__name__)

# The decorator ensures this always returns the same handler (and it is only
# attached once).
@functools.lru_cache()
def _ensure_handler():
    """
    The first time this function is called, attach a `StreamHandler` using the
    same format as `logging.basicConfig` to the Matplotlib root logger.

    Return this handler every time this function is called.
    """
    handler = logging.StreamHandler()
    handler.setFormatter(logging.Formatter(logging.BASIC_FORMAT))
    _log.addHandler(handler)
    return handler

Any of the matplotlib submodules just call

import logging

_log = logging.getLogger(__name__)

when they need to log something. Since child loggers by default propagate messages up to the handlers associated with their ancestor loggers, matplotlib doesn't need to set configuration on every module that needs to log a message. This leaves the choice about how logs are handled entirely up to the user, which is in accordance with python docs here. Of all the libraries I looked at, I like matplotlib's logging configuration the best.

  • bokeh: A logger named 'bokeh' is defined in bokeh.util.logconfig. If there are no existing handlers attached to the root logger or the 'bokeh' logger, a basic logging handler is attached to 'bokeh', and log propagation is disabled for that logger. This has the effect of avoiding printing output twice if the root logger is configured later by the user, but I think this also means that the user loses control over the 'bokeh' logger. This logging configuration is disabled if bokeh static or bokeh serve are run, meaning that log messages are propagated up to the root. I'm not exactly sure why bokeh is configured this way, or what advantages this method has over a simpler approach.

  • pymc: A logger called 'pymc' is explicitly defined and all log messages are written to this logger across the entire library. If the root logger has no handlers, a StreamHandler is added to the pymc logger at the INFO level inside pymc.__init__.py. I haven't tested this, but this configuration should be emitting duplicate log messages if the user does something like

import pymc
import logging

logging.basicConfig()

# then do something that creates a log entry

In this case, pymc creates a new logger ('pymc') with a StreamHandler that will write messages to the terminal - but I'm pretty sure that the new logger will have propagate = True so if a basicConfig then generates a new handler and attaches it to the root logger, you should see each logging event recorded twice - once for the 'pymc' logger and once for the root logger. Since pymc only emits messages if in pymc.stats.convergence, I'd need some example code which has convergence problems to really test this, but anyway that's how I understand the code here.

Moving forward

From the Logging HOWTO:

Note: It is strongly advised that you do not add any handlers other than NullHandler to your library’s loggers. This is because the configuration of handlers is the prerogative of the application developer who uses your library. The application developer knows their target audience and what handlers are most appropriate for their application: if you add handlers ‘under the hood’, you might well interfere with their ability to carry out unit tests and deliver logs which suit their requirements.

IMO there are really two different situations we need to think about when tackling the problem of logging in Ray:

  1. When a user integrates Ray as a component of a larger library or application, we want the user to be able to have total control over logging. In this use case, I think we should not configure any logging output, in accordance with the python docs I linked above. @rkooo567 mentioned this above, and I'm in agreement.

  2. If the user wants to just consume Ray output directly, for example training models etc in a notebook, we should offer a logging configuration with reasonable defaults. For me, I think it would be good if users didn't have to manually set up the logging configuration in this case.

Here's one idea:

  1. Use logging.getLogger(__name__) in any module that needs logging. This means that we only need to configure the root logger with reasonable defaults for good-looking logs. With this approach we also avoid the problem of explicitly needing to define separate loggers (e.g. 'ray.data', 'ray.tune', etc), and developers wouldn't need to remember the appropriate logger for the ray package they are working with.
  2. Inside ray.init(), we set a reasonable default logging configuration, but only if the root logger has no logging configuration already set. This means if the user does something like
import ray
import numpy as np
import ...
import logging

...

logging.basicConfig(format="a custom logging format: %(asctime)s: %(message)s)

...

ray.init()

Then the logging configuration the user sets would take precedence. On the other hand, if the user doesn't do any configuration, they still get well-formatted custom logs provided by ray automatically.

  1. @rkooo567 I think you might be right about setting logging configuration in each ray subpackage. I originally thought this would be necessary so that child loggers would know which subpackage they are a part of (so that we can add tags like [Ray Core], [Ray Tune], etc to log messages as appropriate) but as it turns out you can do this by adding a custom filter to the main logging configuration anyway.

I made a simple example project which uses this "conditional configuration" approach here; if rich is installed a good looking log config is generated, but if not, a simple fallback is used. If the user specifies a configuration for the root logger, the user configuration is used.

@xwjiang2010
Copy link
Contributor

Thanks @peytondmurray for a survey of various libraries.

I like the proposal in the sense that it offers good default as well as configurability. @rkooo567 would you be ok if we move forward like this?

@rkooo567
Copy link
Contributor

I will take a look at it by today

@rkooo567
Copy link
Contributor

rkooo567 commented Nov 29, 2022

The proposal sounds pretty good to me! Also thanks for the research on other library logging :).

Can you defer doing 2? I think we can configure the logger upon ray.init, but we don't need to check if the root logger is set up now because of 2 reasons (sorry I am back band fourth here lol).

  1. It is not a regression
  2. These logs are "ray internal logs" which are separated from "user logs" (logs written by users). I think there are 2 approaches here. (1) allow users to configure handler & formatter. (2) following the root logger config. I originally thought 2 is the way to go, but I found it could actually cause problems depending on how users setup their root logger (e.g., if they have some other custom tagging, it could be broken). So I think we should go with (1) (meaning when ray.init is called, we allow users to configure handler and formatter with propagate=False).

@rkooo567
Copy link
Contributor

what's the status for this btw?

@stale
Copy link

stale bot commented May 12, 2023

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

  • If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
  • If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

@stale stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label May 12, 2023
@stale
Copy link

stale bot commented Jun 11, 2023

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

@stale stale bot closed this as completed Jun 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Request for new feature and/or capability observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling stale The issue is stale. It will be closed within 7 days unless there are further conversation
Projects
None yet
Development

No branches or pull requests

4 participants