[RFC] Logging configuration and message formats #30005

peytondmurray · 2022-11-03T22:34:56Z

Description

In the interest of improving both the user and development experiences with Ray, and after having a discussion with @xwjiang2010, I'm opening this issue to solicit community input about Ray's logging capabilities.

Background

Logging configuration

Logging in Ray is currently set up in multiple places. Some modules set custom logging configurations with calls to basicConfig:

This makes sense to do inside separate processes because each process has its own root logger, although there are instances where basicConfig is called inside the main process.

From the python docs, basicConfig

does nothing if the root logger already has handlers configured, unless the keyword argument force is set to True.

In principle I think this means that only the first call to basicConfig modifies the root logger, and all other calls have no effect. Additionally, modifying the root logger may also not be intended because Ray's main logger is set up with the name ray here.

Currently the code to set up the main logger, ray, also appears in a number of different places in order to capture all the different entry points for ray execution, e.g. calling ray.init(), using the CLI to run ray start, etc:

Log output

Currently for a number of different Ray use cases, a lot of output can be generated, which sometimes makes it hard to identify important information.

In this example, there are logging statements from multiple Ray libraries but it's hard to tell at a glance which ones they are. In part this is due to the training results being interspersed throughout, but also because the logging formatter is different for the different Ray libraries, meaning that logging messages from different parts of the code may have different message formats. As part of this effort, I'm hoping we can improve the clarity of these logging statements so that they are more visible when they are shown to the user.

Log Rotation

Another common requested feature is to allow for log rotation, particularly with respect to progress statements; see #28268.

Proposal

One option is to create a logger for each Ray library (tune, train, air, etc), and convert existing logging calls to use the logger from the appropriate library. Logging configuration would be done in the __init__.py for each library, helping to make configuration more uniform and transparent across different modules, and making it easier to address the issue of log rotation.

To improve log message clarity, we can append the name of the Ray library to the logging statement to make it clear what part of the code the log message is being emitted from, and use similar message formats across multiple modules. For example:

[2022-11-02 13:19:32][Ray Core] INFO     Started a local Ray instance. View the dashboard at 127.0.0.1:8265        worker.py:1527
[2022-11-03 10:24:25][Ray Tune]	INFO     Total run time: 5.89 seconds (5.79 seconds for the tuning loop).          tune.py:787

The time, Ray library issuing the log statement, logging level, and location in the source where the message is emitted are clearly stated, and different Ray libraries have similar message formats.

Note: This RFC does not include any changes to remote actor logs, only driver side logging structure. Remote actor logs are for the most part handled on the user side, and so are outside the scope of this proposal.

I'd also be interested in input from the Ray community, so leave a comment if you have any feedback about anything discussed here. @xwjiang2010 Let me know if I missed anything here!

The text was updated successfully, but these errors were encountered:

xwjiang2010 · 2022-11-03T22:44:14Z

tagging a few people for feedbacks
@rkooo567 @ericl

rkooo567 · 2022-11-04T15:49:05Z

One option is to create a logger for each Ray library (tune, train, air, etc), and convert existing logging calls to use the logger from the appropriate library. Logging configuration would be done in the init.py for each library, helping to make configuration more uniform and transparent across different modules, and making it easier to address the issue of log rotation.

This sgtm!

The time, Ray library issuing the log statement, logging level, and location in the source where the message is emitted are clearly stated, and different Ray libraries have similar message formats.

Also I like this idea. Maybe Ray can provide a public API to standardize some logging output in the long term.

Btw, should we also enable propagate? I found all of Ray loggers have logger.propagate = False which won't give user control of logs. I think logs printed to user program's stdout should all be controlled by main user's logger configuration.

peytondmurray · 2022-11-04T18:56:39Z

I think logs printed to user program's stdout should all be controlled by main user's logger configuration.

💯 Agree, when a user can't configure logging it can be very frustrating.

rkooo567 · 2022-11-10T11:41:59Z

Actually, I had some additional research here.

We should allow users to specify the handler. Right now, we only allow formatted / logger and all logs are force streamed to the stderr (StreamHandler).
From ^, the user should be able to stream ray logs to their root logger. Maybe logging_handler = None can do this.

from the appropriate library. Logging configuration would be done in the init.py for each library,

After some research, this may not be the best idea. The reason is the logging configuration happens upon the import time if we add it to init, meaning users will lose control over logs from Ray (this basically means all users are forced to use the default configuration of Ray loggers). It seems like this should happen after users' root logging configuration or after they specify the handler. @peytondmurray do you know how other libraries configure logger?

richardliaw · 2022-11-10T18:10:34Z

Also cc @scottsun94

peytondmurray · 2022-11-13T05:02:43Z

| Do you know how other libraries configure logger?

@rkooo567 This is a much tougher question than I originally thought because at least in the libraries I took a look at there isn't much inspiration to be found. I looked at a handful of the major scientific python libraries for insight, and here is what I found:

numpy: no logging at all
scipy: no logging anywhere except to the root logger in the scipy.optimize package only. Doesn't seem to have documentation on the logging approach taken.
matplotlib: logging configuration is done in matplotlib.__init__.py:

_log = logging.getLogger(__name__)

# The decorator ensures this always returns the same handler (and it is only
# attached once).
@functools.lru_cache()
def _ensure_handler():
    """
    The first time this function is called, attach a `StreamHandler` using the
    same format as `logging.basicConfig` to the Matplotlib root logger.

    Return this handler every time this function is called.
    """
    handler = logging.StreamHandler()
    handler.setFormatter(logging.Formatter(logging.BASIC_FORMAT))
    _log.addHandler(handler)
    return handler

Any of the matplotlib submodules just call

import logging

_log = logging.getLogger(__name__)

when they need to log something. Since child loggers by default propagate messages up to the handlers associated with their ancestor loggers, matplotlib doesn't need to set configuration on every module that needs to log a message. This leaves the choice about how logs are handled entirely up to the user, which is in accordance with python docs here. Of all the libraries I looked at, I like matplotlib's logging configuration the best.

bokeh: A logger named 'bokeh' is defined in bokeh.util.logconfig. If there are no existing handlers attached to the root logger or the 'bokeh' logger, a basic logging handler is attached to 'bokeh', and log propagation is disabled for that logger. This has the effect of avoiding printing output twice if the root logger is configured later by the user, but I think this also means that the user loses control over the 'bokeh' logger. This logging configuration is disabled if bokeh static or bokeh serve are run, meaning that log messages are propagated up to the root. I'm not exactly sure why bokeh is configured this way, or what advantages this method has over a simpler approach.
pymc: A logger called 'pymc' is explicitly defined and all log messages are written to this logger across the entire library. If the root logger has no handlers, a StreamHandler is added to the pymc logger at the INFO level inside pymc.__init__.py. I haven't tested this, but this configuration should be emitting duplicate log messages if the user does something like

import pymc
import logging

logging.basicConfig()

# then do something that creates a log entry

In this case, pymc creates a new logger ('pymc') with a StreamHandler that will write messages to the terminal - but I'm pretty sure that the new logger will have propagate = True so if a basicConfig then generates a new handler and attaches it to the root logger, you should see each logging event recorded twice - once for the 'pymc' logger and once for the root logger. Since pymc only emits messages if in pymc.stats.convergence, I'd need some example code which has convergence problems to really test this, but anyway that's how I understand the code here.

Moving forward

From the Logging HOWTO:

Note: It is strongly advised that you do not add any handlers other than NullHandler to your library’s loggers. This is because the configuration of handlers is the prerogative of the application developer who uses your library. The application developer knows their target audience and what handlers are most appropriate for their application: if you add handlers ‘under the hood’, you might well interfere with their ability to carry out unit tests and deliver logs which suit their requirements.

IMO there are really two different situations we need to think about when tackling the problem of logging in Ray:

When a user integrates Ray as a component of a larger library or application, we want the user to be able to have total control over logging. In this use case, I think we should not configure any logging output, in accordance with the python docs I linked above. @rkooo567 mentioned this above, and I'm in agreement.
If the user wants to just consume Ray output directly, for example training models etc in a notebook, we should offer a logging configuration with reasonable defaults. For me, I think it would be good if users didn't have to manually set up the logging configuration in this case.

Here's one idea:

Use logging.getLogger(__name__) in any module that needs logging. This means that we only need to configure the root logger with reasonable defaults for good-looking logs. With this approach we also avoid the problem of explicitly needing to define separate loggers (e.g. 'ray.data', 'ray.tune', etc), and developers wouldn't need to remember the appropriate logger for the ray package they are working with.
Inside ray.init(), we set a reasonable default logging configuration, but only if the root logger has no logging configuration already set. This means if the user does something like

import ray
import numpy as np
import ...
import logging

...

logging.basicConfig(format="a custom logging format: %(asctime)s: %(message)s)

...

ray.init()

Then the logging configuration the user sets would take precedence. On the other hand, if the user doesn't do any configuration, they still get well-formatted custom logs provided by ray automatically.

@rkooo567 I think you might be right about setting logging configuration in each ray subpackage. I originally thought this would be necessary so that child loggers would know which subpackage they are a part of (so that we can add tags like [Ray Core], [Ray Tune], etc to log messages as appropriate) but as it turns out you can do this by adding a custom filter to the main logging configuration anyway.

I made a simple example project which uses this "conditional configuration" approach here; if rich is installed a good looking log config is generated, but if not, a simple fallback is used. If the user specifies a configuration for the root logger, the user configuration is used.

xwjiang2010 · 2022-11-16T17:17:21Z

Thanks @peytondmurray for a survey of various libraries.

I like the proposal in the sense that it offers good default as well as configurability. @rkooo567 would you be ok if we move forward like this?

rkooo567 · 2022-11-16T22:03:16Z

I will take a look at it by today

rkooo567 · 2022-11-29T08:46:57Z

The proposal sounds pretty good to me! Also thanks for the research on other library logging :).

Can you defer doing 2? I think we can configure the logger upon ray.init, but we don't need to check if the root logger is set up now because of 2 reasons (sorry I am back band fourth here lol).

It is not a regression
These logs are "ray internal logs" which are separated from "user logs" (logs written by users). I think there are 2 approaches here. (1) allow users to configure handler & formatter. (2) following the root logger config. I originally thought 2 is the way to go, but I found it could actually cause problems depending on how users setup their root logger (e.g., if they have some other custom tagging, it could be broken). So I think we should go with (1) (meaning when ray.init is called, we allow users to configure handler and formatter with propagate=False).

rkooo567 · 2023-01-11T14:26:55Z

what's the status for this btw?

stale · 2023-05-12T12:08:30Z

Hi, I'm a bot from the Ray team :)

To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.

If there is no further activity in the 14 days, the issue will be closed!

If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
If you'd like to get more attention to the issue, please tag one of Ray's contributors.

You can always ask for help on our discussion forum or Ray's public slack channel.

stale · 2023-06-11T13:06:52Z

Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.

Please feel free to reopen or open a new issue if you'd still like it to be addressed.

Again, you can always ask for help on our discussion forum or Ray's public slack channel.

Thanks again for opening the issue!

peytondmurray added the enhancement Request for new feature and/or capability label Nov 3, 2022

xwjiang2010 assigned rkooo567 Nov 3, 2022

xwjiang2010 added the observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling label Nov 3, 2022

peytondmurray mentioned this issue Dec 2, 2022

Update logging config #30863

Merged

7 tasks

rkooo567 assigned rkooo567 and peytondmurray and unassigned rkooo567 Dec 8, 2022

peytondmurray mentioned this issue Dec 19, 2022

Add rich logging configuration #31205

Closed

7 tasks

peytondmurray mentioned this issue Feb 22, 2023

Global logging format changes #32741

Merged

7 tasks

This was referenced Mar 22, 2023

[Core] Add a rich logging configuration #33600

Closed

[Core] Logging config cleanup #33821

Closed

peytondmurray mentioned this issue Apr 7, 2023

[WIP] Implement global logging configuration #34179

Closed

8 tasks

stale bot added the stale The issue is stale. It will be closed within 7 days unless there are further conversation label May 12, 2023

peytondmurray mentioned this issue May 23, 2023

[RLlib] Clean up logging configuration #35690

Closed

8 tasks

This was referenced Jun 5, 2023

[Experimental] Raysort: clean up logging configuration #36089

Closed

[Core] Misc logging config cleanup #36090

Closed

stale bot closed this as completed Jun 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Logging configuration and message formats #30005

[RFC] Logging configuration and message formats #30005

peytondmurray commented Nov 3, 2022 •

edited

Loading

xwjiang2010 commented Nov 3, 2022

rkooo567 commented Nov 4, 2022

peytondmurray commented Nov 4, 2022

rkooo567 commented Nov 10, 2022 •

edited

Loading

richardliaw commented Nov 10, 2022

peytondmurray commented Nov 13, 2022

xwjiang2010 commented Nov 16, 2022

rkooo567 commented Nov 16, 2022

rkooo567 commented Nov 29, 2022 •

edited

Loading

rkooo567 commented Jan 11, 2023

stale bot commented May 12, 2023

stale bot commented Jun 11, 2023

[RFC] Logging configuration and message formats #30005

[RFC] Logging configuration and message formats #30005

Comments

peytondmurray commented Nov 3, 2022 • edited Loading

Description

Background

Logging configuration

Log output

Log Rotation

Proposal

xwjiang2010 commented Nov 3, 2022

rkooo567 commented Nov 4, 2022

peytondmurray commented Nov 4, 2022

rkooo567 commented Nov 10, 2022 • edited Loading

richardliaw commented Nov 10, 2022

peytondmurray commented Nov 13, 2022

Moving forward

xwjiang2010 commented Nov 16, 2022

rkooo567 commented Nov 16, 2022

rkooo567 commented Nov 29, 2022 • edited Loading

rkooo567 commented Jan 11, 2023

stale bot commented May 12, 2023

stale bot commented Jun 11, 2023

peytondmurray commented Nov 3, 2022 •

edited

Loading

rkooo567 commented Nov 10, 2022 •

edited

Loading

rkooo567 commented Nov 29, 2022 •

edited

Loading