Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce tracing #4282

Merged
merged 125 commits into from
Nov 2, 2023
Merged

feat: introduce tracing #4282

merged 125 commits into from
Nov 2, 2023

Conversation

eserilev
Copy link
Contributor

@eserilev eserilev commented Aug 1, 2023

Description

We replace log with tracing across the codebase. Where possible, we make use of structured logging now instead of templating strings.

tracing offers the ability to also record "span"s. A span lasts until its dropped and describes the entire duration that it is active for. All logs (in tracing term "events") are hierarchically embedded in all parent-spans).

We introduce several spans:

  • On debug level: One for new_outgoing_connection, new_incoming_connection and new_established_connection
  • On debug level: Connection::poll, Swarm::poll and Pool::poll
  • On trace level: NetworkBehaviour::poll for each implementation of NetworkBehaviour
  • On trace level: ConnectionHandler::poll for each implementation of (protocol) ConnectionHandlers

The idea here is that logging on debug level gives you a decent overview of what the system is doing. You get spans for the duration of connections and how often each connection gets polled.

Dropping down to trace level gives you an extremely detailed view of how long each individual ConnectionHandler was executed as part of poll which could be used for detailed analysis on how busy certain handlers are.

Most importantly, simply logging on info does not give you any spans. We consider info to be a good default that should be reasonably quiet.

Resolves #1533.

Notes & open questions

Change checklist

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • A changelog entry has been made in the appropriate crates

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for starting this!

I've left some notes :)

swarm/Cargo.toml Outdated Show resolved Hide resolved
swarm/src/connection/pool.rs Outdated Show resolved Hide resolved
swarm/src/connection/pool.rs Outdated Show resolved Hide resolved
swarm/src/connection/pool.rs Outdated Show resolved Hide resolved
swarm/src/connection/pool.rs Outdated Show resolved Hide resolved
examples/dcutr/src/main.rs Outdated Show resolved Hide resolved
protocols/ping/src/handler.rs Outdated Show resolved Hide resolved
protocols/ping/src/handler.rs Show resolved Hide resolved
protocols/ping/src/handler.rs Outdated Show resolved Hide resolved
@thomaseizinger thomaseizinger changed the title feat:tracing feat(swarm): introduce tracing Aug 2, 2023
@eserilev
Copy link
Contributor Author

eserilev commented Aug 2, 2023

thank you for the detailed feedback! I'll work to make the changes you suggested and will ping here when its ready for another review

@eserilev
Copy link
Contributor Author

eserilev commented Aug 4, 2023

i've pushed up some changes. To summarize what I've done so far:

  • pass error span into connection
  • poll should call span.enter on every invocation
  • changed span name to connection
  • remove unneeded tracing::field::Empty
  • removed tracing-subscriber dependency from swarm crate

will leave some comments on a few threads where I'd like to clarify some things

ALSO:
I needed to add a "dummy" span in some of the unit tests that create Connection instances

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good progress so far :)

The span is working, see the following screenshot from a local run of the ping-example on my machine:

image

As you can see, some of the information is duplicated. Now that we capture the PeerId within the connection span, it is printed redundantly in some of the messages. I think we should remove it from those messages.

Also, as I noted inline the peer_address prints the entire ConnectedPoint which is very noisy.

I'd encourage you to run the example yourself (I used RUST_LOG=libp2p_ping=trace,debug cargo run) and tweak the logs until they are concise and informative.

Once the ping protocol is done, we can move on to other protocols and tidy up the logs there!

examples/ping-example/Cargo.toml Outdated Show resolved Hide resolved
swarm/src/connection/pool.rs Outdated Show resolved Hide resolved
swarm/src/connection.rs Outdated Show resolved Hide resolved
swarm/src/connection.rs Outdated Show resolved Hide resolved
protocols/ping/src/handler.rs Outdated Show resolved Hide resolved
@eserilev
Copy link
Contributor Author

thanks again for the feedback. I've cleaned up the logs a bit, removed some of the redundancies, and made some changes based on your feedback. I think the logs look tidy!

image

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited for this to land. Thank you for working on it.

Feel free to ignore comments below at this point.

protocols/ping/Cargo.toml Outdated Show resolved Hide resolved
swarm/Cargo.toml Outdated Show resolved Hide resolved
@thomaseizinger
Copy link
Contributor

thanks again for the feedback. I've cleaned up the logs a bit, removed some of the redundancies, and made some changes based on your feedback. I think the logs look tidy!

image

This looks great!

@mxinden do you think we should transition more crates or do that in follow-up PRs?

@mxinden
Copy link
Member

mxinden commented Aug 11, 2023

Either is fine by me.

@eserilev
Copy link
Contributor Author

if it's ok with you all, I think ill just continue implementing tracing for the other protocols on this branch

@thomaseizinger
Copy link
Contributor

if it's ok with you all, I think ill just continue implementing tracing for the other protocols on this branch

Yes, thank you!

@eserilev
Copy link
Contributor Author

sorry for the long hiatus. getting back to working on this now

I'll do my best to separate each protocol changes to its own commit and will include some screenshots of tracing logs

first off is autonat logs:
image

@thomaseizinger thomaseizinger changed the title feat(swarm): introduce tracing feat: introduce tracing Oct 31, 2023
Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the outstanding comments, this is good to merge from my end. Thank you for the work! Looking forward to the insights we will gain from this.

examples/metrics/README.md Show resolved Hide resolved
examples/metrics/README.md Show resolved Hide resolved
Cargo.lock Show resolved Hide resolved
examples/chat/src/main.rs Outdated Show resolved Hide resolved
swarm/src/connection/pool.rs Show resolved Hide resolved

This comment was marked as resolved.

@eserilev
Copy link
Contributor Author

eserilev commented Nov 1, 2023

i believe i resolved the outstanding comments, hopefully I didnt miss anything. the metrics example is really neat!

This comment was marked as resolved.

@thomaseizinger
Copy link
Contributor

@mxinden I believe all your comments have been addressed, please have another look!

This comment was marked as resolved.

This comment was marked as resolved.

Copy link
Member

@mxinden mxinden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work everyone!

@mxinden mxinden added the send-it label Nov 2, 2023
@mergify mergify bot merged commit e2e9179 into libp2p:master Nov 2, 2023
71 checks passed
@thomaseizinger
Copy link
Contributor

@eserilev Thank you especially for the continuous work here! Migrations like this are very tedious but I am excited for the insights this will bring to our users.

I will mention this prominently in our release notes.

umgefahren pushed a commit to umgefahren/rust-libp2p that referenced this pull request Mar 8, 2024
We replace `log` with `tracing` across the codebase. Where possible, we make use of structured logging now instead of templating strings.

`tracing` offers the ability to also record "span"s. A span lasts until its dropped and describes the entire duration that it is active for. All logs (in `tracing` term "events") are hierarchically embedded in all parent-spans).

We introduce several spans:

- On debug level: One for `new_outgoing_connection`, `new_incoming_connection` and `new_established_connection`
- On debug level: `Connection::poll`, `Swarm::poll` and `Pool::poll`
- On trace level: `NetworkBehaviour::poll` for each implementation of `NetworkBehaviour`
- On trace level: `ConnectionHandler::poll` for each implementation of (protocol) `ConnectionHandler`s

The idea here is that logging on debug level gives you a decent overview of what the system is doing. You get spans for the duration of connections and how often each connection gets polled.

Dropping down to trace level gives you an extremely detailed view of how long each individual `ConnectionHandler` was executed as part of `poll` which could be used for detailed analysis on how busy certain handlers are.

Most importantly, simply logging on `info` does not give you any spans. We consider `info` to be a good default that should be reasonably quiet.

Resolves libp2p#1533.

Pull-Request: libp2p#4282.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal-change Pull requests that make internal changes to crates and thus don't need to include a changelog entry. send-it
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use tracing for improved diagnostics
5 participants