Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A80: gRPC Metrics for TCP connection #428

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open
Changes from 20 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
dedfb16
Create A80-grpc-metrics-for-tcp-connection
nanahpang Apr 22, 2024
ffaeb22
Update A80-grpc-metrics-for-tcp-connection
nanahpang Apr 23, 2024
d413291
Update A80-grpc-metrics-for-tcp-connection
nanahpang Apr 24, 2024
5b5ba3f
Update A80-grpc-metrics-for-tcp-connection
nanahpang Apr 25, 2024
583e6b3
Update and rename A80-grpc-metrics-for-tcp-connection to A80-grpc-met…
nanahpang Apr 29, 2024
8aa21c1
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang Apr 29, 2024
9f8038c
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 1, 2024
59ab138
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 2, 2024
ce27a69
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 2, 2024
d239c39
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 10, 2024
0726f6e
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 15, 2024
3bfe76b
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 15, 2024
2ccf768
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 21, 2024
83ac908
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 22, 2024
2a11aea
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 22, 2024
b6dc6d9
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 22, 2024
0aceebe
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 22, 2024
052d5cf
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 22, 2024
7e5bc86
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 24, 2024
092fbc1
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 24, 2024
bd18940
Update A80-grpc-metrics-for-tcp-connection.md
nanahpang May 24, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
73 changes: 73 additions & 0 deletions A80-grpc-metrics-for-tcp-connection.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
A80: gRPC Metrics for TCP connection
----
* Author(s): Yash Tibrewal (@yashykt), Nana Pang (@nanahpang), Yousuk Seung (@yousukseung)
* Approver: Craig Tiller (@ctiller), Mark Roth (@markdroth)
* Status: {Draft, In Review, Ready for Implementation, Implemented}
* language: {...}
* Last updated: 2024-04-18
* Discussion at: https://groups.google.com/g/grpc-io/c/AyT0LVgoqFs

## Abstract

This document proposes adding new TCP connection metrics to gRPC for improved network analysis and debugging.

## Background

To improve the network debugging capabilities for gRPC users, we propose adding per-connection TCP metrics in gRPC. The metrics will utilize the metrics framework outlined in [A79].

### Related Proposals:
* [A79]: gRPC Non-Per-Call Metrics Framework

[A79]: A79-non-per-call-metrics-architecture.md

## Proposal

This document proposes changes to the following gRPC components.

### Per-Connection TCP Metrics

We will provide the following metrics:
- `grpc.tcp.min_rtt`
- `grpc.tcp.delivery_rate`
- `grpc.tcp.packets_sent`
- `grpc.tcp.packets_retransmitted`
- `grpc.tcp.packets_spurious_retransmitted`

The metrics will be exported as:

| Name | Type | Unit | Labels | Description |
| ------------- | ----- | ----- | ------- | ----------- |
| grpc.tcp.min_rtt | Histogram (double) | s | None | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.<br /> RTT = packet acked timestamp - packet sent timestamp. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buckets for histogram are explicitly specified for request latency in https://github.com/grpc/proposal/blob/master/A66-otel-stats.md. Do you plan to reuse the same buckets? It might be worth specifying.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, but using buckets for the min_rtt metric wouldn't offer much significant insights. The value of min_rtt is bound to the physical length of the path between sender and receiver or the queueing time caused by throttling and load. A step-change in min_rtt values usually means that traffic is being throttled or experiencing congestion, or has been re-routed through a different path. I think it's better to opt out buckets in this situation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nanahpang If you do not specify buckets, we will get the default OpenTelemetry buckets which is { 0, 5, 10, 25, 50, 75,100, 250, 500, 750, 1000, 2500, 5000, 7500, 10000}. Are there better defaults that we can suggest or do we leave it to the user to figure it out?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding OpenTelemetry buckets, if you're referring to a plugin used for exporting metrics, that would be a separate concern from the metric types we're defining here. However, I agree that specifying a smaller range for the min_rtt metric buckets is the right approach, especially since we're using seconds as the unit (to align with the open-source standard) while the underlying measurements are usually in microseconds.

| grpc.tcp.delivery_rate | Histogram (double) | bit/s | None | Records latest goodput measured of the TCP connection. <br /> Elapsed time = packet acked timestamp - last packet acked timestamp. <br /> Delivery rate = packet acked bytes / elapsed time. |
| grpc.tcp.packets_sent | Counter (uint64) | {packet} | None | Records total packets TCP sends in the calculation period. |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "calculation period"?

| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | None | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | None | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|

#### Metric Collection Design

A high-level approach to collecting TCP metrics (on Linux) is as follows:
1) **Enable Network Timestamps for Metric Calculation:** Enable the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission.
2) **Calculate Metrics from Timestamps:** Linux kernel calculates TCP connection metrics based on the captured packet timestamps. These metrics can be retrieved using the `getsockopt(TCP_INFO)` system call. For example, the delivery_rate metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow ([code](https://elixir.bootlin.com/linux/v5.11.1/source/net/ipv4/tcp.c#L391)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd still like to know what the mapping is for each metric. The first few are easy because the names here mirror tcp_info (I assume). But then it gets less obvious, and tcp_info isn't documented very well.

3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records.
Copy link
Contributor

@atollena atollena May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be a bit more precise. I have 3 questions:

  1. Do you plan to have this interval be specified by users somehow? Or should it be fixed to 5 minutes? 5 minutes seems long to me for updating metrics, ideally it should be something close to the collection period (which iiuc is more typically in the 1 minute ballpark).
  2. How should the aggregation logic look like? I imagine you have something like this in mind:
  • for counters, iterate over sockets and sum up packets_sent, packets_restransmitted of all open sockets.
  • for histograms, for each socket, record the value (delivery_rate & min_rtt) in the histogram for the corresponding metric.

That's the only thing that really makes sense given the definition of the metrics above, so perhaps it's fine to leave it implicit.

  1. What happens to sockets that have been closed in between the interval? Do we just not collect statistics for those and loose the data? I'm not sure that there is an alternative that works, such as calling getsockopts and updating statistics just before closing the socket.

Copy link
Author

@nanahpang nanahpang May 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in C++ the interval is not fixed, and the default value is 5 minutes in Fathom. For other language implementations, the interval can be adjusted as needed. @yousukseung for other questions of the Fathom implementation. Thanks.

For context, this high-level plan aims to provide a general understanding of the existing metric collection process in C++ (implemented through Fathom), while offering flexibility for adaptation in other languages. To maintain clarity and focus, implementation details have been omitted from this proposal and can be found in the Fathom documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"At a specified time interval" means the interval will be configurable. How is it configured?


A detailed explanation of the design can be found in the Fathom documentation.

#### Reference:
* Fathom: https://dl.acm.org/doi/pdf/10.1145/3603269.3604815
* Kernel TCP Timestamping: https://www.kernel.org/doc/Documentation/networking/timestamping.rst
* Delivery Rate: https://datatracker.ietf.org/doc/html/draft-cheng-iccrg-delivery-rate-estimation#name-delivery-rate

### Metric Stability

All metrics added in this proposal will start as experimental. The long term goal will be to
de-experimentalize them and have them be on by default, but the exact
criteria for that change are TBD.

### Temporary environment variable protection

This proposal does not include any features enabled via external I/O, so
it does not need environment variable protection.

## Implementation

Will be implemented in C-core, and currently have no plans to implement in other languages.