From dedfb16b205911709313f076e82a7f0dfd07d8a3 Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Mon, 22 Apr 2024 11:14:23 -0700
Subject: [PATCH 01/21] Create A80-grpc-metrics-for-tcp-connection
---
A80-grpc-metrics-for-tcp-connection | 62 +++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)
create mode 100644 A80-grpc-metrics-for-tcp-connection
diff --git a/A80-grpc-metrics-for-tcp-connection b/A80-grpc-metrics-for-tcp-connection
new file mode 100644
index 000000000..ca3b4a9d3
--- /dev/null
+++ b/A80-grpc-metrics-for-tcp-connection
@@ -0,0 +1,62 @@
+A80: gRPC Metrics for TCP connection
+----
+* Author(s): Yash Tibrewal (@yashykt), Nana Pang (@nanahpang), Yousuk Seung (@yousukseung)
+* Approver: Craig Tiller (@ctiller), Mark Roth (@markdroth)
+* Status: {Draft, In Review, Ready for Implementation, Implemented}
+* language: {...}
+* Last updated: 2024-04-18
+* Discussion at: {...}
+
+## Abstract
+
+This document proposes adding new TCP connection metrics to gRPC for improved network analysis and debugging.
+
+## Background
+
+To improve the network debugging capabilities for gRPC users, we propose adding per-connection TCP metrics in gRPC. The metrics will utilize the metrics framework outlined in [A79].
+
+### Related Proposals:
+* [A79]: gRPC Non-Per-Call Metrics Framework (pending)
+
+[A79]: https://github.com/grpc/proposal/pull/421
+
+## Proposal
+
+This document proposes changes to the following gRPC components.
+
+#### Per-Connection TCP Metrics
+
+We will provide the following metrics:
+- `grpc.tcp.min_rtt`
+- `grpc.tcp.delivery_rate`
+- `grpc.tcp.packets_sent`
+- `grpc.tcp.packets_retransmitted`
+- `grpc.tcp.packets_spurious_retransmitted`
+
+The metrics will have label:
+
+| Name | Disposition | Description |
+| ----------- | ----------- | ----------- |
+| grpc.tcp.remote_peer_address | optional | Store the peer address info in the format as `ip:port`. |
+
+The metrics will be exported as:
+
+| Name | Type | Unit | Labels | Description |
+| ------------- | ----- | ----- | ------- | ----------- |
+| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Reports TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. |
+| grpc.tcp.delivery_rate | Distribution | bit/s | grpc.tcp.remote_peer_string | Records the most recent non-app-limited throughput at the time that Fathom samples the connection statistics. |
+| grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets TCP sends in the calculation period. |
+| grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
+| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets spuriously retransmitted packets in the calculation period. |
+
+### Metric Stability
+
+All metrics added in this proposal will start as experimental. The long term goal will be to
+de-experimentalize them and have them be on by default, but the exact
+criteria for that change are TBD.
+
+### Temporary environment variable protection
+
+This proposal does not include any features enabled via external I/O, so
+it does not need environment variable protection.
+
From ffaeb22f7de15a8f2dab03e7e5d221e2a70b215a Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Tue, 23 Apr 2024 12:27:08 -0700
Subject: [PATCH 02/21] Update A80-grpc-metrics-for-tcp-connection
---
A80-grpc-metrics-for-tcp-connection | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/A80-grpc-metrics-for-tcp-connection b/A80-grpc-metrics-for-tcp-connection
index ca3b4a9d3..4992e19e3 100644
--- a/A80-grpc-metrics-for-tcp-connection
+++ b/A80-grpc-metrics-for-tcp-connection
@@ -47,7 +47,7 @@ The metrics will be exported as:
| grpc.tcp.delivery_rate | Distribution | bit/s | grpc.tcp.remote_peer_string | Records the most recent non-app-limited throughput at the time that Fathom samples the connection statistics. |
| grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets TCP sends in the calculation period. |
| grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
-| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets spuriously retransmitted packets in the calculation period. |
+| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
### Metric Stability
From d41329136a403d303db166db1daf723980be2a0d Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Wed, 24 Apr 2024 13:36:56 -0700
Subject: [PATCH 03/21] Update A80-grpc-metrics-for-tcp-connection
---
A80-grpc-metrics-for-tcp-connection | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/A80-grpc-metrics-for-tcp-connection b/A80-grpc-metrics-for-tcp-connection
index 4992e19e3..61d4b613d 100644
--- a/A80-grpc-metrics-for-tcp-connection
+++ b/A80-grpc-metrics-for-tcp-connection
@@ -43,8 +43,8 @@ The metrics will be exported as:
| Name | Type | Unit | Labels | Description |
| ------------- | ----- | ----- | ------- | ----------- |
-| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Reports TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. |
-| grpc.tcp.delivery_rate | Distribution | bit/s | grpc.tcp.remote_peer_string | Records the most recent non-app-limited throughput at the time that Fathom samples the connection statistics. |
+| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. |
+| grpc.tcp.delivery_rate | Distribution | bit/s | grpc.tcp.remote_peer_string | Records latest throughput measured of the TCP connection. |
| grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets TCP sends in the calculation period. |
| grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
From 5b5ba3f3cba2d2fa6af72ab6110de211001f83a3 Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Thu, 25 Apr 2024 15:36:37 -0700
Subject: [PATCH 04/21] Update A80-grpc-metrics-for-tcp-connection
---
A80-grpc-metrics-for-tcp-connection | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/A80-grpc-metrics-for-tcp-connection b/A80-grpc-metrics-for-tcp-connection
index 61d4b613d..3c2f592b8 100644
--- a/A80-grpc-metrics-for-tcp-connection
+++ b/A80-grpc-metrics-for-tcp-connection
@@ -5,7 +5,7 @@ A80: gRPC Metrics for TCP connection
* Status: {Draft, In Review, Ready for Implementation, Implemented}
* language: {...}
* Last updated: 2024-04-18
-* Discussion at: {...}
+* Discussion at: https://groups.google.com/g/grpc-io/c/AyT0LVgoqFs
## Abstract
From 583e6b32ece172499301c2373ee4d5a04b28cb48 Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Mon, 29 Apr 2024 14:25:04 -0700
Subject: [PATCH 05/21] Update and rename A80-grpc-metrics-for-tcp-connection
to A80-grpc-metrics-for-tcp-connection.md
---
...n => A80-grpc-metrics-for-tcp-connection.md | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
rename A80-grpc-metrics-for-tcp-connection => A80-grpc-metrics-for-tcp-connection.md (61%)
diff --git a/A80-grpc-metrics-for-tcp-connection b/A80-grpc-metrics-for-tcp-connection.md
similarity index 61%
rename from A80-grpc-metrics-for-tcp-connection
rename to A80-grpc-metrics-for-tcp-connection.md
index 3c2f592b8..45906a316 100644
--- a/A80-grpc-metrics-for-tcp-connection
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -37,17 +37,18 @@ The metrics will have label:
| Name | Disposition | Description |
| ----------- | ----------- | ----------- |
-| grpc.tcp.remote_peer_address | optional | Store the peer address info in the format as `ip:port`. |
+| grpc.tcp.peer_address | optional | Store the peer address info in URI format such as `ipv4:1.2.3.4:567`. |
+| grpc.tcp.local_address | optional | Store the local address info in URI format such as `ipv4:1.2.3.4:567`. |
The metrics will be exported as:
| Name | Type | Unit | Labels | Description |
| ------------- | ----- | ----- | ------- | ----------- |
-| grpc.tcp.min_rtt | Distribution | s | grpc.tcp.remote_peer_string | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. |
-| grpc.tcp.delivery_rate | Distribution | bit/s | grpc.tcp.remote_peer_string | Records latest throughput measured of the TCP connection. |
-| grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets TCP sends in the calculation period. |
-| grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
-| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.remote_peer_string | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
+| grpc.tcp.min_rtt | Histogram | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. |
+| grpc.tcp.delivery_rate | Histogram | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. |
+| grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. |
+| grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
+| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
### Metric Stability
@@ -60,3 +61,8 @@ criteria for that change are TBD.
This proposal does not include any features enabled via external I/O, so
it does not need environment variable protection.
+## Implementation
+
+Will be implemented in C-core, but currently have no plans to implement in other languages.
+
+
From 8aa21c1b26afd9043fc6652f6a67ba2787e2c7b3 Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Mon, 29 Apr 2024 14:26:35 -0700
Subject: [PATCH 06/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index 45906a316..aa82cae02 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -63,6 +63,6 @@ it does not need environment variable protection.
## Implementation
-Will be implemented in C-core, but currently have no plans to implement in other languages.
+Will be implemented in C-core, and currently have no plans to implement in other languages.
From 9f8038c61fe5d6f991ae9d12f3449cebc603818c Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Wed, 1 May 2024 14:16:17 -0700
Subject: [PATCH 07/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index aa82cae02..52b56be09 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -44,11 +44,11 @@ The metrics will be exported as:
| Name | Type | Unit | Labels | Description |
| ------------- | ----- | ----- | ------- | ----------- |
-| grpc.tcp.min_rtt | Histogram | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. |
-| grpc.tcp.delivery_rate | Histogram | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. |
-| grpc.tcp.packets_sent | Counter | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. |
-| grpc.tcp.packets_retransmitted | Counter | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
-| grpc.tcp.packets_spurious_retransmitted | Counter | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
+| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. |
+| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. |
+| grpc.tcp.packets_sent | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. |
+| grpc.tcp.packets_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
+| grpc.tcp.packets_spurious_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
### Metric Stability
From 59ab138dd1945083c3de18ce27faf6e2ed6f959b Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Wed, 1 May 2024 17:35:51 -0700
Subject: [PATCH 08/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index 52b56be09..c82cd5f79 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -44,12 +44,18 @@ The metrics will be exported as:
| Name | Type | Unit | Labels | Description |
| ------------- | ----- | ----- | ------- | ----------- |
-| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. |
+| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. |
| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. |
| grpc.tcp.packets_sent | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. |
| grpc.tcp.packets_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
| grpc.tcp.packets_spurious_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
+The TCP metrics are collected by enabling `SO_TIMESTAMPING` in kernel TCP through `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))`. The kernel TCP then wil capture packet timestamps on transmission.
+
+#### Reference:
+* Fathom: https://dl.acm.org/doi/pdf/10.1145/3603269.3604815
+* Kernel TCP Timestamping: https://www.kernel.org/doc/Documentation/networking/timestamping.rst
+
### Metric Stability
All metrics added in this proposal will start as experimental. The long term goal will be to
From ce27a6929c95e006dd2df814e9e7afe9316c5a34 Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Wed, 1 May 2024 17:49:33 -0700
Subject: [PATCH 09/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index c82cd5f79..6109e993d 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -50,7 +50,7 @@ The metrics will be exported as:
| grpc.tcp.packets_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
| grpc.tcp.packets_spurious_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
-The TCP metrics are collected by enabling `SO_TIMESTAMPING` in kernel TCP through `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))`. The kernel TCP then wil capture packet timestamps on transmission.
+The metrics are acquired by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack via the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This configuration allows the kernel to capture packet timestamps during transmission and subsequently provide relevant socket information when `getsockopt(TCP_INFO)` is invoked.
#### Reference:
* Fathom: https://dl.acm.org/doi/pdf/10.1145/3603269.3604815
From d239c39e8f8957dd2c4b0f393d888c2bc247aee4 Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Fri, 10 May 2024 13:28:57 -0700
Subject: [PATCH 10/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index 6109e993d..ca7c37bbc 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -16,7 +16,7 @@ This document proposes adding new TCP connection metrics to gRPC for improved ne
To improve the network debugging capabilities for gRPC users, we propose adding per-connection TCP metrics in gRPC. The metrics will utilize the metrics framework outlined in [A79].
### Related Proposals:
-* [A79]: gRPC Non-Per-Call Metrics Framework (pending)
+* [A79]: gRPC Non-Per-Call Metrics Framework
[A79]: https://github.com/grpc/proposal/pull/421
@@ -46,9 +46,9 @@ The metrics will be exported as:
| ------------- | ----- | ----- | ------- | ----------- |
| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. |
| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. |
-| grpc.tcp.packets_sent | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. |
-| grpc.tcp.packets_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
-| grpc.tcp.packets_spurious_retransmitted | Counter (int64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
+| grpc.tcp.packets_sent | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. |
+| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
+| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
The metrics are acquired by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack via the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This configuration allows the kernel to capture packet timestamps during transmission and subsequently provide relevant socket information when `getsockopt(TCP_INFO)` is invoked.
From 0726f6e8af912ab7ca40fe3c7e3a21ba3f97881b Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Wed, 15 May 2024 14:26:33 -0700
Subject: [PATCH 11/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index ca7c37bbc..62f417f50 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -18,7 +18,7 @@ To improve the network debugging capabilities for gRPC users, we propose adding
### Related Proposals:
* [A79]: gRPC Non-Per-Call Metrics Framework
-[A79]: https://github.com/grpc/proposal/pull/421
+[A79]: https://github.com/grpc/proposal/blob/master/A79-non-per-call-metrics-architecture.md
## Proposal
From 3bfe76b79892fb4c77bb2905015798df4fee61a3 Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Wed, 15 May 2024 14:47:33 -0700
Subject: [PATCH 12/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index 62f417f50..a9c82d5f8 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -18,7 +18,7 @@ To improve the network debugging capabilities for gRPC users, we propose adding
### Related Proposals:
* [A79]: gRPC Non-Per-Call Metrics Framework
-[A79]: https://github.com/grpc/proposal/blob/master/A79-non-per-call-metrics-architecture.md
+[A79]: A79-non-per-call-metrics-architecture.md
## Proposal
From 2ccf768a6820f1b1d1b49579c082252f6035617e Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Tue, 21 May 2024 16:30:59 -0700
Subject: [PATCH 13/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index a9c82d5f8..22e48b4bd 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -50,11 +50,19 @@ The metrics will be exported as:
| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
-The metrics are acquired by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack via the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This configuration allows the kernel to capture packet timestamps during transmission and subsequently provide relevant socket information when `getsockopt(TCP_INFO)` is invoked.
+
+#### Metric Collection Design
+
+A high-level approach to collecting TCP metrics is as follows:
+1) **Collect Network Timestamps for Metric Calculation:** On Linux, this is achieved by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission and provide this information through `getsockopt(TCP_INFO)`.
+2) **Calculate Time Deltas from Timestamps:** For example, the `delivery_rate` metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow. This involves calculating the time difference between when a data packet was sent and when it was acknowledged.
+3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 10 seconds), gRPC aggregates the calculated metrics and updates the corresponding statistics records.
+
#### Reference:
* Fathom: https://dl.acm.org/doi/pdf/10.1145/3603269.3604815
* Kernel TCP Timestamping: https://www.kernel.org/doc/Documentation/networking/timestamping.rst
+* Delivery Rate: https://datatracker.ietf.org/doc/html/draft-cheng-iccrg-delivery-rate-estimation#name-delivery-rate
### Metric Stability
@@ -70,5 +78,3 @@ it does not need environment variable protection.
## Implementation
Will be implemented in C-core, and currently have no plans to implement in other languages.
-
-
From 83ac90863229845da872a8c754b1e689774c6b9b Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Tue, 21 May 2024 18:06:44 -0700
Subject: [PATCH 14/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index 22e48b4bd..b0dfac344 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -44,8 +44,8 @@ The metrics will be exported as:
| Name | Type | Unit | Labels | Description |
| ------------- | ----- | ----- | ------- | ----------- |
-| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. |
-| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest throughput measured of the TCP connection. |
+| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. RTT: packet acked timestamp - packet sent timestamp. |
+| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest goodput measured of the TCP connection. Elapse time = packet acked timestamp - last packet acked timestamp. Delivery rate = packet acked bytes / elapse time. |
| grpc.tcp.packets_sent | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. |
| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
@@ -55,7 +55,7 @@ The metrics will be exported as:
A high-level approach to collecting TCP metrics is as follows:
1) **Collect Network Timestamps for Metric Calculation:** On Linux, this is achieved by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission and provide this information through `getsockopt(TCP_INFO)`.
-2) **Calculate Time Deltas from Timestamps:** For example, the `delivery_rate` metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow. This involves calculating the time difference between when a data packet was sent and when it was acknowledged.
+2) **Calculate Time Deltas from Timestamps:** For example, the `delivery_rate` metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow. This involves calculating the (byte difference / time difference) between last acked data packet and the latest acked data packet.
3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 10 seconds), gRPC aggregates the calculated metrics and updates the corresponding statistics records.
From 2a11aea34244e32dcafc348a72702ea1d007e947 Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Tue, 21 May 2024 18:08:45 -0700
Subject: [PATCH 15/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index b0dfac344..cc811cca0 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -55,7 +55,7 @@ The metrics will be exported as:
A high-level approach to collecting TCP metrics is as follows:
1) **Collect Network Timestamps for Metric Calculation:** On Linux, this is achieved by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission and provide this information through `getsockopt(TCP_INFO)`.
-2) **Calculate Time Deltas from Timestamps:** For example, the `delivery_rate` metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow. This involves calculating the (byte difference / time difference) between last acked data packet and the latest acked data packet.
+2) **Calculate Time Deltas from Timestamps:** For example, the `delivery_rate` metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow. This involves calculating the (packet bytes / elapse time between last acked data packet and the latest acked data packet).
3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 10 seconds), gRPC aggregates the calculated metrics and updates the corresponding statistics records.
From b6dc6d94a0d4439699da802eb18e52b0d985ed2c Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Tue, 21 May 2024 18:35:55 -0700
Subject: [PATCH 16/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index cc811cca0..3b16a4bfb 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -44,8 +44,8 @@ The metrics will be exported as:
| Name | Type | Unit | Labels | Description |
| ------------- | ----- | ----- | ------- | ----------- |
-| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints. RTT: packet acked timestamp - packet sent timestamp. |
-| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest goodput measured of the TCP connection. Elapse time = packet acked timestamp - last packet acked timestamp. Delivery rate = packet acked bytes / elapse time. |
+| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. |
+| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. |
| grpc.tcp.packets_sent | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. |
| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
@@ -53,10 +53,10 @@ The metrics will be exported as:
#### Metric Collection Design
-A high-level approach to collecting TCP metrics is as follows:
-1) **Collect Network Timestamps for Metric Calculation:** On Linux, this is achieved by enabling the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission and provide this information through `getsockopt(TCP_INFO)`.
-2) **Calculate Time Deltas from Timestamps:** For example, the `delivery_rate` metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow. This involves calculating the (packet bytes / elapse time between last acked data packet and the latest acked data packet).
-3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 10 seconds), gRPC aggregates the calculated metrics and updates the corresponding statistics records.
+A high-level approach to collecting TCP metrics (on Linux) is as follows:
+1) **Enable Network Timestamps for Metric Calculation:** Enable the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission.
+2) **Calculate Metrics from Timestamps:** Linux kernel calculates TCP connection metrics based on the captured packet timestamps. These metrics can be retrieved using the `getsockopt(TCP_INFO)` system call. For example, the delivery_rate metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow ([code](https://elixir.bootlin.com/linux/v5.11.1/source/net/ipv4/tcp.c#L391)).
+3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records.
#### Reference:
From 0aceebef5d12b4981447beafee3802243bbcbf9b Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Wed, 22 May 2024 14:30:48 -0700
Subject: [PATCH 17/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index 3b16a4bfb..3dce9267f 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -37,18 +37,17 @@ The metrics will have label:
| Name | Disposition | Description |
| ----------- | ----------- | ----------- |
-| grpc.tcp.peer_address | optional | Store the peer address info in URI format such as `ipv4:1.2.3.4:567`. |
-| grpc.tcp.local_address | optional | Store the local address info in URI format such as `ipv4:1.2.3.4:567`. |
+| grpc.tcp.server_address | optional | Store the server address info in URI format such as `ipv4:1.2.3.4:567`. For clients, this address is the same as the peer address, while on the server side, it's the same as the local address. |
The metrics will be exported as:
| Name | Type | Unit | Labels | Description |
| ------------- | ----- | ----- | ------- | ----------- |
-| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.peer_address, grpc.tcp.local_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. |
-| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.peer_address, grpc.tcp.local_address | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. |
-| grpc.tcp.packets_sent | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets TCP sends in the calculation period. |
-| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
-| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.peer_address, grpc.tcp.local_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
+| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.server_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. |
+| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.server_address | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. |
+| grpc.tcp.packets_sent | Counter (uint64) | {packet} | grpc.tcp.server_address | Records total packets TCP sends in the calculation period. |
+| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.server_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
+| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.server_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
#### Metric Collection Design
From 052d5cf58272f128da0b53ada7cc496b3a8f0c40 Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Wed, 22 May 2024 16:55:23 -0700
Subject: [PATCH 18/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index 3dce9267f..efa8c720d 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -33,21 +33,15 @@ We will provide the following metrics:
- `grpc.tcp.packets_retransmitted`
- `grpc.tcp.packets_spurious_retransmitted`
-The metrics will have label:
-
-| Name | Disposition | Description |
-| ----------- | ----------- | ----------- |
-| grpc.tcp.server_address | optional | Store the server address info in URI format such as `ipv4:1.2.3.4:567`. For clients, this address is the same as the peer address, while on the server side, it's the same as the local address. |
-
The metrics will be exported as:
| Name | Type | Unit | Labels | Description |
| ------------- | ----- | ----- | ------- | ----------- |
-| grpc.tcp.min_rtt | Histogram (double) | s | grpc.tcp.server_address | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. |
-| grpc.tcp.delivery_rate | Histogram (double) | bit/s | grpc.tcp.server_address | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. |
-| grpc.tcp.packets_sent | Counter (uint64) | {packet} | grpc.tcp.server_address | Records total packets TCP sends in the calculation period. |
-| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | grpc.tcp.server_address | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
-| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | grpc.tcp.server_address | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
+| grpc.tcp.min_rtt | Histogram (double) | s | None | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. |
+| grpc.tcp.delivery_rate | Histogram (double) | bit/s | None | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. |
+| grpc.tcp.packets_sent | Counter (uint64) | {packet} | None | Records total packets TCP sends in the calculation period. |
+| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | None | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
+| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | None | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
#### Metric Collection Design
From 7e5bc869baa98e4a7132e4fcc7b98a5b16a4af99 Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Fri, 24 May 2024 11:15:21 -0700
Subject: [PATCH 19/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index efa8c720d..d881e2ec6 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -24,7 +24,7 @@ To improve the network debugging capabilities for gRPC users, we propose adding
This document proposes changes to the following gRPC components.
-#### Per-Connection TCP Metrics
+### Per-Connection TCP Metrics
We will provide the following metrics:
- `grpc.tcp.min_rtt`
@@ -43,14 +43,12 @@ The metrics will be exported as:
| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | None | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | None | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
-
#### Metric Collection Design
A high-level approach to collecting TCP metrics (on Linux) is as follows:
1) **Enable Network Timestamps for Metric Calculation:** Enable the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission.
2) **Calculate Metrics from Timestamps:** Linux kernel calculates TCP connection metrics based on the captured packet timestamps. These metrics can be retrieved using the `getsockopt(TCP_INFO)` system call. For example, the delivery_rate metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow ([code](https://elixir.bootlin.com/linux/v5.11.1/source/net/ipv4/tcp.c#L391)).
-3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records.
-
+3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records. A detailed explanation of the design can be found in the Fathom documentation.
#### Reference:
* Fathom: https://dl.acm.org/doi/pdf/10.1145/3603269.3604815
From 092fbc197da455de073419eeb94b60c1f4a68ff7 Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Fri, 24 May 2024 13:04:37 -0700
Subject: [PATCH 20/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index d881e2ec6..5b689718e 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -48,7 +48,9 @@ The metrics will be exported as:
A high-level approach to collecting TCP metrics (on Linux) is as follows:
1) **Enable Network Timestamps for Metric Calculation:** Enable the `SO_TIMESTAMPING` option in the kernel's TCP stack through the `setsocketopt(fd, SOL_SOCKET, SO_TIMESTAMPING, &val, sizeof(val))` system call. This enables the kernel to capture packet timestamps during transmission.
2) **Calculate Metrics from Timestamps:** Linux kernel calculates TCP connection metrics based on the captured packet timestamps. These metrics can be retrieved using the `getsockopt(TCP_INFO)` system call. For example, the delivery_rate metric estimates the goodput—the rate of useful data transmitted—for the most recent group of outbound data packets within a single flow ([code](https://elixir.bootlin.com/linux/v5.11.1/source/net/ipv4/tcp.c#L391)).
-3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records. A detailed explanation of the design can be found in the Fathom documentation.
+3) **Periodically Collect Statistics:** At a specified time interval (e.g., every 5 minutes), gRPC aggregates the calculated metrics and updates the corresponding statistics records.
+
+A detailed explanation of the design can be found in the Fathom documentation.
#### Reference:
* Fathom: https://dl.acm.org/doi/pdf/10.1145/3603269.3604815
From bd18940ab84ad7ed5acd541d9377ae4867af3ba1 Mon Sep 17 00:00:00 2001
From: nanahpang <31627465+nanahpang@users.noreply.github.com>
Date: Fri, 24 May 2024 15:20:35 -0700
Subject: [PATCH 21/21] Update A80-grpc-metrics-for-tcp-connection.md
---
A80-grpc-metrics-for-tcp-connection.md | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/A80-grpc-metrics-for-tcp-connection.md b/A80-grpc-metrics-for-tcp-connection.md
index 5b689718e..f60574777 100644
--- a/A80-grpc-metrics-for-tcp-connection.md
+++ b/A80-grpc-metrics-for-tcp-connection.md
@@ -37,11 +37,11 @@ The metrics will be exported as:
| Name | Type | Unit | Labels | Description |
| ------------- | ----- | ----- | ------- | ----------- |
-| grpc.tcp.min_rtt | Histogram (double) | s | None | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. |
-| grpc.tcp.delivery_rate | Histogram (double) | bit/s | None | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. |
-| grpc.tcp.packets_sent | Counter (uint64) | {packet} | None | Records total packets TCP sends in the calculation period. |
-| grpc.tcp.packets_retransmitted | Counter (uint64) | {packet} | None | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
-| grpc.tcp.packets_spurious_retransmitted | Counter (uint64) | {packet} | None | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
+| grpc.tcp.min_rtt | Histogram (floating-point) | s | None | Records TCP's current estimate of minimum round trip time (RTT), typically used as an indication of the network health between two endpoints.
RTT = packet acked timestamp - packet sent timestamp. |
+| grpc.tcp.delivery_rate | Histogram (floating-point) | bit/s | None | Records latest goodput measured of the TCP connection.
Elapsed time = packet acked timestamp - last packet acked timestamp.
Delivery rate = packet acked bytes / elapsed time. |
+| grpc.tcp.packets_sent | Counter (integer) | {packet} | None | Records total packets TCP sends in the calculation period. |
+| grpc.tcp.packets_retransmitted | Counter (integer) | {packet} | None | Records total packets lost in the calculation period, including lost or spuriously retransmitted packets. |
+| grpc.tcp.packets_spurious_retransmitted | Counter (integer) | {packet} | None | Records total packets spuriously retransmitted packets in the calculation period. These are retransmissions that TCP later discovered unnecessary.|
#### Metric Collection Design