Skip to content

Commit

Permalink
network: More efficient caching for Envoy socket addresses (envoyprox…
Browse files Browse the repository at this point in the history
…y#37832)

An LRU cache was introduced to cache `Envoy::Network::Address` instances
because they are expensive to create. These addresses are cached for
reading source and destination addresses from `recvmsg` and `recvmmsg`
calls on QUIC UDP sockets. The current size of the cache is 4 entries
for each IoHandle (i.e. each socket).

A locally run CPU profile of Envoy Mobile showed about 1.75% of CPU
cycles going towards querying and inserting into the
`quic::QuicLRUCache`.

Given the small number of elements in the cache, this commit uses a
`std::vector` data structure instead of `QuicLRUCache`. `QuicLRUCache`,
`std::vector`, and `std::deque` were compared using newly added
benchmark tests, and the following were the results:

QuicLRUCache:
```
-------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                               Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------------------
BM_GetOrCreateEnvoyAddressInstanceNoCache/iterations:1000                           31595 ns        31494 ns         1000
BM_GetOrCreateEnvoyAddressInstanceConnectedSocket/iterations:1000                    5538 ns         5538 ns         1000
BM_GetOrCreateEnvoyAddressInstanceUnconnectedSocket/iterations:1000                 38918 ns        38814 ns         1000
BM_GetOrCreateEnvoyAddressInstanceUnconnectedSocketLargerCache/iterations:1000      52969 ns        52846 ns         1000
```

std::deque:
```
-------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                               Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------------------
BM_GetOrCreateEnvoyAddressInstanceNoCache/iterations:1000                           31805 ns        31716 ns         1000
BM_GetOrCreateEnvoyAddressInstanceConnectedSocket/iterations:1000                    1553 ns         1550 ns         1000
BM_GetOrCreateEnvoyAddressInstanceUnconnectedSocket/iterations:1000                 27243 ns        27189 ns         1000
BM_GetOrCreateEnvoyAddressInstanceUnconnectedSocketLargerCache/iterations:1000      39335 ns        39235 ns         1000
```

std::vector:
```
-------------------------------------------------------------------------------------------------------------------------
Benchmark                                                                               Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------------------------
BM_GetOrCreateEnvoyAddressInstanceNoCache/iterations:1000                           31960 ns        31892 ns         1000
BM_GetOrCreateEnvoyAddressInstanceConnectedSocket/iterations:1000                    1514 ns         1514 ns         1000
BM_GetOrCreateEnvoyAddressInstanceUnconnectedSocket/iterations:1000                 26361 ns        26261 ns         1000
BM_GetOrCreateEnvoyAddressInstanceUnconnectedSocketLargerCache/iterations:1000      43987 ns        43738 ns         1000
```

`std::vector` uses 3.5x less CPU cycles than `quic::QuicLRUCache` and
performs very slightly better than `std::deque` at small cache sizes. If
considering creating a bigger cache size (e.g. >= 50 entries),
`std::deque` may perform better and it's worth profiling, though in such
a situation, no cache at all seems to perform better than a cache.

Risk Level: low
Testing: unit and benchmark tests
Docs Changes: n/a
Release Notes: n/a
Platform Specific Features: n/a

---------

Signed-off-by: Ali Beyad <abeyad@google.com>
  • Loading branch information
abeyad authored Jan 6, 2025
1 parent a07c129 commit 5ccb21d
Show file tree
Hide file tree
Showing 9 changed files with 269 additions and 17 deletions.
1 change: 0 additions & 1 deletion source/common/network/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,6 @@ envoy_cc_library(
"//source/common/api:os_sys_calls_lib",
"//source/common/buffer:buffer_lib",
"//source/common/event:dispatcher_includes",
"@com_github_google_quiche//:quic_core_lru_cache_lib",
"@com_github_google_quiche//:quic_platform_socket_address",
"@envoy_api//envoy/extensions/network/socket_interface/v3:pkg_cc_proto",
],
Expand Down
18 changes: 13 additions & 5 deletions source/common/network/io_socket_handle_impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -231,18 +231,26 @@ Api::IoCallUint64Result IoSocketHandleImpl::sendmsg(const Buffer::RawSlice* slic

Address::InstanceConstSharedPtr
IoSocketHandleImpl::getOrCreateEnvoyAddressInstance(sockaddr_storage ss, socklen_t ss_len) {
if (recent_received_addresses_ == nullptr) {
if (!recent_received_addresses_) {
return Address::addressFromSockAddrOrDie(ss, ss_len, fd_, socket_v6only_);
}
quic::QuicSocketAddress quic_address(ss);
auto it = recent_received_addresses_->Lookup(quic_address);
auto it = std::find_if(
recent_received_addresses_->begin(), recent_received_addresses_->end(),
[&quic_address](const QuicEnvoyAddressPair& pair) { return pair.first == quic_address; });
if (it != recent_received_addresses_->end()) {
return *it->second;
Address::InstanceConstSharedPtr cached_addr = it->second;
// Move the entry to the back of the list since it's the most recently accessed entry.
std::rotate(it, it + 1, recent_received_addresses_->end());
return cached_addr;
}
Address::InstanceConstSharedPtr new_address =
Address::addressFromSockAddrOrDie(ss, ss_len, fd_, socket_v6only_);
recent_received_addresses_->Insert(
quic_address, std::make_unique<Address::InstanceConstSharedPtr>(new_address));
recent_received_addresses_->push_back(QuicEnvoyAddressPair(quic_address, new_address));
if (recent_received_addresses_->size() > address_cache_max_capacity_) {
// Over capacity so remove the first element in the list, which is the least recently accessed.
recent_received_addresses_->erase(recent_received_addresses_->begin());
}
return new_address;
}

Expand Down
22 changes: 11 additions & 11 deletions source/common/network/io_socket_handle_impl.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#pragma once

#include <memory>
#include <vector>

#include "envoy/api/io_error.h"
#include "envoy/api/os_sys_calls.h"
Expand All @@ -13,15 +14,12 @@
#include "source/common/network/io_socket_handle_base_impl.h"
#include "source/common/runtime/runtime_features.h"

#include "quiche/quic/core/quic_lru_cache.h"
#include "quiche/quic/platform/api/quic_socket_address.h"

namespace Envoy {
namespace Network {

using AddressInstanceLRUCache =
quic::QuicLRUCache<quic::QuicSocketAddress, Address::InstanceConstSharedPtr,
quic::QuicSocketAddressHash>;
using QuicEnvoyAddressPair = std::pair<quic::QuicSocketAddress, Address::InstanceConstSharedPtr>;

/**
* IoHandle derivative for sockets.
Expand All @@ -32,10 +30,10 @@ class IoSocketHandleImpl : public IoSocketHandleBaseImpl {
absl::optional<int> domain = absl::nullopt,
size_t address_cache_max_capacity = 0)
: IoSocketHandleBaseImpl(fd, socket_v6only, domain),
receive_ecn_(Runtime::runtimeFeatureEnabled("envoy.reloadable_features.quic_receive_ecn")) {
receive_ecn_(Runtime::runtimeFeatureEnabled("envoy.reloadable_features.quic_receive_ecn")),
address_cache_max_capacity_(address_cache_max_capacity) {
if (address_cache_max_capacity > 0) {
recent_received_addresses_ =
std::make_unique<AddressInstanceLRUCache>(address_cache_max_capacity);
recent_received_addresses_ = std::vector<QuicEnvoyAddressPair>();
}
}

Expand Down Expand Up @@ -111,9 +109,7 @@ class IoSocketHandleImpl : public IoSocketHandleBaseImpl {
// Latches a copy of the runtime feature "envoy.reloadable_features.quic_receive_ecn".
const bool receive_ecn_;

size_t addressCacheMaxSize() const {
return recent_received_addresses_ == nullptr ? 0 : recent_received_addresses_->MaxSize();
}
size_t addressCacheMaxSize() const { return address_cache_max_capacity_; }

private:
// Returns the destination address if the control message carries it.
Expand All @@ -128,7 +124,11 @@ class IoSocketHandleImpl : public IoSocketHandleBaseImpl {
// Should only be used by UDP sockets to avoid creating multiple address instances for the same
// address in each read operation. Only be instantiated if the non-zero address_cache_max_capacity
// is passed in during the construction.
std::unique_ptr<AddressInstanceLRUCache> recent_received_addresses_;
size_t address_cache_max_capacity_;
absl::optional<std::vector<QuicEnvoyAddressPair>> recent_received_addresses_ = absl::nullopt;

// For testing and benchmarking non-public methods.
friend class IoSocketHandleImplTestWrapper;
};
} // namespace Network
} // namespace Envoy
6 changes: 6 additions & 0 deletions source/common/quic/envoy_quic_utils.cc
Original file line number Diff line number Diff line change
Expand Up @@ -189,6 +189,12 @@ createConnectionSocket(const Network::Address::InstanceConstSharedPtr& peer_addr
ASSERT(peer_addr != nullptr);
const bool should_connect =
Runtime::runtimeFeatureEnabled("envoy.reloadable_features.quic_connect_client_udp_sockets");
// NOTE: If changing the default cache size from 4 entries, make sure to profile it using
// the benchmark test: //test/common/network:io_socket_handle_impl_benchmark
//
// If setting a higher cache size, try profiling std::deque instead of std::vector for the
// `recent_received_addresses_` cache in
// https://github.com/envoyproxy/envoy/blob/main/source/common/network/io_socket_handle_impl.h.
size_t max_addresses_cache_size =
Runtime::runtimeFeatureEnabled(
"envoy.reloadable_features.quic_upstream_socket_use_address_cache_for_read")
Expand Down
12 changes: 12 additions & 0 deletions test/common/network/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -452,6 +452,18 @@ envoy_cc_test(
],
)

envoy_cc_benchmark_binary(
name = "io_socket_handle_impl_benchmark",
srcs = ["io_socket_handle_impl_benchmark_test.cc"],
rbe_pool = "6gig",
deps = [
"//source/common/common:utility_lib",
"//source/common/network:address_lib",
"//test/test_common:network_utility_lib",
"@com_github_google_benchmark//:benchmark",
],
)

envoy_cc_test(
name = "io_uring_socket_handle_impl_test",
srcs = select({
Expand Down
93 changes: 93 additions & 0 deletions test/common/network/io_socket_handle_impl_benchmark_test.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
#include <memory>

#include "source/common/network/io_socket_handle_impl.h"

#include "test/test_common/network_utility.h"

#include "absl/strings/str_cat.h"
#include "benchmark/benchmark.h"

namespace Envoy {
namespace Network {
namespace Test {

std::vector<sockaddr_storage> getSockAddrSampleAddresses(const int count) {
std::vector<sockaddr_storage> addresses;
for (int i = 0; i < count; i += 4) {
int ip_suffix = 101 + i;
// A sample v6 source address.
addresses.push_back(getV6SockAddr(absl::StrCat("2001:DB8::", ip_suffix), 51234));
// A sample v6 destination address.
addresses.push_back(getV6SockAddr(absl::StrCat("2001:DB8::", ip_suffix), 443));
// A sample v4 source address.
addresses.push_back(getV4SockAddr(absl::StrCat("203.0.113.", ip_suffix), 52345));
// A sample v4 destination address.
addresses.push_back(getV4SockAddr(absl::StrCat("203.0.113.", ip_suffix), 443));
}
return addresses;
}

} // namespace Test

class IoSocketHandleImplTestWrapper {
public:
explicit IoSocketHandleImplTestWrapper(const int cache_size)
: io_handle_(-1, false, absl::nullopt, cache_size) {}

Address::InstanceConstSharedPtr getOrCreateEnvoyAddressInstances(const sockaddr_storage& ss) {
return io_handle_.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss));
}

private:
IoSocketHandleImpl io_handle_;
};

static void BM_GetOrCreateEnvoyAddressInstanceNoCache(benchmark::State& state) {
std::vector<sockaddr_storage> addresses = Test::getSockAddrSampleAddresses(/*count=*/4);
IoSocketHandleImplTestWrapper wrapper(/*cache_size=*/0);
for (auto _ : state) {
for (int i = 0; i < 50; ++i) {
benchmark::DoNotOptimize(wrapper.getOrCreateEnvoyAddressInstances(addresses[0]));
benchmark::DoNotOptimize(wrapper.getOrCreateEnvoyAddressInstances(addresses[1]));
}
}
}
BENCHMARK(BM_GetOrCreateEnvoyAddressInstanceNoCache)->Iterations(1000);

static void BM_GetOrCreateEnvoyAddressInstanceConnectedSocket(benchmark::State& state) {
std::vector<sockaddr_storage> addresses = Test::getSockAddrSampleAddresses(/*count=*/4);
IoSocketHandleImplTestWrapper wrapper(/*cache_size=*/4);
for (auto _ : state) {
for (int i = 0; i < 50; ++i) {
benchmark::DoNotOptimize(wrapper.getOrCreateEnvoyAddressInstances(addresses[0]));
benchmark::DoNotOptimize(wrapper.getOrCreateEnvoyAddressInstances(addresses[1]));
}
}
}
BENCHMARK(BM_GetOrCreateEnvoyAddressInstanceConnectedSocket)->Iterations(1000);

static void BM_GetOrCreateEnvoyAddressInstanceUnconnectedSocket(benchmark::State& state) {
std::vector<sockaddr_storage> addresses = Test::getSockAddrSampleAddresses(/*count=*/100);
IoSocketHandleImplTestWrapper wrapper(/*cache_size=*/4);
for (auto _ : state) {
for (const sockaddr_storage& ss : addresses) {
benchmark::DoNotOptimize(wrapper.getOrCreateEnvoyAddressInstances(ss));
}
}
}
BENCHMARK(BM_GetOrCreateEnvoyAddressInstanceUnconnectedSocket)->Iterations(1000);

static void
BM_GetOrCreateEnvoyAddressInstanceUnconnectedSocketLargerCache(benchmark::State& state) {
std::vector<sockaddr_storage> addresses = Test::getSockAddrSampleAddresses(/*count=*/100);
IoSocketHandleImplTestWrapper wrapper(/*cache_size=*/50);
for (auto _ : state) {
for (const sockaddr_storage& ss : addresses) {
benchmark::DoNotOptimize(wrapper.getOrCreateEnvoyAddressInstances(ss));
}
}
}
BENCHMARK(BM_GetOrCreateEnvoyAddressInstanceUnconnectedSocketLargerCache)->Iterations(1000);

} // namespace Network
} // namespace Envoy
84 changes: 84 additions & 0 deletions test/common/network/io_socket_handle_impl_test.cc
Original file line number Diff line number Diff line change
Expand Up @@ -197,5 +197,89 @@ TEST_P(IoSocketHandleImplTest, InterfaceNameForLoopback) {
}

} // namespace

// This test wrapper is a friend class of IoSocketHandleImpl, so it has access to its private and
// protected methods.
class IoSocketHandleImplTestWrapper {
public:
void runGetAddressTests(const int cache_size) {
IoSocketHandleImpl io_handle(-1, false, absl::nullopt, cache_size);

// New address.
sockaddr_storage ss = Test::getV6SockAddr("2001:DB8::1234", 51234);
EXPECT_EQ(io_handle.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss))->asString(),
"[2001:db8::1234]:51234");
// New address.
ss = Test::getV6SockAddr("2001:DB8::1235", 51235);
EXPECT_EQ(io_handle.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss))->asString(),
"[2001:db8::1235]:51235");

// Access the first entry to test moving recently used entries in the cache.
ss = Test::getV6SockAddr("2001:DB8::1234", 51234);
EXPECT_EQ(io_handle.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss))->asString(),
"[2001:db8::1234]:51234");
// Access the last entry to test moving recently used entries in the cache.
ss = Test::getV6SockAddr("2001:DB8::1234", 51234);
EXPECT_EQ(io_handle.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss))->asString(),
"[2001:db8::1234]:51234");

// New address.
ss = Test::getV6SockAddr("2001:DB8::1236", 51236);
EXPECT_EQ(io_handle.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss))->asString(),
"[2001:db8::1236]:51236");
// New address.
ss = Test::getV6SockAddr("2001:DB8::1237", 51237);
EXPECT_EQ(io_handle.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss))->asString(),
"[2001:db8::1237]:51237");

// Access the second entry to test moving recently used entries in the cache.
ss = Test::getV6SockAddr("2001:DB8::1234", 51234);
EXPECT_EQ(io_handle.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss))->asString(),
"[2001:db8::1234]:51234");

// New address.
ss = Test::getV6SockAddr("2001:DB8::1238", 51238);
EXPECT_EQ(io_handle.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss))->asString(),
"[2001:db8::1238]:51238");
// New address.
ss = Test::getV4SockAddr("213.0.113.101", 50234);
EXPECT_EQ(io_handle.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss))->asString(),
"213.0.113.101:50234");
ss = Test::getV4SockAddr("213.0.113.102", 50235);
EXPECT_EQ(io_handle.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss))->asString(),
"213.0.113.102:50235");
ss = Test::getV4SockAddr("213.0.113.103", 50236);
EXPECT_EQ(io_handle.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss))->asString(),
"213.0.113.103:50236");

// Access a middle entry.
ss = Test::getV4SockAddr("213.0.113.101", 50234);
EXPECT_EQ(io_handle.getOrCreateEnvoyAddressInstance(ss, Test::getSockAddrLen(ss))->asString(),
"213.0.113.101:50234");
}
};

TEST(IoSocketHandleImpl, GetOrCreateEnvoyAddressInstance) {
IoSocketHandleImplTestWrapper wrapper;

// No cache.
wrapper.runGetAddressTests(/*cache_size=*/0);

// Cache size 1.
wrapper.runGetAddressTests(/*cache_size=*/1);

// Cache size 3.
wrapper.runGetAddressTests(/*cache_size=*/3);

// Cache size 4.
wrapper.runGetAddressTests(/*cache_size=*/4);

// Cache size 6.
wrapper.runGetAddressTests(/*cache_size=*/6);

// Cache size 10.
wrapper.runGetAddressTests(/*cache_size=*/10);
}

} // namespace Network
} // namespace Envoy
27 changes: 27 additions & 0 deletions test/test_common/network_utility.cc
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,33 @@ void UdpSyncPeer::recv(Network::UdpRecvData& datagram) {
received_datagrams_.pop_front();
}

sockaddr_storage getV6SockAddr(const std::string& ip, uint32_t port) {
sockaddr_storage ss;
auto ipv6_addr = reinterpret_cast<sockaddr_in6*>(&ss);
memset(ipv6_addr, 0, sizeof(sockaddr_in6));
ipv6_addr->sin6_family = AF_INET6;
inet_pton(AF_INET6, ip.c_str(), &ipv6_addr->sin6_addr);
ipv6_addr->sin6_port = htons(port);
return ss;
}

sockaddr_storage getV4SockAddr(const std::string& ip, uint32_t port) {
sockaddr_storage ss;
auto ipv4_addr = reinterpret_cast<sockaddr_in*>(&ss);
memset(ipv4_addr, 0, sizeof(sockaddr_in));
ipv4_addr->sin_family = AF_INET;
inet_pton(AF_INET, ip.c_str(), &ipv4_addr->sin_addr);
ipv4_addr->sin_port = htons(port);
return ss;
}

socklen_t getSockAddrLen(const sockaddr_storage& ss) {
if (ss.ss_family == AF_INET6) {
return sizeof(sockaddr_in6);
}
return sizeof(sockaddr_in);
}

} // namespace Test
} // namespace Network
} // namespace Envoy
23 changes: 23 additions & 0 deletions test/test_common/network_utility.h
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,29 @@ UpstreamTransportSocketFactoryPtr createRawBufferSocketFactory();
*/
DownstreamTransportSocketFactoryPtr createRawBufferDownstreamSocketFactory();

/**
* Creates a sockaddr_storage instance containing an IPv6 socket.
* @param ip The IP address as a string.
* @param port The port.
* @return sockaddr_storage
*/
sockaddr_storage getV6SockAddr(const std::string& ip, uint32_t port);

/**
* Creates a sockaddr_storage instance containing an IPv4 socket.
* @param ip The IP address as a string.
* @param port The port.
* @return sockaddr_storage
*/
sockaddr_storage getV4SockAddr(const std::string& ip, uint32_t port);

/**
* Gets the length of the sockaddr_storage instance.
* @param ss The sockaddr_storage instance (can be a v4 or v6 instance).
* @return socklen_t The size of the sockaddr_storage object.
*/
socklen_t getSockAddrLen(const sockaddr_storage& ss);

/**
* Implementation of Network::FilterChain with empty filter chain, but pluggable transport socket
* factory.
Expand Down

0 comments on commit 5ccb21d

Please sign in to comment.