Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add resource monitor framework #3848

Merged
merged 6 commits into from
Jul 19, 2018
Merged

add resource monitor framework #3848

merged 6 commits into from
Jul 19, 2018

Conversation

eziskind
Copy link
Contributor

Add an extensible resource monitor framework for monitoring resource "pressures" (usage/limit). This will be used by the overload manager to implement downstream circuit breaking (issue #373 - see design doc linked from there).

Risk Level: low (not yet used in envoy main)

Signed-off-by: Elisha Ziskind eziskind@google.com

to be used by the overload manager (issue envoyproxy#373)

Signed-off-by: Elisha Ziskind <eziskind@google.com>
Signed-off-by: Elisha Ziskind <eziskind@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really solid, LGTM modulo comments.

option go_package = "v2alpha";

message FixedHeapConfig {
// Limit of the Envoy process heap size. This is used to calculate heap memory pressure which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if there is a more push-button way to approach this, by having Envoy infer maximum memory available in the container? We have had some recent discussions with OpenCensus on how to do memory utilization determination and the main takeaway is its hard to do cross platform in a consistent way. OTOH, freeing users from having to make this determination seems a net win if there's some reasonable approaches (e.g. can tcmalloc tell us anything?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I couldn't find a way to dynamically get the heap limit from tcmalloc. I expect most users will want to define their own memory monitor which knows how to query the envoy container for this info and perhaps use it in combination with this or on its own.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I agree with @eziskind on this that it will be easier to have the operator provide the limit. How that limit is obtained/enforced is going to be system specific.


// Callback for handling updated usage information for this resource.
// Exactly one of 'usage' or 'error' will be non-null.
typedef std::function<void(const ResourceUsage* usage, const EnvoyException* error)> UpdateCb;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to register both an onSuccess and onFailure callback, following the same patterns as the async clients, e.g. https://github.com/envoyproxy/envoy/blob/master/include/envoy/http/async_client.h.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

* @return Event::Dispatcher& the main thread's dispatcher. This dispatcher should be used
* for all singleton processing.
*/
virtual Event::Dispatcher& dispatcher() PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just inline this and skip the context unless you anticipate this growing significantly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

ResourceMonitorFactoryContext& context) PURE;

/**
* @return std::string the identifgitying name for a particular implementation of a resource
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: typo identifgitying

const Protobuf::Message& config,
Server::Configuration::ResourceMonitorFactoryContext& /*unused_context*/) {
return std::make_unique<FixedHeapMonitor>(
MessageUtil::downcastAndValidate<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the cast/validation boilerplate, some other extension mechanisms have a templated adapter to the extension interface. Search for "FromProtoTyped" to see examples of this. Not necessarily something needed in this PR, but could be good to add a TODO at least.

}

void FixedHeapMonitor::updateResourceUsage(const Server::ResourceMonitor::UpdateCb& completionCb) {
size_t physical = Memory::Stats::totalCurrentlyReserved();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: make these const size_t.

TEST(FixedHeapMonitorTest, SaneUsage) {
envoy::config::resource_monitor::fixed_heap::v2alpha::FixedHeapConfig config;
config.set_max_heap_size_bytes(std::numeric_limits<uint64_t>::max());
std::unique_ptr<FixedHeapMonitor> monitor(new FixedHeapMonitor(config));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could FixedHeapMonitor receive its memory query functions as a mockable object in its constructor? This would allow you to validate the actual math in the utilization function as well with mockable inputs.

Copy link
Member

@lizan lizan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You will need add a field in bootstrap.proto to actually include resource monitors in config.

void FixedHeapMonitor::updateResourceUsage(const Server::ResourceMonitor::UpdateCb& completionCb) {
size_t physical = Memory::Stats::totalCurrentlyReserved();
size_t unmapped = Memory::Stats::totalPageHeapUnmapped();
size_t used = std::max<size_t>(physical - unmapped, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

size_t is unsigned, so I think this max does nothing?

Signed-off-by: Elisha Ziskind <eziskind@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo nits. @mattklein123 @lizan can you take another pass?

name = "resource_monitor_interface",
hdrs = ["resource_monitor.h"],
deps = [
"//source/common/protobuf",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this dependency isn't needed (based on header analysis).

void FixedHeapMonitor::updateResourceUsage(Server::ResourceMonitor::Callbacks& callbacks) {
const size_t physical = stats_->reservedHeapBytes();
const size_t unmapped = stats_->unmappedHeapBytes();
const size_t used = physical - unmapped;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ASSERT(physical >= unmapped); to sanity check here.


load(
"//bazel:envoy_build_system.bzl",
"envoy_cc_mock",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused?

virtual ~MemoryStatsReader() {}

// Memory reserved for the process by the heap.
virtual uint64_t reservedHeapBytes();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general style in Envoy for this is to declare pure interfaces for everything and then specialize as real vs. mock. I think what you have here is fine though, so no objections, just sharing an idiom that seems fairly pervasive.

Signed-off-by: Elisha Ziskind <eziskind@google.com>
Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great from a high level skim. Few small comments. Very excited to see this being added!

option go_package = "v2alpha";

message FixedHeapConfig {
// Limit of the Envoy process heap size. This is used to calculate heap memory pressure which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I agree with @eziskind on this that it will be easier to have the operator provide the limit. How that limit is obtained/enforced is going to be system specific.

* Called when the request for updated resource usage succeeds.
* @param usage the updated resource usage
*/
virtual void onSuccess(const ResourceUsage& usage) PURE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, given the size of ResourceUsage I would probably pass by value everywhere, unless you see it getting larger in the future?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to add the actual usage and limit values - useful at least for reporting as stats. So it may grow a little, but probably not much. In any event, I'd expect resource usages won't be polled so frequently that passing by reference or value will make any noticeable difference.

* Create a particular resource monitor implementation.
* @param config const ProtoBuf::Message& supplies the config for the resource monitor
* implementation.
* @param dispatcher Event::Dispatcher& the main thread's dispatcher. This is used by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional comment: I would expect that in the future we might need to add other things here beyond the dispatcher, such as cluster manager, TLS, etc. It might be better to just create some type of Factory interface like we do for the other types of plugins? That way it's easier to add additional members to pass in the future without having to change the interface everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, adding back the ResourceMonitorFactoryContext I had originally... :)

Signed-off-by: Elisha Ziskind <eziskind@google.com>
Signed-off-by: Elisha Ziskind <eziskind@google.com>
@htuch htuch merged commit 0e71582 into envoyproxy:master Jul 19, 2018
@eziskind eziskind deleted the overload branch July 20, 2018 13:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants