Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(subscriber): resource instrumentation #77

Merged
merged 17 commits into from
Aug 26, 2021
Merged

feat(subscriber): resource instrumentation #77

merged 17 commits into from
Aug 26, 2021

Conversation

zaharidichev
Copy link
Collaborator

This PR adds the pieces needed to get resource instrumentation data
into the console. This includes:

  • changes to the proto definitions
  • changes to the subscriber

The UI part will come as a follow up PR. This branch uses a patched tokio
that emits these tracing spans and events for the Sleep resource. You can look
at the raw data by running:

cargo run --example app
cargo run --example dump

The information piped through includes:

  • data describing the resource lifecycle, namely when resources are created and dropped
  • data describing the async operations that take place on these events and their associationg with tasks
  • data reflecting the state updates that take place on resources (e.g. resetting timer's duration, adding permits to a semaphore, etc )

Signed-off-by: Zahari Dichev zaharidichev@gmail.com

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Comment on lines 546 to 571
if let Some(mut task_stats) = self.task_stats.update(&id) {
task_stats.poll_stats.current_polls -= 1;
if task_stats.poll_stats.current_polls == 0 {
if let Some(last_poll_started) = task_stats.poll_stats.last_poll_started {
let elapsed = at.duration_since(last_poll_started).unwrap();
task_stats.poll_stats.last_poll_ended = Some(at);
task_stats.poll_stats.busy_time += elapsed;
task_stats
.poll_times_histogram
.record(elapsed.as_nanos().try_into().unwrap_or(u64::MAX))
.unwrap();
}
}
}

if let Some(mut async_op_stats) = self.async_op_stats.update(&id) {
async_op_stats.poll_stats.current_polls -= 1;
if async_op_stats.poll_stats.current_polls == 0 {
if let Some(last_poll_started) = async_op_stats.poll_stats.last_poll_started
{
let elapsed = at.duration_since(last_poll_started).unwrap();
async_op_stats.poll_stats.last_poll_ended = Some(at);
async_op_stats.poll_stats.busy_time += elapsed;
}
}
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will make this a bit more DRY

Comment on lines 522 to 539
if let Some(mut task_stats) = self.task_stats.update(&id) {
if task_stats.poll_stats.current_polls == 0 {
task_stats.poll_stats.last_poll_started = Some(at);
if task_stats.poll_stats.first_poll == None {
task_stats.poll_stats.first_poll = Some(at);
}
task_stats.poll_stats.polls += 1;
}
task_stats.poll_stats.current_polls += 1;
}

if let Some(mut async_op_stats) = self.async_op_stats.update(&id) {
if async_op_stats.poll_stats.current_polls == 0 {
async_op_stats.poll_stats.last_poll_started = Some(at);
if async_op_stats.poll_stats.first_poll == None {
async_op_stats.poll_stats.first_poll = Some(at);
}
async_op_stats.poll_stats.polls += 1;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this as well...

Comment on lines 356 to 362
} else if meta.name() == "runtime.resource" {
self.resource_callsites.insert(meta);
} else if meta.name() == "runtime.async_op" {
self.async_op_callsites.insert(meta);
} else if meta.target() == "tokio::resource::op" {
self.resource_op_callsites.insert(meta);
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to do that, we probably need to change the way Callsites work and drop the 32 bound.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there likely more than 32 ops in Tokio? That doesn't sound crazy... Perhaps Callsites could get a const generic for the size of the array?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#87 makes Callsites have a const generic parameter for length, so we can make bigger callsite arrays for resource ops.

if there are a lot of resource ops, though, we may want to use a hashmap or something instead; there's probably a point at which linear search is no longer efficient...

Copy link
Member

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've so far reviewed all the files except for aggregator.rs and lib.rs. Those big 'uns will be tomorrow. But so far, the way you've separated things in the proto files makes sense to me! Just a couple questions inline.


fn record_str(&mut self, field: &tracing_core::Field, value: &str) {
let extract_partial_attr = || -> Option<MatchPart> {
let parts: Vec<_> = field.name().split('_').collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This temporary vec is an allocation we can skip, just calling next() 3 times.

}

fn record_u64(&mut self, field: &tracing_core::Field, value: u64) {
let parts: Vec<_> = field.name().split('_').collect();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

// The resources's concrete rust type.
string concrete_type = 3;
// The kind of resource (e.g timer, mutex)
string kind = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could save some allocations and wire size if we the kind a oneof with a bunch of "common" kinds, and another kind that is a string. Kinda like linkerd2's HttpMethod.

I don't mean this as a hard requirement now, but could be a good optimization maybe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that makes sense indeeed

//
// This uniquely identifies this op across all *currently live*
// ones.
common.MetaId id = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I fully understand the difference of these 3 IDs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have clarified this more.

  • common.MetaId metadata is the id of the metadata for tracing event that describes this op
  • common.MetaId id should really be the unique identifier for this resource op. Since this resource op is represented by an event in tracing, not a span it does not have a span id. This is why I have used the metadata id for the purpose as well. Not sure whether this is correct as I am not sure this will actually be unique. Namely, there could be two invocations of a resource op that have the exact metadata and will produce the very same metadata id. Need to revisit that. Also I am not too sure we actually need any unique identifier for these. Suggestions welcome.
  • common.SpanId resource_id is the id of the resource. Any resource operation should be attached to a resource. Like if you call reset on a timer, this will produce a ResourceOp and the resource id will be the id of the main resource span.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • common.MetaId id should really be the unique identifier for this resource op. Since this resource op is represented by an event in tracing, not a span it does not have a span id. This is why I have used the metadata id for the purpose as well. Not sure whether this is correct as I am not sure this will actually be unique. Namely, there could be two invocations of a resource op that have the exact metadata and will produce the very same metadata id. Need to revisit that. Also I am not too sure we actually need any unique identifier for these. Suggestions welcome.

This will not be unique; it will be shared across all ResourceOp events originating from the same tracing::event! macro. In fact, ResourceOp.id will always be equal to ResourceOp.metadata.

I think this field should probably just be removed entirely. Since these resource ops correspond to single events, and we won't emit more data about a specific resource op more than once, I don't think they need a unique identifier. If we do want them to have unique identifiers, we should probably just generate them with a global counter in the subscriber.

Copy link
Member

@seanmonstar seanmonstar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got through aggregator and lib. On the whole, looks good to me.

task_stats: IdData::default(),
resources: IdData {
data: HashMap::<span::Id, (Resource, bool)>::new(),
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can each of these just be IdData::default()? It would make these lines less busy.

@@ -543,6 +787,21 @@ impl<T> TaskData<T> {
fn get(&self, id: &span::Id) -> Option<&T> {
self.data.get(id).map(|(data, _)| data)
}

fn as_proto(&mut self, updated_only: bool) -> HashMap<u64, T::Result>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest not using a boolean argument, when I was reading the usage elsewhere in the file, it's hard to know what tasks.as_proto(false) means. Perhaps an enum would make it clearer, like enum Include { All, UpdatedOnly } or something.

Also, since this creates a new hashmap, I'd suggest the name fn to_proto instead of as_proto. The as prefix sounds like it's getting a reference and thus would be cheap.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems clippy is not happy about the to_proto with &mut self:

error: methods with the following characteristics: (`to_*` and `self` type is not `Copy`) usually take `self` by reference
   --> console-subscriber/src/aggregator.rs:823:17
    |
823 |     fn to_proto(&mut self, include: Include) -> HashMap<u64, T::Result>
    |                 ^^^^^^^^^
    |
    = note: `-D clippy::wrong-self-convention` implied by `-D warnings`
    = help: consider choosing a less ambiguous name
    = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#wrong_self_convention

Comment on lines 356 to 362
} else if meta.name() == "runtime.resource" {
self.resource_callsites.insert(meta);
} else if meta.name() == "runtime.async_op" {
self.async_op_callsites.insert(meta);
} else if meta.target() == "tokio::resource::op" {
self.resource_op_callsites.insert(meta);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there likely more than 32 ops in Tokio? That doesn't sound crazy... Perhaps Callsites could get a const generic for the size of the array?

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>

fn record_str(&mut self, field: &tracing_core::Field, value: &str) {
let extract_partial_attr = || -> Option<MatchPart> {
let mut parts = field.name().split('_');
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally, we would want to use . instead of _ but the tracing macro had some problem and did not allow me to emit events that have fields that are bot valid and invalid rust identifiers (the invalid ones being quoted "")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, can you provide details on the issues with the tracing macro? that seems like a bug we should fix...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will put together an example shortly

@seanmonstar
Copy link
Member

@hawkw did you want to take a look? It's not required, I'm happy to merge this, I think the direction is largely right, and we can address things in future patches.

@hawkw
Copy link
Member

hawkw commented Aug 13, 2021

@hawkw did you want to take a look? It's not required, I'm happy to merge this, I think the direction is largely right, and we can address things in future patches.

I do really want to finish giving this a review before it's merged. Sorry, I've been busy; will try to look at it today. If I don't get through it by EOD we can just merge and roll forward.

Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, sorry this took me such a long time to review --- there's a lot of code in this change, so I wanted to read over it thoroughly.

i commented on a bunch of minor style nits. most of those are not hard blockers, but since this is such a large change, it would be nice to simplify the new code where possible. but, feel free to ignore those suggestions, or address them in follow-up branches when this lands.

the main things i want to address before this merges are:

  1. i had a number of questions about the new proto types, and i want to make sure i completely understand them before this merges --- either through talking about them in the PR discussion or (preferably) by adding new comments to the proto definitions.
  2. the way numeric values associated with resources are sent over the wire as diffs feels unnecessarily complex (provided that i'm understanding it correctly?). can we simplify this at all? i understand that the events being recorded don't have a way of tracking the previous data and summing them, but it would be better, IMO, not to include this in the wire proto. instead, can we make the aggregator handle adding/subtracting/replacing tracked attribute values and only send the current value on the wire? that would be more consistent with the way we record all other numeric values.
  3. in general, anything we can simplify or anywhere we can reuse existing code (such as for field names) would be great.

console-subscriber/examples/dump.rs Show resolved Hide resolved
first_poll: Option<SystemTime>,
last_poll_started: Option<SystemTime>,
last_poll_ended: Option<SystemTime>,
busy_time: Duration,
}

// Represent static data for resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Represent static data for resources
/// Represents static data for resources

Comment on lines 94 to 95
}
struct PollStats {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer whitespace between definitions

Suggested change
}
struct PollStats {
}
struct PollStats {

Comment on lines 96 to 97
current_polls: u64,
created_at: Option<SystemTime>,
polls: u64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between current_polls and polls? I would probably add a comment to this, since it's not immediately obvious

@@ -1,7 +1,10 @@
use crate::WatchRequest;
use crate::{AttributeUpdate, WatchRequest};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is an unimportant nitpick, but...wow, there's a lot of code in this file now. I might want to consider breaking some of this up into separate modules. but, we can do that in a follow-up branch.

let attributes = self
.attributes
.values()
.cloned()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/tioli: i think we should be able to remove the cloned() here, since to_proto takes a reference:

Suggested change
.cloned()

fn serialize_histogram(histogram: &Histogram<u64>) -> Result<Vec<u8>, V2SerializeError> {
let mut serializer = V2Serializer::new();
let mut buf = Vec::new();
serializer.serialize(histogram, &mut buf)?;
Ok(buf)
}

fn total_time(created_at: Option<&SystemTime>, closed_at: Option<&SystemTime>) -> Option<Duration> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/tioli: SystemTime implements Copy, so it's probably more important to pass these by value, rather than by reference:

Suggested change
fn total_time(created_at: Option<&SystemTime>, closed_at: Option<&SystemTime>) -> Option<Duration> {
fn total_time(created_at: Option<SystemTime>, closed_at: Option<SystemTime>) -> Option<Duration> {

fn serialize_histogram(histogram: &Histogram<u64>) -> Result<Vec<u8>, V2SerializeError> {
let mut serializer = V2Serializer::new();
let mut buf = Vec::new();
serializer.serialize(histogram, &mut buf)?;
Ok(buf)
}

fn total_time(created_at: Option<&SystemTime>, closed_at: Option<&SystemTime>) -> Option<Duration> {
closed_at.and_then(|end| created_at.and_then(|start| end.duration_since(*start).ok()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, take it or leave it: this might be clearer, what do you think?

Suggested change
closed_at.and_then(|end| created_at.and_then(|start| end.duration_since(*start).ok()))
let end = closed_at?;
let start = created_at?;
end.duration_since(start).ok()

tasks.len = entities_len_1,
stats.dropped = dropped_stats,
stats.tasks = stats_len_1,
"dropped closed entities"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/tioli: should maybe also be

Suggested change
"dropped closed entities"
"dropped closed {}",
std::any::type_name::<R>(),

tracing::trace!(
entities.len = entities_len_1,
stats.len = stats_len_1,
"no closed entities were droppable"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"no closed entities were droppable"
"no closed {} were droppable",
std::any::type_name::<T>(),

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, we could put something like this at the top of the method:

let _span = tracing::debug_span!(
    "drop_closed",
    entity = %std::any::type_name::<T>(),
    stats = %std::any::type_name::<R>(),
).entered();

and then not have to add the type names to the individual trace events.

up to you!

@seanmonstar seanmonstar added this to the 0.1 milestone Aug 13, 2021
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
@zaharidichev zaharidichev requested a review from hawkw August 24, 2021 13:47
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking very close to ready. I had some additional comments.

The main things I think are important to address before merging this are:

  • my comment about potentially simplifying the way StateUpdate attributes are recorded as tracing fields
  • using targets rather than event names to recognize events (since tracing doesn't currently allow overriding the name of an event)

The remaining suggestions are mostly style nits or otherwise not blockers.

@@ -188,3 +189,31 @@ impl From<&dyn std::fmt::Debug> for field::Value {
field::Value::DebugVal(format!("{:?}", val))
}
}

#[allow(clippy::derive_hash_xor_eq)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is required because prost generates a struct with a PartialEq impl, but we can't add derive(Hash) to that struct. It might be worth adding a comment explaining why we allow this lint:

Suggested change
#[allow(clippy::derive_hash_xor_eq)]
// Clippy warns when a type derives `PartialEq` but has a manual `Hash` impl,
// or vice versa. However, this is unavoidable here, because `prost` generates
// a struct with `#[derive(PartialEq)]`, but we cannot add`#[derive(Hash)]` to the
// generated code.
#[allow(clippy::derive_hash_xor_eq)]

@@ -25,4 +24,4 @@ serde_json = "1"
[dev-dependencies]

tokio = { version = "^1.7", features = ["full", "rt-multi-thread"]}
futures = "0.3"
futures = "0.3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: unrelated trailing whitespace change?

Suggested change
futures = "0.3"
futures = "0.3"

Comment on lines 6 to 12
impl<T> Default for IdData<T> {
fn default() -> Self {
IdData {
data: ShrinkMap::<Id, (T, bool)>::new(),
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: in general, we tend to put type definitions above method implementations...can we move the Default impl below the // === impl IdData === comment on line 25?

/// *New* PollOp events that whave occured since the last update
///
/// This is emptied on every state update.
new_poll_ops: ShrinkVec<proto::resources::PollOp>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because this is emptied every time we send an update, this really doesn't need to be a ShrinkMap. every time we send an update, we will replace the vector with a new empty Vec, so it doesn't grow constantly over the entire lifetime of the program.

This is why new_metadata is a regular Vec rather than a ShrinkMap.

/// This is emptied on every state update.
new_poll_ops: ShrinkVec<proto::resources::PollOp>,

task_ids: Ids,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: not a blocker, but I have a slight preference for grouping this with the other task related fields, so that the struct is ordered like

Aggregator {
   ...
   tasks,
   task_stats,
   task_ids,
   resources,
   resource_stats,
   async_ops,
   async_op_stats,
  ...
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, sorry I missed renaming this. I will rename this to just ids because this is used now for not just task ids but also resource and async op ids

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I would probably give those separate lists of IDs, but we can figure out whether we want to do that or not later...

Comment on lines 99 to 105
/// tracing::trace!(
/// target: "tokio::resource::state_update",
/// duration = "attribute_name",
/// value = 10,
/// unit = "ms",
/// op = "ovr",
/// );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way that state updates are recorded currently feels a little overly complex to me. We are looking for four separate fields named attribute_name, value, op, and unit, but tracing already has a mechanism for representing key-value pairs. Why not simply look for fields named unit and op`, and then treat any remaining field as the value, and use its name as the name of the attribute?

We could potentially even do something a bit fancier to allow a single event to update multiple resource states, if we looked for clusters of fields named <NAME>, <NAME>.unit, and <NAME>.op? That way, we could record one event like

tracing::trace!(
    target: "runtime::resource::state_update", 
    read = 256, read.unit = "bytes", read.op = "add",
    to_write = 1024, to_write.unit = "bytes", to_write.op = "sub",
);

or similar?

This might be slightly overengineered, but it seems like it would be nice to reuse the existing mechanism in tracing, and in the wire proto, for representing field names. In particular, if we do it this way, we can just use the existing proto format for representing field names as metadata indices, instead of having to add a new concept of "attributes", which are like tracing fields but also not the same...

Comment on lines 338 to 342
fn record_bool(&mut self, field: &field::Field, value: bool) {
if field.name() == Self::OP_STATE_FIELD_TYPE_VALUE {
self.val = Some(PbFieldValue::BoolVal(value));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the value is a bool, we should assume the op is "overwrite", correct? since "add" and "sub" don't really make sense for booleans. perhaps we should only require an "op" field when the value is numeric? i notice we already allow recording state updates without unit fields...

}

impl Visit for StateUpdateVisitor {
fn record_debug(&mut self, _: &field::Field, _: &dyn std::fmt::Debug) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also allow recording Debug fields for state updates? I can imagine a resource wanting to include some random bit of internal data as part of its tracked state. if we support Debug here we should probably assume that there's no unit and that the op is always "overwrite", the way we do for bool values...

console-subscriber/examples/dump.rs Show resolved Hide resolved
Comment on lines 754 to 758
match stats.attributes.get_mut(&upd_key) {
Some(attr) => update_attribute(attr, update),
None => {
stats.attributes.insert(upd_key, update.into());
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: i think it might be slightly nicer if we did this using the HashMap::entry API:

Suggested change
match stats.attributes.get_mut(&upd_key) {
Some(attr) => update_attribute(attr, update),
None => {
stats.attributes.insert(upd_key, update.into());
}
stats.attributes
.entry(upd_key)
.and_modify(|attr| update_attribute(attr, update))
.or_insert_with(|| update.into());

This has the advantage of only doing a single hashmap lookup, instead of two (one for get_mut and a second one for insert if the key is not present).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that requires cloning the update as it is moved into the and_modify closure.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I believe we could alternatively still use a match with the entry API, like this:

                    match stats.attributes.entry(upd_key) {
                        Entry::Occupied(ref mut attr) => {
                            update_attribute(attr.get_mut(), update);
                        }
                        Entry::Vacant(attr) => {
                            attr.insert(update.into());
                        }
                    }

This still avoids performing a second lookup.

Obviously, not a blocker!

@hawkw hawkw changed the title resource instrumentation feat(subscriber): resource instrumentation Aug 24, 2021
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, this looks good to me, I think we can address the remaining things in subsequent branches.

thanks a lot for all the time you've spent on this, I know it was a big change. :)

Comment on lines 319 to 348
fn record_debug(&mut self, field: &field::Field, value: &dyn std::fmt::Debug) {
self.field = Some(proto::Field {
name: Some(field.name().into()),
value: Some(value.into()),
metadata_id: Some(self.meta_id.clone()),
});
}

fn record_i64(&mut self, field: &field::Field, value: i64) {
if field.name() == Self::OP_STATE_FIELD_TYPE_VALUE {
self.val = Some(PbFieldValue::I64Val(value));
}
self.field = Some(proto::Field {
name: Some(field.name().into()),
value: Some(value.into()),
metadata_id: Some(self.meta_id.clone()),
});
}

fn record_u64(&mut self, field: &field::Field, value: u64) {
if field.name() == Self::OP_STATE_FIELD_TYPE_VALUE {
self.val = Some(PbFieldValue::U64Val(value));
}
self.field = Some(proto::Field {
name: Some(field.name().into()),
value: Some(value.into()),
metadata_id: Some(self.meta_id.clone()),
});
}

fn record_bool(&mut self, field: &field::Field, value: bool) {
if field.name() == Self::OP_STATE_FIELD_TYPE_VALUE {
self.val = Some(PbFieldValue::BoolVal(value));
}
self.field = Some(proto::Field {
name: Some(field.name().into()),
value: Some(value.into()),
metadata_id: Some(self.meta_id.clone()),
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should probably all check that the field doesn't end with the unit or op suffixes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, just pushed an update

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
@hawkw hawkw merged commit f4a21ac into main Aug 26, 2021
@hawkw hawkw deleted the zd/res-schema branch August 26, 2021 15:59
hawkw pushed a commit that referenced this pull request Aug 31, 2021
Based on some [earlier feedback][1], this PR changes the proto
definition so readiness of a poll op is represented via a bool. The
relevant changes have been made to:
tokio-rs/tokio#4072

[1]: #77 (comment)

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
zaharidichev added a commit to tokio-rs/tokio that referenced this pull request Sep 22, 2021
This branch instruments the `Sleep` resource to allow the tokio-console
to consume data about resources usage. The corresponding console branch
is here: tokio-rs/console#77

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
suikammd pushed a commit to suikammd/tokio that referenced this pull request Oct 7, 2021
This branch instruments the `Sleep` resource to allow the tokio-console
to consume data about resources usage. The corresponding console branch
is here: tokio-rs/console#77

Signed-off-by: Zahari Dichev <zaharidichev@gmail.com>
hds added a commit that referenced this pull request Jul 5, 2023
The current logic present in `IdData::drop_closed` marks an item (task,
resource, and async op stats) to be dropped in the case that the item
**is** dirty and there **are** watchers: `(dirty && has_watchers)`.

This causes a case where if an item is first received and then completes
in between the aggregator push cycle, it will be discarded immediately
and never sent.

This logic has been in place since the concepts of watchers and dirty
items was introduced in #77. However since an item that is created and
then dropped within a single update cycle isn't likely to be missed in
the UI, it may never have been noticed.

Instead the logic should be to **retain** an item if **any** of the
following is true:
* there are watchers and the item is dirty: `(dirty && has_watchers)`
* item has been dropped less time than the retention period:
  `dropped_for <= retention`.
hawkw pushed a commit that referenced this pull request Aug 1, 2023
The current logic present in `IdData::drop_closed` marks an item (task,
resource, and async op stats) to be dropped in the case that the item
**is** dirty and there **are** watchers: `(dirty && has_watchers)`.

This causes a case where if an item is first received and then completes
in between the aggregator push cycle, it will be discarded immediately
and never sent.

This logic has been in place since the concepts of watchers and dirty
items was introduced in #77. However since an item that is created and
then dropped within a single update cycle isn't likely to be missed in
the UI, it may never have been noticed.

Instead the logic should be to **retain** an item if **any** of the
following is true:
* there are watchers and the item is dirty: `(dirty && has_watchers)`
* item has been dropped less time than the retention period:
  `dropped_for <= retention`.
hawkw pushed a commit that referenced this pull request Sep 29, 2023
The current logic present in `IdData::drop_closed` marks an item (task,
resource, and async op stats) to be dropped in the case that the item
**is** dirty and there **are** watchers: `(dirty && has_watchers)`.

This causes a case where if an item is first received and then completes
in between the aggregator push cycle, it will be discarded immediately
and never sent.

This logic has been in place since the concepts of watchers and dirty
items was introduced in #77. However since an item that is created and
then dropped within a single update cycle isn't likely to be missed in
the UI, it may never have been noticed.

Instead the logic should be to **retain** an item if **any** of the
following is true:
* there are watchers and the item is dirty: `(dirty && has_watchers)`
* item has been dropped less time than the retention period:
  `dropped_for <= retention`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants