-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: aggregate metrics #41
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think aggregate should be cleaned after a time period, since the struct seems small, it might be okay to do something like 90 or 180 days, maybe refer to Graphseer to see the timeframe they usually use
CREATE TABLE IF NOT EXISTS aggregates | ||
( | ||
timestamp BIGINT PRIMARY KEY, | ||
data JSONB NOT NULL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmmm so the reason why the message data get stored in jsonb was because we don't necessarily know the message fields in advance and a great deal of flexibility is required.
For aggregates, we are clear that the type is something we define within listener radio and is not dynamic. In general, separate columns is faster to query and search and easier to handle as a structured data.
would you explain your reasoning behind why jsonb is used instead of something more specific?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was faster in terms of development time but I'll switch to specific types
@@ -35,6 +40,20 @@ impl RadioContext { | |||
} | |||
} | |||
|
|||
#[derive(Serialize, SimpleObject)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also interested in additional summary like total topics covered, average message count (or some other things mentioned in the message propagation issues), but yeah some are not so easy to get from the existing code
It looks like the two HashMaps are indexed by indexer address, so having a Vec for active_indexers doesn't seem so useful when they can grab the keys from one of the maps.
I also think it might be easier to simply return the vec of IndexerStats (same thing as AggregatedIndexerStats
?) so that users can do manipulation independently with greater flexibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total topics covered
you mean all distinct ipfs hashes that Listener Radio has received messages for?
average message count
i don't quite get this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
total topics covered
you mean all distinct ipfs hashes that Listener Radio has received messages for?
yep
average message count
i don't quite get this
you have the total_message_count for each indexer already, I'm interested in the average of that across all indexers. but, as I alluded to in the previous comment, it is something the client can do once they receive the vec of indexerStats
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i agree, let's leave it to the client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i will add one for distinct ipfs hashes thought
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thanks.
What do you think about returning Vec<IndexerStats>
directly as part of Summary
instead of having to calculate total_message_count
and average_subgraph_account
? I think that allows more flexibility on the client side like calculating averages; but I can see an argument of providing that functionality to the client out-of-the-box... a bit of a tricky balance
ab4dd49
to
458aaae
Compare
458aaae
to
267dcfb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before closing this pr, would you add an issue for periodically clean up aggregates, perhaps in the same daily tick? and, I think it would be cool to have a metric/field on storing the network latency (message received time - message timestamp), perhaps that's a future feature
src/db/resolver.rs
Outdated
) -> Result<Vec<IndexerStats>, anyhow::Error> { | ||
let aggregates = sqlx::query_as!( | ||
Aggregate, | ||
"SELECT timestamp, graph_account, message_count, subgraphs_count FROM indexer_aggregates WHERE timestamp > $1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to only select graph_account, message_count, subgraphs_count
and query_as IndexerStats
, but still use the timestamp for filtering? asking to see if this will let you avoid doing conversion from Aggregate
to IndexerStats
later
@@ -35,6 +40,20 @@ impl RadioContext { | |||
} | |||
} | |||
|
|||
#[derive(Serialize, SimpleObject)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thanks.
What do you think about returning Vec<IndexerStats>
directly as part of Summary
instead of having to calculate total_message_count
and average_subgraph_account
? I think that allows more flexibility on the client side like calculating averages; but I can see an argument of providing that functionality to the client out-of-the-box... a bit of a tricky balance
267dcfb
to
f20f847
Compare
I think it's better to leave it like this for now so that it's already usable straight from the graphql endpoint playground, for instance for @PilarRod to get the aggregates when needed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aggregates
)tokio::select
to insert aggregate data once a dayOpen Questions
Should we delete the aggregates every X period? Maybe 30 days, 1 year? Open to suggestions ¯_(ツ)_/¯