feat: aggregate metrics #41

pete-eiger · 2024-03-27T14:42:23Z

add new table to db (aggregates)
add DB resolvers for adding and fetching aggregates
add tick in tokio::select to insert aggregate data once a day
add GraphQL resolver to allow for querying aggregates up to a given number of days

Open Questions
Should we delete the aggregates every X period? Maybe 30 days, 1 year? Open to suggestions ¯_(ツ)_/¯

hopeyen

I think aggregate should be cleaned after a time period, since the struct seems small, it might be okay to do something like 90 or 180 days, maybe refer to Graphseer to see the timeframe they usually use

hopeyen · 2024-03-27T15:39:31Z

migrations/20240325124203_aggregates.up.sql

+CREATE TABLE IF NOT EXISTS aggregates
+(
+    timestamp BIGINT PRIMARY KEY,
+    data JSONB NOT NULL


hmmm so the reason why the message data get stored in jsonb was because we don't necessarily know the message fields in advance and a great deal of flexibility is required.

For aggregates, we are clear that the type is something we define within listener radio and is not dynamic. In general, separate columns is faster to query and search and easier to handle as a structured data.

would you explain your reasoning behind why jsonb is used instead of something more specific?

it was faster in terms of development time but I'll switch to specific types

hopeyen · 2024-03-27T16:10:39Z

src/server/model/mod.rs

@@ -35,6 +40,20 @@ impl RadioContext {
    }
 }

+#[derive(Serialize, SimpleObject)]


I'm also interested in additional summary like total topics covered, average message count (or some other things mentioned in the message propagation issues), but yeah some are not so easy to get from the existing code

It looks like the two HashMaps are indexed by indexer address, so having a Vec for active_indexers doesn't seem so useful when they can grab the keys from one of the maps.

I also think it might be easier to simply return the vec of IndexerStats (same thing as AggregatedIndexerStats?) so that users can do manipulation independently with greater flexibility

total topics covered

you mean all distinct ipfs hashes that Listener Radio has received messages for?

average message count

i don't quite get this

total topics covered

you mean all distinct ipfs hashes that Listener Radio has received messages for?

yep

average message count

i don't quite get this

you have the total_message_count for each indexer already, I'm interested in the average of that across all indexers. but, as I alluded to in the previous comment, it is something the client can do once they receive the vec of indexerStats

i agree, let's leave it to the client

i will add one for distinct ipfs hashes thought

cool, thanks.
What do you think about returning Vec<IndexerStats> directly as part of Summary instead of having to calculate total_message_count and average_subgraph_account? I think that allows more flexibility on the client side like calculating averages; but I can see an argument of providing that functionality to the client out-of-the-box... a bit of a tricky balance

hopeyen

before closing this pr, would you add an issue for periodically clean up aggregates, perhaps in the same daily tick? and, I think it would be cool to have a metric/field on storing the network latency (message received time - message timestamp), perhaps that's a future feature

hopeyen · 2024-03-28T16:09:28Z

src/db/resolver.rs

+) -> Result<Vec<IndexerStats>, anyhow::Error> {
+    let aggregates = sqlx::query_as!(
+        Aggregate,
+        "SELECT timestamp, graph_account, message_count, subgraphs_count FROM indexer_aggregates WHERE timestamp > $1",


is it possible to only select graph_account, message_count, subgraphs_count and query_as IndexerStats, but still use the timestamp for filtering? asking to see if this will let you avoid doing conversion from Aggregate to IndexerStats later

hopeyen · 2024-03-28T16:27:44Z

src/server/model/mod.rs

@@ -35,6 +40,20 @@ impl RadioContext {
    }
 }

+#[derive(Serialize, SimpleObject)]


cool, thanks.
What do you think about returning Vec<IndexerStats> directly as part of Summary instead of having to calculate total_message_count and average_subgraph_account? I think that allows more flexibility on the client side like calculating averages; but I can see an argument of providing that functionality to the client out-of-the-box... a bit of a tricky balance

pete-eiger · 2024-03-29T14:31:59Z

@hopeyen

What do you think about returning Vec directly as part of Summary instead of having to calculate total_message_count and average_subgraph_account? I think that allows more flexibility on the client side like calculating averages; but I can see an argument of providing that functionality to the client out-of-the-box... a bit of a tricky balance

I think it's better to leave it like this for now so that it's already usable straight from the graphql endpoint playground, for instance for @PilarRod to get the aggregates when needed.

hopeyen

pete-eiger requested a review from hopeyen March 27, 2024 14:42

hopeyen reviewed Mar 27, 2024

View reviewed changes

pete-eiger marked this pull request as draft March 28, 2024 13:10

pete-eiger force-pushed the petko/aggregate-metrics branch from ab4dd49 to 458aaae Compare March 28, 2024 14:55

pete-eiger marked this pull request as ready for review March 28, 2024 14:55

pete-eiger requested a review from hopeyen March 28, 2024 14:55

pete-eiger force-pushed the petko/aggregate-metrics branch from 458aaae to 267dcfb Compare March 28, 2024 14:55

hopeyen reviewed Mar 28, 2024

View reviewed changes

feat: aggregate metrics

f20f847

pete-eiger force-pushed the petko/aggregate-metrics branch from 267dcfb to f20f847 Compare March 29, 2024 14:29

pete-eiger requested a review from hopeyen March 29, 2024 14:32

hopeyen approved these changes Mar 29, 2024

View reviewed changes

pete-eiger merged commit 20fdd82 into dev Mar 29, 2024
4 checks passed

pete-eiger deleted the petko/aggregate-metrics branch March 29, 2024 15:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: aggregate metrics #41

feat: aggregate metrics #41

pete-eiger commented Mar 27, 2024 •

edited

Loading

hopeyen left a comment

hopeyen Mar 27, 2024

pete-eiger Mar 27, 2024

hopeyen Mar 27, 2024

pete-eiger Mar 27, 2024 •

edited

Loading

hopeyen Mar 27, 2024

pete-eiger Mar 28, 2024

pete-eiger Mar 28, 2024 •

edited

Loading

hopeyen Mar 28, 2024

hopeyen left a comment

hopeyen Mar 28, 2024

hopeyen Mar 28, 2024

pete-eiger commented Mar 29, 2024

hopeyen left a comment

feat: aggregate metrics #41

feat: aggregate metrics #41

Conversation

pete-eiger commented Mar 27, 2024 • edited Loading

hopeyen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pete-eiger Mar 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pete-eiger Mar 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hopeyen left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pete-eiger commented Mar 29, 2024

hopeyen left a comment

Choose a reason for hiding this comment

pete-eiger commented Mar 27, 2024 •

edited

Loading

pete-eiger Mar 27, 2024 •

edited

Loading

pete-eiger Mar 28, 2024 •

edited

Loading