[processor/interval] Persistence backed interval processor with support for multiple aggregation intervals #33949

lahsivjar · 2024-07-08T14:10:34Z

Component(s)

processor/interval

Is your feature request related to a problem? Please describe.

The current version of the interval processor does in-memory aggregations for a single given interval (more than one intervals can be added with extra-overhead as new processors). The in-memory aggregations can be lost due to crashes and can also cause high memory usage for high cardinality aggregations.

Describe the solution you'd like

I propose database-backed aggregation supporting more than one intervals. I have a working PoC for this here, it is rough around the edges and will require some work before it can be contributed but it demonstrates the proposed idea. The PoC uses pebble-db, an Log-Structured-Merge (LSM) database and provides option to aggregate over multiple intervals.

The currently linked PoC uses only the interval as a key in the database which would require us to load the full aggregated interval in-memory, however, we could provide granular keys or even simple hash-based partitioning on resource IDs to reduce memory requirements even further.

Describe alternatives you've considered

No response

Additional context

I would love to hear what the maintainers of the interval-processor think about the proposal. If the maintainers are on-board the idea, I can clean-up the PoC, get it to a feature parity with the current interval processor, and create a PR.

github-actions · 2024-07-08T14:14:21Z

Pinging code owners:

processor/interval: @RichieSams @sh0rez @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

djaglowski · 2024-07-08T16:56:20Z

We have an established notion of a storage extension which allows components to persist data via an interface. Users can then choose the implementation of the interface (i.e. the type of storage extension) that meets their needs.

The use case you're describing seems reasonable but can we please first consider whether it could work with a storage extension? There are several examples of components using the extension - this might be a relatively straightforward one.

lahsivjar · 2024-07-08T17:51:05Z

@djaglowski Thanks for looking over the issue.

We have an established notion of a storage extension which allows components to persist data via an interface

The current storage extensions would not be efficient for aggregations. The best we could do with the current storage extension would be to store each resource as a key-value pair and do something like update on write manually. For example, when we get something stored on the same key then we decode the data, perform merge, and write it back to the DB. Pebble, OTOH, can do periodic compactions and perform merges (aggregations) on compactions.

I think if we can implement pebble as a new storage extension then we can use it with aggregations efficiently. WDYT, does it make sense?

djaglowski · 2024-07-08T18:59:10Z

I'm not opposed to a new pebble extension (though I don't have capacity to help make it happen). What I'm most curious about is whether the storage extension really needs to be used so aggressively. In other words, can we maintain aggregations in memory and periodically write the overall state to the storage extension (e.g. once per second)?

lahsivjar · 2024-07-09T09:33:25Z

I'm not opposed to a new pebble extension (though I don't have capacity to help make it happen).

Got it. I can do the heavy lifting here once we have a general direction, if that would work for you.

What I'm most curious about is whether the storage extension really needs to be used so aggressively. In other words, can we maintain aggregations in memory and periodically write the overall state to the storage extension (e.g. once per second)?

No, we don't need to use it so aggressively as to hit the database for each value we get. I did a PoC for using pebble for aggregations and used a size-based batch buffer of 16MBs (ref) to keep the data in-memory before flushing it to the actual DB. In addition, before harvesting the period when it is mature we would flush all in-memory data to db.

One of the advantages of using something like pebble over a generic key-value DB would be that it would make writes fairly straightforward, without a lot of computational overhead involved in merging.

RichieSams · 2024-07-09T13:30:55Z

What I'm most curious about is whether the storage extension really needs to be used so aggressively. In other words, can we maintain aggregations in memory and periodically write the overall state to the storage extension (e.g. once per second)?

I had the same thought. Would it be fine to only store to the DB every X seconds or so, so we don't have to suffer the read, deserialize, write roundtrip for every operation

djaglowski · 2024-07-09T13:36:48Z

I'm not opposed to a new pebble extension (though I don't have capacity to help make it happen).

Got it. I can do the heavy lifting here once we have a general direction, if that would work for you.

To clarify, I don't have capacity to sponsor (review and ultimately be responsible for) another component. It's still worth proposing formally though since you may find another sponsor. Otherwise you can always host it outside of contrib and people can pull it into their builds.

lahsivjar · 2024-07-09T14:18:09Z

I had the same thought. Would it be fine to only store to the DB every X seconds or so, so we don't have to suffer the read, deserialize, write roundtrip for every operation

Hmm, I understand the point now. In the PoC I posted I am still serializing the data and writing it to an in-memory batch, however, we can save the whole serialization/deserialization bit by performing some merges in-memory and then serializing the partial aggregate. This makes sense to me and we could definitely do it.

To clarify, I don't have capacity to sponsor (review and ultimately be responsible for) another component. It's still worth proposing formally though since you may find another sponsor

Got it. Thanks for the clarification.

lahsivjar · 2024-07-09T14:42:26Z

so we don't have to suffer the read, deserialize, write roundtrip for every operation

A clarification to this, with pebble, in the hot path (theConsumeMetrics step) we would not read and deserialize but rather just perform serializaton followed by write. The new metric will be written as a merge operation. The deserialize, actual merge, and write will be performed during compaction or when we harvest the aggregated data.

lahsivjar · 2024-07-11T13:44:07Z

An update, I am working on building a pebble-based storage extension as per @djaglowski 's suggestion earlier. The existing storage extension API is basic and using it directly would be a bit detrimental for the performance as it has: a) no range keys support b) keys are always string. I will know more once I have made some progress with the code.

github-actions · 2024-09-10T03:32:33Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/interval: @RichieSams @sh0rez @djaglowski

See Adding Labels via Comments if you do not have permissions to add labels yourself.

lahsivjar · 2024-09-10T09:04:35Z

I have been a bit away from this topic but I am working on this again. Looking into the storage extension, I found the current interface for the extension lacking support for the use-case for using pebble for aggregation. If we were to use the storage extension in the current state (with the support for simple get/put/batch) then the performance would be quite bad as we require range operations on keys.

I am currently working on getting some benchmarks comparing the pebble approach with the current interval processor to get some data.

github-actions · 2024-12-11T03:38:48Z

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

processor/interval: @RichieSams @sh0rez

See Adding Labels via Comments if you do not have permissions to add labels yourself.

lahsivjar added enhancement New feature or request needs triage New item requiring triage labels Jul 8, 2024

lahsivjar changed the title ~~Persistence backed interval processor with support for multiple aggregation intervals~~ [processor/interval] Persistence backed interval processor with support for multiple aggregation intervals Jul 8, 2024

github-actions bot added the processor/interval label Jul 8, 2024

github-actions bot mentioned this issue Jul 9, 2024

Weekly Report: 2024-07-02 - 2024-07-09 #33962

Closed

This was referenced Jul 16, 2024

Weekly Report: 2024-07-09 - 2024-07-16 #34087

Closed

Weekly Report: 2024-07-16 - 2024-07-23 #34202

Closed

This was referenced Jul 30, 2024

Weekly Report: 2024-07-23 - 2024-07-30 #34301

Closed

Weekly Report: 2024-07-30 - 2024-08-06 #34410

Closed

This was referenced Aug 13, 2024

Weekly Report: 2024-08-06 - 2024-08-13 #34626

Closed

Weekly Report: 2024-08-13 - 2024-08-20 #34743

Closed

This was referenced Aug 27, 2024

Weekly Report: 2024-08-20 - 2024-08-27 #34856

Closed

Weekly Report: 2024-08-27 - 2024-09-03 #34966

Closed

lahsivjar mentioned this issue Sep 3, 2024

REQUEST: New membership for lahsivjar open-telemetry/community#2325

Closed

6 tasks

github-actions bot mentioned this issue Sep 10, 2024

Weekly Report: 2024-09-03 - 2024-09-10 #35086

Closed

github-actions bot added the Stale label Sep 10, 2024

github-actions bot removed the Stale label Sep 11, 2024

github-actions bot mentioned this issue Sep 17, 2024

Weekly Report: 2024-09-10 - 2024-09-17 #35228

Closed

This was referenced Sep 24, 2024

Weekly Report: 2024-09-17 - 2024-09-24 #35377

Closed

Weekly Report: 2024-09-24 - 2024-10-01 #35498

Closed

github-actions bot mentioned this issue Oct 8, 2024

Weekly Report: 2024-10-01 - 2024-10-08 #35659

Closed

dashpole added needs triage New item requiring triage and removed needs triage New item requiring triage labels Oct 9, 2024

atoulme removed the needs triage New item requiring triage label Oct 12, 2024

github-actions bot added the Stale label Dec 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[processor/interval] Persistence backed interval processor with support for multiple aggregation intervals #33949

[processor/interval] Persistence backed interval processor with support for multiple aggregation intervals #33949

lahsivjar commented Jul 8, 2024 •

edited

Loading

github-actions bot commented Jul 8, 2024

djaglowski commented Jul 8, 2024

lahsivjar commented Jul 8, 2024 •

edited

Loading

djaglowski commented Jul 8, 2024

lahsivjar commented Jul 9, 2024 •

edited

Loading

RichieSams commented Jul 9, 2024

djaglowski commented Jul 9, 2024

lahsivjar commented Jul 9, 2024

lahsivjar commented Jul 9, 2024

lahsivjar commented Jul 11, 2024

github-actions bot commented Sep 10, 2024

lahsivjar commented Sep 10, 2024

github-actions bot commented Dec 11, 2024

[processor/interval] Persistence backed interval processor with support for multiple aggregation intervals #33949

[processor/interval] Persistence backed interval processor with support for multiple aggregation intervals #33949

Comments

lahsivjar commented Jul 8, 2024 • edited Loading

Component(s)

Is your feature request related to a problem? Please describe.

Describe the solution you'd like

Describe alternatives you've considered

Additional context

github-actions bot commented Jul 8, 2024

djaglowski commented Jul 8, 2024

lahsivjar commented Jul 8, 2024 • edited Loading

djaglowski commented Jul 8, 2024

lahsivjar commented Jul 9, 2024 • edited Loading

RichieSams commented Jul 9, 2024

djaglowski commented Jul 9, 2024

lahsivjar commented Jul 9, 2024

lahsivjar commented Jul 9, 2024

lahsivjar commented Jul 11, 2024

github-actions bot commented Sep 10, 2024

lahsivjar commented Sep 10, 2024

github-actions bot commented Dec 11, 2024

lahsivjar commented Jul 8, 2024 •

edited

Loading

lahsivjar commented Jul 8, 2024 •

edited

Loading

lahsivjar commented Jul 9, 2024 •

edited

Loading