-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter cache update reads to only read each entry once, when an entry has two events in the same refresh cycle. #5349
Comments
yeah my initial thoughts was just like if I have
and it's time to look up missed events, then first thing that happens (assuming something like #5342 is addressed):
which returns back, if successful, a struct essentially equivalent to the
we de-dupe this, but I was thinking into something like an inverse of the original map:
next we query the agent table for each key in the query map above. |
I think this might already be handled. There's a
|
@sorindumitru I wish it was already deduplicated, but there will be a small code change to deduplicate the query. What you are seeing is that "change event" being deduplicated. The issue described above is when two different change events refer to the same EntryID, the EntryID should be fetched only once. This will involve a map, and a second loop. We process the items once (as above) to see if we skip items, and to add them to a fetching map, instead of fetching the directly. Then we fetch from the fetching map, which will be keyed by the EntryID (not the EventID) which will de-duplicate the fetching. I'm working on code for this now, and will have a PR ready soon. |
Isn't it done by, entry id at the moment? So if the same entry id is seen twice in an update we'll only fetch the entry once from the database, see for example
|
This limits the solution to full polling as before, but still preserves the work that would close spiffe#5349. Signed-off-by: Edwin Buck <edwbuck@gmail.com>
I agree, @sorindumitru. Seems that we're already deduplicating the handling of events for a given entry/agent. It is however, scoped to only events retrieved from normal event polling (e.g. the List*Events functions). It is possible that a given entry/agent ends up getting polled twice in the same cycle if there is both a "skipped event" successfully polled, and one or more regular events for that entry/agent, though I'd expect that to not be something that happens frequently. |
Seen events only handled the "before the first" detected event. It never handled other areas of event updates. It existed primarily so the selection range in the "before the first" events could be filtered for "before the first" events that had already been processed. Besides, the events aren't the problem, it is the Entries and Nodes that are to be fetched that needed de-duplicate. The title has a typo, as one doesn't get any event twice, they get two different events that reference the same Entry or Node, where the Entry or Node was processed "twice". |
This is a small optimization of the db event framework for a specific scenario.
When an entry is modified twice within a polling cycle, we currently handle each modification (event) independently. This means that the cache updating read of the entry will read the entry twice, once for the first update and once for the second update.
As the first update's read would likely read the record as it exists after the second update had been applied, an optimization of reading the entry once, instead of twice, should provide the same outcomes, with one less record combing back in the database request.
For this to occur, we would have to take the list of the entries to be updated, and deduplicate them. A number of relatively fast methods exist to do this, so the cost to do this might be smaller than the cost of returning a duplicate record. However, if we ever want fidelity on "count of events processed" tracking to "count of cache updates" this optimization will break 1 to 1 event tracing.
The text was updated successfully, but these errors were encountered: