-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic suppression of flooding events #2095
Comments
My preferred approach would be first off fix apps that can flood. The more autonomous filtering we add in the flight logic the more complex managing events from the ground gets (not to mention the impacts of adding complexity to the software). I'm not opposed to putting in a very simple failsafe in EVS to avoid a flood (quarantine an app for X seconds or something) but any app that can flood is broken in my mind. It's basically eating up resources in a bad way which preferably is fixed at the source vs bandaiding downstream. SBN has historically had this issue (I'd consider it critical to fix!) |
Quarantining an app entirely for X seconds would alleviate most of our concerns w/o being too complicated. We can probably do w/o the per-event-type granularity to simplify things. |
@jphickey can you review? |
I do tend to agree that the root cause is badly designed apps that can flood in the first place. Note that EVS already has a feature to prevent circular/recursive events where processing one event causes an event to be generated again, but this is limited to the processing of that event. If the app is actually sending the event multiple times, then EVS doesn't squelch it. That being said, a simple rate limit shouldn't be too hard to do, but currently the only reference of time in EVS is the HK wakeup event. Which might be good enough - at each HK wakeup it could issue a "credit" for some number of events that each app is allowed to send. Each event actually sent decrements the credit by 1, and if this reaches 0, then the app is prevented from sending events until the next HK wakeup when it gets more credit. One would need to cap the max credit (for apps which might be silent for long periods) and also count any messages that got squelched due to this, so one knows there is a flooding problem. |
@chillfig How do I call into the CCB, and can I invite some of my colleagues?
|
Just sent you a message on MSTeams |
|
@tngo67 look out for a meeting invite about this |
@astrogeco Copy that. I will pull in additional developers from my team to weigh in. |
Filtering was the intended way to handle flooding. All applications should have registered a filter table so you can have robust behavior. I wanted this to be a requirement and was overruled. EVS could just install a default filter table for apps that don't register. That would be pretty straight forward. |
@jwilmot I've typically seen event flooding when modifying filtering, which isn't completely solved by registering more default filters. Rather than relying on specific filtering that if changed can render a system inoperable, it seems more robust to fix the app such that it won't flood even if the filters are modified. The EVS limit proposed here is that failsafe for flooding edge cases, but really an app that can nominally produce more events than the system can handle should be fixed. Otherwise modifying filters should be considered a critical (could render system inoperable) command. |
Also, maybe CFE_EVS_ENABLE_EVENT_TYPE_CC should really just be avoided for type DEBUG (until the apps that flood are fixed), in favor of using CFE_EVS_ENABLE_APP_EVENT_TYPE_CC for just the app of interest. That would at least narrow down the issue, unless the app of interest is the flooding app, but again... fix the app! |
I agree that EVS should probably have a failsafe such that no more than X (configurable) can be sent per second. That would handle misconfiguration, bad apps, and operator error as @skliper has seen. To keep it simple, X could be global across all events on a given cFE instantiation. I still like the idea of having a default filter for apps that don't register that limits to the first few and then one per 128(?) as a reminder. Apps that can flood should always have filters included and be checked at design and code walkthough reviews. |
@jwilmot - why not just fix the app to not flood with CFE_EVS_SendEvent calls vs adding autonomous filter complexity? Then the ground can manage filters however they want and it reduces the chance that a ground command to modify a filter could result in rendering the system inoperable. If an app sends an event in a tight loop it's bad whatever default filter is used. Events should be treated like tlm, make them useful/informative and don't send an unlimited amount from the app per cycle. Note all apps that send events register with EVS... but if they were to register a default filter for every event that will expand the memory requirements of EVS (some apps have 100+ events), and the calls load the system even if filtered. Eliminating the problem at the source avoids expanding the memory footprint and could reduce system loading while reducing the likelihood of inadvertently rendering the system inoperable. |
If we are going to do a "global" EVS limit, then I like @jphickey's suggestion of "credit". Could start up with enough credit to output all the startup events (like the evs log limit), but then only add X credits per cycle. It's still definitely a "fail-safe" that shouldn't be relied on since it's indiscriminate so important messages could be dropped, or we could still allow ERROR/CRITICAL events or have different limits. This would allow for a short term 0-loss... but still keep the system operable. |
Fix #2095, Automatic suppression of flooding events
Is your feature request related to a problem? Please describe.
We've observed that overly verbose event messages can take down a processor, especially if the generation of events can induce the production of more events due to increased SB/SBNg traffic. Some ability to rate-limit events regardless
of filter state seems to be desirable, as the default for many apps is often unfiltered, and during an event storm it may not be possible to command an event to be filtered.
In addition, there are concerns about critical event messages being lost amongst a flood of less critical events, especially when not commandable due to being in a temporary loss-of-signal state.
There's also the desire to automatically re-enable an event if it is no longer flooding, and the proposed solution would have that characteristic. This is important if the event notifying that the filter has been activated gets lost due to flooding.
Events could still be permanently suppressed w/ the existing binary filters.
Describe the solution you'd like
A filter that is applied per-app, per event type (priority). A token counter for (app, event type) is incremented every time an event message arrives for (app, event type). Meanwhile, the token counter is decremented at a fixed rate.
If the token counter exceeds a certain threshold, incrementing stops and all incoming events for that (app, event type) are discarded until the counter is below a threshold (could be the same as the suppress threshold, then it'll
effectively rate limit after allowing a small burst).
Describe alternatives you've considered
To simplify the implementation, this could be done globally for all events instead of on a per-app basis. There would be the problem of a misbehaved app flooding out another app sending more critical event messages, but this could
be mitigated somewhat by doing this per-event type or prioritizing by event type.
A more granular solution would filter only excessively verbose events by (app event ID) but this would probably require registration of all events and enforcement. It appears that not all apps register all events w/ EVS (i.e. SBNg).
Another solution that only addresses the auto-reenabling of formerly flooding events would be to add another filter type besides binary filter that uses the above mentioned token scheme. This doesn't help too much since the filter
type is set by the app during registration, if events are registered at all.
Additional context
Initial driver was SBNg doesn't come w/ default filters for floodable events, and one of our other apps also sends off a lot of events on startup. Obvious solution is to modify these apps to setup default filters, but this doesn't address reenabling the events automatically when the transients go away, nor does it address unexpectedly flooding messages due to unforeseen edge cases.
Requester Info
John N Pham, Northrop Grumman
The text was updated successfully, but these errors were encountered: