-
Notifications
You must be signed in to change notification settings - Fork 202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race Condition in CFE_AppCreate() function #279
Comments
Imported from trac issue 248. Created by jphickey on 2018-09-18T16:40:13, last modified: 2019-07-03T13:15:28 |
Trac comment by jphickey on 2018-09-18 16:49:51: Historical notes At the time this bug was described, there was an existing enhancement ticket (#173) to change the overall app loading architecture. The original plan was to simply address this race condition as part of the larger overall changeset. Further discussion on this topic revealed a few issues with that original plan:
It was therefore determined at the 2018-09-18 CCB that this bug needs its own solution, separate from any refactoring of the app loading logic. |
Trac comment by jphickey on 2018-09-18 17:06:09: Proposed Fixes for this issue ticket The current thinking is to examine all cases where CFE ES manages its internal tables using a single We might have a problem any time this general pattern is employed:
The immediate issue is that anything accessing the same table between 5-7 (such as the scan function in this case) it may misinterpret this to be a normal entry. The work item here is to examine all tables in CFE which employ a
|
Trac comment by sseeger on 2018-10-11 14:42:25: I have two proposals to fix this: The easy way is to set StateRecord.AppState = CFE_ES_AppStateRUNNING when RecordUsed is set to TRUE. This will prevent CFE_ES_ScanAppTable() from processing it. We would then set the state back to CFE_ES_AppStateERROR if an error occurs and record is freed. A cleaner approach is to get rid of RecordUsed. We could create another state CFE_ES_AppStateUNUSED. This fixes the problem and frees up memory. |
Trac comment by cdknight on 2018-10-23 14:23:19: Related: #278 |
Trac comment by sseeger on 2018-10-30 13:17:25: The AppTable fix is in [changeset:90c30887e1ce6d2c75762a2096aee7ed6d0efd01] |
Trac comment by jhageman on 2019-02-25 15:41:41: 2018/11/13 CCB: Mentioned this ticket was ready for review. If so please update status to "work complete". |
Trac comment by sseeger on 2019-02-27 08:59:04: The current candidate for a fix is changeset:4fc775a16f8c6c2e8e049e995d3e6627aec2795c |
Trac comment by jhageman on 2019-03-15 16:59:32: Needs rebase to development branch and squash of commits prior to CCB code review. |
Trac comment by jhageman on 2019-03-25 10:27:33: Rebased, see [changeset:ee218c7] for review. |
Trac comment by jphickey on 2019-03-25 11:19:08: I have reviewed the rebased commit and I think this still needs some additional cleanup.
I'll submit a commit that has my suggestions to illustrate what I'm talking about. |
Trac comment by jphickey on 2019-03-25 16:16:58: I have pushed commit [changeset:47014d5f] that has what I'm talking about. Note this is based on development, not the previous changeset. In reviewing this I did notice that |
Trac comment by jphickey on 2019-03-25 17:02:40: Note: Submitted issue #295 about CFE_ES_ScanAppTable. |
Trac comment by jhageman on 2019-03-27 12:40:17: Code reviewed in CCB, 3/27/19 |
Trac comment by jphickey on 2019-03-27 14:24:46: Summary of some follow-on discussion:
Additional follow-on tickets #299 and #300 were submitted to fix these cases, and a task to perform a more thorough check of the code base prior to the next release. |
Trac comment by jhageman on 2019-04-01 11:30:33: CCB 3/27/19 - approved for merge to IC branch |
Trac comment by jphickey on 2019-04-01 11:37:10: This is merged to the |
Trac comment by jhageman on 2019-04-18 17:03:23: Last I heard this is still pending a unit test update, is that the current status? |
Trac comment by jphickey on 2019-04-19 08:56:26: I updated the unit tests. Unit test updates in commit: [changeset:d17f439] Bamboo build results: https://babelfish.arc.nasa.gov/bamboo/browse/CFS-CFSCFE141-26 |
Trac comment by jhageman on 2019-04-24 12:22:19: Approved for merge per CCB 4/24/2019 |
Trac comment by jphickey on 2019-04-24 13:15:49: Per CCB approval. |
Trac comment by jphickey on 2019-04-24 13:16:39: Merged to "development" branch |
Trac comment by jhageman on 2019-07-03 13:15:28: Moved closed tickets targeting 6.6.1 to 6.7.0 end of summer release. |
This has been split to a separate ticket from #173.
Per email from Preston Faiks on 2018-06-04, there is an actual observed race condition issue with
CFE_ES_AppCreate()
out in the field:When ES is loading and starting apps, one app might fail initialization and call CFE_ES_ExitApp() If that occurs, its app state will be set to CFE_ES_APP_STATE_STOPPED.
When apps are scanned, it will be removed from the app table and that table entry set to not in use.
As ES continues to load apps, it will make use on the now unused app table entry. It will not change the app state in the entry until it has successfully loaded the app into memory.
The process of loading an app into memory can cause the task to pend on file system (or network file system) and allow other tasks to run.
As that app continues to be loaded, another app scan can occur and detect the app entry as both in use and stopped, and will unload it.
When ES finishes loading the app, it will spawn a task at an entry point which was just unloaded by the scanning task, causing it to execute from unloaded memory and crash.
I have reviewed this code again and the race condition risk described is definitely still present in the current development branch, but this isn't the only example. There are other similar race conditions that are possible regarding the use if the
RecordUsed
boolean field.Having an observed failure should escalate this in priority now.
The text was updated successfully, but these errors were encountered: