You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.
Sometimes, etcd data could be broken, e.g. DRI deletes some frameworks manually, or the data are all lost. For most running jobs, their requestSynced=true. So the database controller assumes they are all synchronized with api server and won't check them any more. If their records are actually deleted in api server, these jobs will be in Running status forever. In addition, api server and database will be out of sync for these jobs.
Workaround & suggestion:
In most cases, admin/DRI should not touch framework data in API server directly. Any add/update/delete should use rest-server. If admin leaves the etcd data untouched, this issue can be avoided.
However, if this issue happens, the workaround is that:
Set these jobs' requestSynced to requestSynced=false.
UPDATE frameworks SET "requestSynced"=false WHERE <please select the jobs>
If all the data are lost in etcd, use the following SQL sentence:
UPDATE frameworks SET "requestSynced"=false WHERE "requestSynced"=true and "apiServerDeleted"=false and "subState" != 'Completed'
Possible solutions for this problem:
Provide a recover-from-database mode. If admin loses all data, he/she can manually turn this mode on.
In this mode, we do UPDATE frameworks SET "requestSynced"=false WHERE "requestSynced"=true and "apiServerDeleted"=false and "subState" != 'Completed' for the user.
When framework watcher starts, it lists all framework objects from api server. We can compare them with the frameworks in database. If we find there is any framework satifies: 1. apiServerDeleted=false 2. requestSynced=true 3. state!=Completed 4. Records in database and api server are different, or the api server record is missing, we can set its requestSynced=false.
Do 2 periodically in database poller. Pro: we can handle this issue during normal time Cons: bring overhead
The text was updated successfully, but these errors were encountered:
If a job is completed, someone updates its spec. It will cause it re-created in the api server.
It is caused by the short-cut in merge writer.
Currently, this problem is minor. Because rest-server can only update one field in job spec: set spec.executionType = 'Stop'. This will only cause the job to be stopped and deleted in api server.
We can: 1. Reject job spec modifying request after a job is completed 2. Or we can accept the request, but not sync it to api server.
Issue Description:
Sometimes, etcd data could be broken, e.g. DRI deletes some frameworks manually, or the data are all lost. For most running jobs, their
requestSynced=true
. So the database controller assumes they are all synchronized with api server and won't check them any more. If their records are actually deleted in api server, these jobs will be inRunning
status forever. In addition, api server and database will be out of sync for these jobs.Workaround & suggestion:
In most cases, admin/DRI should not touch framework data in API server directly. Any add/update/delete should use rest-server. If admin leaves the etcd data untouched, this issue can be avoided.
However, if this issue happens, the workaround is that:
requestSynced
torequestSynced=false
.If all the data are lost in etcd, use the following SQL sentence:
Possible solutions for this problem:
Provide a
recover-from-database
mode. If admin loses all data, he/she can manually turn this mode on.In this mode, we do
UPDATE frameworks SET "requestSynced"=false WHERE "requestSynced"=true and "apiServerDeleted"=false and "subState" != 'Completed'
for the user.When framework watcher starts, it lists all framework objects from api server. We can compare them with the frameworks in database. If we find there is any framework satifies: 1.
apiServerDeleted=false
2.requestSynced=true
3.state!=Completed
4. Records in database and api server are different, or the api server record is missing, we can set itsrequestSynced=false
.Do 2 periodically in database poller. Pro: we can handle this issue during normal time Cons: bring overhead
The text was updated successfully, but these errors were encountered: