-
Notifications
You must be signed in to change notification settings - Fork 953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vcjob的task名称包含 - 时, task-topolopy 插件会误判而失效。 #3469
Comments
It has been fixed by #2940 :) |
改动好像有点问题,把改动合入到 1.7.0 版本后仍有问题。 打日志看到 jobNamePrefix 为 test-xytsyh2-d6d4a2e5-1ddb-4d1d-98ec-8eb97b054e63- ,有 uuid 后缀,这个 uuid 是 Job 的 uid volcano/pkg/scheduler/api/job_info.go Lines 366 to 372 in 016d215
|
Is it a vcjob or deplopyemnt? |
vcjob |
前面写错了, jobNamePrefix 为 test-xytsyh2-d6d4a2e5-1ddb-4d1d-98ec-8eb97b054e63- ,后面的 uuid 是 vcjob 的UID。 |
can you paste the whole job name and |
# kubectl get pods
NAME READY STATUS RESTARTS AGE
test-binpack-xytsyh2-task-1-0 0/1 CreateContainerError 0 9m1s
test-binpack-xytsyh2-task-2-0 0/1 CreateContainerError 0 9m1s
# kubectl get podgroup
NAME STATUS MINMEMBER RUNNINGS AGE
test-binpack-xytsyh2-d7e821cc-2e6c-4b5a-9818-f3efca39e5da Running 2 59m
#
apiVersion: batch.volcano.sh/v1alpha1
kind: Job
metadata:
creationTimestamp: "2024-05-16T03:21:00Z"
generation: 1
name: test-binpack-xytsyh2
namespace: default
resourceVersion: "1025579"
uid: d7e821cc-2e6c-4b5a-9818-f3efca39e5da
spec:
maxRetry: 3
minAvailable: 2
queue: default
schedulerName: volcano
tasks:
- maxRetry: 3
minAvailable: 1
name: task-1
replicas: 1
template:
metadata: {}
spec:
containers:
- image: alpine
imagePullPolicy: IfNotPresent
name: test
restartPolicy: OnFailure
- maxRetry: 3
minAvailable: 1
name: task-2
replicas: 1
template:
metadata: {}
spec:
containers:
- image: alpine
imagePullPolicy: IfNotPresent
name: test
restartPolicy: OnFailure
status:
conditions:
- lastTransitionTime: "2024-05-16T03:21:02Z"
status: Pending
minAvailable: 2
pending: 2
state:
lastTransitionTime: "2024-05-16T03:21:02Z"
phase: Pending
taskStatusCount:
task-1:
phase:
Pending: 1
task-2:
phase:
Pending: 1 |
/good-first-issue |
@Monokaix: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@loheagn can you take a look? |
@jichangyun Have you set affinity rules in job's annotation? |
恩,是的,就是 task-1 和 task-2 反亲和。 大概看了下,问题点应该在这里, SetPodGroup 时,会置 ji.Name = pg.Name ,而 pg.Name 是 job.Name + "-" + job.UID . volcano/pkg/scheduler/api/job_info.go Lines 366 to 372 in 016d215
私有项目就先把 topology.go 文件中的 |
ji.Name = pg.Name没有问题,本来这俩就是相同的。问题在于controller自动生成的podgroup为啥还要带一个UID,有的又没有。这个UID生成我没找到在哪,我认为没必要有。 |
So the current issue is that the full task name in |
恩,如果这两个名称是一样的,那应该没问题。 翻了下 master 分支的代码,好像是在这里。 volcano/pkg/controllers/job/job_controller_actions.go Lines 647 to 659 in 3520a0f
|
It's a bug definitely. As @jichangyun said, the name of podgroup created by vc-controller is |
/assign |
代码位置:
volcano/pkg/scheduler/plugins/task-topology/topology.go
Lines 243 to 275 in f6e0a52
问题点:
volcano/pkg/scheduler/plugins/task-topology/topology.go
Line 251 in f6e0a52
task 名称包含 - 时, tmpStrings := strings.Split(task.Name, "-") 切割的结果会非期望, 放进 taskRef 的 key 就不对了,下面 check 的时候就会报错 “task %s do not exist in job <%s/%s>”
The text was updated successfully, but these errors were encountered: