Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked #679

Merged
merged 1 commit into from
Aug 9, 2024

Conversation

tacslon
Copy link
Contributor

@tacslon tacslon commented Aug 3, 2024

What type of PR is this?

/kind bug
What this PR does / why we need it:
Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked
Which issue(s) this PR fixes:
Fixes #591 #665

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

@tacslon tacslon force-pushed the fix/xdp-link branch 3 times, most recently from 7e97917 to 702c498 Compare August 6, 2024 07:54
@tacslon
Copy link
Contributor Author

tacslon commented Aug 6, 2024

The following code can compare if XDP program has any difference with the previous one, but if we always use the previous XDP program, duplicate maps do exist. So for now, always update XDP programs linked at pod's NIC driver.

	// If kmesh restarts, prevProgInfo is not nil
	prevProg, prevProgInfo, _ := utils.GetProgAndInfoByName(constants.XDP_PROG_NAME)
	log.Infof("prevProgInfo: %+v", prevProgInfo)
	progTag, _ := spec.Programs[constants.XDP_PROG_NAME].Tag()
	log.Infof("progTag: %v", progTag)
	if prevProgInfo != nil && prevProgInfo.Tag != "" && prevProgInfo.Tag == progTag {
		// Same XDP prog, just assign, do not load new prog, but use new maps to prevent duplicate maps
		xa.XdpShutdown = prevProg
		if xa.MapOfAuth, err = utils.GetMapByName(constants.XDP_MAP_NAME); err != nil {
			return nil, err
		}
		log.Infof("MapOfAuth: %+v", xa.MapOfAuth)
	} else {
		// New prog, do load and assignment
		if err = spec.LoadAndAssign(&xa.KmeshXDPAuthObjects, &opts); err != nil {
			return nil, err
		}
	}

For example, map id 3595,3583,3584,3588,3602 are all previous XDP program's maps, and they are duplicate.

1778: xdp  name xdp_shutdown  tag a9eca76e9384b59d  gpl
        loaded_at 2024-08-06T21:36:43+0800  uid 0
        xlated 1728B  jited 984B  memlock 4096B  map_ids 3595,3583,3584,3588,3602
        btf_id 2248

@hzxuzhonghu
Copy link
Member

but if we always use the previous XDP program, duplicate maps do exist

Where do we create the duplicate maps again?

@tacslon
Copy link
Contributor Author

tacslon commented Aug 7, 2024

but if we always use the previous XDP program, duplicate maps do exist

Where do we create the duplicate maps again?

Other eBPF programs that have reference to same maps, e.g. sockops has a reference to bpf_log_level, and old XDP program has an old bpf_log_level map

@tacslon tacslon force-pushed the fix/xdp-link branch 3 times, most recently from f2b9ab8 to 75277fa Compare August 7, 2024 03:06
@tacslon tacslon changed the title [WIP]Migrate XDP link from CNI to deamon, in order to fix Kmesh has restarted but old XDP program remain linked [WIP] Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked Aug 7, 2024
@tacslon tacslon changed the title [WIP] Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked [WIP]Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked Aug 7, 2024
@hzxuzhonghu
Copy link
Member

but if we always use the previous XDP program, duplicate maps do exist

Where do we create the duplicate maps again?

Other eBPF programs that have reference to same maps, e.g. sockops has a reference to bpf_log_level, and old XDP program has an old bpf_log_level map

I am not sure i understand what you mean. Do you mean sockops refer a bpf_log_level map, but xdp refer another? How could that happen if we detach and unlink the old xdp prog correctly

@hzxuzhonghu
Copy link
Member

You should respect the restart feature, do re attach/link as other bpf prog

@hzxuzhonghu
Copy link
Member

lgtm otherwise

Copy link

codecov bot commented Aug 8, 2024

Codecov Report

Attention: Patch coverage is 67.18750% with 21 lines in your changes missing coverage. Please review.

Project coverage is 51.39%. Comparing base (433592b) to head (aee3fee).
Report is 1 commits behind head on main.

Files Patch % Lines
pkg/controller/manage/manage_controller.go 68.25% 13 Missing and 7 partials ⚠️
pkg/controller/controller.go 0.00% 1 Missing ⚠️
Files Coverage Δ
pkg/controller/controller.go 0.00% <0.00%> (ø)
pkg/controller/manage/manage_controller.go 55.26% <68.25%> (-0.18%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update dda7049...aee3fee. Read the comment docs.

@tacslon tacslon changed the title [WIP]Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked Aug 8, 2024
@hzxuzhonghu
Copy link
Member

@YaoZengzeng TestRemoveAddNsOrServiceWaypoint failed

@LiZhenCheng9527
Copy link
Collaborator

@YaoZengzeng TestRemoveAddNsOrServiceWaypoint failed

The PR I merged in yesterday also failed, but rerunning the job passed it.

@YaoZengzeng
Copy link
Member

@YaoZengzeng TestRemoveAddNsOrServiceWaypoint failed

I'll take a closer look

@YaoZengzeng
Copy link
Member

It seems like that the Kmesh daemon crashed during the test, casuing the newly deployed waypoint fail to take effect.

@hzxuzhonghu
Copy link
Member

Ut coverage does not satisfy the checker

… XDP program remain linked

Signed-off-by: talon <tianmuyang@huawei.com>
Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@kmesh-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hzxuzhonghu

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@kmesh-bot kmesh-bot merged commit 77cadb3 into kmesh-net:main Aug 9, 2024
8 checks passed
@kmesh-bot
Copy link
Collaborator

In response to a cherrypick label: #679 failed to apply on top of branch "release-0.4":

Applying: Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked
Using index info to reconstruct a base tree...
M	pkg/controller/controller.go
A	pkg/controller/manage/manage_controller.go
A	pkg/controller/manage/manage_controller_test.go
Falling back to patching base and 3-way merge...
CONFLICT (modify/delete): pkg/controller/manage/manage_controller_test.go deleted in HEAD and modified in Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked. Version Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked of pkg/controller/manage/manage_controller_test.go left in tree.
CONFLICT (modify/delete): pkg/controller/manage/manage_controller.go deleted in HEAD and modified in Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked. Version Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked of pkg/controller/manage/manage_controller.go left in tree.
Auto-merging pkg/controller/controller.go
CONFLICT (content): Merge conflict in pkg/controller/controller.go
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Patch failed at 0001 Link XDP again in deamon, in order to fix Kmesh has restarted but old XDP program remain linked
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

@kmesh-bot
Copy link
Collaborator

In response to a cherrypick label: new issue created for failed cherrypick: #716

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

XDP prog fails to update
5 participants