Skip to content
This repository has been archived by the owner on May 12, 2021. It is now read-only.

grpc :Add the StatsContainer api for events cli #222

Merged
merged 1 commit into from
May 10, 2018

Conversation

jshachm
Copy link
Member

@jshachm jshachm commented Apr 24, 2018

Add StatsContainer API to display container events
Fixes: #221

Signed-off-by: c00416947 caihaomin@huawei.com

@sboeuf
Copy link

sboeuf commented Apr 24, 2018

@jshachm I'll try to review this when I get a chance. In the meantime, could you please fix the CI.

@jshachm jshachm force-pushed the add-stats-api branch 2 times, most recently from 29d945c to 46f8273 Compare April 25, 2018 01:16
@@ -108,7 +110,10 @@ func init() {
}
func (this *CheckRequest) Equal(that interface{}) bool {
if that == nil {
return this == nil
if this == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is useless yet it is auto-generated. Are you using an old version of protoc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am try to using the latest protoc to re-generated it :-)

string container_id = 1;
}

message CpuUsage {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a reference about why these stats are collected? IOW, are they sufficient to support the CRI metrics as well as the kata cli events?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the fact that runc/libcontainer is the standard, I think relying on the libcontainer call to container.Stats() will return the right amount of data.

@jshachm
Copy link
Member Author

jshachm commented Apr 25, 2018

@sboeuf @bergwolf

Which version of protoc release we use to re-generate pb. I have tried with latest release and latest gogo-protoc still have problems with the different codes result.

Should we provide a stable version of protoc tools to avoid this problem?

@bergwolf
Copy link
Member

@jshachm Did you use the helper script hack/update-generated-agent-proto.sh to re-generate grpc code?

@jshachm
Copy link
Member Author

jshachm commented Apr 25, 2018

@bergwolf I use this hack/update-generated-agent-proto.sh and using latest protoc src make binary from master branch and latest gogo-protoc from master branch .Still have problems .

$ protoc --version
libprotoc 3.5.1

@bergwolf
Copy link
Member

@jshachm I upgraded my local setup to latest protoc (protoc-3.5.1-linux-x86_64.zip) and latest protoc-gen-gogofast(commit 1ef32a8b9fc3f). Then I regenerated the grpc code but I still do not see those.

@jshachm
Copy link
Member Author

jshachm commented Apr 25, 2018

@bergwolf That's weird. My local compile env is the same as yours. Maybe related to golang version, I myself using 1.10.1.

Let me find out where the problem and will upgrade the pr.

@jshachm jshachm force-pushed the add-stats-api branch 2 times, most recently from 7c9a451 to 9563b63 Compare April 25, 2018 06:48
@jshachm
Copy link
Member Author

jshachm commented Apr 26, 2018

@bergwolf looking into cri-o and find the some reference.

The Stats struct is build from runc events stats as discussed with @sboeuf .
In the code cri-o of ContainerStats command, it use libcontainer to get stats struct which is the same as us. So I think they are sufficient to support the CRI metrics as well as the kata cli events.

https://github.com/kubernetes-incubator/cri-o/blob/c8cb1da8fb455e30d5959ded9f04b8cf0c37870d/lib/container_server_linux.go#L62

@codecov
Copy link

codecov bot commented Apr 28, 2018

Codecov Report

Merging #222 into master will increase coverage by 0.3%.
The diff coverage is 66.66%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master     #222     +/-   ##
=========================================
+ Coverage   41.84%   42.14%   +0.3%     
=========================================
  Files          12       12             
  Lines        1943     1960     +17     
=========================================
+ Hits          813      826     +13     
- Misses       1026     1028      +2     
- Partials      104      106      +2
Impacted Files Coverage Δ
mockcontainer.go 100% <100%> (ø) ⬆️
grpc.go 26.82% <64.7%> (+1.24%) ⬆️
protocols/client/client.go 70.43% <0%> (+1.73%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8856b3a...077e6f9. Read the comment docs.

@jshachm jshachm force-pushed the add-stats-api branch 2 times, most recently from 4db1116 to 39ea8f9 Compare April 28, 2018 03:42
@jshachm
Copy link
Member Author

jshachm commented Apr 28, 2018

@sboeuf CI is happy now

sboeuf
sboeuf previously requested changes Apr 30, 2018
grpc.go Outdated
@@ -718,6 +718,34 @@ func (a *agentGRPC) ListProcesses(ctx context.Context, req *pb.ListProcessesRequ
return resp, nil
}

func (a *agentGRPC) StatsContainer(ctx context.Context, req *pb.StatsContainerRequest) (*pb.StatsContainerResponse, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kata-agent is completely tied to libcontainer and I don't think we need to define a new type pb.StatsContainerResponse here. This could be simpler here by simply returning a structure of type libcontainer.Stats.
@bergwolf WDYT ?

Virtcontainers being something generic, we will need to introduce a duplicated structure there (runtime repo).

According to this, the following function would be greatly simplified:

func (a *agentGRPC) StatsContainer(ctx context.Context, req *pb.StatsContainerRequest) (*libcontainer.Stats, error) {
	c, err := a.sandbox.getContainer(req.ContainerId)
	if err != nil {
		return nil, err
	}

	return c.container.Stats()
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back from holiday, WIP. Thx for reviewing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sboeuf If we use libcontainer.Stats directly as the return of a StatsContainerResponse or sth else like the grpc response. There will be error if we don't convert libcontainer.Stats into StatsContainerResponse.

Besides, if we want to use libcontainer.Stats directly, we need to convert GO struct into protoc IDL. And I think this is why here I introduce a struct reponse here and what's my work have done.

Glad to hear your views and suggestions. If anything I ignore, plz let me know.

string container_id = 1;
}

message CpuUsage {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the fact that runc/libcontainer is the standard, I think relying on the libcontainer call to container.Stats() will return the right amount of data.

@WeiZhang555
Copy link
Member

There're some conflicts now, sorry for you. Please rebase first.

@WeiZhang555
Copy link
Member

And you can try to run make proto to generate the pb files, it makes sure everyone is using same protoc version and there's no unnecessary modifications will be introduced due to different protoc version.

@jshachm
Copy link
Member Author

jshachm commented May 7, 2018

@WeiZhang555 done~

@jshachm jshachm closed this May 7, 2018
@jshachm jshachm reopened this May 7, 2018
Add `StatsContainer` API to display container events

Fixes: kata-containers#221

Signed-off-by: c00416947 <caihaomin@huawei.com>
@jshachm
Copy link
Member Author

jshachm commented May 8, 2018

CI fails randomly. But I think it's not this patch's problem.
@sboeuf @bergwolf @WeiZhang555 PTAL
Any other suggestions? So I can go forward to runtime cli implement.

@jshachm jshachm changed the title [RFC] API:Add the StatsContainer api for events cli grpc :Add the StatsContainer api for events cli May 8, 2018

message CpuUsage {
uint64 total_usage = 1;
repeated uint64 percpu_usage = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is CPU hotplug supported here? If it is in principle, I think we'll need some tests to ensure we dtrt.

/cc @devimc

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libcontainer checks the cgroups and each time a new CPU is added, the cgroup is updated, in theory this should work

}

message CpuUsage {
uint64 total_usage = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably this is in bytes? Same question for lots of the other fields.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkout from libcontainer/cgroups/stats.go
To make it easier to transport into kata event display value, it should be in the same type with what runc does.

MemoryStats memory_stats = 2;
PidsStats pids_stats = 3;
BlkioStats blkio_stats = 4;
map<string, HugetlbStats> hugetlb_stats = 5; // the map is in the format "size of hugepage: stats of the hugepage"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the key of type string if it's being used to store "size of hugepage"? Again, is this in bytes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

string is the type of the hugetlb size like 2MB or 1GB it should be in string as what runc has done.

}

message StatsContainerResponse {
CgroupStats cgroup_stats = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something to consider: what if the guest kernel doesn't provide all the cgroup values the agent is expecting to be able to read for the workload? Do we fail, or return some sort of "default" value when we couldn't query it. In fact, maybe we could consider adding a special --test option to the agent where it will check to see if all the cgroup paths exists as expected and error if they don't exist (and maybe even display the appropriate CONFIG_* name(s) to help the user configure their guest kernel appropriately).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libcontainer checks the cgroup subsystem path. If it path is not provided by guest kernel, it will return a nil struct.

for name, path := range m.Paths {
		sys, err := subsystems.Get(name)
		if err == errSubsystemDoesNotExist || !cgroups.PathExists(path) {
			continue
		}
		if err := sys.GetStats(path, stats); err != nil {
			return nil, err
		}
	}

Having a --test check for agent to find out cgroups paths is a good idea. However, maybe sometimes user just want to cropping unnecessary configurations. And besides , we can't decide which cgroup subsystem is necessary by now.

I will do this if necessary. Glad to hear what your guys think of this.

@WeiZhang555
Copy link
Member

WeiZhang555 commented May 10, 2018

LGTM

@sboeuf Do you need to take another look at this?

Approved with PullApprove Approved with PullApprove

@jodh-intel
Copy link
Contributor

jodh-intel commented May 10, 2018

Thanks for explaining @jshachm.

lgtm

/cc @egernst as @sboeuf is afk.

Approved with PullApprove Approved with PullApprove

@egernst egernst dismissed sboeuf’s stale review May 10, 2018 16:20

Looks like Seb's feedback was addressed and he's OOO

@egernst egernst merged commit ea0e6ae into kata-containers:master May 10, 2018
@jshachm jshachm deleted the add-stats-api branch May 11, 2018 01:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants