-
Notifications
You must be signed in to change notification settings - Fork 114
grpc :Add the StatsContainer api for events cli
#222
Conversation
@jshachm I'll try to review this when I get a chance. In the meantime, could you please fix the CI. |
29d945c
to
46f8273
Compare
protocols/grpc/health.pb.go
Outdated
@@ -108,7 +110,10 @@ func init() { | |||
} | |||
func (this *CheckRequest) Equal(that interface{}) bool { | |||
if that == nil { | |||
return this == nil | |||
if this == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check is useless yet it is auto-generated. Are you using an old version of protoc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am try to using the latest protoc
to re-generated it :-)
string container_id = 1; | ||
} | ||
|
||
message CpuUsage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a reference about why these stats are collected? IOW, are they sufficient to support the CRI metrics as well as the kata cli events?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the fact that runc/libcontainer is the standard, I think relying on the libcontainer call to container.Stats()
will return the right amount of data.
@jshachm Did you use the helper script |
@bergwolf I use this
|
@jshachm I upgraded my local setup to latest protoc (protoc-3.5.1-linux-x86_64.zip) and latest protoc-gen-gogofast(commit 1ef32a8b9fc3f). Then I regenerated the grpc code but I still do not see those. |
@bergwolf That's weird. My local compile env is the same as yours. Maybe related to Let me find out where the problem and will upgrade the pr. |
7c9a451
to
9563b63
Compare
@bergwolf looking into The |
Codecov Report
@@ Coverage Diff @@
## master #222 +/- ##
=========================================
+ Coverage 41.84% 42.14% +0.3%
=========================================
Files 12 12
Lines 1943 1960 +17
=========================================
+ Hits 813 826 +13
- Misses 1026 1028 +2
- Partials 104 106 +2
Continue to review full report at Codecov.
|
4db1116
to
39ea8f9
Compare
@sboeuf CI is happy now |
grpc.go
Outdated
@@ -718,6 +718,34 @@ func (a *agentGRPC) ListProcesses(ctx context.Context, req *pb.ListProcessesRequ | |||
return resp, nil | |||
} | |||
|
|||
func (a *agentGRPC) StatsContainer(ctx context.Context, req *pb.StatsContainerRequest) (*pb.StatsContainerResponse, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kata-agent
is completely tied to libcontainer and I don't think we need to define a new type pb.StatsContainerResponse
here. This could be simpler here by simply returning a structure of type libcontainer.Stats
.
@bergwolf WDYT ?
Virtcontainers being something generic, we will need to introduce a duplicated structure there (runtime repo).
According to this, the following function would be greatly simplified:
func (a *agentGRPC) StatsContainer(ctx context.Context, req *pb.StatsContainerRequest) (*libcontainer.Stats, error) {
c, err := a.sandbox.getContainer(req.ContainerId)
if err != nil {
return nil, err
}
return c.container.Stats()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back from holiday, WIP. Thx for reviewing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sboeuf If we use libcontainer.Stats
directly as the return of a StatsContainerResponse
or sth else like the grpc response. There will be error if we don't convert libcontainer.Stats
into StatsContainerResponse
.
Besides, if we want to use libcontainer.Stats
directly, we need to convert GO struct
into protoc IDL
. And I think this is why here I introduce a struct reponse
here and what's my work have done.
Glad to hear your views and suggestions. If anything I ignore, plz let me know.
string container_id = 1; | ||
} | ||
|
||
message CpuUsage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the fact that runc/libcontainer is the standard, I think relying on the libcontainer call to container.Stats()
will return the right amount of data.
There're some conflicts now, sorry for you. Please rebase first. |
And you can try to run |
@WeiZhang555 done~ |
Add `StatsContainer` API to display container events Fixes: kata-containers#221 Signed-off-by: c00416947 <caihaomin@huawei.com>
CI fails randomly. But I think it's not this patch's problem. |
events cli
events cli
|
||
message CpuUsage { | ||
uint64 total_usage = 1; | ||
repeated uint64 percpu_usage = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is CPU hotplug supported here? If it is in principle, I think we'll need some tests to ensure we dtrt.
/cc @devimc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libcontainer checks the cgroups and each time a new CPU is added, the cgroup is updated, in theory this should work
} | ||
|
||
message CpuUsage { | ||
uint64 total_usage = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably this is in bytes? Same question for lots of the other fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
checkout from libcontainer/cgroups/stats.go
To make it easier to transport into kata event
display value, it should be in the same type with what runc
does.
MemoryStats memory_stats = 2; | ||
PidsStats pids_stats = 3; | ||
BlkioStats blkio_stats = 4; | ||
map<string, HugetlbStats> hugetlb_stats = 5; // the map is in the format "size of hugepage: stats of the hugepage" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the key of type string
if it's being used to store "size of hugepage"? Again, is this in bytes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string
is the type of the hugetlb size like 2MB
or 1GB
it should be in string
as what runc
has done.
} | ||
|
||
message StatsContainerResponse { | ||
CgroupStats cgroup_stats = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something to consider: what if the guest kernel doesn't provide all the cgroup values the agent is expecting to be able to read for the workload? Do we fail, or return some sort of "default" value when we couldn't query it. In fact, maybe we could consider adding a special --test
option to the agent where it will check to see if all the cgroup paths exists as expected and error if they don't exist (and maybe even display the appropriate CONFIG_*
name(s) to help the user configure their guest kernel appropriately).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
libcontainer
checks the cgroup subsystem path. If it path is not provided by guest kernel, it will return a nil struct.
for name, path := range m.Paths {
sys, err := subsystems.Get(name)
if err == errSubsystemDoesNotExist || !cgroups.PathExists(path) {
continue
}
if err := sys.GetStats(path, stats); err != nil {
return nil, err
}
}
Having a --test
check for agent
to find out cgroups paths is a good idea. However, maybe sometimes user just want to cropping unnecessary configurations. And besides , we can't decide which cgroup subsystem is necessary by now.
I will do this if necessary. Glad to hear what your guys think of this.
LGTM @sboeuf Do you need to take another look at this? |
Looks like Seb's feedback was addressed and he's OOO
Add
StatsContainer
API to display container eventsFixes: #221
Signed-off-by: c00416947 caihaomin@huawei.com