Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Application is not able to retrieve container metrics for memory usage on Ubuntu Noble stemcell #995

Open
Tracked by #1224
dimivel opened this issue Mar 4, 2025 · 5 comments
Labels

Comments

@dimivel
Copy link

dimivel commented Mar 4, 2025

Current behavior

Running the CF acceptance tests in noble-stemcell-validation pipeline is failing on a test checking if instances is able to retrieve container metrics for memory usage:

[2025-03-04 10:54:46.19 (UTC)]> cf app CATS-2-APP-e984cce5716df278 
  Showing health and status for app CATS-2-APP-e984cce5716df278 in org CATS-2-ORG-39eae2fe273ebf64 / space CATS-2-SPACE-645fdd9004c00857 as CATS-2-USER-9eb54322b25a4686...

  name:              CATS-2-APP-e984cce5716df278
  requested state:   started
  routes:            CATS-2-APP-e984cce5716df278.cf.bellatrix.env.wg-ard.ci.cloudfoundry.org
  last uploaded:     Tue 04 Mar 10:46:26 UTC 2025
  stack:             cflinuxfs4
  buildpacks:        
  	name               version   detect output   buildpack name
  	binary_buildpack   1.1.15    binary          binary

  type:           web
  sidecars:       
  instances:      2/2
  memory usage:   256M
       state     since                  cpu    memory       disk         logging             cpu entitlement   details
  #0   running   2025-03-04T10:46:44Z   0.0%   0B of 256M   8.8M of 1G   0B/s of unlimited   0.8%              
  #1   running   2025-03-04T10:46:49Z   0.0%   0B of 256M   8.8M of 1G   0B/s of unlimited   0.5%              
  [FAILED] in [It] - /tmp/build/33ac16d1/cf-acceptance-tests/apps/lifecycle.go:247 @ 03/04/25 10:54:47.78

Here are reported application logs:

[2025-03-04 10:54:48.78 (UTC)]> cf logs --recent CATS-2-APP-e984cce5716df278 
  Retrieving logs for app CATS-2-APP-e984cce5716df278 in org CATS-2-ORG-39eae2fe273ebf64 / space CATS-2-SPACE-645fdd9004c00857 as CATS-2-USER-9eb54322b25a4686...

     2025-03-04T10:46:10.86+0000 [API/0] OUT Added process: "web"
     2025-03-04T10:46:10.87+0000 [API/0] OUT Created app with guid a054c1f6-d69b-44e6-8122-cb517e631501
     2025-03-04T10:46:10.90+0000 [API/0] OUT Applied manifest to app with guid a054c1f6-d69b-44e6-8122-cb517e631501 (---
     2025-03-04T10:46:10.90+0000 [API/0] OUT applications:
     2025-03-04T10:46:10.90+0000 [API/0] OUT - name: CATS-2-APP-e984cce5716df278
     2025-03-04T10:46:10.90+0000 [API/0] OUT instances: 2
     2025-03-04T10:46:10.90+0000 [API/0] OUT path: "/tmp/build/33ac16d1/cf-acceptance-tests/assets/catnip/bin"
     2025-03-04T10:46:10.91+0000 [API/0] OUT memory: 256M
     2025-03-04T10:46:10.91+0000 [API/0] OUT default-route: true
     2025-03-04T10:46:10.91+0000 [API/0] OUT buildpacks:
     2025-03-04T10:46:10.91+0000 [API/0] OUT - binary_buildpack
     2025-03-04T10:46:10.91+0000 [API/0] OUT command: "./catnip"
     2025-03-04T10:46:10.91+0000 [API/0] OUT )
     2025-03-04T10:46:17.42+0000 [API/1] OUT Uploading app package for app with guid a054c1f6-d69b-44e6-8122-cb517e631501
     2025-03-04T10:46:20.58+0000 [API/0] OUT Creating build for app with guid a054c1f6-d69b-44e6-8122-cb517e631501
     2025-03-04T10:46:21.08+0000 [STG/0] OUT Downloading binary_buildpack...
     2025-03-04T10:46:21.10+0000 [STG/0] OUT Downloaded binary_buildpack
     2025-03-04T10:46:21.10+0000 [STG/0] OUT Cell 8a63080b-b242-401f-9d29-740be95dc45b creating container for instance 7d0aa2bf-66b9-426a-9709-ba0566a1b98c
     2025-03-04T10:46:22.05+0000 [STG/0] OUT Security group rules were updated
     2025-03-04T10:46:22.10+0000 [STG/0] OUT Cell 8a63080b-b242-401f-9d29-740be95dc45b successfully created container for instance 7d0aa2bf-66b9-426a-9709-ba0566a1b98c
     2025-03-04T10:46:23.10+0000 [STG/0] OUT Downloading app package...
     2025-03-04T10:46:23.64+0000 [STG/0] OUT Downloaded app package (4.9M)
     2025-03-04T10:46:23.95+0000 [STG/0] OUT -----> Binary Buildpack version 1.1.15
     2025-03-04T10:46:25.55+0000 [STG/0] OUT Exit status 0
     2025-03-04T10:46:25.55+0000 [STG/0] OUT Uploading droplet, build artifacts cache...
     2025-03-04T10:46:25.55+0000 [STG/0] OUT Uploading droplet...
     2025-03-04T10:46:25.56+0000 [STG/0] OUT Uploading build artifacts cache...
     2025-03-04T10:46:26.18+0000 [STG/0] OUT Uploaded build artifacts cache (214B)
     2025-03-04T10:46:26.45+0000 [API/1] OUT Creating droplet for app with guid a054c1f6-d69b-44e6-8122-cb517e631501
     2025-03-04T10:46:30.22+0000 [STG/0] OUT Uploaded droplet (4.9M)
     2025-03-04T10:46:30.22+0000 [STG/0] OUT Uploading complete
     2025-03-04T10:46:31.35+0000 [STG/0] OUT Cell 8a63080b-b242-401f-9d29-740be95dc45b stopping instance 7d0aa2bf-66b9-426a-9709-ba0566a1b98c
     2025-03-04T10:46:31.35+0000 [STG/0] OUT Cell 8a63080b-b242-401f-9d29-740be95dc45b destroying container for instance 7d0aa2bf-66b9-426a-9709-ba0566a1b98c
     2025-03-04T10:46:32.34+0000 [STG/0] OUT Cell 8a63080b-b242-401f-9d29-740be95dc45b successfully destroyed container for instance 7d0aa2bf-66b9-426a-9709-ba0566a1b98c
     2025-03-04T10:46:33.45+0000 [API/0] OUT Updated app with guid a054c1f6-d69b-44e6-8122-cb517e631501 ({:droplet_guid=>"c0cba695-b4eb-4724-8976-4718b0b7f800"})
     2025-03-04T10:46:33.56+0000 [API/1] OUT Creating revision for app with guid a054c1f6-d69b-44e6-8122-cb517e631501
     2025-03-04T10:46:34.36+0000 [API/1] OUT Restarted app with guid a054c1f6-d69b-44e6-8122-cb517e631501
     2025-03-04T10:46:34.38+0000 [CELL/0] OUT Cell 8a63080b-b242-401f-9d29-740be95dc45b creating container for instance aa0c5fbc-b813-4bfd-759e-dce3
     2025-03-04T10:46:34.39+0000 [CELL/1] OUT Cell d0c8f6d8-0bc9-4135-bd9c-36124e44de51 creating container for instance 19e75467-5ad6-46c5-691e-4766
     2025-03-04T10:46:36.31+0000 [CELL/0] OUT Security group rules were updated
     2025-03-04T10:46:36.37+0000 [CELL/0] OUT Cell 8a63080b-b242-401f-9d29-740be95dc45b successfully created container for instance aa0c5fbc-b813-4bfd-759e-dce3
     2025-03-04T10:46:37.32+0000 [CELL/0] OUT Downloading droplet...
     2025-03-04T10:46:38.17+0000 [CELL/0] OUT Downloaded droplet (4.9M)
     2025-03-04T10:46:38.17+0000 [CELL/0] OUT Starting health monitoring of container
     2025-03-04T10:46:39.40+0000 [CELL/1] OUT Security group rules were updated
     2025-03-04T10:46:39.45+0000 [CELL/1] OUT Cell d0c8f6d8-0bc9-4135-bd9c-36124e44de51 successfully created container for instance 19e75467-5ad6-46c5-691e-4766
     2025-03-04T10:46:39.82+0000 [CELL/1] OUT Downloading droplet...
     2025-03-04T10:46:40.89+0000 [APP/PROC/WEB/0] OUT Invoking pre-start scripts.
     2025-03-04T10:46:41.88+0000 [APP/PROC/WEB/0] OUT Invoking start command.
     2025-03-04T10:46:41.88+0000 [CELL/1] OUT Downloaded droplet (4.9M)
     2025-03-04T10:46:41.88+0000 [CELL/1] OUT Starting health monitoring of container
     2025-03-04T10:46:42.31+0000 [APP/PROC/WEB/0] OUT listening on port 8080...
     2025-03-04T10:46:44.20+0000 [CELL/0] OUT Container became healthy
     2025-03-04T10:46:44.23+0000 [API/0] OUT Process became ready with guid a054c1f6-d69b-44e6-8122-cb517e631501 payload: {"instance"=>"aa0c5fbc-b813-4bfd-759e-dce3", "index"=>0, "cell_id"=>"8a63080b-b242-401f-9d29-740be95dc45b", "ready"=>true, "version"=>"33c1a78b-0a1a-4b43-a7be-aef84df36da2"}
     2025-03-04T10:46:46.26+0000 [APP/PROC/WEB/1] OUT Invoking pre-start scripts.
     2025-03-04T10:46:46.59+0000 [APP/PROC/WEB/1] OUT Invoking start command.
     2025-03-04T10:46:46.90+0000 [APP/PROC/WEB/1] OUT listening on port 8080...
     2025-03-04T10:46:49.08+0000 [CELL/1] OUT Container became healthy
     2025-03-04T10:46:49.13+0000 [API/1] OUT Process became ready with guid a054c1f6-d69b-44e6-8122-cb517e631501 payload: {"instance"=>"19e75467-5ad6-46c5-691e-4766", "index"=>1, "cell_id"=>"d0c8f6d8-0bc9-4135-bd9c-36124e44de51", "ready"=>true, "version"=>"33c1a78b-0a1a-4b43-a7be-aef84df36da2"}

Desired behavior

The application should successfully report container metrics for memory usage.

Affected Version

2.115.0

@jpalermo
Copy link
Member

jpalermo commented Mar 6, 2025

The FIWG owns the gosigar library because the bosh-agent uses it to get memory/cpu/disk metrics and report them to the bosh director.

Diego also uses the gosigar library. Probably to gather container metrics? We thought that might be the root of the problem, but it appears that VM metrics work fine under Noble and we just ran a one-off of the gosigar unit tests under Noble and they passed fine. So we're thinking it's not the root of the problem here (but we could also be missing something).

@beyhan
Copy link
Member

beyhan commented Mar 7, 2025

Should we re-run the pipeline to see whether the failure is reproducible? Looking into the noble-stemcell-validation pipeline it ran once and got that error or am I missing other runs?

@dimivel
Copy link
Author

dimivel commented Mar 7, 2025

I did validation pipeline re-run and the failure is reproducible.

Summarizing 1 Failure:
  [FAIL] [apps] Application Lifecycle pushing multiple instances is able to retrieve container metrics [It] for memory usage
  /tmp/build/33ac16d1/cf-acceptance-tests/apps/lifecycle.go:247

I'm available to follow-up together providing BBL environment access.

@dimivel
Copy link
Author

dimivel commented Mar 11, 2025

Here is comparison between application push on environment with Jammy stemcell and bellatrix BBL environments where "Noble" validation is running.

On "Jammy" stemcell environment memory container metric is reported successfully:

root@b493bcc3867f:/home/bbl/bbl-state# cf app catnip
Showing health and status for app catnip in org DVV / space dvv_space as admin...

name:              catnip
requested state:   started
routes:            catnip.cf.bellatrix.env.wg-ard.ci.cloudfoundry.org
last uploaded:     Fri 07 Mar 13:02:05 UTC 2025
stack:             cflinuxfs4
buildpacks:
	name               version   detect output   buildpack name
	binary_buildpack   1.1.15    binary          binary

type:           web
sidecars:
instances:      1/1
memory usage:   1024M
     state     since                  cpu    memory     disk         logging             cpu entitlement   details
#0   running   2025-03-07T13:02:17Z   0.0%   0B of 1G   8.8M of 1G   0B/s of unlimited   0.0% 

After that using cf cli with the corresponding api endpoint to get container statistics and especially memory usage:

cf-acceptance-tests/assets/catnip/bin > develop > cf curl /v3/apps/d1fcedb0-828d-4810-92d7-79ef20a79a27/processes/web/stats | jq '.resources[0].usage.mem'
19435795

Same application on bellatrix BBL (with "Noble" stemcell) environment the memory container metric value is 0.

root@b493bcc3867f:/home/bbl/bbl-state# cf app catnip
Showing health and status for app catnip in org DVV / space dvv_space as admin...

name:              catnip
requested state:   started
routes:            catnip.cf.bellatrix.env.wg-ard.ci.cloudfoundry.org
last uploaded:     Fri 07 Mar 13:02:05 UTC 2025
stack:             cflinuxfs4
buildpacks:
	name               version   detect output   buildpack name
	binary_buildpack   1.1.15    binary          binary

type:           web
sidecars:
instances:      1/1
memory usage:   1024M
     state     since                  cpu    memory     disk         logging             cpu entitlement   details
#0   running   2025-03-07T13:02:17Z   0.0%   0B of 1G   8.8M of 1G   0B/s of unlimited   0.0% 

Same API call as above is reporting zero memory usage:

root@b493bcc3867f:/home/bbl/bbl-state#  cf curl /v3/apps/46f76fe1-05bd-44d6-b0bf-d4fa5b36af02/processes/web/stats | jq '.resources[0].usage.mem'
0

At the same time application container shows that used memory is 517 MB

vcap@8bda1ddb-47c5-41d0-70c6-0410:~$ free -m
               total        used        free      shared  buff/cache   available
Mem:           15990         517       12162         148        3310       14994
Swap:          15990           0       15990

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Development

No branches or pull requests

4 participants