Crash with Invalid JSON error while parsing container info #683

Srinivas11789 · 2019-06-20T19:45:37Z

What happened:
Falco crashes with Runtime error: Invalid JSON encountered while parsing container info resulting in CrashLoopBackOff pod state

What you expected to happen:

Parse container info without error
Throw error and run without crash ( possible fallback? )

How to reproduce it (as minimally and precisely as possible):

Make a k8s deployment with large number of ports (> 1000)
Example nginx deployment [This is a dumb example configuration just to recreate the issue]

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deploy
spec:
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
        - name: nginx
          image: nginx
          ports:
          - { containerPort: 8080, name: server1}
          - { containerPort: 8081, name: server2}
          - { containerPort: 8082, name: server3}
          - { containerPort: 8083, name: server4}
          - { containerPort: 50000, hostPort: 50000, protocol: UDP, name: port1 }
          - { containerPort: 50001, hostPort: 50001, protocol: UDP, name: port2 }
          - { containerPort: 50002, hostPort: 50002, protocol: UDP, name: port3 }
          - { containerPort: 50003, hostPort: 50003, protocol: UDP, name: port4 }
          - { containerPort: 50004, hostPort: 50004, protocol: UDP, name: port5 }
          - { containerPort: 50005, hostPort: 50005, protocol: UDP, name: port6 }
          - { containerPort: 50006, hostPort: 50006, protocol: UDP, name: port7 }
          - { containerPort: 50007, hostPort: 50007, protocol: UDP, name: port8 }
          - { containerPort: 50008, hostPort: 50008, protocol: UDP, name: port9 }
          - { containerPort: 50009, hostPort: 50009, protocol: UDP, name: port10 }
          ...
          ...
          - { containerPort: 50998, hostPort: 50998, protocol: UDP, name: port999 }

Deploy Falco on the same node and check falco logs
fyi References,
- We need to explicitly list all the ports as mentioned at https://github.com/kubernetes/kubernetes/issues/23864
- Example: https://kubernetes.io/docs/tasks/run-application/run-stateless-application-deployment/

Anything else we need to know?:

values.yaml for parameters

ebpf:
  # Enable eBPF support for Falco - This allows Falco to run on Google COS.
  enabled: true

  settings:
    # Needed to enable eBPF JIT at runtime for performance reasons.
    # Can be skipped if eBPF JIT is enabled from outside the container
    hostNetwork: true
    # Needed to correctly detect the kernel version for the eBPF program
    # Set to false if not running on Google COS
    mountEtcVolume: true

falco:
  # Output format
  jsonOutput: true
  logLevel: notice
  # Slack alerts
  programOutput:
    enabled: true
    keepAlive: false
    program: "\" jq '{text: .output}' | curl -d @- -X POST https://hooks.slack.com/services/XXXX\""

Environment:

Falco version (use falco --version): falco version 0.15.3
System info

{
  "machine": "x86_64",
  "nodename": "gke-test-default-pool-3d67c0cd-n8b4",
  "release": "4.14.119+",
  "sysname": "Linux",
  "version": "#1 SMP Tue May 14 21:04:23 PDT 2019"
}

Cloud provider or hardware configuration: GCP
OS (e.g: cat /etc/os-release):

BUILD_ID=10895.242.0
NAME="Container-Optimized OS"

Kernel (e.g. uname -a):

Linux gke-test-default-pool-3d67c0cd-dlng 4.14.119+ #1 SMP Tue May 14 21:04:23 PDT 2019 x86_64 Intel(R) Xeon(R) CPU
 @ 2.20GHz GenuineIntel GNU/Linux

Install tools (e.g. in kubernetes, rpm, deb, from source): Kubernetes (helm)
Others:

The text was updated successfully, but these errors were encountered:

fntlnz · 2019-06-21T13:13:10Z

Hi @Srinivas11789 good catch! Thanks for opening the issue, we will try to reproduce it.

fntlnz · 2019-06-21T13:13:24Z

/assign @fntlnz
/assign @leodido

vsimon · 2019-06-21T16:57:01Z

Hi I'm hitting this issue as well. Thanks for looking in to it.

vsimon · 2019-07-23T00:21:15Z

any update on this?

fntlnz · 2019-07-30T20:21:11Z

@vsimon @Srinivas11789 this is in the backlog, will address shortly, in the meanwhile if anyone has more details please post here! ❤️

fntlnz · 2019-07-31T02:10:46Z

@Srinivas11789 I am not able to reproduce the error you are reporting for the parser, however I acknowledge that not checking the error at the event loop level can lead to falco crashing.

I couldn't try with ports because k8s didn't allow me to create a container with that many ports caused by a network sandboxing error.

The first definition contains around 3k env variables, the other around 10k but k8s didn't allow me to load it.

As you suggested we need two fixes for this:

The first is handling parsing errors, those are not handled right now and lead falco to crash and die, the fix for this is in wip: Handle sinsp::next errors in the inspect loop #746
The second is to understand the root cause of why the JSON parser breaks for you and fix it.

So since I was not able to reproduce the parsing error you are reporting I can't address the second fix we have to do, please continue providing feedback to help fixing this 👼

Having a complete reproducible yaml definition that breaks falco would help.

@vsimon you can probably help too.

My kubernetes version (compiled from master):

Client Version: version.Info{Major:"", Minor:"", GitVersion:"v0.0.0-master+$Format:%h$", GitCommit:"81a61ae0e37143299ee5947a6c2c5195ec5f72ae", GitTreeState:"clean", BuildDate:"2019-05-20T03:59:28Z", GoVersion:"go1.12.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.0", GitCommit:"641856db18352033a0d96dbc99153fa3b27298e5", GitTreeState:"clean", BuildDate:"2019-06-09T08:06:25Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}

Srinivas11789 · 2019-08-01T01:02:10Z

@fntlnz Thanks for the update and a possible fix. 👍

Agree that the first fix would solve the crash but having this deployment configuration would keep triggering the JSON parse errors for that container. So I think it would be better if we also try to fix the root cause. I can help with more feedback.

I tried to reproduce this again today (same environment as mentioned before) and still see the issue occurring, I added some issue reproducible ready files here that I used. Let me know if that helps. 🤔

@vsimon Thanks for the follow up.

fntlnz · 2019-08-07T08:04:12Z

Thanks for the updated files to reproduce @Srinivas11789 - will try to see if I can trigger that case on my environment.

stale · 2019-12-02T11:18:36Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

vsimon · 2019-12-02T14:46:30Z

/no-stale

leodido · 2019-12-20T14:01:16Z

/milestone 1.0.0

Moving this to 1.0.0 because we are re-designing the input interface (via gRPC). Once we have that we'll use the k8s go client directly plus go json package. Which in turn means that we'll use the same code k8s uses solving this bug.

stale · 2020-02-18T18:42:32Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

vsimon · 2020-02-18T19:48:36Z

/no-stale

santi-asapp · 2020-04-07T13:33:25Z

Hi, I have a similar issue with my EKS cluster (v1.14) when trying to parse a JSON from helm chart deployment. "Runtime error: Invalid JSON encountered while parsing container info:"

I'm running>
Falco version: 0.21.0-23+35691b0
Driver version: be1ea2d9482d0e6e2cb14a0fd7e08cbecf517f94

stale · 2020-06-06T14:10:25Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

leogr · 2020-09-30T12:38:24Z

Hello
I have same issue while container info json (with inspect) is correct. But i get same "Crash with Invalid JSON error while parsing container info" error. Is there any workaround for that ?

What's Falco version? Could you provide detailed steps to reproduce the problem?

vpharabot · 2020-09-30T13:30:29Z

I'm using latest release falco:0.25.0
I'm not currently sure how to reproduce in my case yet.

vpharabot · 2020-10-01T12:13:04Z

I'm able to reproduce
If you have a pod with 62K character in annotations, when falco try to parse the container info, falco will crash
The limit might be lower, but at least with 62K characters, i'm able to reproduce

adamzr · 2020-11-03T17:18:41Z

This is causing me an issue with a Java Spring Boot image produced by Spring Boot Maven Plugin's Docker Image generation feature. That feature uses Paketo buildpacks. This is likely a problem for every Java image produced by Paketo buildpacks.

adamzr · 2020-11-03T17:30:41Z

@fntlnz @leodido Try the Docker image nebhale/spring-music that's a typical Java Spring Docker image created by Poketo buildpacks. There is a lot of JSON in the labels. I think this will cause Falco to crash.

PhilipSchmid · 2020-11-13T10:57:55Z

Hi guys,

I just run into the same issue 😢.

Working image:

user@node1:~$ docker inspect image_with_io_buildpacks_build_metadata_label:v1 -f '{{json .Config.Labels}}' | wc -m
58378

NOT working image:

user@node1:~$ docker inspect image_with_io_buildpacks_build_metadata_label:v2 -f '{{json .Config.Labels}}' | wc -m 
596846

@leodido Is there any possibility this will be fixed prior to Falco release 1.0.0?

Thanks & regards,
Philip

Annegies · 2021-02-03T15:32:36Z

On our container platform we also have some containers running that were built with some kind of buildpack resulting in insanely huge labels on the docker images. And Falco crashes when trying to parse them.
These labels are ridiculous but Falco should also be able to handle it in my opinion.

What is weird though is that this starting happening when we upgraded from 0.26.2 to 0.27.0. It's running fine with 0.26.2.
I couldn't find a change in de changelog that could explain this?

rbkaspr · 2021-04-02T13:05:16Z

I'm also encountering this error in Falco 0.27.0 running on EKS 1.18.9-eks-d1db3c. Has there been any progress made towards solving this?

rbkaspr · 2021-04-02T13:36:58Z

Additional context, the only container that seemed to be triggering the issue was any instance of the micrometermetrics/prometheus-rsocket-proxy image. Removing all pods running that image allows Falco to run normally.

Hopefully that helps

ryneal · 2021-05-10T17:46:19Z

Any update on this issue? Seeing it with containers built with cloud native buildpacks

sbkg0002 · 2021-06-04T07:11:07Z

We also have this issue with >0.26.2. Is there any workaround/fix?

leogr · 2021-06-15T13:59:17Z

I have created a gist to simulate the >1000 ports deployment: https://gist.github.com/leogr/a184a09a3420eea4db73a07633aa04f3

Anyway, I was not able to reproduce this issue with 0.28.1.

Could someone who still has the problem with a newer version of Falco provide reproducible steps?

dza89 · 2021-06-15T16:57:01Z

@leogr
I've created a dummy image which let's falco (28.1) crash:
dza123/kotlin:latest

The issue is I think the total size of the labels, because i had to test it a few times before generating enough labels. This is default behaviour of buildpack btw, so please don't blame me for the ridiculous amount of labels.

leogr · 2021-06-17T11:37:47Z

Thank you @dza89, I was able to reproduce the bug now. It seems the root cause resides in libsinsp.
I can confirm the problem occurs when parsing container metadata. It can happen even outside a K8s context.

I still need to investigate further. Meanwhile, I have opened a new issue falcosecurity/libs#51 to track the problem in libsinsp.

PS
In my opinion, falcosecurity/libs#51 is not a dup of this issue since a temporary workaround for Falco only might be just reporting the error without exiting (not a definitive solution, ofc).

FedeDP · 2021-09-17T12:56:51Z

Hi!
It seems like the specific issue outlined by @dza89 with her/his docker image was fixed in falco libs with commit https://github.com/falcosecurity/libs/tree/748485ac2e912cdb67e3a19bf6ff402a54d4f08a, that avoids storing LABEL lines with length > 100bytes.

There is still a bug that is not covered by the above commit: what if lots (i mean lots) of labels with strings length < 100 bytes are added to a docker image?
I'll tell you: falco still crashes.
I am currently testing a possible fix.

You can easily reproduce the crash with the attached dockerfile (sorry for the stupid label keys/values :) )
Dockerfile.txt

poiana · 2021-12-16T15:44:29Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh with /remove-lifecycle stale.

Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle stale

leogr · 2021-12-20T09:14:58Z

@FedeDP has this issue been definitively fixed? I recall yes, but I have not found any reference.

FedeDP · 2021-12-20T09:18:59Z

Yup!
Well you now need a json > 4G to trigger the issue :)

FedeDP · 2021-12-20T09:20:40Z

I did not close this one because eventually the bug may appear again; it was meant to be fixed by falcosecurity/libs#85 but then me and @mstemm agreed that a malicious >4G container metadata json would kill us anyway: falcosecurity/libs#85 (comment)

poiana · 2022-01-19T09:39:43Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh with /remove-lifecycle rotten.

Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Provide feedback via https://github.com/falcosecurity/community.

/lifecycle rotten

leogr · 2022-01-19T17:28:00Z

This issue should be definitively fixed by falcosecurity/libs#102 which is included in the latest development version of Falco (i.e. the source code in the master branch).

The fix will be also part of the next upcoming release, so
/milestone 0.31.0

Since it has been fixed, I'm closing this issue. Feel free to discuss further or ask to re-open it if the problem persists.
Also, any feedback about the fix will be really appreciated. 🙏

/close

poiana · 2022-01-19T17:28:04Z

@leogr: Closing this issue.

In response to this:

This issue should be definitively fixed by falcosecurity/libs#102 which is included in the latest development version of Falco (i.e. the source code in the master branch).

The fix will be also part of the next upcoming release, so
/milestone 0.31.0

Since it has been fixed, I'm closing this issue. Feel free to discuss further or ask to re-open it if the problem persists.
Also, any feedback about the fix will be really appreciated. 🙏

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Srinivas11789 added the kind/bug label Jun 20, 2019

poiana assigned fntlnz and leodido Jun 21, 2019

fntlnz mentioned this issue Jul 31, 2019

wip: Handle sinsp::next errors in the inspect loop #746

Closed

fntlnz mentioned this issue Aug 16, 2019

Ensure that the webserver is robust #773

Closed

leodido added this to the 0.18.0 milestone Aug 22, 2019

leodido modified the milestones: 0.18.0, 0.19.0 Oct 3, 2019

stale bot added the wontfix label Dec 2, 2019

stale bot removed the wontfix label Dec 2, 2019

poiana modified the milestones: 0.19.0, 1.0.0 Dec 20, 2019

stale bot added the wontfix label Feb 18, 2020

stale bot removed the wontfix label Feb 18, 2020

stale bot added the wontfix label Jun 6, 2020

leogr mentioned this issue Jun 17, 2021

unable to handle containers with huge metadata falcosecurity/libs#51

Closed

fntlnz removed their assignment Aug 31, 2021

poiana added the lifecycle/stale label Dec 16, 2021

poiana added lifecycle/rotten and removed lifecycle/stale labels Jan 19, 2022

poiana modified the milestones: 1.0.0, 0.31.0 Jan 19, 2022

poiana closed this as completed Jan 19, 2022

Crash with Invalid JSON error while parsing container info #683

Crash with Invalid JSON error while parsing container info #683

Comments

Srinivas11789 commented Jun 20, 2019 • edited Loading

fntlnz commented Jun 21, 2019

fntlnz commented Jun 21, 2019

vsimon commented Jun 21, 2019

vsimon commented Jul 23, 2019

fntlnz commented Jul 30, 2019

fntlnz commented Jul 31, 2019

Srinivas11789 commented Aug 1, 2019

fntlnz commented Aug 7, 2019

stale bot commented Dec 2, 2019

vsimon commented Dec 2, 2019

leodido commented Dec 20, 2019

stale bot commented Feb 18, 2020

vsimon commented Feb 18, 2020

santi-asapp commented Apr 7, 2020 • edited Loading

stale bot commented Jun 6, 2020

leogr commented Sep 30, 2020

vpharabot commented Sep 30, 2020

vpharabot commented Oct 1, 2020

adamzr commented Nov 3, 2020

adamzr commented Nov 3, 2020 • edited Loading

PhilipSchmid commented Nov 13, 2020

Annegies commented Feb 3, 2021

rbkaspr commented Apr 2, 2021

rbkaspr commented Apr 2, 2021 • edited Loading

ryneal commented May 10, 2021

sbkg0002 commented Jun 4, 2021

leogr commented Jun 15, 2021

dza89 commented Jun 15, 2021 • edited Loading

leogr commented Jun 17, 2021 • edited Loading

FedeDP commented Sep 17, 2021

poiana commented Dec 16, 2021

leogr commented Dec 20, 2021

FedeDP commented Dec 20, 2021

FedeDP commented Dec 20, 2021

poiana commented Jan 19, 2022

leogr commented Jan 19, 2022

poiana commented Jan 19, 2022

Srinivas11789 commented Jun 20, 2019 •

edited

Loading

santi-asapp commented Apr 7, 2020 •

edited

Loading

adamzr commented Nov 3, 2020 •

edited

Loading

rbkaspr commented Apr 2, 2021 •

edited

Loading

dza89 commented Jun 15, 2021 •

edited

Loading

leogr commented Jun 17, 2021 •

edited

Loading