-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPsec pluto process crashes when the remote endpoint is unstable #2516
Comments
sridhargaddam
added
the
backport
This change requires a backport to eligible release branches
label
Jun 2, 2023
sridhargaddam
added a commit
to sridhargaddam/submariner
that referenced
this issue
Jun 2, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: submariner-io#2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com>
sridhargaddam
added a commit
to sridhargaddam/submariner
that referenced
this issue
Jun 2, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: submariner-io#2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com> Co-authored-by: Yossi Boaron <yboaron@redhat.com>
sridhargaddam
added a commit
to sridhargaddam/submariner
that referenced
this issue
Jun 2, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: submariner-io#2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com> Co-authored-by: Yossi Boaron <yboaron@redhat.com>
sridhargaddam
added a commit
to sridhargaddam/submariner
that referenced
this issue
Jun 2, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: submariner-io#2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com> Co-authored-by: Yossi Boaron <yboaron@redhat.com>
sridhargaddam
added a commit
to sridhargaddam/submariner
that referenced
this issue
Jun 2, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: submariner-io#2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com> Co-authored-by: Yossi Boaron <yboaron@redhat.com>
sridhargaddam
added a commit
that referenced
this issue
Jun 2, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: #2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com> Co-authored-by: Yossi Boaron <yboaron@redhat.com>
sridhargaddam
added a commit
to sridhargaddam/submariner
that referenced
this issue
Jun 2, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: submariner-io#2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com> Co-authored-by: Yossi Boaron <yboaron@redhat.com>
sridhargaddam
added a commit
to sridhargaddam/submariner
that referenced
this issue
Jun 2, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: submariner-io#2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com> Co-authored-by: Yossi Boaron <yboaron@redhat.com>
sridhargaddam
added a commit
to sridhargaddam/submariner
that referenced
this issue
Jun 2, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: submariner-io#2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com> Co-authored-by: Yossi Boaron <yboaron@redhat.com>
tpantelis
pushed a commit
that referenced
this issue
Jun 2, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: #2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com> Co-authored-by: Yossi Boaron <yboaron@redhat.com>
tpantelis
pushed a commit
that referenced
this issue
Jun 2, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: #2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com> Co-authored-by: Yossi Boaron <yboaron@redhat.com>
tpantelis
pushed a commit
that referenced
this issue
Jun 2, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: #2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com> Co-authored-by: Yossi Boaron <yboaron@redhat.com>
novad03
pushed a commit
to novad03/k8s-submariner
that referenced
this issue
Nov 25, 2023
Currently, submariner-gateway pod while invoking the whack commands does not set any dpdaction flags. So the default dpdaction of disabled was applied. While using this action, when the remote endpoint is not responding within a certain duration, some problematic code path in Libreswan was getting executed and leading to crash. The proper fix would be to use an updated Libreswan, but as a workaround we can explicitly set the dpdaction=hold to avoid hitting the problematic code paths. Related PR in libreswan: libreswan/libreswan@c7a6113 Fixes: submariner-io/submariner#2516 Signed-off-by: Sridhar Gaddam <sgaddam@redhat.com> Co-authored-by: Yossi Boaron <yboaron@redhat.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
What happened:
In situations where a Kubernetes (K8s) cluster experiences high resource utilization and the K8s infrastructure restarts the submariner-gateway pods on the remote cluster, it has been observed that this action leads to the crash of submariner-gateway pods on the local cluster. Additionally, this situation can also occur when the remote IPsec endpoint fails to respond to the IKE messages, which are part of the IPsec negotiation process. Both of these factors can contribute to the crash of the submariner-gateway pods in the local cluster.
What you expected to happen:
Pluto (indirectly submariner-gateway) should not crash even when the remote endpoint is not responding or unsable.
How to reproduce it (as minimally and precisely as possible):
Deploy two KIND/OCP Clusters and connect them via Submariner. Once the connections are successfully established, on one of the clusters, run the following script to periodically restart the submariner-gateway pod. Notice the submariner-gateway pod on the other cluster.
Environment:
The text was updated successfully, but these errors were encountered: