-
Notifications
You must be signed in to change notification settings - Fork 214
Description
Version v0.22 introduced an experimental managed transport to move towards fixing some stability issues when executing git network operations.
This issue catalogues all known issues with the new transport and their respective statuses. Please note that some of this issues could also be experienced with go-git
and the non-managed libgit2
implementations.
1) ssh.Dial hangs indefinitely ✔️
SSH connections hang indefinitely during a ssh.Dial
call. Behind the scenes the transport handshake seems to get stuck during key exchange (at kexLoop
). More information can be found at upstream issue.
Fixed from:
source-controller
-> ghcr.io/fluxcd/source-controller:v0.22.4
image-automation-controller
-> image-automation-controller:v0.21.2
2) HTTP leaked connections ✔️
The controllers shown an ever increasing number of HTTP established connections (i.e. netstat
).
Upon investigation, some requests were not completely processed and closed, impairing the likelihood of the underlying connections to be reused. The transport instances were created per request and never shared.
Fixed from:
source-controller
-> ghcr.io/fluxcd/source-controller:v0.22.4
image-automation-controller
-> image-automation-controller:v0.21.2
3) SSH leaked connections ✔️
The controllers shown an ever increasing number of SSH established connections (i.e. netstat
).
SSH connections are now cached based on the remote target, meaning that all the operations take place as part of the same connection instead of the previous 1 connection per command (clone/push).
Fixed from:
source-controller
-> ghcr.io/fluxcd/source-controller:v0.22.4
image-automation-controller
-> image-automation-controller:v0.21.2
4) Intermittent SSH errors ✔️
The upstream git and crypto libraries do not support multiple and concurrent SSH connections very well (i.e. golang/go#27140).
An initial attempt to cache ssh connections and reuse them cross ssh commands completely eliminated intermittent errors (i.e. #439) during long-running tests.
Fixed from:
source-controller
-> ghcr.io/fluxcd/source-controller:v0.22.4
image-automation-controller
-> image-automation-controller:v0.21.2
5) Panic when closing SSH connections ✔️
The upstream git2go implementation was trying to call .Wait()
and .Close()
in session or stdin objects that could be nil
, leading to panic.
Fixed from:
source-controller
-> ghcr.io/fluxcd/source-controller:v0.22.5
image-automation-controller
-> image-automation-controller:v0.21.3
6) multi-ack
protocol over SSH ✔️
Connecting to ssh servers that require Git's multi-ack
feature (i.e. Azure DevOps) results in consistent errors:
EOF
transport closed
This seems to occur due to the fact that the remote server closes the connection mid-flight.
Connections to Azure DevOps will fallback to unmanaged transport and users will also gain opt-in/out powers based on #662
7) BitBucket ✔️
Multiple concurrent Git connections (one per key type for example) lead to errors ssh.Dial: dial tcp xxx.xxx.xxx.xxx:22: i/o timeout
or ssh: rejected: administratively prohibited (cannot open additional channels)
.
The removal of cached connections and servicing the PipeStdOut fast enough has fixed this.
8) git2go
/libgit2
may panic and force the controller to crash ✔️
- git2go internal state may cause panics. This has been replaced with TransportOptions.
9) Stale connections leading to continuous errors ✔️
Cached connections may stale over time. In some Git providers (e.g. GitLab) this may happen sooner than others.
Once the connections become stale, errors reconciling become common.
Fixed from:
source-controller
-> ghcr.io/fluxcd/source-controller:v0.23.0
image-automation-controller
-> pending
Metadata
Metadata
Assignees
Labels
Type
Projects
Status