Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Only limit resources per peer #9443

Closed
wants to merge 1 commit into from

Conversation

ajnavarro
Copy link
Member

We are removing system and transient limits by default when ResourceManager is active. We will never be able to define sane defaults for all the cases, and RM is difficult to configure for most users.

Limiting resources per peer is enough to avoid a DoS attack.

Logs are changed to Warn level. Example:

2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 228 times with error "peer:12D3KooWAwJn4B1TDmw2M6Fx7ueGM9NAjHDUmp5x99xCxnfYm8Rx: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 148 times with error "peer:12D3KooWJWuTqC64UMd1sGKPxk7G9R6wy6GtzYnxwVpBwG2KQP4E: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 173 times with error "peer:12D3KooWRqk15Ep9iMEhAESdMaRmrs5ZR1UjFKehDJctyVnu74NT: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 55 times with error "peer:12D3KooWM5sNTkwTG1a9ZMmpbhoY6cCJkH1ciYCGjKhDWq64QPt2: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 144 times with error "peer:12D3KooWQ9ydrk2cQ18tL3GjjsDBdJtHgJAUcVYCuh3qRhZurX4T: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 73 times with error "peer:12D3KooWRSwBCxSqPsnor88RUyMPU1KhEATcZKsarKYJXTvn7d6v: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 249 times with error "peer:12D3KooWK4xPjV19Nm2YWxhk95RjbSLkdCM4wJSxaEWfnU1nVxK1: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 36 times with error "peer:12D3KooWHYecdfScc8uASPUF57WDMXYKkrcvhEN1KgUJeNPnVmsH: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 175 times with error "peer:12D3KooWDForzWi8v63RuEdbV1BSYf497uwvedGr2tECx7i98koN: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 269 times with error "peer:12D3KooWAmTLXf6R94meS5QwY6E5PNxojr8wJ2eWaUSLN2MQgsAm: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 423 times with error "peer:12D3KooWB5ZghciXT1r8y4X1mBqrxSzgEKVpcXqoPgy91T9uJaCU: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 205 times with error "peer:12D3KooWRRMnA99zavmn7ZC3eVYwDfQHpRT9jEMhGLW8BQmGNddX: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:53	Resource limits were exceeded 294 times with error "peer:12D3KooWQBeqmfAmB1yRvLBMRLmc8oPSGsiL9Roqo57y9cjbU7fM: cannot reserve inbound stream: resource limit exceeded".
2022-12-01T18:35:26.983+0100	WARN	resourcemanager	libp2p/rcmgr_logging.go:57	Consider inspecting logs and raising the resource manager limits. Documentation: https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr

Signed-off-by: Antonio Navarro Perez antnavper@gmail.com

Signed-off-by: Antonio Navarro Perez <antnavper@gmail.com>
@vyzo
Copy link
Contributor

vyzo commented Dec 1, 2022

have you ever heard of sybil attacks?

@ajnavarro
Copy link
Member Author

@vyzo yes! Do you have any other proposals?

People are having a hard time trying to configure ResourceManager at the System level. Any defaults that we set there will be wrong.

The better thing that we can do is to have sane defaults for everyone, and that is in my opinion limiting as much as we can connections by peer. The ideal here will be to limit one connection per IP, but is a non-trivial thing to do.

@vyzo
Copy link
Contributor

vyzo commented Dec 1, 2022

We should pick some highish defaults for system/transient at the very least, and let power users set those explicitly instead of trying to always autotune and hide it under the carpet.

@vyzo
Copy link
Contributor

vyzo commented Dec 1, 2022

Point being that without a system/transient limit you are not protecting from DoS attacks; all the attacker has to do is cycle through a bunch of sybils.

@ajnavarro
Copy link
Member Author

ajnavarro commented Dec 1, 2022

Can you define highish? for whom? Power users can still set limits explicitly.

Right now with system/transient limits, you are keeping the node alive, but mostly unusable. That is what we want?

@vyzo
Copy link
Contributor

vyzo commented Dec 1, 2022

It is not unusable, it just can't accept new incoming connections and it can still use existing connections and open new ones.
That's infinitely better than OOM crashing.

@ajnavarro
Copy link
Member Author

ajnavarro commented Dec 1, 2022

I'm eager to set here sane defaults for system and transient limits but:

  • if they are too high, small nodes will crash anyway.
  • If they are too low, more powerful nodes won't use all the resources and users won't know why or they will be bombarded with RM errors.

My point is that there are no sane defaults for system and transient limits, only for peer limits.

@vyzo
Copy link
Contributor

vyzo commented Dec 1, 2022

just take half the available memory and the fd limit for incoming limits.

Much better if they cry about logs than crashing.

@vyzo
Copy link
Contributor

vyzo commented Dec 1, 2022

Re: powerful nodes -> power users, they can configure things.

I think you are approaching this issue in the wrong direction.

@BigLep
Copy link
Contributor

BigLep commented Dec 2, 2022

Hi @ajnavarro ,

In general bounding at the peer scope by itself is not enough to protect against DoS attacks since it's trivial to create lots of peers (as discussed in the comments).

That said, we still are bounding by the amount of memory that go-libp2p will use, and even with you're proposed change, we we won't OOM because we cap System.Memory. This is certainly better than nothing.

I agree it's hard to pick defaults, but I think we can likely do better here...

Why am I even trying to pick better defaults?

Yes we need to protect against OOM. That said, if we don't bound incoming connections and streams, we are still vulnerable to CPU exhaustion processing these connections and streams. I don't know how much CPU is spent managing a connection or stream, but I know it's not free.

Proposal

With Kubo 0.17 we set the default memory that is used to calculate go-libp2p resource limits to TOTAL_SYSTEM_MEMORY]/8per https://github.com/ipfs/kubo/blob/master/core/node/libp2p/rcmgr_defaults.go#L50 . On a 16GB machine, this means 2GB for go-libp2p resource manager which translates to 124 System.ConnsInbound per https://github.com/libp2p/go-libp2p/blob/master/p2p/host/resource-manager/limit_defaults.go#L344 . (64 for initial GB and 64 for the GB increase). This checks out with what is being reported in #9432 (comment) .

I think we just set the default memory for go-libp2p too low by default. What about instead of TOTAL_SYSTEM_MEMORY]/8 we did TOTAL_SYSTEM_MEMORY]/2 (half)? I agree there's no magic number here that will solve all usecases, but this seems more reasonable to me in retorspect. go-libp2p will not use more than half the system memory (which should still provide plenty of headroom for the operating system to function even if we hit that value), and it effectively increases our default limit for `System.ConnsInbound``by 4x.

The other option here is to not use rcmgr.DefaultLimits.SystemBaseLimit.ConnsInbound in https://github.com/ipfs/kubo/blob/master/core/node/libp2p/rcmgr_defaults.go#L66. Instead define our own higher value. (I don't know a better value, but that's a knob to turn.)

Summary

Before we fallback to disabling

  1. System.ConnsInbound
  2. System.StreamsInbound
  3. Transient.ConnsInbound
  4. Transient.StreamsInbound

by default, I'd like to see if we can thread the needle here. I think one of the ways is to suggest to folks in #9432 that are having issues to set Swarm.ResourceMgr.MaxMemory to TOTAL_SYSTEM_MEMORY]/2 (they'll have to figure this value out) and we see how that performs.

(Also, even if we were to go with PR, it would need to include doc updates to https://github.com/ipfs/kubo/blob/master/docs/config.md#swarmresourcemgr )

@ajnavarro ajnavarro closed this Dec 2, 2022
@ajnavarro ajnavarro deleted the fix/limit-resources-only-per-peer branch December 2, 2022 10:14
@ajnavarro
Copy link
Member Author

Given that there is no agreement I decided to close the PR.

I totally understand your reasoning about why we should have limits at the system level, but I still believe that any default value that we set at that level, will be wrong in most cases.

I'll open a new PR changing default memory values to TOTAL_MEMORY/2 hoping users will have fewer problems with RM.

@BigLep
Copy link
Contributor

BigLep commented Dec 3, 2022

Thanks @ajnavarro

I still believe that any default value that we set at that level, will be wrong in most cases.

Can we please test this hypothesis by asking the vocal parties in #9432 to set Swarm.ResourceMgr.MaxMemory to TOTAL_SYSTEM_MEMORY]/2 (they'll have to figure this value out) and we see how that performs?

@ajnavarro
Copy link
Member Author

ajnavarro commented Dec 5, 2022

@BigLep added a comment asking that #9432 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants