Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restore memleak detect, fix leaks we missed. #5055

Merged

Conversation

rustyrussell
Copy link
Contributor

@rustyrussell rustyrussell commented Feb 25, 2022

Based on #5051 this fixes all the other (far less important!) leaks, and fixes the leak detector which had multiple problems.

Of course, I wrote it in the other order, to make sure it found #5051!

(Rebased on #5056 now #5051 is merged)

@rustyrussell rustyrussell added this to the v0.11 milestone Feb 25, 2022
@cdecker
Copy link
Member

cdecker commented Feb 25, 2022

ACK 6109132

cdecker and others added 15 commits February 26, 2022 11:13
These tests have proven to be:

 a) very expensive, as they spin up many nodes, and perform long setup
 b) are not testing anything specific, they just fuzz functionality
    that is already tested otherwise
 c) have not helped pinpoint any issues in living memory
 d) are very flaky, making for really bad signal-to-noise, so much
    that devs usually just restart without even looking at the logs
 e) even if we were to look at the logs, we'd be unable to reproduce
    due to the inherent randomness involved in these tests
 f) are really noisy neighbors, causing other tests to flake as well,
    further muddying the water

All in all, these tests are a waste of time, and source of
frustration.

[ Cleaned up python unused imports --RR ]
Changelog-None
client_read_next(…) calls io_read_wire(…), passing &c->msg_in as the
address of a pointer that will be set to the address of the buffer that
io_read_wire_(…) will allocate, and passing c (a pointer to the struct
client instance) as the parent for the new allocation. As long as the
struct client instance eventually gets freed, the allocated message
buffer will be freed too, so there is no "leak" in the strict sense of
the term, but the freeing of the buffer may not occur for an arbitrarily
long time after the buffer has become disused, and indeed many millions
of message buffers may be allocated within the lifetime of one struct
client instance.

handle_client(…) ultimately hands off the c->msg_in to one of several
message-type-specific handler functions, and those functions are not
TAKES or STEALS on their message buffer parameters and do not free their
message buffer arguments. Consequently, each successive call to
client_read_next(…) will cause io_read_wire_(…) to overwrite the
c->msg_in pointer with the address of a newly allocated message buffer,
and the old buffer will be left dangling off of the struct client
instance indefinitely.

Fix this by initializing c->msg_in to NULL in new_client(…) and then
having client_read_next(…) do `c->msg_in = tal_free(c->msg_in)` prior to
calling io_read_wire(…). That way, the previous message buffer will be
freed just before beginning to read the next message. The same strategy
is already employed in common/daemon_conn.c, albeit without nulling out
dc->msg_in after freeing it.

Fixes: ElementsProject#5035
Changelog-Fixed: hsmd: Fixed a significant memory leak
Seeing if this helps my build box which previously gives:

```
/home/rusty/lightning-ltest/lightningd/lightning_connectd: libbacktrace: unrecognized DWARF version in .debug_info at 6
/home/rusty/lightning-ltest/lightningd/lightning_connectd: libbacktrace: no debug info in ELF executable
```

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. p is a child of cmd, so it's freed by command_failed.
2. cltv_budget is set a few lines up to the same thing already.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
They can only follow pointers (without help) if they point to the
*start* of a tal object.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Not really a leak, but it would last as long as the peer.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The current client is in dbid_zero_clients (i.e. it's lightningd).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. We don't keep a pointer to payments (unlike pay, which keeps a
   linked list), so mark it notleak.
2. plugin->our_features is overloaded for "features we want to set" (used by keysend)
   and then "features we have".  Create a new field, which is cleaner.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This causes weirdness in pay, which tal_steal's b11->routes and expects
to get it all.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. The dijkstra can be temporary, doesn't need to last as long as pay cmd.
2. We fail multiple times in several places, so don't leak old failreason.
3. Make payments findable by our memleak detector.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
1. Tell memleak about our linked-list of current payments.
2. Don't remove them from list until we actually free them (in destructor, naturally).
3. Decode invoices into tmpctx (we steal / copy what we want anyway).
4. Free params after we've used them.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Not actually leaks, but they do live longer than they need.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We allocate the default, then callback allocates over the top.  Mark
params with a default, so we can free that when it's called.

(We can't do this generally, since not all param args are actually
pointers to pointers, though opt_param_def has to be).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
It's a small leak, but real.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Accidentally removed in 6c9b752.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We were ignoring the *parent* of the notleak pointers, not the notleak
pointer itself!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We often hand an exclude pointer (usually the current command) to
memleak.  But when we encountered this we would stop iterating, rather
than just ignore it: this means we would often ignore significant siblings.

In particular, fixing this (which has always been there) reveals many
previously-undetected leaks.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell
Copy link
Contributor Author

Ack b0cbf1a

@rustyrussell rustyrussell merged commit dd495b2 into ElementsProject:master Feb 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants