Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dart VM closures are not safe-for-space (share the same context if defined within the same scope - leading to overcapturing) #36983

Open
jensjoha opened this issue May 16, 2019 · 17 comments
Assignees
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P2 A bug or feature request we're likely to work on triaged Issue has been triaged by sub team type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@jensjoha
Copy link
Contributor

Running this program:

$ cat memoryleak.dart
main() async {
  var result;
  for(int i = 0; i < 100; i++) {
    result = await calc(result);
  }
  print("done");
}

calc(var oldResult) async {
  var newResult = calcInternal(oldResult);
  foo() {
    print("hello world");
  }
  newResult.foo = foo;
  return newResult;
}

calcInternal(var oldResult) {
  return new Foo(new List<int>(10000000));
}

class Foo {
  var foo;
  final List<int> data;
  Foo(this.data);
}

gives this result when run via /usr/bin/time:

$ /usr/bin/time -v out/ReleaseX64/dart memoryleak.dart
done
        Command being timed: "out/ReleaseX64/dart memoryleak.dart"
        User time (seconds): 10.01
        System time (seconds): 2.04
        Percent of CPU this job got: 167%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:07.19
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 7908452
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 0
        Minor (reclaiming a frame) page faults: 1573005
        Voluntary context switches: 2635
        Involuntary context switches: 42
        Swaps: 0
        File system inputs: 0
        File system outputs: 0
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

I'll mark the important part here: Maximum resident set size (kbytes): 7908452 --- i.e. it uses 7GB+ memory.

Via observatory I can see that all 100 Foo's are alive, and picking a semi-arbitrary one of them, and clicking the "Retaining path" thing gets me this:

Retaining path	{  ⊟  
Foo {  ⊞  }
retained by offset 3 of Context (7) {  ⊞  }
retained by var _context of Closure (foo) {  ⊞  }
retained by var foo of Foo {  ⊞  }
retained by offset 3 of Context (7) {  ⊞  }
retained by var _context of Closure (foo) {  ⊞  }
retained by var foo of Foo {  ⊞  }
retained by offset 3 of Context (7) {  ⊞  }
retained by var _context of Closure (foo) {  ⊞  }
retained by var foo of Foo {  ⊞  }
retained by offset 3 of Context (7) {  ⊞  }
retained by var _context of Closure (foo) {  ⊞  }
retained by var foo of Foo {  ⊞  }
retained by offset 3 of Context (7) {  ⊞  }
retained by var _context of Closure (foo) {  ⊞  }
retained by var foo of Foo {  ⊞  }
retained by offset 3 of Context (7) {  ⊞  }
retained by var _context of Closure (foo) {  ⊞  }
retained by var foo of Foo {  ⊞  }
retained by offset 3 of Context (7) {  ⊞  }
retained by var _context of Closure (foo) {  ⊞  }
retained by var foo of Foo {  ⊞  }
retained by offset 3 of Context (7) {  ⊞  }
retained by var _context of Closure (foo) {  ⊞  }
retained by var foo of Foo {  ⊞  }
retained by a GC root

so it seems to me that the current Foo (which should be alive --- it's in variable result) has the closure foo in it (so far so good), which then has a context which has the previous Foo in it and on and on.

Also notice that

  • if I don't have the newResult.foo = foo; line the problem goes away.
  • if I remove the async/await stuff the problem goes away.
@a-siva a-siva added type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P2 A bug or feature request we're likely to work on labels May 24, 2019
@a-siva a-siva self-assigned this May 24, 2019
@mdempsky
Copy link
Contributor

mdempsky commented Jun 11, 2019

if I remove the async/await stuff the problem goes away.

I'm assuming that foo() async { ... } gets transformed by the frontend into foo() { return Future(() { ... }); }, and notably that still preserves the problem too. I simplified calc as far as:

calc(var oldResult) {
  var newResult = (() => calcInternal(oldResult))();
  foo() { print('foo'); }
  newResult.foo = foo;
  return newResult;
}

Currently, I haven't been able to simplify it past (1) calc has two nested functions, (2) one of those functions closes over oldResult, (3) the other function survives, and (4) oldResult is not explicitly cleared.

I'm guessing that all closures created within a function share a common context object, and since one of the functions needed oldResult, then foo ends up keeping it alive too even though foo doesn't actually need it.

Edit: Further evidence:

var never = false;

calc(var oldResult) {
  if (never) {
    (() => print(oldResult))();
  }
  var newResult = calcInternal(null);
  newResult.foo = () => print('foo');
  return newResult;
}

@mraleph
Copy link
Member

mraleph commented May 28, 2020

@mkustermann we should probably introduce a liveness analysis which compute which variables should be promoted into the context across yield points. This should alleviate this a bit - because we are over-promoting right now.

Additionally it is probably worth taking a look at alternative strategies for implementing closures - so that a closure does not retain any state it does not reference. My concern would be that solutions that adress the memory leak might have negative impact on code size - but we don't really know unless we try.

Maybe something that @cskau-g could take a look at?

@jensjoha
Copy link
Contributor Author

Condensed example of a leak in pkg/frontend_server/lib/frontend_server.dart:

import 'dart:async';
import 'dart:developer';
import 'dart:io';

main() async {
  Foo foo = new Foo();
  await foo.bigAllocation();
  await foo.smallComputation();
  print("done");
  debugger();
}

final Map<String, dynamic> leakFacilitator = <String, dynamic>{};

class Foo {
  Future<bool> bigAllocation() async {
    leakFacilitator['foo'] = (dynamic flags) => flags;
    LeakMe results;
    await new File("").exists();
    results = new LeakMe(new List<int>(10000000));
    // results = null;
    return true;
  }

  Future smallComputation() async {}
}

class LeakMe {
  final List<int> data;
  LeakMe(this.data);
}

run like out/ReleaseX64/dart --enable-vm-service --disable-service-auth-codes --disable_dart_dev leakMe.dart and go to the observatory Allocation Profile => class "LeakMe" is alive (including the 100MB list in there).

If one uncomments the line // results = null; (so we actually null it out) the leak goes away.

@leafpetersen
Copy link
Member

This is great! Happy to see this explored.

negative impact on code size - but we don't really know unless we try.

Given the code size concerns (and given the fact that closure identity is observable in Dart) I wonder whether that would suggest an environment passing approach despite the slightly larger memory usage? I don't know if that makes sense in our existing implementations since I don't know what our closure representations look like, but separating the environment should give you more freedom to share environments (and hence eliminate some of the code required to construct them).

@mraleph
Copy link
Member

mraleph commented May 29, 2020

Given the code size concerns (and given the fact that closure identity is observable in Dart) I wonder whether that would suggest an environment passing approach despite the slightly larger memory usage?

That's more or less what we use now. We have a variation of what in classical literature is called "linked closures": each closure object has a pointer to a context object and each context object can have a parent context - this is just a straightforward representation of syntactical scopes. Context objects store variables directly.

@rakudrama
Copy link
Member

rakudrama commented Dec 19, 2020

I have just discovered something similar in the dart2js source.

Q: How do I work around a leak like this? Is nulling-out a local guaranteed to be sufficient or would I need to manually extract the code into a method and force it to never be inlined?

What I see:

The local variable source appears to be retained after the enclosing async method has completed:
https://github.com/dart-lang/sdk/blob/master/pkg/compiler/lib/src/serialization/task.dart#L262

The state of the _FutureListerner is isAwait.
I'm not sure what that means, but I am certain the future completed since dart2js wrote out the JavaScript contained in source.

I think there are two leaks here.

  • The async_op should not hold a local from a loop after the loop exited. (I'm pretty sure dart2js has this problem too since locals are flattened into a single ES5 frame).
  • The async_op should not be retained by _FutureListener after it is finished
  • I can't find the retaining objects for the triplet of closures, so this might also be a GC 'bug' where GC has not yet collected all dead objects on old-space. I forced GC in Observatory three times to be 'sure'.

(the references from _source are tight cycles not contributing to the retaining path)
image

@goderbauer
Copy link
Contributor

This is causing hard to debug (and understand) memory leaks in flutter applications that use closures for gesture handlers and route builders, see flutter/flutter#79605 for an example.

@minikin
Copy link

minikin commented Jul 9, 2021

Totally agree with @goderbauer. I believe we can ReadingOrderTraversalPolicy to this group.
Screenshot 2021-07-09 at 10 39 44 AM

It'd be great to have something like Memory Graph in Xcode for Dart/Flutter dev tools.

dart-bot pushed a commit that referenced this issue Sep 3, 2021
…d over

As a side-effect of making closures without captured variables have an
empty context, we not only avoid a memory leak, but also allow such
closures to be sent across [SendPort]s.

Issue #36983
Issue #45603

TEST=vm/dart{_2,}/isolates/closures_without_captured_variables_test

Change-Id: I5e8845203059ff625f7cff3f072d845b5e48ba36
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/212297
Commit-Queue: Martin Kustermann <kustermann@google.com>
Reviewed-by: Slava Egorov <vegorov@google.com>
copybara-service bot pushed a commit that referenced this issue Jan 27, 2022
In https://dart-review.googlesource.com/c/sdk/+/214020
incremental_compiler_leak_test was renamed to
incremental_compiler_leak_tester temporarily because it started saying
we had possible leaks.

Looking into it now, a number of puzzeling this were found:
* Many SourceLibraryBuilders, SourceClassBuilders etc was alive despite
  having been converted to DillLibraryBuilders (etc). They shoudn't be.
* Several Librarys (from kernels AST) came and went between
  invalidations.

This all was caused by the VM sometimes keeping things around longer
than it should (probably a variation of
#36983).

This CL fixes it by explicitley nulling things out (and similar).
It re-enasbles the test and updates it to explicitley expecting a
certain number of instances for some classes (e.g. it now expect
0 SourceLibraryBuilders).

To be clear I don't think this has caused any "real" leaks, but:
* having things clear up too late is very confusing.
* we will probably use less memory now.

Change-Id: I3a439f23fc7ef26b156c6164a2c37f6352bc57b4
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/229964
Reviewed-by: Johnni Winther <johnniwinther@google.com>
Commit-Queue: Jens Johansen <jensj@google.com>
@goderbauer
Copy link
Contributor

Another example of this problem was found in combination with Isolate.run, see #48566.

@alexmarkov
Copy link
Contributor

New async/await implementation which is based on copying stack frames (developed to address #48378) might be able to fix this bug because it doesn't capture local variables in async methods.

@mkustermann
Copy link
Member

This bug is about over-capturing of state in closures in general - which we should fix.

@alexmarkov
Copy link
Contributor

The new implementation of async/await landed, so async methods no longer capture all local variables.
The original example, as well as other examples in this issue which involve async are now fixed.

The problem related to multiple closures sharing the same context still remains.

@mraleph mraleph changed the title Memory leak via "context" on async methods Dart VM closures are not safe-for-space (share the same context if defined within the same scope - leading to overcapturing) Dec 1, 2022
@gaaclarke
Copy link
Contributor

In my case, the problem with overcapturing is that it blocks communication of state from Fluter's root isolate to a background isolate via a closure since it was erroneously capturing dart:ui classes which are blocked from being sent over SendPorts:

  @override
  void initState() {
    super.initState();
    final int index = widget.index;
    _getAgent().then((agent) async {
      final String? decoded =
          await agent.query((state) => state.decodedMessages[index]);
      if (mounted) {
        setState(() {
          _text = decoded;
        });
      }
    });
  }
[VERBOSE-2:dart_vm_initializer.cc(41)] Unhandled Exception: Invalid argument(s): Illegal argument in isolate message: (object extends NativeWrapper - Library:'dart:ui' Class: EngineLayer)
#0      _SendPort._sendInternal (dart:isolate-patch/isolate_patch.dart:249:43)
#1      _SendPort.send (dart:isolate-patch/isolate_patch.dart:230:5)
#2      Agent.query (package:isolate_agents/isolate_agents.dart:104:15)
#3      _MessageState.initState.<anonymous closure> (package:romeo_juliet/main.dart:144:23)
<asynchronous suspension>

That looks like a novel observation about this bug. I don't believe there is a workaround (short of writing the data to pipe and reading it back =P). More details in #50593

@mkustermann
Copy link
Member

That looks like a novel observation about this bug. I don't believe there is a workaround (short of writing the data to pipe and reading it back =P). More details in #50593

One can always work around the issues by restructuring code, would e.g. this work for you:

  void initState() {
    super.initState();
    final int index = widget.index;
    _getAgent().then((agent) async {
      final String? decoded = await _sendQuery(agent, index);
      if (mounted) {
        setState(() {
          _text = decoded;
        });
      }
    });
  }
  Future _sendQuery(agent, index) => agent.query((state) => state.decodedMessages[index]);

?

@brianquinlan
Copy link
Contributor

@mkustermann I was planning on suggesting that restructuring in the documentation Isolate.run/Isolate.spawn for developers getting Illegal argument in isolate message messages (discussion in #51594) - would it make sense to point out that doing so can also reduce the memory footprint of the spawned Isolate?

@mkustermann
Copy link
Member

@mkustermann I was planning on suggesting that restructuring in the documentation Isolate.run/Isolate.spawn for developers getting Illegal argument in isolate message messages (discussion in #51594) - would it make sense to point out that doing so can also reduce the memory footprint of the spawned Isolate?

Maybe in a it very cautionary way / as a side-note, because we shouldn't tell our users to re-write code in a more ugly / hard to maintain way if they wouldn't actually benefit from it. (With infinite resources we could even highlight in the IDE directly what state is retaine & we could find a solution for the over-capturing issue in the first place).

@milesegan
Copy link

I just bumped into this today. This can lead to some very non-obvious bugs when using isolates and providers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. P2 A bug or feature request we're likely to work on triaged Issue has been triaged by sub team type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests