Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[async-hooks] request for feedback #129

Closed
mike-kaufman opened this issue Dec 11, 2017 · 8 comments
Closed

[async-hooks] request for feedback #129

mike-kaufman opened this issue Dec 11, 2017 · 8 comments

Comments

@mike-kaufman
Copy link
Contributor

mike-kaufman commented Dec 11, 2017

In the 2017-11-29 Diagnostics WG meeting, one of the issues that came up was a lack of insight into how async-hooks was preceived by it's early adopters, in particular early adopters who were using async-hooks at scale.

After discussion we decided to open an issue here to solicit feedback from ppl. In particular, can you chime in here with comments, including any details on the following:

  1. Any cases where async-hooks is losing context or has incorrect context? If so, what are you doing about it? How did you detect the "lost context"?

  2. Perf impact of async-hooks?

  3. Any modules (native or non-native) that you're seeing that need updated to correctly propogate context?

/cc @watson (Elastic), @jkrems (groupon), @vdeturckheim (sqreen), @NatalieWolfe (new relic), @kjin (GCP/stack driver), @ofrobots (GCP/stack driver)

@AndreasMadsen
Copy link
Member

Any native modules that you're seeing that need updated to correctly propogate context?

Or non-native modules. But where they have a queue.

@chrisdickinson
Copy link

I have an issue here detailing my attempt to move from CLS-based-on-domains to CLS-based-on-async-hooks. The connection pooling problem prevents me from using async hooks at present.

@Qard
Copy link
Member

Qard commented Dec 12, 2017

Yeah, userland queuing is a major problem. The majority of userland queuing happens via the generic-pool, so most APM vendors just patch that, but there are definite exceptions.

@NatalieWolfe
Copy link

Our usage of async-hooks is limited to instrumenting native promises when used with async/await. We had a few customers complain of a memory leak when using our async-hooks instrumentation. Upon investigation this turned out to actually be a promise leaked through a closure in pg-pool. Async-hooks simply exacerbated what would otherwise have been a small leak since the destroy hook was never called.

We have had no other reports of issues regarding async-hooks, and Node 8 (where async-hooks is on by default) makes up more than half of our usage.

On the performance side, things are a little different. In our contrived benchmark of resolving a 300-promise chain simply enabling async-hooks causes the test to perform at ~15% of the base case.

no agent, no hooks x 15,710 ops/sec ±1.17% (81 runs sampled)
  GC stats for no agent, no hooks:

    Scavenge count: 430
    Scavenge duration: 0.3663271049999998 sec

    IncrementalMarking count: 1
    IncrementalMarking duration: 0.00014575700000000002 sec

    MarkSweepCompact count: 1
    MarkSweepCompact duration: 0.001349373 sec

  Overall GC stats:

    Total GC count: 432
    Total GC duration: 0.36782223499999983 sec

    Duration per test run: 0.09195555874999996
    Duration per GC: 0.0008514403587962959 sec
no agent, noop async hooks x 2,386 ops/sec ±1.33% (78 runs sampled)
  GC stats for no agent, noop async hooks:

    Scavenge count: 156
    Scavenge duration: 0.5808710489999999 sec

    IncrementalMarking count: 2
    IncrementalMarking duration: 0.000325359 sec

    MarkSweepCompact count: 2
    MarkSweepCompact duration: 0.002096918 sec

  Overall GC stats:

    Total GC count: 160
    Total GC duration: 0.5832933259999998 sec

    Duration per test run: 0.11665866519999997
    Duration per GC: 0.003645583287499999 sec
agent async hooks x 1,769 ops/sec ±0.91% (77 runs sampled)
  GC stats for agent async hooks:

    Scavenge count: 79
    Scavenge duration: 0.483310709 sec

    IncrementalMarking count: 20
    IncrementalMarking duration: 0.003254328 sec

    MarkSweepCompact count: 20
    MarkSweepCompact duration: 0.03899593899999999 sec

  Overall GC stats:

    Total GC count: 119
    Total GC duration: 0.525560976 sec

    Duration per test run: 0.08759349599999999
    Duration per GC: 0.004416478789915966 sec
instrumentation x 2,553 ops/sec ±0.99% (79 runs sampled)
  GC stats for instrumentation:

    IncrementalMarking count: 4
    IncrementalMarking duration: 0.0007899300000000001 sec

    MarkSweepCompact count: 4
    MarkSweepCompact duration: 0.010999376000000002 sec

    Scavenge count: 285
    Scavenge duration: 0.8076918919999998 sec

  Overall GC stats:

    Total GC count: 293
    Total GC duration: 0.8194811979999999 sec

    Duration per test run: 0.20487029949999996
    Duration per GC: 0.0027968641569965863 sec

Real-world applications do not see this extreme of overhead, but it is still noticeable. Unfortunately I don't have numbers for that at the moment. :/

Regarding user-land queueing, we've found that the majority of that is actually in the plethora of promise libraries out there. Every one of them manages its own micro task queue which must be independently instrumented.

@vdeturckheim
Copy link
Member

I have not started to implement Async Hooks in Sqreen agent yet. My plan was to evaluate overhead in a few open source apps before continuing but end of year being extremely busy, it is still in the backlog.

If I have time to do this, I'll share my results and tests.

@ghost
Copy link

ghost commented Dec 14, 2017

We are using async-hooks in Node 8 for logging capabilities. Uniq transaction ids are passed through promise chains. This gives us detailed, correlated logging output. Not more that 10 lines of code for setup, but of great use.

@AndreasMadsen
Copy link
Member

petkaantonov/bluebird#1472 (comment) shows a slow-down in bluebird when they use the async_hooks Embedder API.

@mike-kaufman
Copy link
Contributor Author

Thanks everyone for your input here. I'm going to close this. I've added an item to issue #124 to figure out what "acceptable performance" impact is.

In addition to performance, there's also a question around whether the API is "the right API". i.e., does the API solve the intended problems. If anyone has feedback on the utility of the API, please feel free to re-activate this, or open another issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants