-
Notifications
You must be signed in to change notification settings - Fork 20
Investigation of post-mortem debugging with Promises. #45
Comments
Hey, have been thinking on this for a while - can we discuss? |
@misterdjules has done quite a bit of work to identify problematic cases and challenges in addressing them. I don't have the links handy, but perhaps he (or somebody else) does. |
I tried to document the challenges of using post-mortem debugging with code that uses promises at https://gist.github.com/misterdjules/2969aa1b5e6440a7e401. I tried to do the same with regards to using post-mortem debugging with code that uses async/await at https://gist.github.com/misterdjules/30db8fc651746c6f917a. Both documents are works in progress because shortly after I started writing them, the pull request that had been submitted to node core to provide support for promises was closed, and so that work became low priority. Both documents focused on trying to find a solution so that we could use the exact same post-mortem debugging tools that we use at Joyent (mdb and mdb_v8) to inspect core files generated after an uncaught error is thrown within a promise handler. The main obstacle to making more progress was, in my opinion, finding out what changes to the behavior of programs using promises would be acceptable to accommodate the needs of post-mortem debugging users. That would have required discussing the design of promises with other people (e.g TC39 members, etc.) and exploring what could be changed about it, but it hasn't been done. If I understand correctly, this issue introduces an effort that will explore other avenues, so maybe the two gists I linked above and the context I just gave are not that relevant. However, if you have any question, feel free to ask and I'll be happy to try to help! |
@benjamingr I can facilitate setting up a meeting to discuss for those who are interested. Doodle here: http://doodle.com/poll/mpmrqsxi3tq4msiq. I only suggested a few times, if these don't work out we can find a different set. Anybody who is interested and @uzun please fill out the poll. |
Starting point for examples discussion with Max today: |
So on the 12th, anyone else wants to join the meeting? |
Just set the time to 12 EST on Monday the 12th of June based on the doodle, I'll post a link to a hangout in this issue just before 12 on Monday. If anybody else is interested pleas join in. |
Meeting recording here: https://www.youtube.com/watch?v=WeQAIUPbHOA although we started it a bit late so did not capture all of the discussion. We also agreed to have a regular meeting 12 EST on Mondays to get started. @benjamingr is going to list a few more test cases we should cover and he also provided a link to the v8 API's that let you register for rejections (max will add link to this issue) |
In my opinion, the most interesting case is: function test() {
var p = Promise.reject(new Error());
process.nextTick(() => modifyHeap());
// if you remove me, a core dump should return an unmodified heap
setTimeout(() => { p.catch(function(() => {});
} The promise is handled in a macrotick - but the heap is modified in a microtick. This is the "core" case @misterdjules and I have been discussing. Unless we can get a core dump "fast enough" to take in this case (which we can't if I understand correctly, since it's not viable to only capture the page changed in modifyHeap() and the rest later if it's unhandled). |
It might be possible to have an option/hook to fork the process at the reject point (i.e. before modifyHeap()). The parent and child then run as copy-on-write. Then if/when the decision is made that a core dump is needed, we request the child (which has been idle) to dump core. Alternatively, we capture a lightweight node-report or similar at the unhandled point - first failure. Then using the information it contains (eg stack trace?) we set a option/hook to trigger a core dump at the reject point on a subsequent run. |
I was under the impression that this solution is impossible because taking a copy-on-write core dump is too expensive. What if we fork and then just block - and then take the dump only when we're sure? Is forking 2 times a second without doing any memory touching too expensive? |
Yes, that's what I mean, the child just blocks, so there is no disk i/o until we are sure we want to dump. Just the cost of the memory copies in the parent process - we'd need to see how expensive that is. |
@rnchamberlain if it's possible - it would be the best path forward. I think forking should be fast enough. |
I originally said this elsewhere, but it applies here:
Besides that, the performance cost is not trivial. Even if the child process were to immediately exit, a lot of work has to happen as part of fork: all threads need to be stopped, the entire list of open file descriptors needs to be duplicated, and all of the parent process's page tables need to be marked copy-on-write. Besides that immediate cost, there's a subsequent performance impact on both the parent and the child after the fork() because of COW. fork() is not a lightweight operation. |
@davepacheco Agreed that taking core dumps is not a light-weight exercise and involves some technical challenges to overcome, but getting a core dump is a valuable feature. Can you suggest an alternate approach? |
@sam-github Core dumps are indeed somewhat expensive, but that cost is only incurred when the program has already failed fatally, so that's often not a big deal. My concerns were about using fork() in error cases that are not necessarily fatal. To answer your question: I don't know that it's possible to satisfy the competing goals here (namely, to record program state at the point of fatal failure and also allow programs to handle an otherwise fatal failure after it's already happened). For me, this makes the use of promises as specified a non-starter. I understand that it's a tradeoff, and I'm glad people are looking into options to make these work better, but I think using fork() here would only compound the challenges of using promises with postmortem debugging. |
The alternative strategy is to capture enough information at the 'unhandled' point - i.e. on first failure - to allow a selective trap to be set during subsequent runs of the application. The trap has to be sufficiently selective so that we can trigger core dumps at the 'reject' point, without triggering an excessive number (we'd need a count/limit anyway). |
@rnchamberlain experimenting with the selective approach is part of the plan. |
Hangout for this weeks call: https://hangouts.google.com/hangouts/_/ytl/Z-z4RMwrmwuiryRsNwGhdBBlzNA4Yt3LhBEg6_DAq_Y=?eid=100598160817214911030 |
@benjamingr you going to make it this week ? |
Link from last week that @benjamingr mentioned in chat: https://github.com/nodejs/node/blob/master/lib/internal/process/promises.js |
Sorry, I was pulled into a meeting. |
Of interest: https://www.npmjs.com/package/gencore once we get to the point were we want to generate cores on a second run. |
@benjamingr np, hope to see you next week. |
Will post hangout in just a minute |
If we don't want to continue running after producing the core dump, the SIGSEGV/SIGABRT mechanism is probably easiest. It would match what we do for --abort -on-uncaught-exception - see the |
@uzun can you please let me know if the above comments help with the questions you've had about how to generate core dumps from Node core? |
There is only a 50% change I can make the meeting today as I have some family related issues. If I can't I'll catch up with @uzun later in the week. |
@uzun do we have any reason to meet today? |
@benjamingr @mhdawson I think we can just skip this week then and catch up next Monday. |
Just a heads up, I won't be able to make it this week, so no meeting today. |
Are we on for today ? |
I'm attending a thesis defence that might overlap with it this afternoon. We can hold off until next Monday or reschedule for later in the week. Not as much progress to report as I'd like, so I could use the extra time. |
Ok lets do it next week as I'm just catching up after being away for a week myself. |
I'm out sick today and can't make it, I'll post some progress updates this week. |
ok, we might want to pick another time this week as we have not met for a while. Once you feel better can you propose a time. |
Got it, next week or earlier then. |
Hi, given there were no meetings for a few weeks now - is this actually happening? Are you still actively pursuing this @uzun ? |
I think given nodejs/node#15335 this issue can be closed. |
@benjamingr maybe wait until nodejs/node#15335 has landed |
I'm looking for open source projects that extensively use promises. If anyone has any suggestions, let me know. |
@uzun You might try GitHub or npm. Or try asking around on twitter. Here's a project that utilizes promises: https://github.com/bevacqua/promisees |
It was mentioned in the recent benchmarking meeting that the latest HAPI framework uses promises exclusively, so maybe something built on that? |
Has anyone considered using catch prediction, rather than HostPromiseRejectionTracker, for deciding when to capture a core dump? @bmeurer has pointed out that the old logic didn't do such a good Catch prediction is meant to solve this problem in Chrome DevTools: If post-mortem stacks and core dumps are done via catch prediction |
@littledan how would we experiment with Catch prediction (pointers to which files to look it, what to search for etc,)? @uzun was looking at benchmarking different options and that sounds like one that he should include. |
@mhdawson The API is driven by ChangeBreakOnException. |
@uzun sequelize, most modern DB drivers, koa and most of the newer ecosystem. |
I believe this work is now being done in https://github.com/nodejs/promises-debugging. If this needs to be tracked outside of that repository, please open an issue over in https://github.com/nodejs/diagnostics. |
After reading this comment, I walk away feeling like the things I have cared about for a long time aren't taken seriously by the author, though I suspect that was not the intent. Promises changing the state of a process before it exits has been a known problem for nearly as long as I've been in the Node.js community, and effort has been invested by quite a few people (i.e. the amazing @addaleax!) in this space. nodejs/post-mortem#45 (speaks directly to it as if it were a known long-standing issue) nodejs/promises#26 (doesn't reference core dumps but speaks to changing state on process exit) We are still resisting promises in the restify codebase due to this class of problem: restify/node-restify#1304 I do think it is fair to say: * Those with a vested interest in operating Node.js in production at scale haven't shown up with the resources to solve for operating promises. I'm starting to see this change (yay!). * Many of conversations around this happened outside of the Node.js issue tracker, often in echo chambers and hallway tracks. If you weren't actively looking for these concerns, they would have been easy to miss. I believe this is already changing (also yay!). I also think it is fair to say: * Many of those who had a vested interest in operating Node.js in production at scale _did_ show up and broadcast their concerns about the form of the promises specification when it was being proposed. The ones I saw were met with something between dismissal and hostility. * Those I know who _did_ show up left this feeling as if promises in their current form were both rushed through the specification process and forced upon this part of the community without much regard for their input. After this, many wholesale rejected promises as a viable primitive which left adoption to be driven by those who didn't have this class of concern. To put it another way, if you cared about post-mortem and your concerns weren't heard for the spec, you didn't use promises and since you didn't use promises you wouldn't be actively engaging in issue trackers about promises. I don't bring up these two points to add fuel to an old argument, but to try to cultivate empathy for why these concerns are only now starting to shake out of the woodwork from some people's perspectives. The part of the community that cares about this class of problem - and that we want investment from to solve this class of problem - was ostracized early on in the story of promises. We should be making efforts to draw those who abandoned promises back into the community, I don't believe the tone of this section advances that cause.
I am a graduate student at the IBM Centre for Advanced Studies—Atlantic at the University of New Brunswick working with @mhdawson. We are interested in working on post-mortem debugging for promises in Node.js as part of our research and are hoping to get the group’s input as we tackle this problem. I realize from reading the Issues that there isn’t consensus on a solution and although I am not yet a Node expert myself, I am willing to dedicate time to contribute towards a possible fix.
Our current plan would be to:
Some of the ideas we would plan to test would include:
We plan to keep this issue up to date with pointers to a fork that includes the code for the options we experiment with.
If you have any suggestions on what we should consider looking at or would like to collaborate on this effort just let me know.
The text was updated successfully, but these errors were encountered: