Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: revamp the refresh method for better rendering performance #150

Closed
jerch opened this issue Jun 28, 2016 · 13 comments
Closed

Proposal: revamp the refresh method for better rendering performance #150

jerch opened this issue Jun 28, 2016 · 13 comments

Comments

@jerch
Copy link
Member

jerch commented Jun 28, 2016

To start the discussion about the screen representation and rendering process here are some first ideas:

  • pre container instead of div
    pre context can be rendered faster by most browser engines due to the reduced capabilities. Downside - reduced capabilities ;)
  • document fragments
    Although this needs more JS code with all the explicit node manipulation it is super fast compared to innerHTML. IMHO a must-do ;)
  • clever caching
    The emulator is caching real DOM nodes at line level at the moment. Maybe another caching strategy in combination with fragments might lift the burden to some degree.
  • use canvas instead of DOM nodes
    This is a hard one, not sure if this is worth the trouble at all. Would give the full power of what is going to be shown. Downside - needs tons of JS code to get the layouting done and all those nifty events we get with DOM nodes for free. No clue if it will show better performance in the end since text rendering on canvas is not that good. Maybe someone has experience with that, therefore open for discussion...
  • spot and rearrange style requesting calls
    Those calls will almost always trigger an incremental synchronous re-layouting (full stop until reflow is done). Needs to be checked...

Somewhat offtopic but still performance relevant:

  • scrolling
    What about native scrolling? Other ideas?
  • output alignment
    Some unicode chars will break the terminal grid. Needs some CSS rules, maybe even some kind of font glyph measuring tricks.
@Tyriar
Copy link
Member

Tyriar commented Jun 28, 2016

While canvas may be more performant, getting ti to act like a text editor may be tricky (selection, context menus, IMEs, etc.), also it's a very radical change that makes it behave significantly different. It would be good if there was separation between the rendering and terminal state, a renderer could then be swapped in or out allowing for choice. Not really something that needs attention immediately though.

The first 2 points will have by far the biggest impact imo, that's where we should focus. Resolving point 2 should also fix the layout thrashing (point 5). I still plan on picking up the document fragment change when I get some time away from vscode stabilization (1-2 weeks).

Also on native scrolling, this is only possible if the entire buffer is created in the DOM. This would be an enormous memory footprint for little payoff.

@parisk
Copy link
Contributor

parisk commented Jun 29, 2016

My two cents here:

  1. Let's leave the canvas approach out of the discussion for now. Since this will need significant effort to provide the proper UX, we should have clear pros and cons between the two approaches and actually we should have reached a point where text rendering should be considered doomed.
  2. As far as the native scrolling goes, I am pretty sure we can do this without rendering the whole buffer. CodeMirror (https://github.com/codemirror/codemirror) does this also, but I have not dug into the internals yet.

My proposal on this is first try to replace divs with pres in a PR dedicated to this goal and see how this goes and then move forward with further optimizations step by step.

@jerch
Copy link
Member Author

jerch commented Jun 29, 2016

I second your remarks about the canvas idea. Too much of an effort for an uncertain result. Might be worth a look again once a bigger overhaul is needed, but not at the current state.

Imo missing in the list above - CSS optimization. Here is a nice overview of CSS concerns. Atm there are two selectors that might be performance critical - .terminal:not(:focus) .terminal-cursor and .terminal .xterm-rows > div.

Gonna go for some pre vs. div tests first.

@jerch
Copy link
Member Author

jerch commented Jun 29, 2016

Native scrolling might be possible in combination with partial buffer rendering upon scrolling. Clusterize.js uses such a trick to show tons of data in a list.

@Tyriar
Copy link
Member

Tyriar commented Jun 29, 2016

@jerch I don't think those selectors should be particularly badly performing. .terminal:not(:focus) shouldn't be bad as it's rooted on .terminal so it only needs to check a single element for focus (O(1)). Just :not(focus) would be bad since it would need to check every element for focus (O(n)).

Similar with .terminal .xterm-rows > div. If anything the direct descendant selector would improve performance by only applying to the children of .xterm-rows.

@jerch
Copy link
Member Author

jerch commented Jun 30, 2016

As far as I understand the concerns the evaluation from right to left of the selector makes them counterintuitive. For .terminal .xterm-rows > div this kinda selects all divs in the page to filter in the next step divs with parents of class 'xterm-rows'. The following descendant rule will then filter the divs with a 'terminal' class ancestor. If that's still the case, this will get really ugly within a more complex page.

But the referred document is pretty old, I am not sure, if it is still that bad (http://calendar.perfplanet.com/2011/css-selector-performance-has-changed-for-the-better/)

Since I dont know how to test the CSS stuff reliably, I would not spend to much time for it.

@jerch
Copy link
Member Author

jerch commented Jul 1, 2016

Now I feel like Don Quixote tilting at windwills - ever wondered what this greyish pie in dev tools timeline summary is? It is "Other". Hmm.

I ran several tests with different settings to get better output performance and was able to lower the total runtime of ls -lR /usr/lib from 6s to 5s here. Dev tools say the rendering takes now about 500ms and the JS scripting about 2000ms. Sweet. Mission accomplished.
Wait - 2.5s in dev tools but the whole command takes about 5s. Isn't there something wrong? Oh yeah there is this greyish pie. It was always there. Always at around 50%. Hmm. Must be god-given. Dont't ask, even google cant't tell you. It is "Other", leave it be.

Dramatic prologue, short answer - websockets are slow. I switched off all the frontend handling (no parsing, no output) to see the raw performance. ls -lR /usr/lib still took around 3s to get into chrome. Btw the command generates 5.3 MB of data on my system, which is a lousy throughput of 1.8 MB/s (local delivery - wth).

By playing with the server side buffer size I was able to raise the throughput to 3.5 MB/s. Seems the websocket stack itself is to heavy to cope with many short data and would benefit from buffering before sending.

current master:

timeline_master
master

websocket buffered (plus some other rendering changes):

timeline_buffered
buffered

Change:
You can see and test it here.

@jerch
Copy link
Member Author

jerch commented Jul 1, 2016

Did some tests with pre. It is on par to slightly worse. Imo not worth any further testing.

Another possible optimization - the terminal container .terminal is not explicit styled atm in the demo. This triggers a full reflow up to document in chrome and adds ~1ms to a single layout and paint. In a more complex embedding page this gonna hurt much more. It can be fixed with setting explicit values for width, height and overflow (info).

@parisk
Copy link
Contributor

parisk commented Jul 1, 2016

Thanks a lot for all these information @jerch. They are really useful. I will take a deeper look later, because I couldn't find the time today. Could you please provide us with some more information about the websocket server you used and the settings you used to set up buffering?

@jerch
Copy link
Member Author

jerch commented Jul 3, 2016

@parisk You can find the adjustments here. The websocket buffer is just a quick hack to your app.js to see the performance difference. The changes should be applicable to any websocket lib (in fact the buffer is before websocket invocation), though Im not sure if it will show such an impact on a remote system, comm latency might just swallow the effect idk.
Now the total runtime is almost on par with xterm on my computer, but with less output rendering. The rendering still needs some attention since a single rendering is still quite heavy (fragments with some caching tweaks will show some impact imo - I leave that to Tyriar to test).

Also the separation of the terminal inner state handling and the rendering might be a goal (as Tyriar mentioned above). This would give other optimization options like using web workers for not DOM related work. I did a quick&dirty test with one worker thread for websocket comm and state handling and sole rendering in the main thread with these results:

timeline_worker
worker
As you can see the main thread is now idle half of the time giving back precious CPU time for other stuff. The striped yellow part is the worker "Scripting" (almost at 100%). With this approach even FF is able to show frames again. (FF just blocks atm until the command is done).
Still much room for improvement...

@parisk
Copy link
Contributor

parisk commented Jul 11, 2016

@jerch thanks for providing all this data. After releasing 1.0.0 this week, I will get on top of this, in order to set up a "roadmap" or course of action in order to tune the rendering process.

@Tyriar
Copy link
Member

Tyriar commented Dec 31, 2016

The situation now is certainly better than before, ls -lR /usr/lib only takes around 2s for me

image

@Tyriar
Copy link
Member

Tyriar commented Jan 1, 2017

Looking at this now after the changes I've done as part of microsoft/vscode#17875 I don't think it's worth pursuing any of these points. While caching would be nice, it comes at the cost of potentially quite significant memory overhead and performance is fine as is imo. Caching is less likely to be needed now also due to the queue change (#438) which will likely see the entire viewport change a lot more often than individual rows in which case caching would only help scrollback at the cost of lots of memory.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants