Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] vtebench tracking issue #10563

Open
skyline75489 opened this issue Jul 6, 2021 · 4 comments
Open

[Performance] vtebench tracking issue #10563

skyline75489 opened this issue Jul 6, 2021 · 4 comments
Labels
Area-Performance Performance-related issue Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Product-Meta The product is the management of the products.
Milestone

Comments

@skyline75489
Copy link
Collaborator

This is an attempt to use vtebench by alacritty to measure & establish the performance baseline of Windows Terminal.

The program itself requires bash so we can only test it through WSL and SSH. Still it shows the real-life performance of this product.

@skyline75489 skyline75489 added Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Area-Performance Performance-related issue labels Jul 6, 2021
@ghost ghost added Needs-Triage It's a new issue that the core contributor team needs to triage at the next triage meeting Needs-Tag-Fix Doesn't match tag requirements labels Jul 6, 2021
@skyline75489
Copy link
Collaborator Author

skyline75489 commented Jul 6, 2021

The result on my PC (WSL 2, AA: grayscale, font: Hack 13pt, Acrylic: off):

1.8.1521.0
Results:

  cursor_motion (47 samples @ 1.77 MiB):
    213.32ms avg (90% < 215ms) +-1.37ms

  dense_cells (24 samples @ 2.65 MiB):
    421.67ms avg (90% < 443ms) +-23.23ms

  light_cells (143 samples @ 1.08 MiB):
    69.63ms avg (90% < 71ms) +-1ms

  scrolling (2 samples @ 1 MiB):
    4981ms avg (90% < 5004ms) +-32.53ms

  scrolling_bottom_region (1 samples @ 1 MiB):
    102219ms avg (90% < 102219ms) +-0ms

  scrolling_bottom_small_region (1 samples @ 1 MiB):
    101344ms avg (90% < 101344ms) +-0ms

  scrolling_fullscreen (10 samples @ 1 MiB):
    475.6ms avg (90% < 484ms) +-8.55ms

  scrolling_top_region (3 samples @ 1 MiB):
    4595.67ms avg (90% < 4628ms) +-28.36ms

  scrolling_top_small_region (3 samples @ 1 MiB):
    3841ms avg (90% < 3861ms) +-31.24ms

  unicode (27 samples @ 1.06 MiB):
    386.22ms avg (90% < 492ms) +-85.02ms
1.9.1445.0
Results:

  cursor_motion (87 samples @ 1.3 MiB):
    115.41ms avg (90% < 121ms) +-2.75ms

  dense_cells (44 samples @ 2.02 MiB):
    228.25ms avg (90% < 260ms) +-24.69ms

  light_cells (205 samples @ 1.05 MiB):
    48.4ms avg (90% < 49ms) +-0.78ms

  scrolling (3 samples @ 1 MiB):
    3888.67ms avg (90% < 3942ms) +-48.26ms

  scrolling_bottom_region (1 samples @ 1 MiB):
    59689ms avg (90% < 59689ms) +-0ms

  scrolling_bottom_small_region (1 samples @ 1 MiB):
    60395ms avg (90% < 60395ms) +-0ms

  scrolling_fullscreen (11 samples @ 1 MiB):
    424.18ms avg (90% < 430ms) +-5.42ms

  scrolling_top_region (3 samples @ 1 MiB):
    4367.67ms avg (90% < 4383ms) +-13.87ms

  scrolling_top_small_region (3 samples @ 1 MiB):
    3653.67ms avg (90% < 3672ms) +-16.8ms

  unicode (35 samples @ 1.06 MiB):
    291.54ms avg (90% < 373ms) +-69.71ms
1.10.1933.0
Results:

  cursor_motion (84 samples @ 1.57 MiB):
    119.35ms avg (90% < 121ms) +-1.05ms

  dense_cells (54 samples @ 2.41 MiB):
    186.07ms avg (90% < 203ms) +-16.97ms

  light_cells (203 samples @ 1.07 MiB):
    48.82ms avg (90% < 50ms) +-0.72ms

  scrolling (2 samples @ 1 MiB):
    4505ms avg (90% < 4535ms) +-42.43ms

  scrolling_bottom_region (1 samples @ 1 MiB):
    85137ms avg (90% < 85137ms) +-0ms

  scrolling_bottom_small_region (1 samples @ 1 MiB):
    80985ms avg (90% < 80985ms) +-0ms

  scrolling_fullscreen (10 samples @ 1 MiB):
    452.2ms avg (90% < 457ms) +-8.53ms

  scrolling_top_region (3 samples @ 1 MiB):
    4474.67ms avg (90% < 4488ms) +-15.28ms

  scrolling_top_small_region (3 samples @ 1 MiB):
    3736.67ms avg (90% < 3831ms) +-84.48ms

  unicode (17 samples @ 1.06 MiB):
    605.24ms avg (90% < 761ms) +-105.16ms

Clearly there's something wrong with scrolling that we need to find out.


zadjii notes circa 1.19

terminal canary 1.20.2683.0
Results:

  cursor_motion (80 samples @ 1.57 MiB):
    125.46ms avg (90% < 134ms) +-7.91ms

  dense_cells (77 samples @ 2.41 MiB):
    130.75ms avg (90% < 141ms) +-18.29ms

  light_cells (170 samples @ 1.07 MiB):
    58.34ms avg (90% < 61ms) +-2.76ms

  scrolling (5 samples @ 1 MiB):
    2460.6ms avg (90% < 2464ms) +-3.85ms

  scrolling_bottom_region (2 samples @ 1 MiB):
    6726ms avg (90% < 6734ms) +-11.31ms

  scrolling_bottom_small_region (1 samples @ 1 MiB):
    55261ms avg (90% < 55261ms) +-0ms

  scrolling_fullscreen (150 samples @ 1 MiB):
    66.55ms avg (90% < 72ms) +-6.49ms

  scrolling_top_region (5 samples @ 1 MiB):
    2401.4ms avg (90% < 2452ms) +-28.95ms

  scrolling_top_small_region (7 samples @ 1 MiB):
    1508.29ms avg (90% < 1521ms) +-10.4ms

  unicode (128 samples @ 1.06 MiB):
    77.84ms avg (90% < 87ms) +-6.82ms

conhost, 1.19.2682
Results:

  cursor_motion (120 samples @ 1.01 MiB):
    83.44ms avg (90% < 88ms) +-5.85ms

  dense_cells (214 samples @ 1.08 MiB):
    46.46ms avg (90% < 50ms) +-4.44ms

  light_cells (353 samples @ 1.04 MiB):
    27.82ms avg (90% < 31ms) +-3.07ms

  scrolling (28 samples @ 1 MiB):
    367.54ms avg (90% < 384ms) +-9.45ms

  scrolling_bottom_region (20 samples @ 1 MiB):
    520.05ms avg (90% < 529ms) +-6.14ms

  scrolling_bottom_small_region (10 samples @ 1 MiB):
    1030.5ms avg (90% < 1028ms) +-115.09ms

  scrolling_fullscreen (138 samples @ 1 MiB):
    72.12ms avg (90% < 84ms) +-11.87ms

  scrolling_top_region (8 samples @ 1 MiB):
    1391.13ms avg (90% < 1414ms) +-10.93ms

  scrolling_top_small_region (11 samples @ 1 MiB):
    980.45ms avg (90% < 1000ms) +-17.87ms

  unicode (200 samples @ 1.06 MiB):
    49.59ms avg (90% < 60ms) +-7.46ms
    

@skyline75489
Copy link
Collaborator Author

skyline75489 commented Jul 6, 2021

For reference this is how Alacritty looks like (private release build, because it requires a fix that has not yet released);

Alacritty on Windows (19043.1083)
 cursor_motion (13 samples @ 1.3 MiB):
    791.23ms avg (90% < 804ms) +-9.58ms

  dense_cells (7 samples @ 3.86 MiB):
    1534.14ms avg (90% < 1564ms) +-18.87ms

  light_cells (87 samples @ 1 MiB):
    115.08ms avg (90% < 119ms) +-3.1ms

  scrolling (1 samples @ 1 MiB):
    12011ms avg (90% < 12011ms) +-0ms

  scrolling_bottom_region (1 samples @ 1 MiB):
    148788ms avg (90% < 148788ms) +-0ms

  scrolling_bottom_small_region (1 samples @ 1 MiB):
    149490ms avg (90% < 149490ms) +-0ms

  scrolling_fullscreen (4 samples @ 1 MiB):
    1084.5ms avg (90% < 1129ms) +-30.16ms

  scrolling_top_region (1 samples @ 1 MiB):
    13210ms avg (90% < 13210ms) +-0ms

  scrolling_top_small_region (1 samples @ 1 MiB):
    13121ms avg (90% < 13121ms) +-0ms

  unicode (56 samples @ 1.06 MiB):
    180.93ms avg (90% < 186ms) +-6.5ms

Here's some result on several Linux terminals (Ryzen 3600X, X11):

GNOME Terminal 3.36.2
Results:

  cursor_motion (543 samples @ 1.42 MiB):
    18.1ms avg (90% < 87ms) +-31.44ms

  dense_cells (2 samples @ 4.23 MiB):
    9456ms avg (90% < 9480ms) +-33.94ms

  light_cells (271 samples @ 1.1 MiB):
    36.55ms avg (90% < 122ms) +-50.8ms

  scrolling (45 samples @ 1 MiB):
    223.89ms avg (90% < 290ms) +-44.72ms

  scrolling_bottom_region (57 samples @ 1 MiB):
    177.63ms avg (90% < 293ms) +-64.44ms

  scrolling_bottom_small_region (58 samples @ 1 MiB):
    173.74ms avg (90% < 214ms) +-41.99ms

  scrolling_fullscreen (199 samples @ 1 MiB):
    50.02ms avg (90% < 119ms) +-53.38ms

  scrolling_top_region (57 samples @ 1 MiB):
    176.63ms avg (90% < 204ms) +-43.55ms

  scrolling_top_small_region (58 samples @ 1 MiB):
    174.36ms avg (90% < 211ms) +-38.71ms

  unicode (2 samples @ 1.06 MiB):
    27600.5ms avg (90% < 52888ms) +-35761.93ms
Tilix 1.9.1
Results:

  cursor_motion (486 samples @ 1.58 MiB):
    20.12ms avg (90% < 88ms) +-32.6ms

  dense_cells (2 samples @ 4.68 MiB):
    7244ms avg (90% < 7511ms) +-377.6ms

  light_cells (280 samples @ 1.04 MiB):
    35.63ms avg (90% < 120ms) +-51.18ms

  scrolling (38 samples @ 1 MiB):
    262.58ms avg (90% < 388ms) +-71.13ms

  scrolling_bottom_region (60 samples @ 1 MiB):
    168.1ms avg (90% < 225ms) +-46.11ms

  scrolling_bottom_small_region (58 samples @ 1 MiB):
    174.28ms avg (90% < 209ms) +-53.11ms

  scrolling_fullscreen (194 samples @ 1 MiB):
    51.41ms avg (90% < 119ms) +-52.85ms

  scrolling_top_region (59 samples @ 1 MiB):
    172.14ms avg (90% < 214ms) +-30.17ms

  scrolling_top_small_region (59 samples @ 1 MiB):
    171.25ms avg (90% < 230ms) +-39.97ms

  unicode (2 samples @ 1.06 MiB):
    19608ms avg (90% < 38354ms) +-26510.85ms
Alacritty 0.8.0
Results:

  cursor_motion (674 samples @ 1.86 MiB):
    14.41ms avg (90% < 16ms) +-1.01ms

  dense_cells (221 samples @ 5.45 MiB):
    44.87ms avg (90% < 50ms) +-3.19ms

  light_cells (945 samples @ 1.01 MiB):
    10.04ms avg (90% < 11ms) +-0.74ms

  scrolling (95 samples @ 1 MiB):
    105.59ms avg (90% < 113ms) +-26.26ms

  scrolling_bottom_region (103 samples @ 1 MiB):
    96.7ms avg (90% < 110ms) +-12.33ms

  scrolling_bottom_small_region (97 samples @ 1 MiB):
    103.02ms avg (90% < 123ms) +-11.93ms

  scrolling_fullscreen (645 samples @ 1 MiB):
    14.97ms avg (90% < 16ms) +-1.24ms

  scrolling_top_region (100 samples @ 1 MiB):
    99.93ms avg (90% < 119ms) +-13.01ms

  scrolling_top_small_region (94 samples @ 1 MiB):
    106.26ms avg (90% < 124ms) +-29.34ms

  unicode (558 samples @ 1.06 MiB):
    17.3ms avg (90% < 18ms) +-0.84ms

More results on macOS (this is on a 2015 MBP 13, so I use a smaller window size to compensate the less powerful CPU):

iTerm2 3.4.8
Results:

  cursor_motion (19 samples @ 1.3 MiB):
    533.32ms avg (90% < 639ms) +-69.46ms

  dense_cells (2 samples @ 1.34 MiB):
    8177.5ms avg (90% < 9465ms) +-1820.8ms

  light_cells (55 samples @ 1.04 MiB):
    181.69ms avg (90% < 234ms) +-56.86ms

  scrolling (2 samples @ 1 MiB):
    4470ms avg (90% < 4525ms) +-77.78ms

  scrolling_bottom_region (2 samples @ 1 MiB):
    6804ms avg (90% < 6971ms) +-236.17ms

  scrolling_bottom_small_region (2 samples @ 1 MiB):
    5031.5ms avg (90% < 5064ms) +-45.96ms

  scrolling_fullscreen (14 samples @ 1 MiB):
    132.86ms avg (90% < 144ms) +-13.61ms

  scrolling_top_region (2 samples @ 1 MiB):
    5739.5ms avg (90% < 5772ms) +-45.96ms

  scrolling_top_small_region (2 samples @ 1 MiB):
    5205.5ms avg (90% < 5712ms) +-716.3ms

  unicode (4 samples @ 1.06 MiB):
    2570.25ms avg (90% < 2938ms) +-374.22ms
Terminal.app 2.10
Results:

  cursor_motion (69 samples @ 1.22 MiB):
    144.86ms avg (90% < 168ms) +-16.69ms

  dense_cells (130 samples @ 1.29 MiB):
    76.64ms avg (90% < 220ms) +-88.1ms

  light_cells (306 samples @ 1 MiB):
    32.24ms avg (90% < 35ms) +-2.66ms

  scrolling (22 samples @ 1 MiB):
    395.05ms avg (90% < 400ms) +-6.5ms

  scrolling_bottom_region (32 samples @ 1 MiB):
    317.81ms avg (90% < 321ms) +-6.44ms

  scrolling_bottom_small_region (32 samples @ 1 MiB):
    319.28ms avg (90% < 324ms) +-15.68ms

  scrolling_fullscreen (88 samples @ 1 MiB):
    39.28ms avg (90% < 46ms) +-3.94ms

  scrolling_top_region (32 samples @ 1 MiB):
    315.91ms avg (90% < 320ms) +-3.25ms

  scrolling_top_small_region (32 samples @ 1 MiB):
    316.84ms avg (90% < 323ms) +-17.3ms

  unicode (238 samples @ 1.06 MiB):
    41.97ms avg (90% < 103ms) +-43.09ms
Alacritty 0.8.0
Results:

  cursor_motion (300 samples @ 1.25 MiB):
    32.9ms avg (90% < 36ms) +-42.75ms

  dense_cells (97 samples @ 3.79 MiB):
    102.86ms avg (90% < 149ms) +-30.48ms

  light_cells (373 samples @ 1.12 MiB):
    26.35ms avg (90% < 29ms) +-4.36ms

  scrolling (69 samples @ 1 MiB):
    100.38ms avg (90% < 117ms) +-11.05ms

  scrolling_bottom_region (209 samples @ 1 MiB):
    47.44ms avg (90% < 51ms) +-3.15ms

  scrolling_bottom_small_region (85 samples @ 1 MiB):
    117.28ms avg (90% < 123ms) +-4.16ms

  scrolling_fullscreen (145 samples @ 1 MiB):
    28.94ms avg (90% < 31ms) +-1.66ms

  scrolling_top_region (209 samples @ 1 MiB):
    47.47ms avg (90% < 51ms) +-2.65ms

  scrolling_top_small_region (87 samples @ 1 MiB):
    115.32ms avg (90% < 121ms) +-4.15ms

  unicode (276 samples @ 1.06 MiB):
    35.79ms avg (90% < 42ms) +-5.06ms

@skyline75489
Copy link
Collaborator Author

skyline75489 commented Jul 20, 2021

To summarize the work:

ghost pushed a commit that referenced this issue Jul 20, 2021
<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request

<!-- Other than the issue solved, is this relevant to any other issues/existing PRs? --> 
## References

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [X] Supports #10563
* [ ] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [ ] Tests added/passed
* [ ] Documentation updated. If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/terminal) and link it here: #xxx
* [ ] Schema updated.
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments

<!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well -->
## Validation Steps Performed
@skyline75489
Copy link
Collaborator Author

I think I am gonna step down from this project temporarily. So feel free to edit this issue however that may be helpful.

The performance regression in 1.10 is definitely worth investigating. But sadly I don’t have the energy to do that anymore (also my VS just stopped working). What happened in the last month exhausted me more than I could imagine. I need a break from it.

Feel free to At me if there’s some shitty code I wrote that causes troubles. I hope there isn’t too many of them :)

ghost pushed a commit that referenced this issue Jul 27, 2021
<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request

<!-- Other than the issue solved, is this relevant to any other issues/existing PRs? --> 
## References

The `+=` operator is an extremely hot path under heavily output load. This PR aims to optimize its speed.

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [ ] Supports #10563
* [ ] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [ ] Tests added/passed
* [ ] Documentation updated. If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/terminal) and link it here: #xxx
* [ ] Schema updated.
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments

<!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well -->
## Validation Steps Performed
ghost pushed a commit that referenced this issue Aug 12, 2021
…10921)

Improve WriteCharsLegacy performance by increasing LocalBuffer size, allowing
longer runs of characters to be submitted to the remaining parts of conhost.

References #10563 -- vtebench tracking issue

## Validation Steps Performed

* Ran `cat big.txt`, vtebench and termbench and
  noted ~5% performance improvements
ghost pushed a commit that referenced this issue Aug 24, 2021
This commit improves the renderer classes by:
* reducing binary size by 4kB
* improving performance by 5%
* reducing code complexity

## References

* #10563 -- vtebench tracking issue

## PR Checklist
* [x] I work here
* [x] Tests added/passed

## Validation Steps Performed

* Ran vtebench/termbench and noted ~5% perf. improvements
DHowett pushed a commit that referenced this issue Aug 25, 2021
<!-- Enter a brief description/summary of your PR here. What does it fix/what does it change/how was it tested (even manually, if necessary)? -->
## Summary of the Pull Request

<!-- Other than the issue solved, is this relevant to any other issues/existing PRs? --> 
## References

The `+=` operator is an extremely hot path under heavily output load. This PR aims to optimize its speed.

<!-- Please review the items on the PR checklist before submitting-->
## PR Checklist
* [ ] Supports #10563
* [ ] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [ ] Tests added/passed
* [ ] Documentation updated. If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/terminal) and link it here: #xxx
* [ ] Schema updated.
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

<!-- Provide a more detailed description of the PR, other things fixed or any additional comments/features here -->
## Detailed Description of the Pull Request / Additional comments

<!-- Describe how you validated the behavior. Add automated tests wherever possible, but list manual validation steps taken as well -->
## Validation Steps Performed
DHowett pushed a commit that referenced this issue Aug 25, 2021
…10921)

Improve WriteCharsLegacy performance by increasing LocalBuffer size, allowing
longer runs of characters to be submitted to the remaining parts of conhost.

References #10563 -- vtebench tracking issue

## Validation Steps Performed

* Ran `cat big.txt`, vtebench and termbench and
  noted ~5% performance improvements
@zadjii-msft zadjii-msft removed this from the Terminal Backlog milestone Jan 4, 2022
@zadjii-msft zadjii-msft added this to the Backlog milestone Jan 4, 2022
ghost pushed a commit that referenced this issue May 5, 2022
`TextAttribute` and `TextColor` are commonly used structures in hot paths.
This commit replaces more complex comparisons where each field is compared
independently with a single call to `memcmp`. This compiles down to just
a few instructions. This reduces code and binary size and improves
performance for paths were `TextAttribute`s need to be compared.

## PR Checklist

* [x] Supports #10563
* [x] CLA signed. If not, go over [here](https://cla.opensource.microsoft.com/microsoft/Terminal) and sign the CLA
* [x] Tests added/passed
* [ ] Documentation updated. If checked, please file a pull request on [our docs repo](https://github.com/MicrosoftDocs/terminal) and link it here: #xxx
* [ ] Schema updated.
* [ ] I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

## Validation Steps Performed

* termbench still works ✔️

Co-authored-by: Leonard Hecker <lhecker@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Performance Performance-related issue Issue-Feature Complex enough to require an in depth planning process and actual budgeted, scheduled work. Product-Meta The product is the management of the products.
Projects
None yet
Development

No branches or pull requests

2 participants