Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eliminate more transient allocations: Titles and invalid rectangles and bitmap runs and utf8 conversions #8621

Merged
23 commits merged into from
Feb 16, 2021

Conversation

miniksa
Copy link
Member

@miniksa miniksa commented Dec 19, 2020

References

PR Checklist

Detailed Description of the Pull Request / Additional comments

Window Title Generation

Every time the renderer checks the title, it's doing two bad things that
I've fixed:

  1. It's assembling the prefix to the full title doing a concatenation.
    No one ever gets just the prefix ever after it is set besides the
    concat. So instead of storing prefix and the title, I store the
    assembled prefix + title and the bare title.
  2. A copy must be made because it was returning std::wstring instead
    of std::wstring&. Now it returns the ref.

Dirty Area Return

Every time the renderer checks the dirty area, which is sometimes
multiple times per pass (regular text printing, again for selection,
etc.), a vector is created off the heap to return the rectangles. The
consumers only ever iterate this data. Now we return a span over a
rectangle or rectangles that the engine must store itself.

  1. For some renderers, it's always a constant 1 element. They update
    that 1 element when dirty is queried and return it in the span with a
    span size of 1.
  2. For other renderers with more complex behavior, they're already
    holding a cached vector of rectangles. Now it's effectively giving
    out the ref to those in the span for iteration.

Bitmap Runs

The til::bitmap used a std::optional<std::vector<til::rectangle>>
inside itself to cache its runs and would clear the optional when the
runs became invalidated. Unfortunately doing .reset() to clear the
optional will destroy the underlying vector and have it release its
memory. We know it's about to get reallocated again, so we're just going
to make it a std::pmr::vector and give it a memory pool.

The alternative solution here was to use a bool and
std::vector<til::rectangle> and just flag when the vector was invalid,
but that was honestly more code changes and I love excuses to try out
PMR now.

Also, instead of returning the ref to the vector... I'm just returning a
span now. Everyone just iterates it anyway, may as well not share the
implementation detail.

UTF-8 conversions

When testing with Terminal and looking at the conhost.exe's PTY
renderer, it spends a TON of allocation time on converting all the
UTF-16 stuff inside to UTF-8 before it sends it out the PTY. This was
because ConvertToA was allocating a string inside itself and returning
it just to have it freed after printing and looping back around again...
as a PTY does.

The change here is to use til::u16u8 that accepts a buffer out
parameter so the caller can just hold onto it.

Validation Steps Performed

  • big.txt in conhost.exe (GDI renderer)
  • big.txt in Terminal (DX, PTY renderer)
  • Ensure WDDM and BGFX build under Razzle with this change.

…so it's not assembled every time the renderer asks.
…es and returned as iterable spans to avoid more allocations, especially for renderers with complex dirty areas that are already storing the rectangles internally.
@miniksa miniksa added Product-Conhost For issues in the Console codebase Area-Performance Performance-related issue Product-Conpty For console issues specifically related to conpty Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Terminal The new Windows Terminal. labels Dec 19, 2020
@miniksa
Copy link
Member Author

miniksa commented Dec 19, 2020

With this PR, #8617, #8619, and #8618... the sum of transient allocations from outputting big.txt drops from approximately 250,000 to 232.

@miniksa
Copy link
Member Author

miniksa commented Dec 19, 2020

Before:
image
image
image

After:
image

Hey look at that. The RenderThread is completely gone from the transient allocation table. (Note these traces include #8617, #8618, #8619 that's why the total count is so low now. But the delta is only this PR's worth of commits.)

src/host/consoleInformation.cpp Outdated Show resolved Hide resolved
@ghost ghost added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Dec 24, 2020
@ghost ghost added the No-Recent-Activity This issue/PR is going stale and may be auto-closed without further activity. label Dec 31, 2020
@ghost
Copy link

ghost commented Dec 31, 2020

This pull request has been automatically marked as stale because it has been marked as requiring author feedback but has not had any activity for 7 days. It will be closed if no further activity occurs within 7 days of this comment.

@DHowett DHowett removed Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something No-Recent-Activity This issue/PR is going stale and may be auto-closed without further activity. labels Jan 1, 2021
@zadjii-msft zadjii-msft self-assigned this Jan 4, 2021
src/interactivity/onecore/BgfxEngine.cpp Outdated Show resolved Hide resolved
src/interactivity/onecore/BgfxEngine.cpp Show resolved Hide resolved
src/renderer/base/renderer.cpp Show resolved Hide resolved
src/renderer/base/renderer.cpp Show resolved Hide resolved
src/renderer/uia/UiaRenderer.cpp Show resolved Hide resolved
src/renderer/wddmcon/WddmConRenderer.cpp Outdated Show resolved Hide resolved
@ghost ghost added Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something and removed Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something labels Jan 7, 2021
…f thousands of transient allocations as we convert things to emit them as a UTF-8 string.
…over and over when we know we're about to just rebuild a vector of more runs. optional will destroy the internal vector.
@miniksa miniksa changed the title Eliminate more transient allocations: Titles and invalid rectangles Eliminate more transient allocations: Titles and invalid rectangles and bitmap runs and utf8 conversions Jan 8, 2021
@miniksa
Copy link
Member Author

miniksa commented Jan 8, 2021

OOPS MORE TRANSIENT ALLOCATIONS. (u16u8 change and bitmap change to use PMR pool)

Before:
image

After:
image

@ghost ghost added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Jan 8, 2021
@skyline75489
Copy link
Collaborator

image

@miniksa You are such a wizard on performance! I'm seeing great result comparing this branch with 1.5 Preview. With setting UI and all, 1.6 is going to be awesome 🎉

@miniksa
Copy link
Member Author

miniksa commented Jan 11, 2021

image

@miniksa You are such a wizard on performance! I'm seeing great result comparing this branch with 1.5 Preview. With setting UI and all, 1.6 is going to be awesome 🎉

I owe a lot of credit to @Austin-Lamb this time for setting me down the allocations path. Without that insight, we wouldn't be seeing these PRs!

@ghost ghost removed the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label Jan 11, 2021
Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

src/inc/til/bitmap.h Outdated Show resolved Hide resolved
@zadjii-msft zadjii-msft removed their assignment Jan 15, 2021
// or continue to use the optional with a PMR vector inside.
// It's static because we only need one pool to manage memory
// for all vectors of rectangles no matter which bitmap instance is making them.
static std::pmr::unsynchronized_pool_resource& _getPool() noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will conflict with #8787; should we converge earlier rather than later?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided over Teams to converge in this PR, because there's a conflict here anyway.

@@ -32,7 +32,7 @@ VtEngine::VtEngine(_In_ wil::unique_hfile pipe,
_lastTextAttributes(INVALID_COLOR, INVALID_COLOR),
_lastViewport(initialViewport),
_pool(til::pmr::get_default_resource()),
_invalidMap(initialViewport.Dimensions(), &_pool),
_invalidMap(initialViewport.Dimensions(), false, &_pool),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DHowett, if I don't call it with three parameters, it is turning the &_pool into bool true. What.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, automatic coercion of pointer types to booleans (for nullness checks). Ew.

@zadjii-msft
Copy link
Member

@miniksa tools\runformat 😉

@miniksa
Copy link
Member Author

miniksa commented Feb 16, 2021

@miniksa tools\runformat 😉

Yeah yeah yeah yeah.

@miniksa miniksa added the AutoMerge Marked for automatic merge by the bot when requirements are met label Feb 16, 2021
@ghost
Copy link

ghost commented Feb 16, 2021

Hello @miniksa!

Because this pull request has the AutoMerge label, I will be glad to assist with helping to merge this pull request once all check-in policies pass.

p.s. you can customize the way I help with merging this pull request, such as holding this pull request until a specific person approves. Simply @mention me (@msftbot) and give me an instruction to get started! Learn more here.

@ghost ghost merged commit 525be22 into main Feb 16, 2021
@ghost ghost deleted the dev/miniksa/perf_title_and_dirty branch February 16, 2021 20:52
@ghost
Copy link

ghost commented Mar 1, 2021

🎉Windows Terminal Preview v1.7.572.0 has been released which incorporates this pull request.:tada:

Handy links:

DHowett added a commit that referenced this pull request Mar 10, 2021
Dustin L. Howett (3)
* Fix the Host Proxy DLL reference in ServerLib (GH-9129)
* ci: fix spelling
* Format the incoming inbox code

Michael Niksa (1)
* Eliminate more transient allocations: Titles and invalid rectangles and bitmap runs and utf8 conversions (GH-8621)

Related work items: MSFT-31755835
ghost pushed a commit that referenced this pull request Mar 26, 2021
We have been seeing some crashes (#9410) originating from a
use-after-free or a double-free in the renderer. The renderer is
iterating over the dirty rects from the render engine¹ and the rect list
is being freed out from under it.

Things like this are usually the result of somebody manipulating the
renderer's state outside of lock.

Therefore, this pull request introduces some targeted locking fixes
around manipulation of the pattern buffer (which, in turn, changes the
renderer state.)

¹ This was not a problem until #8621, which made the renderer return a
span instead of a copy for the list of dirty rects.

## Validation

I ran Terminal under App Verifier, and introduced a manul delay (under
lock) in the renderer such that the invalid map would definitely have
been invalidated between the renderer taking the lock and the renderer
handling the frame. AppVerif failed us without these locking changes,
and did not do so once they were introduced.

Closes #9410.
DHowett added a commit that referenced this pull request Apr 2, 2021
We have been seeing some crashes (#9410) originating from a
use-after-free or a double-free in the renderer. The renderer is
iterating over the dirty rects from the render engine¹ and the rect list
is being freed out from under it.

Things like this are usually the result of somebody manipulating the
renderer's state outside of lock.

Therefore, this pull request introduces some targeted locking fixes
around manipulation of the pattern buffer (which, in turn, changes the
renderer state.)

¹ This was not a problem until #8621, which made the renderer return a
span instead of a copy for the list of dirty rects.

## Validation

I ran Terminal under App Verifier, and introduced a manul delay (under
lock) in the renderer such that the invalid map would definitely have
been invalidated between the renderer taking the lock and the renderer
handling the frame. AppVerif failed us without these locking changes,
and did not do so once they were introduced.

Closes #9410.

(cherry picked from commit ea3e56d)
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Performance Performance-related issue AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Conhost For issues in the Console codebase Product-Conpty For console issues specifically related to conpty Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants