Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce RenderClusterIterator #3458

Closed

Conversation

skyline75489
Copy link
Collaborator

Summary of the Pull Request

This is the successor of #3438 . The idea is based on @miniksa 's comment originally posted here #3438 (comment)

We could perhaps wrap a TextBufferCellIterator into a RenderClusterIterator, tell the RenderClusterIterator that it needs to return operator bool() as false when the underlying TextBufferCellIterator::TextAttr changes on increment, and provides accessors only for the character string and column string into the underlying cell iterator (which is a view directly into the text buffer).

References

#3075 #3438

PR Checklist

  • Closes #xxx
  • CLA signed. If not, go over here and sign the CLA
  • Tests added/passed
  • Requires documentation to be updated
  • I've discussed this with core contributors already. If not checked, I'm ready to accept this work might be rejected in favor of a different grand plan. Issue number where discussion took place: #xxx

Detailed Description of the Pull Request / Additional comments

This is still WIP, though. Would love to hear everybody's feedback.

Validation Steps Performed

Just pretty much everything should be validated. Vim, cat, top, cmatrix, and so on.

short totalWidth = 0;
for (const auto& cluster : clusters)
unclusteredString.reserve(40);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of magic number. But without reserving space, this will run much slower. Should we pass size alongside clusterIter? I'm not sure about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh... perhaps preallocate the number of cells width from coord to the right edge of the known dirty rect (see impl of GetDirtyRectInChars)

All of the engines should know a little bit about what area is actually invalid and they're always given a drawing coordinate.

Failing this, instead of allocating and reserving on each call to _Paint*BufferLine, they might be able to allocate the buffer when the StartPaint() comes in as a dirty-rect-wide string/vector and just re-use that one for every row and free it when the EndPaint() comes around. That'd make way fewer allocations per frame than one per call.

@@ -42,11 +42,14 @@ CustomTextLayout::CustomTextLayout(gsl::not_null<IDWriteFactory1*> const factory
_localeName.resize(gsl::narrow_cast<size_t>(format->GetLocaleNameLength()) + 1); // +1 for null
THROW_IF_FAILED(format->GetLocaleName(_localeName.data(), gsl::narrow<UINT32>(_localeName.size())));

for (const auto& cluster : clusters)
RenderClusterIterator it = clusterIter;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we are using RenderClusterIterator const, the copy is needed. It could be RenderClusterIterator& and then it can be modifed by the caller. But that's just kind of implicit to me. I prefer the copying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, I like what you did. It's constant for the specification of the custom layout, but then the internals of the layout makes the copy to walk through it.

// And hold the point where we should start drawing.
auto screenPoint = target;

// This outer loop will continue until we reach the end of the text we are trying to draw.
while (it)
{
TextBufferCellIterator runStartIt(it);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is inner loop is over clusterIter will be exhausted. So I copy it and keep the start of the run here.

// Advance the cluster and column counts.
const auto columnCount = clusters.back().GetColumns();
it += columnCount > 0 ? columnCount : 1; // prevent infinite loop for no visible columns
const auto columnCount = (*clusterIter).GetColumns();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calculating the cols is cumbersome. , I don't know how to do this in a more elegant way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. This might actually justify making the PaintBufferLine's first arg be a & instead.

Save a copy of it on the way in, let the engine increment it, then add a method that lets you difference the two of them.

Diff the resultant one you get back versus the one you initially created and it should tell you how many columns it walked. Then you don't have to cycle through the loop multiple times.

That's what I did for the cell/text iterators with .GetCellDistance and whatnot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure. I don't think it's enough to justify this. Making it & sounds very error-prone and implicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make two of them? Take a const input and then have a second & as the output one?

It's not uncommon to use an Inout parameter, but if you think its safer to have an explicit In and Out (given we can't use the return code as it's always HRESULT for these thing), it seems OK.

const COORD coord,
const bool trimLeft) noexcept
{
try
{
const auto cchLine = clusters.size();
size_t cchLine = 120;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, without size I can't make use of length to init arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, decent estimate is coord to right edge of dirty rect.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An estimate is not enough, though. I need the exact size here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating it two times, could it be that bad? I'm stucked...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not strictly. You could replace things like auto pwsPoly = std::make_unique<wchar_t>(cchLine) instead with a std::wstring and .reserve() it up to the "estimate" of _rcInvalid.Right - coord.X. Same goes for the rgdxPoly with a vector of ints or something. (or use vectors for both).

The first phase with the for loop has to iterate the whole thing anyway to condense the clusters back into a string. Either you'll get lucky in the majority case and the estimate won't resize the vectors, or you'll get unlucky in the uncommon case and you'll resize up once or twice. (And if resizing up happens a lot, we can use an estimate function like the CustomTextLayout::_EstimateGlyphCount does to reduce the potential resizes.)

By the end of the for loop (lines 316-325), you should be able to know the exact widths required for all the functions further down PaintBufferLine and have only iterated once. E.g. you should have calculated cchLine exactly by iterating once and also filled a vector pwsPoly and vector rgdxPoly with the condensed version. That should be enough information to finish the draw operation.

However, now that I look at this further. The loop in this specific GDI renderer only allows a maximum of one wchar_t per unit because of GDI limitation. (comment on L322 where cluster.GetTextAsSingle() is called. So the "estimate" used for .reserve() above should actually be the exact column count as a Cluster is a console grid unit described by one-or-more-wchar_t and the vectors wouldn't auto-resize.

The last problem is that the vectors would free when going out of scope. There isn't really a detach to the vectors. We could just hold a reference to them that is freed inside _FlushBufferLines after PolyTextOut is called.

Member variables on the class like:
std::vector<std::vector<wchar_t>> _polyStrings and std::vector<std::vector<int>> _polyWidths, push a new vector into those on each PaintBufferLine, take the .data() of the line and stuff it into the pPolyTextLine fields, and then clear them out on _FlushBufferLines.

That might make _FlushBuferLines a little cleaner than raw delete[]s of bare pointers as well.

@skyline75489
Copy link
Collaborator Author

I don't know what I did wrong but I'm getting the same error as CI build:

src\buffer\out\TextColor.h(37,10): Error C1083: Cannot open include file: 'WexTestClass.h': No such file or directory (compiling source file ..\paint.cpp)

Copy link
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the direction I was hoping you would go with this. Pretty much perfectly what I was envisioning with the comment I left yesterday. Thank you!

short totalWidth = 0;
for (const auto& cluster : clusters)
unclusteredString.reserve(40);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh... perhaps preallocate the number of cells width from coord to the right edge of the known dirty rect (see impl of GetDirtyRectInChars)

All of the engines should know a little bit about what area is actually invalid and they're always given a drawing coordinate.

Failing this, instead of allocating and reserving on each call to _Paint*BufferLine, they might be able to allocate the buffer when the StartPaint() comes in as a dirty-rect-wide string/vector and just re-use that one for every row and free it when the EndPaint() comes around. That'd make way fewer allocations per frame than one per call.

@@ -23,17 +23,17 @@ namespace Microsoft::Console::Render
public:
Cluster(const std::wstring_view text, const size_t columns);

const wchar_t GetTextAsSingle() const noexcept;
wchar_t GetTextAsSingle() const noexcept;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did we lose consts here?

@@ -42,11 +42,14 @@ CustomTextLayout::CustomTextLayout(gsl::not_null<IDWriteFactory1*> const factory
_localeName.resize(gsl::narrow_cast<size_t>(format->GetLocaleNameLength()) + 1); // +1 for null
THROW_IF_FAILED(format->GetLocaleName(_localeName.data(), gsl::narrow<UINT32>(_localeName.size())));

for (const auto& cluster : clusters)
RenderClusterIterator it = clusterIter;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, I like what you did. It's constant for the specification of the custom layout, but then the internals of the layout makes the copy to walk through it.

@@ -1026,7 +1026,7 @@ void VtRendererTest::XtermTestCursor()
clusters.emplace_back(std::wstring_view{ &line[i], 1 }, static_cast<size_t>(rgWidths[i]));
}

VERIFY_SUCCEEDED(engine->PaintBufferLine({ clusters.data(), clusters.size() }, { 1, 1 }, false));
// VERIFY_SUCCEEDED(engine->PaintBufferLine({ clusters.data(), clusters.size() }, { 1, 1 }, false));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBD I presume

_dwriteFontFace.Get(),
{ &cluster, 1 },
_glyphCell.cx);
//// Create the text layout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also TBD, I presume

const COORD coord,
const bool trimLeft) noexcept
{
try
{
const auto cchLine = clusters.size();
size_t cchLine = 120;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, decent estimate is coord to right edge of dirty rect.

// Advance the cluster and column counts.
const auto columnCount = clusters.back().GetColumns();
it += columnCount > 0 ? columnCount : 1; // prevent infinite loop for no visible columns
const auto columnCount = (*clusterIter).GetColumns();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. This might actually justify making the PaintBufferLine's first arg be a & instead.

Save a copy of it on the way in, let the engine increment it, then add a method that lets you difference the two of them.

Diff the resultant one you get back versus the one you initially created and it should tell you how many columns it walked. Then you don't have to cycle through the loop multiple times.

That's what I did for the cell/text iterators with .GetCellDistance and whatnot.

@miniksa
Copy link
Member

miniksa commented Nov 6, 2019

Just overall, did you check a trace on this (can I see)? I presume it's better than the original but maybe not as perfect as the more memory-intense previous iteration.


VERIFY_SUCCEEDED(engine->PaintBufferLine({ clusters1.data(), clusters1.size() }, { 0, 0 }, false));
VERIFY_SUCCEEDED(engine->PaintBufferLine({ clusters2.data(), clusters2.size() }, { 0, 1 }, false));
const COORD screenBufferSize{ 119, 9029 };
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just way too much ... kinda hurtful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's suppose to be a unit test that focuses on this certain VT thing. Now I dragged all the other units like TextBuffer. I'm not a fan of this..

static const size_t s_cPolyTextCache = 80;
POLYTEXTW _pPolyText[s_cPolyTextCache];
static constexpr size_t s_cPolyTextCache = 80;
std::array<POLYTEXTW, s_cPolyTextCache> _polyText;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured that all those containers have exact same fixed size. Using std::array is more reasonable

// Exit early if there are no lines to draw.
RETURN_HR_IF(S_OK, 0 == cchLine);

const size_t preallocateSize = (_rcInvalid.right - _rcInvalid.left) / 8;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that 8 is the magic number needed here to get the correct size? Any GDI+ reference that explains this?

}
}

_polyStrings = {};
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This certainly is safer and better than raw delete[].

return !(*this == it);
}

RenderClusterIterator& RenderClusterIterator::operator+=(const ptrdiff_t& movement)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently RenderClusterIterator is not aware of the dbcs columns. Caller must use clusterIter += cols > 0 ? cols : 1 to skip dbcs trailing char. I've tried to let RenderClusterIterator handle this itself, but failed to do so because the code gets too complicated and implicit.

@skyline75489
Copy link
Collaborator Author

skyline75489 commented Nov 8, 2019

Plain catting is a lot faster with this PR. The trace is I think as good as #3438, if not better.

图片

I found something interesting. Plain cmatrix performance is actually worse with this PR. I think that render thread speeding up saves time so that the output thread can write more data, which backfires itself, causing render thread to be even more busy. That is probably where differential drawing will come in handy.

Update: With #3480 merged, cmatrix is smooth again.

@@ -18,5 +18,5 @@
<!-- DONT ADD NEW FILES HERE, ADD THEM TO vt-renderer-common.vcxitems -->
<!-- Careful reordering these. Some default props (contained in these files) are order sensitive. -->
<Import Project="$(SolutionDir)src\common.build.post.props" />
<!-- <Import Project="$(SolutionDir)src\common.build.tests.props" /> -->
<Import Project="$(SolutionDir)src\common.build.tests.props" />
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know why but I need this to make CI and my local VS happy. Wonder why master is still green without it

@skyline75489
Copy link
Collaborator Author

Mostly this is ready now. Tests are finally green.

@skyline75489
Copy link
Collaborator Author

This needs to hold, until #3546 is solved. I'm seeing even worse result with this PR.

@skyline75489
Copy link
Collaborator Author

OK this PR is not the cause of the regression. It's probabaly #3474. But still let's hold this until the regression is solved.

@zadjii-msft zadjii-msft added Area-Performance Performance-related issue Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues labels Dec 16, 2019
@zadjii-msft zadjii-msft self-requested a review December 16, 2019 21:10
@skyline75489
Copy link
Collaborator Author

Now that #778 is closed, this does not seem necessary. Close this for now,

@miniksa
Copy link
Member

miniksa commented May 6, 2020

Thanks for this anyway, @skyline75489. I do want to get this done at some point in the future and will likely use this as a template whenever that time comes. But you're right, it's not as critical right now anymore.

@skyline75489 skyline75489 deleted the fix/renderClusterIter branch January 25, 2021 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Performance Performance-related issue Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants