Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DisplayList] Allow random access to ops through indexing #54484

Merged
merged 1 commit into from
Aug 13, 2024

Conversation

flar
Copy link
Contributor

@flar flar commented Aug 9, 2024

Being able to reorder rendering commands leads to optimization opportunities in the graphics package. A graphics package being fed from a DisplayList either has to take the commands in the order given or implement their own storage format for the rendering data.

With this new dispatching mechanism, the graphics package can both query basic information about the recorded ops and even dispatch them by the index into the list. Query information includes either the "category" of the op (clip/transform/render, etc.) or a specific op type enum. The package can dispatch some categories (or ops) immediately and remember other categories (or ops) along with their state for dispatching later.

@flar
Copy link
Contributor Author

flar commented Aug 9, 2024

Alternative implementation using a "Bookmark" concept: #54299

@flar
Copy link
Contributor Author

flar commented Aug 10, 2024

Some notes on simplification, because it looks like a huge number of changes at first.

The main source of diffs was reorganizing the way that culled dispatching is done. It used to be a complicated interaction between the dispatch method maintaining state in a Culler object with 2 different implementations for culled and unculled dispatches. These objects then maintained yet more state in a DispatchContext object that was handed to the ops. The ops would then apply their own logic to the DispatchContext object to figure out if they should dispatch or skip themselves. The dispatching loop in Dispatch(receiver, ptr, end, culler) method then had to coordinate all of these objects.

All of that is deleted now and we simply get the vector of rects from the rtree, then a single method (RTreeResultsToIndexVector) turns that into a vector of DlIndex to dispatch. All of the culling logic is now encapsulated in that one method. The logic in that method should match all of the logic that used to be distributed between Culler, DispatchContext, all of the *Ops, and the Dispatch(receiver, ptr, end, culler) method.

It is worth noting that the performance of culled dispatching is better with this new arrangement even though an extra vector is allocated.

@flar
Copy link
Contributor Author

flar commented Aug 10, 2024

Dispatch benchmarks before chnages:

Benchmark                                                Time             CPU   Iterations
------------------------------------------------------------------------------------------
BM_DisplayListDispatchDefault/kDefaultNoRtree         5.73 us         5.72 us       111735
BM_DisplayListDispatchDefault/kDefaultWithRtree       5.74 us         5.73 us       121210
BM_DisplayListDispatchCull/kCulledWithRtree           7.91 us         7.89 us        89708

After:

Benchmark                                                      Time             CPU   Iterations
------------------------------------------------------------------------------------------------
BM_DisplayListDispatchDefault/kDefaultNoRtree               6.15 us         6.12 us       104615
BM_DisplayListDispatchDefault/kDefaultWithRtree             6.10 us         6.07 us       114880
BM_DisplayListDispatchCull/kCulledWithRtree                 6.21 us         6.15 us       112998
BM_DisplayListDispatchByIndexDefault/kDefaultNoRtree        8.77 us         8.73 us        80377
BM_DisplayListDispatchByVectorDefault/kDefaultNoRtree       9.19 us         9.16 us        76627
BM_DisplayListDispatchByVectorCull/kCulledWithRtree         7.35 us         7.29 us        94698

Default dispatching took a minor 6% hit (keep in mind that these are microseconds and the DisplayList is huge), but culled dispatching got faster by 22%.

Dispatching manually by indices is slower, but the added flexibility of being able to control the dispatch order will hopefully make up for it - and these are measured in microseconds for a large display list.

---------------- Update ----------------

Made some good improvements by asserting the unsigned nature of the index type to prove that < 0 tests are unnecessary and also turned some FML_CHECK calls into FML_DCHECK because we should be able to trust our offsets_ vector as long as the index has been range checked...

Benchmark                                                        Time             CPU   Iterations
--------------------------------------------------------------------------------------------------
BM_DisplayListDispatchDefault/kDefaultNoRtree                 5.96 us         5.95 us       107729
BM_DisplayListDispatchDefault/kDefaultWithRtree               5.99 us         5.98 us       115678
BM_DisplayListDispatchCull/kCulledWithRtree                   6.06 us         6.05 us       115094
BM_DisplayListDispatchByIndexDefault/kDefaultNoRtree          7.28 us         7.25 us        96080
BM_DisplayListDispatchByIteratorDefault/kDefaultNoRtree       7.28 us         7.27 us        96347
BM_DisplayListDispatchByVectorDefault/kDefaultNoRtree         7.73 us         7.71 us        90363
BM_DisplayListDispatchByVectorCull/kCulledWithRtree           6.67 us         6.66 us       104709

The new data looks like default dispatch is better by a couple of %, but I think that is just run-to-run variance.
The overhead of Indexed iteration is down from 42% to just 21% and you can use an iterator paradigm for the same cost (for (auto i : *dl)).

@jonahwilliams
Copy link
Member

Overhead in us seems small enough to not worry about. Will take more look at the code next week, but I like the idea and seems fairly straightforward

Comment on lines 234 to 237
DlIndex end = display_list->record_count();
for (DlIndex i = 0u; i < end; i++) {
display_list->Dispatch(receiver, i);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is really the meat of what this change allows: dispatch takes an offset and so I can choose to dispatch at any particular offset.

The tricky part is the op kinds right? We'd still need to manage things like paint state, clip state, transform state and distinguish between them when re-dispatching, correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's where GetOpCategory comes into play. You can easily figure out if it is a Save/Restore or a Rendering op or an Attribute/Transform/Clip op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the latest change you can also do for (DlIndex i : *display_list) - woot...

@flar flar force-pushed the dl-indexing branch 3 times, most recently from d2d0527 to b65b742 Compare August 12, 2024 18:14
@jonahwilliams
Copy link
Member

Some other potential usage of this: we may compute that a saveLayer could be edlided during dispatch: it would be nice if we could manipulate the dispatch/iterator structure to skip to the corresponding restore. Right now with experimental canvas we need to rely on the cull rect checks

@flar
Copy link
Contributor Author

flar commented Aug 12, 2024

Some other potential usage of this: we may compute that a saveLayer could be edlided during dispatch: it would be nice if we could manipulate the dispatch/iterator structure to skip to the corresponding restore. Right now with experimental canvas we need to rely on the cull rect checks

You just skip the Dispatch(receiver, index) calls until the category display_list->GetOpCategory(index) is Restore.

Check out the new DisplayListOpCategory enum in display_list.h - I figured that would include the necessary information you might need in order to control your dispatch. If not, it could be enhanced. I'd discourage using GetOpType as that is very specific and kind of relies on the current state of how we encode the various ops (for instance, there are 2 types associated with SaveLayer). It's useful for debugging mostly, but if we need to rely on it in any way during production-level iteration then we should look for other solutions.

@flar
Copy link
Contributor Author

flar commented Aug 12, 2024

Right now with experimental canvas we need to rely on the cull rect checks

Is there a reason why this wouldn't already be culled by the culling form of dispatch? (You can still do culling with indexed dispatches, btw - you just use the new GetCulledIndices method to drive your dispatch loop.)

@jonahwilliams
Copy link
Member

I'm not certain why it doesn't happen. I can say that there are scenarios with the aiks canvas today where aiks will cull save layers that display list does not.

@flar
Copy link
Contributor Author

flar commented Aug 12, 2024

I'm not certain why it doesn't happen. I can say that there are scenarios with the aiks canvas today where aiks will cull save layers that display list does not.

Oh, right, like when the DL had to make a pessimal assumption about the destination size because it didn't actually have a destination in mind when it was recorded. Thus, there may be additional culling that can happen in the final stages like Impeller.

Copy link
Member

@jonahwilliams jonahwilliams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with minor nit

// then gets reset to the value stored in the stack top
if (info.save_was_needed) {
FML_DCHECK(next_restore_index < offsets_.size());
indices.emplace_back(next_restore_index);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit, but I would prefer push_back if we're not using the aggregate initialization features of emplace_back.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were other places that also used emplace_back with a simple integer type (all in this PR) so I changed them all. Only one emplace_back that was actually invoking an aggregate constructor.

@flar flar added the autosubmit Merge PR when tree becomes green via auto submit App label Aug 13, 2024
@auto-submit auto-submit bot merged commit db034a5 into flutter:main Aug 13, 2024
29 checks passed
auto-submit bot pushed a commit to flutter/flutter that referenced this pull request Aug 14, 2024
…sions) (#153413)

Manual roll requested by zra@google.com

flutter/engine@019f9e3...5909666

2024-08-14 98614782+auto-submit[bot]@users.noreply.github.com Reverts "Roll Dart SDK from 44635f897535 to 733062367c2e (1 revision) (#54541)" (flutter/engine#54552)
2024-08-14 98614782+auto-submit[bot]@users.noreply.github.com Reverts "macOS: Clean up create_ios_framework.py (#54543)" (flutter/engine#54550)
2024-08-14 98614782+auto-submit[bot]@users.noreply.github.com Reverts "macOS: refactor create_macos_framework.py (#54546)" (flutter/engine#54549)
2024-08-14 chris@bracken.jp macOS: refactor create_macos_framework.py (flutter/engine#54546)
2024-08-13 flar@google.com [DisplayList] Allow random access to ops through indexing (flutter/engine#54484)
2024-08-13 mit@google.com Update dartdoc for gpu.dart (flutter/engine#54529)
2024-08-13 chris@bracken.jp macOS: Clean up create_ios_framework.py (flutter/engine#54543)
2024-08-13 skia-flutter-autoroll@skia.org Roll Dart SDK from 44635f897535 to 733062367c2e (1 revision) (flutter/engine#54541)
2024-08-13 skia-flutter-autoroll@skia.org Roll Fuchsia Linux SDK from 2LTVy4Gv76DcMpz4V... to MeV5i7xXXFPHF5sBK... (flutter/engine#54542)
2024-08-13 1961493+harryterkelsen@users.noreply.github.com [canvaskit] Add animation detection for GIFs (flutter/engine#54483)
2024-08-13 matej.knopp@gmail.com Preserve background frame damage (flutter/engine#54540)
2024-08-13 68449066+zijiehe-google-com@users.noreply.github.com [fuchsia] Use BundledTestRunner from test-scripts (flutter/engine#54404)

Also rolling transitive DEPS:
  fuchsia/sdk/core/linux-amd64 from 2LTVy4Gv76Dc to MeV5i7xXXFPH

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC rmistry@google.com,zra@google.com on the revert to ensure that a human
is aware of the problem.

To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
DBowen33 pushed a commit to DBowen33/flutter that referenced this pull request Aug 16, 2024
…sions) (flutter#153413)

Manual roll requested by zra@google.com

flutter/engine@019f9e3...5909666

2024-08-14 98614782+auto-submit[bot]@users.noreply.github.com Reverts "Roll Dart SDK from 44635f897535 to 733062367c2e (1 revision) (flutter#54541)" (flutter/engine#54552)
2024-08-14 98614782+auto-submit[bot]@users.noreply.github.com Reverts "macOS: Clean up create_ios_framework.py (flutter#54543)" (flutter/engine#54550)
2024-08-14 98614782+auto-submit[bot]@users.noreply.github.com Reverts "macOS: refactor create_macos_framework.py (flutter#54546)" (flutter/engine#54549)
2024-08-14 chris@bracken.jp macOS: refactor create_macos_framework.py (flutter/engine#54546)
2024-08-13 flar@google.com [DisplayList] Allow random access to ops through indexing (flutter/engine#54484)
2024-08-13 mit@google.com Update dartdoc for gpu.dart (flutter/engine#54529)
2024-08-13 chris@bracken.jp macOS: Clean up create_ios_framework.py (flutter/engine#54543)
2024-08-13 skia-flutter-autoroll@skia.org Roll Dart SDK from 44635f897535 to 733062367c2e (1 revision) (flutter/engine#54541)
2024-08-13 skia-flutter-autoroll@skia.org Roll Fuchsia Linux SDK from 2LTVy4Gv76DcMpz4V... to MeV5i7xXXFPHF5sBK... (flutter/engine#54542)
2024-08-13 1961493+harryterkelsen@users.noreply.github.com [canvaskit] Add animation detection for GIFs (flutter/engine#54483)
2024-08-13 matej.knopp@gmail.com Preserve background frame damage (flutter/engine#54540)
2024-08-13 68449066+zijiehe-google-com@users.noreply.github.com [fuchsia] Use BundledTestRunner from test-scripts (flutter/engine#54404)

Also rolling transitive DEPS:
  fuchsia/sdk/core/linux-amd64 from 2LTVy4Gv76Dc to MeV5i7xXXFPH

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC rmistry@google.com,zra@google.com on the revert to ensure that a human
is aware of the problem.

To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
bdero added a commit that referenced this pull request Aug 20, 2024
flar added a commit to flar/engine that referenced this pull request Aug 21, 2024
Buchimi pushed a commit to Buchimi/flutter that referenced this pull request Sep 2, 2024
…sions) (flutter#153413)

Manual roll requested by zra@google.com

flutter/engine@019f9e3...5909666

2024-08-14 98614782+auto-submit[bot]@users.noreply.github.com Reverts "Roll Dart SDK from 44635f897535 to 733062367c2e (1 revision) (flutter#54541)" (flutter/engine#54552)
2024-08-14 98614782+auto-submit[bot]@users.noreply.github.com Reverts "macOS: Clean up create_ios_framework.py (flutter#54543)" (flutter/engine#54550)
2024-08-14 98614782+auto-submit[bot]@users.noreply.github.com Reverts "macOS: refactor create_macos_framework.py (flutter#54546)" (flutter/engine#54549)
2024-08-14 chris@bracken.jp macOS: refactor create_macos_framework.py (flutter/engine#54546)
2024-08-13 flar@google.com [DisplayList] Allow random access to ops through indexing (flutter/engine#54484)
2024-08-13 mit@google.com Update dartdoc for gpu.dart (flutter/engine#54529)
2024-08-13 chris@bracken.jp macOS: Clean up create_ios_framework.py (flutter/engine#54543)
2024-08-13 skia-flutter-autoroll@skia.org Roll Dart SDK from 44635f897535 to 733062367c2e (1 revision) (flutter/engine#54541)
2024-08-13 skia-flutter-autoroll@skia.org Roll Fuchsia Linux SDK from 2LTVy4Gv76DcMpz4V... to MeV5i7xXXFPHF5sBK... (flutter/engine#54542)
2024-08-13 1961493+harryterkelsen@users.noreply.github.com [canvaskit] Add animation detection for GIFs (flutter/engine#54483)
2024-08-13 matej.knopp@gmail.com Preserve background frame damage (flutter/engine#54540)
2024-08-13 68449066+zijiehe-google-com@users.noreply.github.com [fuchsia] Use BundledTestRunner from test-scripts (flutter/engine#54404)

Also rolling transitive DEPS:
  fuchsia/sdk/core/linux-amd64 from 2LTVy4Gv76Dc to MeV5i7xXXFPH

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC rmistry@google.com,zra@google.com on the revert to ensure that a human
is aware of the problem.

To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
autosubmit Merge PR when tree becomes green via auto submit App
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants