Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues when setting the "active" pseudoclass #293

Closed
MatthiasJFM opened this issue Mar 31, 2022 · 13 comments
Closed

Performance issues when setting the "active" pseudoclass #293

MatthiasJFM opened this issue Mar 31, 2022 · 13 comments
Labels
performance Performance suggestions or specific issues

Comments

@MatthiasJFM
Copy link

I've noticed that on my application there is a massive performance impact when setting the "active" pseudoclass in Element::ProcessDefaultAction(Event& event).
More specifically setting the class and removing it again triggers a cascade of update calls through the DOM tree, calling through:
Element::Update()
Element::UpdateProperties()
ElementStyle::UpdateDefinition()
StyleSheet::GetElementDefiniton()
StyleSheetNode::IsApplicable()
...where it checks the classnames and pseudoclassnames.
For a little more context I am using the lib on a mobile device, so this is happening in response to touch events - repeatedly tapping the screen drives CPU usage up to 50-60%. Not setting the pseudoclass brings CPU usage down to 8%. I think to repro this you would need to click pretty wildly, but worth having a look into!

Afaict this is being caused by a massive number of calls into IsClassSet() (not so much IsPseudoClassSet() but I guess that's because they appear less in my StyleSheetNodes) when it runs through all the StyleSheetNodes multiple times in an update call.

@mikke89 mikke89 added the performance Performance suggestions or specific issues label Mar 31, 2022
@mikke89
Copy link
Owner

mikke89 commented Mar 31, 2022

Hm, this one is a bit tricky without having RML and stylesheet to test with.

What I can say is that the call tree you list is pretty normal. When you change a class or pseudo class we need to figure which style rules apply to the newly changed element definition, including its children. So this line of updates I would say is expected. On the other hand, the high CPU usage is unusual. From my experience, more CPU time is usually spent on layouting rather than the definition updates, so I'm surprised that you see a lot of time being spent there.

Would you say you have a lot of elements? Or complicated style sheets? If you e.g. have a lot of hidden elements, we still need to update the definition on all of them, so that may be costly.

If you are able to make some RML+RCSS example I can reproduce this with on my end, I would be interested to take a look at that and see if there is anything funny going on.

@MatthiasJFM
Copy link
Author

MatthiasJFM commented Apr 1, 2022

It will be a little difficult to set up an RML+RCSS example, but maybe these values will help.
I'm seeing the issue when tapping quickly on DOM tree elements with a depth of ~9, and the highest number of nodes that IsApplicable is called on in StyleSheet::GetElementDefinition is 84.

My best guess at what is happening is that with every tap/click the whole hover chain is copied to the active chain with each element receiving the "active" pseudoclass. This dirties the definition, on the next Update() call the properties are updated and we run through the stylesheet nodes to see if anything applies. With every node the check consists of string comparison iterating over a vector of strings for classnames and pseudoclassnames.
Then we lift the finger/release the click and the "active" pseudoclass is removed again, repeating what happened above.
Simply multiplying the numbers out would give 1512 calls to IsApplicable per full tap, with more calls to IsClassSet and IsPseudoClassSet.

I think it's a case of loads of vector iteration and string comparison, and if that's the case I think you might be able to pick up a noticeable difference when profiling even for a reduced example.

My case is probably solvable with an RCSS refactor, I think I can bring that 84 nodes down quite a bit. I'm not convinced that will be enough though. I noticed that tag and id get hashed, avoiding the string comparison - perhaps that could be extended to classname and pseudoclassname? Though I can imagine that if it wasn't done there's a reason for it. What perplexes me a little is that I don't have any styles defined with :active in the applicable stylesheets...

As a repro, if you have something at the ready that you can profile I think you can see this happening by:

  • Having a stack of RML elements with classnames (maybe 10 nodes deep) that are all the same size but sit within each other
  • Give the document access to a nice hefty stylesheet with plenty of nodes
  • Click away on your stack of elements!

If my take on this is right that will set and unset :active on all of them with every click, and they will all check the nodes applicable, etc. Should show up on your profiler.

@mikke89
Copy link
Owner

mikke89 commented Apr 1, 2022

I agree that GetElementDefinition and related can be hot paths in terms of performance, so I'm all for devising any strategies to speed this up. Some low-hanging fruit has been done already, such as caching id/tag as you mention, but I'm sure there are plenty of opportunities to do more along these lines. Right now, I believe we have to iterate through all rules that do not use tags or IDs. Thus, classes would certainly be something to look into.

For the record, here is some general advice to speed things up if you experience performance issues with element updates:

  • Try to reduce the number of elements in a document, split them into multiple documents if you can.
  • Use tags and ids in your CSS rules, this helps a lot with the current caching strategy.
  • And of course, more CSS rules takes longer time, so keeping things small is better. Consider separating your style sheets into smaller units to be included as needed.

If we can eliminate some of this advice by speeding things up on the library side I'm all for doing that.

@MatthiasJFM
Copy link
Author

MatthiasJFM commented Apr 6, 2022

Thought about this some more, and I came up with different approaches.
The one I tested and have locally is to simply add a NodeList in parallel to styled_node_index but for hashed classnames. If no id is set for a StyleSheetNode but it has classnames, then the first classname is used for the hash and the node is added to this NodeList; otherwise the node goes to the original NodeList. This works in reducing the number of nodes checked, and brings the CPU usage down by a factor of 2-3. It does appear to have a side effect of image-color applying to some decorator images, not sure why that happens. (IsApplicable(element, false) instead of IsApplicable(element, true) because we only checked class name)

There are two more approaches that I haven't tested but I think may work:

the rules are added to one of several hash maps, according to the selector. There are maps by id, by class name, by tag name and a general map for anything that doesn't fit into those categories. If the selector is an id, the rule will be added to the id map, if it's a class it will be added to the class map etc.
This manipulation makes it much easier to match rules. There is no need to look in every declaration: we can extract the relevant rules for an element from the maps. This optimization eliminates 95+% of the rules, so that they need not even be considered during the matching process

  • Gather a separate map for nodes that have a pseudoclass, and by flag or otherwise only check this list if a pseudoclass is applied. This is probably not the cleanest solution, but is based on the knowledge that a) there is likely to be fewer styles with pseudoclasses defined and b) pseudoclass changes can occur quickly in many places.

@MatthiasJFM
Copy link
Author

Adding another update to this to clarify that the CPU usage seems to only be disproportional when debugging. Testing still shows a CPU usage reduction from ~20% to ~15% with the added hashed class list when tapping repeatedly.

Have you been able to repro any of this when profiling?

@mikke89
Copy link
Owner

mikke89 commented Apr 7, 2022

Good job on the improvements. I did some benchmarking, and I can see that there is indeed a performance issue for anyone using a lot of class-only style rules with no tags or IDs, combined with a lot of elements to update. I suspect that is exactly your case.

Here is a test where we toggle a pseudo class on an element and run Context::Update. Here there are 1700 descendent elements, and testing with adding around 250 unique "dummy" RCSS rules. Different "dummy" rule types for each benchmark:

relative us/op op/s err% total Style rules
100.0% 514.70 1,942.88 1.4% 0.01 Reference (load document)
998.4% 51.55 19,398.64 1.0% 0.00 Reference (update unmodified)
179.3% 287.10 3,483.11 0.5% 0.00 Reference (no style rules)
164.7% 312.50 3,200.00 0.1% 0.00 *
178.7% 288.10 3,471.02 0.3% 0.00 a
181.7% 283.30 3,529.83 0.6% 0.00 #a
176.8% 291.20 3,434.07 0.7% 0.00 a#a
16.6% 3,099.10 322.67 1.0% 0.03 .a
176.9% 290.90 3,437.61 0.6% 0.00 a.a
178.0% 289.10 3,459.01 0.4% 0.00 #a.a
178.6% 288.20 3,469.81 0.4% 0.00 a#a.a

There is one metric that particularly stand out, namely the .a test. That is, when style rules only define a class name and no tag or id. So yeah, this is in-line with the issue you are reporting. I'm guessing we haven't seen this during profiling earlier since the sample style sheet (and my own separate projects) use a lot of tags and ids, only seldom class names only.

So, yeah, we certainly need to fix this as I suspect this is not an all too uncommon use-case. There is a potential 10x performance gain here for the benchmarked use case with a better lookup strategy.

You are certainly on to something there, I'll tinker more with trying different strategies. We should also add some more unit tests to make sure we don't accidentally break anything, there are already some tests for style rules, but we need something more rigorous.

@MatthiasJFM
Copy link
Author

Nice!

To throw it out there: perhaps it's possible to replace String classnames entirely with hashes? It would avoid a load of string comparison as well as a bunch of hashing at runtime. It looks like the string classes are only used for logging and debugging, we could keep the list of strings if debugging is enabled. Alternatively if legible strings are essential we could have a map of hashes to strings?

@mikke89
Copy link
Owner

mikke89 commented Apr 7, 2022

I guess the main issue is hash collisions. Under the assumption that most rules do not match we might get an improvement if we check hash first and then the strings. However, with a better class lookup that assumption might break and possibly be a performance pessimization. Anyway, something to consider for sure :)

Actually, I don't think we handle hash collisions as it is during the cache lookup right now.

mikke89 added a commit that referenced this issue Apr 10, 2022
… style definition, see #293.

- Particularly, performance improvements when using style rules with class names.
- Also fixes hash collisions in tag/id or node lists, previously they could result in the wrong styles being applied.
@mikke89
Copy link
Owner

mikke89 commented Apr 10, 2022

Hey, so I've made some changes (for now in the fast_styles branch). Could you test this for your case?

I've reworked the index a bit. Now we have multiple style cache indices for requirements in prioritized order, including one for classes. I too used the strategy of indexing the first class name.

I also tried to add the pseudo classes to the index. I'm not entirely sure about this one though, since in practice they are not really that unique (usually just a handful ones like :hover and :active) and are typically combined with classes or at least tags. Could you try and see if you get any difference with and without the pseudo class index (added in a separate commit)? If you don't, then I'd feel more comfortable removing it.

Here is a detailed benchmark before and after implementation of the new style index, and separately with the pseudo class index as well.

before (us/op) after (us/op) after, w/pseudo (us/op) ElementStyle (rule name)
508.80 509.20 506.50 Reference (load document)
47.20 45.63 50.80 Reference (update unmodified)
297.00 285.30 282.60 Reference (no style rules)
317.30 312.40 328.00 *
321.30 323.50 322.00 a
323.30 322.30 327.10 #a
291.90 284.20 280.20 div#a
3,203.40 275.10 286.50 .a
1,119.10 286.70 285.50 div.a
310.00 286.30 284.60 #a.a
299.10 285.60 286.60 div#a.a
2,945.30 2,085.60 287.90 :a
825.80 645.80 282.50 div:a
310.10 298.80 284.90 #a:a
293.90 290.80 283.70 div#a:a
3,165.10 290.00 286.20 .a:a
1,114.70 287.10 285.10 div.a:a
306.10 288.10 287.10 #a.a:a
298.30 285.20 291.60 div#a.a:a
300.70 296.70 299.80 * div
2,733.40 2,904.70 2,972.20 a div
2,905.80 2,947.50 3,253.10 #a div
4,248.00 4,894.00 4,902.00 div#a div
4,549.40 3,700.00 4,522.60 .a div
5,739.90 5,257.10 5,514.50 div.a div
2,872.40 2,968.70 3,209.30 #a.a div
4,204.80 5,271.90 4,903.00 div#a.a div
6,363.70 5,851.50 6,291.40 :a div
7,111.50 6,591.80 6,887.40 div:a div
2,884.30 2,980.80 3,285.40 #a:a div
4,639.70 5,423.60 5,102.70 div#a:a div
4,348.90 3,729.50 4,249.50 .a:a div
5,361.40 5,207.90 5,414.20 div.a:a div
2,837.40 3,009.60 3,189.00 #a.a:a div
4,647.00 5,233.40 5,066.40 div#a.a:a div

So now lookup of classes are basically as fast as looking up ids or tags.

What's more is that I've also made sure that we handle hash collisions correctly now. This did cost a bit of performance, but with some other minor improvements we ended up pretty much the same as previously (or slightly better) for the non-class lookups.

@MatthiasJFM
Copy link
Author

Thank you for this! It seems to do great for my case.
I only saw your commit 30d7f60 but no other separate commit for pseudoclasses?
30d7f60 works fine though, my implementation brought CPU usage down from 20% to 15% when tapping quickly, but with this it's down to 8%.

Do let me know if you'd like me to test the pseudoclass case. I agree with your logic that we probably won't need it as the selectors are most likely tied to other identifiers.

@mikke89
Copy link
Owner

mikke89 commented Apr 11, 2022

Ah, those numbers are great to see! Thanks for testing.

The pseudoclass change is at the head of the fast_styles branch, here: e14e923. If you don't see any further measurable improvements with that one, I'll probably drop it.

@MatthiasJFM
Copy link
Author

I see a slight improvement of around one percent when taping quickly, probably not worth the extra work we do in all other cases.

This is very helpful though, thank you!

mikke89 added a commit that referenced this issue Apr 11, 2022
… style definition, see #293.

- Particularly, performance improvements when using style rules with class names.
- Also fixes hash collisions in tag/id or node lists, previously they could result in the wrong styles being applied.
@mikke89
Copy link
Owner

mikke89 commented Apr 11, 2022

Okay, I decided to drop the pseudo class index then.

Now it's all merged to master as well, so I'm closing this issue as it seems we have a substantial speed-up. :) Of course, please let me know if there are any related issues such as wrong styles being applied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance suggestions or specific issues
Projects
None yet
Development

No branches or pull requests

2 participants