-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update Mikado v0.7.46 #660
Conversation
Here is the update. |
Of course, I will do. I really would like to see the discussion behind this, although it might be better when not^^. But you are absolutely right, the difference in your benchmark is "non-existing", but that also shows me that something is not right measured. I assume that the timer isn't captured in the same event loop. That's why the differences are so minimal. Also that makes it synthetic and surreal. This benchmark is good to show how equal each benchmark is performing within a small time frame, lets say around 4 milliseconds. That is good for less performant libraries, but super fast implementations can't show their strength. That is absolutely fine, because that makes this benchmark so famous, it does not discriminate the slower libraries, and slower libraries are often more popular. As a consequence this benchmark can't be used as a measure tool for the development, especially when improving benchmark. Without my own test suite I wasn't able to optimize anything. But please keep the non-discriminate concept constantly. Keyed has nothing to do with data-driven. "This benchmark should really care about data-driven frameworks" isn't Please consider that my poor english skills might read things differently than what I meant to say. |
I don't understand why you keep putting yourself out there with unnecessary exposure. Maybe it's a lost in translation thing. But now any idiot can claim they have a higher IQ than you. Anyway, with that out of the system maybe there is some room to turn this conversation productive. I think we are just getting to a threshold with a few of the tests that the variability of results on a given run at the top end makes more of a difference than the difference in performance. Since everyone is in the same ballpark for those tests any variation seems considerably different. You can see the overhead of the proxies in a few key areas but in general, it doesn't have as much of an effect of spiking the swap rows or select test. The tests could be re-run tomorrow and any of handful of libraries could be the fastest or faster than Vanilla JS depending on a good run on a couple of those tests. And you are correct in those couple tests there is some sort of framing that is happening with the measuring. Once we are around 16ms or so all bets are off. Even in the sub 40ms range I think the effect is noticeable. This is due to fact of how the rendering is being measured as you noticed paint and all. The improvements in browser performance and computing power of the benchmark machine have even eclipsed the artificial slowdown here. So we are hitting the limits of this but we are also stepping into an area where miniscule differences of the sub ms range of performance are never even going to be noticeable at any realistic scale against the comparably infinitely larger cost of the DOM operations. Those DOM operations are the only place to make substantial gains at this point and the definition of the benchmark, one could argue reasonably so, restricts what can be done here. But I definitely think if you can turn your mind to how we can improve around those tests within the spirit of this benchmark, everyone can benefit. If you have really learned some new optimization techniques from your own benchmarking suggest some improvements for VanillaJS. One of the second tier objectives here is to keep Vanilla out in front of all the other libraries. If it isn't, we are doing something wrong. I've chalked this up to variability, but I could be wrong. If you want to show your cleverness that also would definitely be a way. I wrote Vanilla JS 1 when were having trouble staying out in front of domc. Same could be true now with Mikado. |
First of all sorry about that ryan, you were not meant. One of the most curiosity of all is this benchmark implementation of Mikado has almost all optimizations disabled. Simply because this benchmark cannot cover it (due to the reloading). You should seriously ask if something on the benchmark environment is might not properly. Anyway I like the diversity of benchmarks and it also shows a possible use case (not a common, but possible). When we would start dealing with fresh created data payloads along the runtime Mikado would blow away anything. Fortunately I'm not resentful. Please give me some time to think about and also to recover my mood. At least I need some motivation to get into this "environment" again, possibly this comes back again. BTW the unit scale of an IQ may differs from country to country, but yes, that's absolutely did not matter. |
@ryansolid I would like to make some suggestions for some improvements to this benchmark suite. Where should I post it? Here? |
@ts-thomas Unless implementation-specific (like say VanillaJS or Mikado or etc) I think just make a dedicated issue. I had started one about the select row on specifically: #613. But if you have specific suggestions that seem manageable, I'd just make an issue for those. Possibly even separate issues so if they are significant enough can be handled independently without being grouped together if they are unrelated. That way each can be discussed/implemented on its own merits. I think this might be really important because if any of the ideas are controversial you don't want the whole ship to sink. I know @krausest is always looking at making things better as long as it fits in what is already being done here. If the idea is sound the biggest friction would be if we have to go through and update all the implementations. That presents a huge challenge. There have been discussions like this before that have been beneficial (like increasing the gap between rows in swap rows). |
Let me do some brainstorming, then we can pick those ones which has a realistic benefit. I will post it here, a closed PR is good enough as a backlog :) |
I just focused on the technical aspect at the first. The biggest issues from top down:
Suggestions for improving these issues:
|
I‘m not at home, so just a short comment regarding one incorrect statement. The measurement does not use the duration of webdriver calls (don‘t think that would work). It parses chrome‘s timeline events - please see https://www.stefankrause.net/wp/?p=218 for more information (and thus is not async). An important principle is that it measures js script duration plus rendering time. |
It is the async nature which is really hard to measure. Let me take a look onto this tomorrow. |
Let me check that we mean the same about "async". A synced test has all within the same event loop:
When it is not in the same event loop it is async. There at least some async frameworks in your table, so it is impossible to stay in the same event loop. The benchmark covers render time. But some points came up to me:
Let me show you a simple comparison. I added just a "console.time()" to the test case 1 and benchmark a series of 5000 loops within the same event loop:
I think it is very obviously that this timings differs by far more than the results from the framework. I really would like to help. If you are interested to improve this, please take me to some important lines of the code. |
Should it? I honestly am not sure. I definitely think that we want to measure more than just the JS time. The JS time alone is almost worthless if it doesn't lead to faster full render time. It can be 1000x times faster and it doesn't really matter. As in reality, it is the full render experience we see. I've seen libraries do really well on the JS side and do poorly on the rendering side. I don't know the exact details of how he measures but in @localvoid UIBench he lets you choose whether to do just JS time vs Full Render time. In that benchmark I've seen libraries invert standing on tests due to it. Like not just close the gap but actually be slower in the JS recording and then faster comparatively with Render. However, that being said I get that if the browsers timers aren't depicting real render time either they aren't particularly useful as well. I'd have to defer to someone else's expertise here. But conceptually I think as a consumer I'd rather have a slightly flawed measurement that included rendering of some form than one that gives false impressions of performance from JS time alone. As a library writer I like having the flexibility of knowing both scenarios, but I think ultimately I'd take the thing closest to what the end-user sees (but that is just a personal priority, and not speaking for anyone else, it would be good to get a few opinions here). Like even if I was hitting a browser frame rule I feel that makes things more authentic. But it can be argued this test has long moved past the point of authenticity. But you can definitely see how it started its life as the "realest" of synthetic benchmarks. |
Please keep in mind that there is a huge gab between "what you expect to get" and "what you get". I write "should be equal" but of course it isn't. At least for me this is no reason to tend adding synthetic cosmetics. It is a rule that every manual adjust which is coming from a good intention has a high potential to lead into wrong results and false interpreting. That's why I exclude this part in my benchmark. I do not believe that these results are wrong just because the paint roundtrip isn't included. When adding this roundtrip it will just adding noise. Of course I did not cover async libs, which is actually not possible this way, but also I get a lot of advantages. But of course we need to differentiate it. There are some special libs which follows its own render strategy (I also made an own render stack a couple years ago). At the end "accuracy" and "compatibility" acts in contrast to each other. Of course I absolutely understood it that this kind of framework could not use the same technical base. That also isn't my intention. Yes, it is important to seriously ask what it the desired result of a benchmark. I would doubt it that you have the knowledge down the road to the each processing unit of chrome, but maybe you have. |
I also updated the other Mikado test to the same version as the new proxy version, it might be confuse. Between 0.6.5 and 0.7.4 most of the updates was related to reactive design and stores. Although, the latest version gets some minor improvements, one of them is that I found a way to get finally rid off the synced dom indices. Since the proxy version exists as an independent test in this benchmark, I decide to slightly change this test implementation to get the maximum difference of both implementations. This test do not use a store anymore (before it uses the "loose" internal store). With these minor changes the comparison of "mikado" and "mikado-proxy" becomes "100% functional 0% data-driven VS. 0% functional 100% data-driven".