-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Update hardware requirements for benchmark machine
#13308
Comments
What is the configuration now? |
https://wiki.polkadot.network/docs/maintain-guides-how-to-validate-polkadot#reference-hardware |
@ggwpez I know this, but benchmark doesn't pass successfully after my machine is configured. |
You can find it here #13317 |
So we're going to run the benches on cloud VMs right now? Hm, I understand that's what most people are running so in a way it makes sense, although it might make dealing with the benchmarks more painful due to the inherent variability of cloud machines. Two random possibilities I see which could make this better. Just use bare metal anywayRun the benchmarks on bare metal, and manually calculate a static multiplier(s) which would allow us to translate the results from bare hardware into the cloud machine, and have those as official weights. Use multiple cloud machines in parallel
Basically you'd essentially have the benchmarks which average the results from one machine (so it reduces the variability within that single VM), and a "meta" benchmark like this which sits a lever higher and averages results from multiple machines (which reduces the variability across the VMs). We'd also have to make sure that our VMs are going to be put on separate machines. Not sure if/which cloud providers provide such service, but an easy way to guarantee this would be to just pick the same type of VMs in multiple regions. |
About Use multiple cloud machines in parallel:
About Just use bare metal anyway:
Overall, my opinion is that using a fully cloud-based system here would be more beneficial than continuing to use the bare metal setup, and that the resources allocated towards maintaining the current setup would be better used in developing a cloud-based solution. |
If we can come to reasonable results using the cloud VMs, they should be fine. However, for this we need to test @koute approach. |
TLDR: FRAME-benchmarking could be the reason and not hardware. Comparing them cross-runtime is normally a good indication whether or not they are consistent, even on the same machine. The hardware stats of the VMs are very consistent from my past measurements. That the weights dont follow suit could come from too few repetitions. |
We should aim for the most reliable solution with the least noise here. So why not both? Improve FRAME and still run on machines with predictable load (bare metal). |
@rcny I talked with @athei and we came to the agreement that we should continue running the bare metal machines until either FRAME-benchmarking is working properly or we come up with some different solution. We can not sacrifice our work here because maintaining the bare metal machines is more workload. I trust you with this and we will make sure to find a solution we are all happy with, but until this is done please bring back the bare metal machines. |
What this means concretely:
It is no problem to have the VMs in parallel to try them out. But we should not switch over until everything works for some time. |
Tracked in https://github.com/paritytech/ci_cd/issues/740. |
The baseline in |
To me it feels like we have two somewhat conflicting use-cases here:
For (1):
For (2):
ProposalSo what would make sense to me would be:
Another nice thing about (b) would be that we could also hook other non-FRAME benchmarks to that, e.g. some of the tests I usually manually run when we upgrade |
Even though it's relatively easy to implement, I'd first propose alternative:
sounds reasonable & for sure we can make both modes work through command bot. |
No it isn't about the crate |
Even if we had those two sets of benchmarking machines: It will still be a problem if the weight machines have a lot of noise. I think we will still have a problem merging those diffs even if the "performance machines" show no diff. Also, it will make the setup more complicated. I think we should first try improve the benchmarking machinery and see if we can get rid of the noise as suggested by @ggwpez. Only if this fails we should look into further steps. But until then we need to be unblocked and continue using the old machines. |
Yes. I am using a "special" bench bot command. But it should be the default. Because anyone who doesn't know about this will be caught off guard scratching their heads about the diff. |
@athei @ggwpez @bkchr @koute What would be the long-term plan re VM vs BM? if it's not yet clear, than I'd suggest to use this "special" :) bench-bm command, which runs benchmarks on BM in a meanwhile, until we have an understanding VM, VM + BM or anything else :) |
As I understood from the thread it's possible to adjust benchmarks in substrate to run them in VM runners but it requires additional coding described in #13308 (comment). So until this is addressed substrate is going to use BM runners. |
Quoting what @athei said above:
I would propose to do exactly what @athei proposed above your post. The default is to use BM and then have some special command for VM.
The benchmarking works, but produces completely different results all the time, because the VM has a lot of noise. We also already have seen this in Cumulus benchmarks. |
Oh, that sucks and should be investigated :( Can you give some examples please? |
I've found, for the reference: |
Yes and that is also what we already have discussed here in the issue. We will investigate and hope that paritytech/polkadot-sdk#379 fixes it. However, until we know the proper fix we want to continue with the current machines. |
BM machines and weights were reverted in paritytech/cumulus#2225 and paritytech/polkadot#6762. Bench bot is now using bm machine by default (reverted in paritytech/command-bot-scripts#14) |
Thank you @alvicsam |
What bare-metal instance are substrate using for the benchmarks? I would like to run the benchmarks independently to see if I can get the same results. |
Substrate switched to VM servers as well. In theory the Wiki entry should be enough info to reproduce it. If you want to exactly reproduce our CI setup, @oleg-plakida will have confirm it.
|
Ok I'm confusing, @alvicsam said here that polkadot/cumulus reverted to bare-metal instances, so I was wondering what bm instances you guys are using, I know before it was Open OVH Rise 2.
So now it is using Google's VM's again? |
Hi, Lohann. There have been a rough road to VM with some back and force migrations) But since about month ago all repos had been switched to VM. Also ‘bot bench’ command uses VM machines by default. You can refer to Oliver’s comment if you need to create testing env of your own. Please use the how-to https://github.com/paritytech/ci_cd/wiki/Gitlab:-Setup-new--runner-(bare-metal-and-VM)#create-runner-manually |
Yap. Mostly the same, except of some CI parameters which could be omitted. One small notice, i would recommend use the most fresh machine image: |
@oleg-plakida thank you so much for your help, btw this repo is private for me: One thing is not clear is how you guys solved the issues regarding VM's results, are substrate implementing the @koute multiple cloud machines in parallel for get an average weight? or does it simply runs in one VM? |
The benchmark results were consistent enough to not induce any further action. We just run it once on a VM and then use the results. As far as I am aware, no obvious inconsistencies reported. |
Oliver is right. Regarding access it’s only for parity team. You can contact me in matrix and i will find the way to pass this docs and custom binary of docker-machine to you. |
Run the
benchmark machine
on new reference hardware and update the requirements JSON file. Showed up on SE.The text was updated successfully, but these errors were encountered: