-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Increased CPU usage and Higher Partial Times #15836
Comments
I can confirm I have seen this same issue using my Windows GUI harvester. |
It is worth noting, this is not the config.yaml spelling issue. These are freshly generated config.yaml files for RC1, using the same values as before |
I verified, in the settings tab on the GUI, that the correct settings for decompressOrs was being read. |
My setup had fully functional decompressor (previously decompresser) settings. I do not believe this issue is resolved. |
I concur, my settings are correct too, and the issue is far from resolved |
Apologies I did not see this sooner. My github I have my github notifications messed up due to a mail filter. Thanks for the report. Any chance you guys can run some tests directly with the bladebit exe on |
Unfortunately the hardware I am using seeing the worst of the issues (with the lates), is a 2014 Synology, and it does not support the newer versions of simulate Fatal Error: |
I plotted on a much more powerful, Ubuntu, machine than my power efficient Windows box with mobile CPU. I will have to attach a suitable SSD to house a plot for a meaningful simulate, so I will get back to you as soon as I can. |
Sorry for the delay. Here are the stats. Let me know if you need anything else. Using 2 threads and 2 parallel decompressors, as that was found to give the least stale partials on my system, so it's what I normally use for farming. Choosing size 200TB to roughly match my farm bladebit_cuda alpha 4.3bladebit_cuda-a4-3 -t 2 simulate --no-cuda -n 100 -p 2 --power 360 --size 200TB --seed e11c75d4ef981ed097bf40452de79eec8c4ce836236f0474573aae143f2fd7fd plot-k32-c05-Test.plot Context count : 2 compression | plot count | size TB | size PBC5 | 12561 | 1096 | 1.10 bladebit_cuda alpha 4.4bladebit_cuda -t 2 simulate --no-cuda -n 100 -p 2 --power 360 --size 200TB --seed e11c75d4ef981ed097bf40452de79eec8c4ce836236f0474573aae143f2fd7fd plot-k32-c05-Test.plot Context count : 2 compression | plot count | size TB | size PBC5 | 12208 | 1065 | 1.07 |
Have tried working with empower to do some troubleshooting... The issue persists when changing the version of Chiapos used, on a cloned repository of the 2.0 release branch. Versions tried chia pos 2.0.0rc1, 2.0.0b5, 2.0.0b4, 2.0.0b3 2.0.0b1 and 2.0.0b2 will not run as it displays this error 2023-08-04T08:04:07.159 harvester chia.plotting.manager and the same with 1.0.12b16, which was the last known working version as far as we know, used in Alpha 4.3 build. Alpha 4.3: chiapos=1.0.12b16 Could be worth comparing what changed between chiapos 1.0.12b16 and 2.0.0b1, assuming the bug was introduced changing up to the major version, if chiapos is even at fault. We also ran some tests changing to use "blspy==1.0.16", but the results were the same |
After updating my rig to 2.0.0rc3 these lines have appeared in my log file (as they have done for several versions) 2023-08-05T16:05:49.900 harvester chia.harvester.harvester: WARNING Looking up qualities on N:\C5\plot-k32-c05-2023-06-03-05-06-3fb805aaed23fcbd6629df024ba4268426f3aab4053142dfd8ace9bda4827702.plot took: 60.26634669303894. This should be below 5 seconds to minimize risk of losing rewards. 2023-08-05T16:05:57.574 harvester chia.harvester.harvester: WARNING Looking up qualities on S:\C5\plot-k32-c05-2023-06-21-04-12-648f1cf9f7deb68b697ce5e7616b357a3ae6720a27b1755d412ed631658caef0.plot took: 59.11319851875305. This should be below 5 seconds to minimize risk of losing rewards. 2023-08-05T16:21:46.164 harvester chia.harvester.harvester: WARNING Looking up qualities on M:\C5\plot-k32-c05-2023-05-31-11-20-aaf6dd611b6ed4c9b603d1169e27e736301a7bc0c548b8427750a694b1b2c483.plot took: 55.353450536727905. This should be below 5 seconds to minimize risk of losing rewards. 2023-08-05T16:23:32.977 harvester chia.harvester.harvester: WARNING Looking up qualities on U:\C5\plot-k32-c05-2023-06-23-16-14-880df202463b5b3fb4f95e5825912ab7d98d206c3a1e6e015dc8500a3b5a1412.plot took: 105.28022837638855. This should be below 5 seconds to minimize risk of losing rewards. 2023-08-05T16:23:54.091 harvester chia.harvester.harvester: WARNING Looking up qualities on R:\C5\plot-k32-c05-2023-06-15-11-03-b383db6587781130d747fea28beeae30818fad89c8b2520840b5adaa69337458.plot took: 79.3622350692749. This should be below 5 seconds to minimize risk of losing rewards. 2023-08-05T16:25:08.232 harvester chia.harvester.harvester: WARNING Looking up qualities on J:\C5\plot-k32-c05-2023-07-28-14-14-09edcffacc140e67e7410808d33299100a2388041c69903d07b2657d67ae193d.plot took: 43.46801948547363. This should be below 5 seconds to minimize risk of losing rewards. 2023-08-05T16:27:15.948 harvester chia.harvester.harvester: WARNING Looking up qualities on K:\C5\plot-k32-c05-2023-05-26-06-18-2103493ca705752950c6f83b2f40d5897ecd1338b49f8c88a152774290f4c20c.plot took: 41.67232918739319. This should be below 5 seconds to minimize risk of losing rewards. 2023-08-05T16:32:36.251 harvester chia.harvester.harvester: WARNING Looking up qualities on O:\C5\plot-k32-c05-2023-06-06-07-06-63316b75708e944ced5e578ba028ec763c338c6a8378cef7139004f16fe7df48.plot took: 44.853880643844604. This should be below 5 seconds to minimize risk of losing rewards. I was watching the Windows Task Manager for most of this time & at no point did the CPU reach 100% utilisation for more than 5seconds. More pertinent to the overall performance impact. I have not managed to submit a single partial in <5s since the introduction of alpha 4.5. I was hitting about 50% <5s for compressed plots before that release. |
Issue persists in RC3, still see the late full proofs, and lots of random spikes for qualities warnings for non full proof lookups |
I tried rolling back to version 4.3 (from 4.5) and then upgraded to RC3, I have seen no noticabale timing differences between those three builds. Average is consistently around 1.5 seconds, with peaks in the 5-10 second range. Farming 4900 C4 + 50 C5 plots with CPU System: OS: Windows 10, 22H2, 19045.3208 Thought I'd add this for comparison, maybe it can help to root cause. |
Main difference between Voodoo-Chia's environment & both mine & PhillWalker is that Voodoo-Chia has only 50 C5 plots, so statistically small chance of multiple plots passing filter for each challenge, whilst both the others have > 2,000 C5 plots so statistically high chance. This MAY suggest the problem lies in parallel/sequential lookups for C5+ compression. |
4950 actually ;) |
Sorry, corrected original post. Only 50 c5 plots |
My intel i3-2100 and i3-4130 seem OK and dont deliver lates. Its the Intel Atom D2700s which have been really affected by these new releases the worst. I can confirm these lates aren't caused by duplicate proofs at the same time or back to back ones. |
Issue persists in RC4 |
There appear to be 2 issues at play here. Both impact harvester performance, but I am not sure if one is a consequence of the other. (Increased average lookup times & sporadic excessive lookups). However, 1 issue is clearly occasional unworkable lookup times. I have seen (in RC4) a lookup of over 500s seconds reported in the log. One instance of an excessively long lookup appears to have a lasting effect on the harvester, with output reduced for a period of time (estimated at up to an hour) after it occurs. Any lookup is worthless if it cannot produce a partial in 30s or less, since most pools enforce 32s limits. Why not just drop any proof if it doesn't complete in say 20s & report the failure to the log? This would at least minimise the impact of that element. |
Unfortunately, that is a rather long standing issue (see #7788) and despite seeming relatively straightforward is more complicated than it appears. But yes, this could be a part, maybe a large part of the problem |
I've been following my hunch that it is linked to bad plots and am making progress. I have aggressively removed plots that result in the following criteria from a chia plots check with default 30 iterations (remember ALL of these plots pass the basic plot check): Plots with less than 16 Proofs found Replacing these plots (approximately 8% of my total farm after several iterations) has massively reduced the issue. I am having to use the bladebit_cuda, plotter provided as part of alpha 4.3 as any later ones cannot be checked using the chia plots check command in alpha 4.3. I want to be able to switch to alpha 4.3 to validate the performance difference still exists. What has changed in the plot format since the alpha 4.3 plotter that would cause the check to fail? Does this suggest that I may need to completely replot to this newer plot format? Is it possible that you are unable to reproduce the error in the farming software because the problem is initiated in imperfect plots & the farmer application is not handling the problem? Plotting in Ubuntu 23.04 with 256GB RAM, Intel CPU, Nvidia Quadro M5000 GPU (CUDA 5.2, 8GB vRAM) |
Issue persists with 2.0 RC5, which uses ChiaPOS 2.0.1. Approx 20% late on the lower end hardware. No lates on fc.comprssion build for comparison. |
Issue persists with 2.0 RC6, which uses ChiaPOS 2.0.2. Approx 20% late on the lower end hardware. No lates on fc.comprssion build for comparison. |
Issue persists with final 2.0 public release. This is using the official chia docker image built by chia, rather than self built now it is available. |
Issue persists in 2.0.1 |
Issue persists in 2.1.0 |
What CPU are you running on the Synology? Have u actually farmed a block successfully with alpha 1 version? We are wondering if it is fully correct or flawed and fast |
Intel Atom D2700 2 cores, 4 threads |
Intel® SSE2, Intel® SSE3, Intel® SSSE3 No AVX. Maybe its an instruction set issue with compile time. Did u have luck with alpha version making blocks? I will look at the green reaper compiler settings |
I can confirm I won multiple blocks on this version across a 1.5PiB farm, on multiple harvesters. |
@Coyote-UK s CPU is i7-4500U for reference with the same issue |
Confirmed: i7-4500U 16GB RAM SSD for DB, Win 10, 2472 C5 plots (created with alpha4), extensive pruning to remove bad plots, No blocks found between 3,798,055 & 4,155,045 (Equivalent of 2.91 x ETW) which covers ALL compressed plot farming up to the release of 2.0.0. I accept that statistically this is possible, but is a big coincidence. |
it certainly sounds like there is a performance problem if you were able to win blocks with the old version. it will difficult for us to regression test this since the merge was so long ago. if there is any other information you can give us please let us know. i do appreciate the testing you have done so far. |
What happened?
Since Chia Compression 4.3 alpha through to the current 2.0 RC1, partial times have increased noticeably, causing lots of lates on lower end equipment.
LebanonJon has been able to verify that the CPU usage has increased since this time, and WallentX has also acknowledged this increased resourced usage for non compression, and wonders if this has also impacted compression.
Version
2.0 RC1
What platform are you using?
Linux
What ui mode are you using?
CLI
Relevant log output
No response
The text was updated successfully, but these errors were encountered: