-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mitigate the impact of outliers on the fee rate statistics #394
Comments
How are these anomalous data generated? If there is no reference value you can use data denoising to find out the anomalous data to clean up. |
They are collected from real-world data and I thing they are reasonable because there aren't many transactions in the pool, so the order is not impacted by fee rate obviously |
Perhaps we can avoid this by sorting the fee and calculating the average elapsed time. For example, take the top 50% of the highest fee and calculate the fastest time, take the bottom 50% and calculate the slowest time, and take the average of all the fees and calculate the average time. |
Typically, a higher fee means a shorter period of time, although data from different times cannot be referenced together due to the existence of peaks and valleys in trading. Perhaps we could add a time limit of nearly 10,000 transactions, such as within the last hour, so that the data would be closer to the current situation. |
As the data is real but inaccurate due to the time span. In this case, we simply filter the data with large deviations, which does not provide correct feedback on the actual situation. So here, we can consider adding a limit to the reference value, only get the transaction data within 1000 blocks, but in order to avoid the gap caused by no data, we need to set a default value of 2000 shannons/KB within 10 seconds. By the way, the restriction here is only for the rate-tracker reference value chart, other data charts are not adjusted.This is because the other charts are statistics, while this one is a summary chart that can be used for reference. |
LGTM, but IMO the count of samples should be relevant to the count of transactions instead of blocks. What if limiting the count of samples to |
This is also workable, the blocks are mainly designed to limit the timeframe and prevent long time spans in case of low active transactions. |
When will we handle this, it's a bit bothering to users |
BTW, the solution mentioned above just shrinks the time-frame of exceptional status. The algorithm is still required to be optimized to remove outliers from samples. |
According to the last data, the anomalies will be obviously large (more than 10 times beyond the normal value), we can add a suitable value to filter, for example, filtering more than 100000shannons/kB data. So we can start with the following program: 1.Filtering more than 100000shannons/kB data. |
The rule
It is a hardcoded value that will be imprecise in most cases. For example, when there's a jam on chain, the fee rate should all be greater than 100000 shannons/kB, and all samples will be filtered out. I will suggest filtering outliers by the following rule
By doing so, the filtered samples will be monotonically increasing. |
For the rule 2
In case that By doing so, the fee rate will be close to 1000shannons/kB when there aren't many transactions. I would suggest using 100 as the threshold based on the current activities |
A temporary solution(nervosnetwork/ckb-explorer-frontend@672233c) was submitted as a hotfix to avoid exceptional samples in production environment |
This is a transitional program and values can be adjusted as appropriate.
Removing long and high fees based on confirmation time sorting may remove normal data if there is a change in the busyness of transactions on the chain when sampling data in a uniform interval. Returning to the question at hand, I think removing data noise is an appropriate solution.
Regarding the sampling of data, 100 is set as the threshold, but a time limit such as within 1 hour needs to be added. Avoiding too large a time span of data does not reflect the current situation in a timely manner. |
This filter adopts the logic that miners use. In general, tx with a high fee rate will be mined first, that means tx confirmed later won't be with a higher fee rate, theoretically.
Removing data above or beneath specific values won't ameliorate the outliers because outliers may all sit in the valid range. Say the original samples are as follow
Same as above, if the algorithm is only to smooth the original curve, the trending won't be fixed.
The suggestion at #394 (comment) covers 2 aspects
If the TPS is very low, say 2 transactions/minute, similar to no transactions within 1 hour, many dummy samples with 1000shannons/kB will be inserted to make the trending close to the minimal fee rate. |
Interquartile range was added by nervosnetwork/ckb-explorer-frontend#1411 |
The lowest fee rate should be recommended when the on-chain bandwidth is not fully occupied. So I would suggest adding a new strategy as follows
|
Threshold will be set |
Will be update by nervosnetwork/ckb-explorer-frontend@a92790d The strategy is slight tweaked that threshold is dynamically updated by average block time, it's set |
That means, the low fee rates are ideal if they make transactions be committed within 2 blocks. |
Already on mainnet and testnet. It's hard to test unless we send numerous transactions on testnet |
A large amount of data can be built on Testnet in a short time, with 1501 pieces of data per block, but no change in the feerate has been found yet. |
Impacted by this issue #665 There are always hundreds of transactions having a 1000 fee rate in the history even though all recent transactions come with a high fee rate. |
Some high-fee-rate samples appear at 30~50s which makes the average fee rate of
slow
stage higher than that ofhigh
stage.A filter may be adopted on the data as follows:
Any thoughts from @Danie0918 @Sven-TBD
The text was updated successfully, but these errors were encountered: