Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ASERT / NEFDA is the perfect DA? #61

Open
zawy12 opened this issue Jun 10, 2020 · 22 comments
Open

ASERT / NEFDA is the perfect DA? #61

zawy12 opened this issue Jun 10, 2020 · 22 comments

Comments

@zawy12
Copy link
Owner

zawy12 commented Jun 10, 2020

This article is about the ASERT difficulty algorithm that was independently discovered by Jacob Eliosoff, Mark Lundeberg, and a group at Imperial College. Jonathan Toomin did a lot of work (here and here) to show it's the best we have, to decide how long the crucial averaging window should be, and to implement it for BCH.

There is a relative ASERT and an absolute ASERT. They are mathematically the same, but relative is expressed more like other difficulty algorithms and has a potentially large error if the "averaging window" is large due to nBit's having up to 2^(-15) imprecision and in approximating e^x in the necessary integer math. Absolute ASERT reduces those errors to effectively zero. EMA aka WTEMA is a very close approximation to relative ASERT.

Relative ASERT:

target[i+1] = target[i] * e^(st/T/N - 1/N)
st = solvetime
target[i] = target at block height i, 2^256/difficulty
T = desired block time
N = mean_lifetime = half_life/ln(2)
The amount of error in difficulty and therefore solvetime is up to N * 2^(-15) as a fraction of the correct value.

Absolute ASERT:

target[i+1] = target[M] * e^((t[i] - t[M])/T/N - (i-M)/N)
t = timestamp
M = a fixed block height in the past.
The above is how Mark Lundeberg's ASERT paper expresses it, but to be perfectly correct, it needs to change target[M] to target[M+1]. The BCH ASERT code expresses this correction as:

target[i+1] = target[M] * e^((t[i] - t[M-1])/T/N - (i-M+1)/N)

It's really interesting that the absolute form does not even need to depend on past targets or timestamps if you use the genesis time and just set difficulty to 1. This makes the absolute form:

target = e^((t/T-h)/N)
where
h = current block height
t = current time since genesis (miners can just use their own local and with large N and reasonable tight validator limits, lying about current time will not lower difficulty unless miner wants other miners to orphan his block for cheating)

To make the weirdness more clear, if we use an "averaging window" of N=286 (BCH's is 288) and invert the absolute form to get it in terms of difficulty, ASERT is:
difficulty = 1.0035^E
where
E = number of blocks the chain has emitted ahead of time. So if genesis starts at D=1, it increases 0.35% for each block it gets behind.

WTEMA algorithm:
Relative ASERT is virtually identical to WTEMA by using the approximation that e^x = 1+x for small x. By expanding the parenthesis in the following integer math can be used, taking care to prevent target overflow or underflow (if negative solvetimes are allowed. ASERT can handle the negatives well).

target[i+1] = target[i] * ( 1+ st/T/N - 1/N)

Problems

The first problem is that it gets behind or ahead in the number of blocks it emits over the long term as the hashrate increases or decreases by
excess_blocks_emitted = mean_lifetime_in_blocks * ln(current_difficulty/start_difficulty)
All algos have this problem. The easiest and safest solution to reduce the error is to have a lower mean_lifetime less than say 144 blocks. This may come at a cost allowing switch mining to gain up up to 0.5% more blocks and makes it easier to conduct the "38% attack" described below. If it's 400, there can be over 1% in excess block rate emission in normal conditions (and up to 38% in attack or attack-like conditions). To make sure too many or two few blocks are emitted, the average solvetime can be checked over a longer averaging window and then adjust the target solve time from the official value by a percent that is in the opposite direction of the long-term error. This turns out to be a PI controller as described in a comment below. The comments describes how to do it safely but the results will be worse if there is a long term oscillation that matches the secondary averaging window. If you choose a window that is too slow it has no effect.

Another problem is that if you use N=1, the median solvetime is ~ln(2)/2 instead of the expected ln(2) of the target time. This indicates it makes the difficulty too easy too often, despite being the only DA that has the perfectly correct average target solve time for N=1. If it's possible to have both, it might be evidence of having the perfect algorithm.

Another problem is that it can emit up to 38% more blocks in a long-term private mining attack than simply getting 100% of the blocks. This is done by setting the first timestamp in the private mine to 2x the mean lifetime window and then all the subsequent timestamps to 0 or 1 second. In SMA the attack gets 50% more blocks. In LWMA I could not find any benefit to timestamp manipulation. If the average of say the past 6 solvetimes is used as the input for the "past solvetime" the attack can get 50% instead of 38%. If the max solvetime allowed is 6x block time, EMA can be reduced from 33% to 28%. It's not 38% because the exploit pushes the boundaries of its e^x approximation.

Another problem is complexity of implementation. EMA is a lot simpler and almost the exact same so I prefer it it LWMA is not used. N can't be small because large N allows the e^x=1+x approximation by making x small, but it can't be large because it has the average solvetime error like relative ASERT. I would use N=100 to 200. Also, it needs code I've mentioned elsewhere to handle out of sequence timestamps correctly.

Another issue (an easily solved "problem") is that the paper below recommends the absolute form of ASERT. This exacerbates the 1st problem if a coin forks to ASERT and uses an initial target that is in the distant past: it immediately releases too many coins at the fork. It will also be a problem if a coin changes the POW and needs to make a large sudden change to the difficulty. Both of these are easily solved by changing the baseline difficulty to the difficulty at the fork, but it's something to be aware of.

Another issue is that relative ASERT and WTEMA accumulate error due to nBits which has 2^-23 to 2^-15 truncation error. From bitcoin wiki on difficulty: "0x008000 is the smallest legal value for the lower 24 bits since targets are always stored with the lowest possible exponent. " This is only 2^(-15) = 0.003% error but it decreases average solvetime in proportion to N. The max decrease (error) in solvetime is approximately 2^(-15) * N = 1.8% for N=600.

The error in the e^x approximation in the code accumulates in the same way in relative ASERT (and WTEMA), requiring higher precision in the integer-based e^x estimator if N is large. The error might be made to offset the average nBits error.

A trivial fix for the last two problems is to change the target solvetime by the same amount to offset them, but an attacker doing a private mine (or miners as a group) could select timestamps that minimize difficulty by about 1/2 the error for which you correct. These last two problems are why absolute ASERT is usually recommended. But N=100 is a good setting for relative ASERT and has only 0.3% error in the nBits portion.

My "Derivations" of ASERT

For Relative ASERT, there seems to be direct & solid argument in claiming it's simply using the exponential distribution: (I've inverted the equation to use difficulty instead of target).
D = prev_D * exp_dist_observed / exp_dist_expected
D = prev_D * [ 1/T * e^(-st/T) ] / [ 1/T * e^(-T/T) ]
D = prev_D * e^(-st/T) / e^(-1)

This can vary a lot with st values jumping around, so a smoothing factor is needed. Since a sequence of adjustments to D are multiplying the ratio over and over, an Nth root of this equation can be used. In physics equations the N in e^(-x/N) is called an attenuation depth or mean free path. The best name in this context is "mean lifetime" where time is measured in blocks. An SMA has the same Std Dev of difficulties if its averaging window is 2N.

Including a mean lifetime low-pass filter to what I have above gives relative ASERT:

D = prev_D * e^( (1-st/T) / N).

Absolute ASERT is mathematically the same as relative, but its form allows a different explanation. Apparently, we can claim it's a solution to the differential equation that says we should increase the difficulty per error E (the number of blocks behind schedule) in proportion to the current difficulty.
dD(E)/dE = k*D(E)
The solution to this is
D(E) = D0 * e^(k*E)
which is absolute ASERT with k=1/N.

More complicated derivation

Only previous block data is needed
To estimate current hashrate H we use past difficulties (D's) and solvetimes (st's). If our previous estimate was the best possible given all past D's and st's, then just the previous D and st should be sufficient to modify our previous H estimate. Even if we did not use all past D's and st's and our estimate is far from perfect, a perfect correction should be in the right direction and trend towards being perfect, so over the long term it will actually be perfect, so we only need the "single-block" adjustment to be perfect. Given an H we expected in the prior block, we want to adjust D like this:

D = prev_D * H_actual / H_expected

We can't use solvetime to directly estimate H_actual because a solvetime that is 10% high or low does not indicate H is 10% low or high. This is because solvetimes have a probability distribution around T that will persistently overestimate H because solvetimes are much more frequently < T than > T. We need to map the relationship between st and H to some linear function that properly estimates H with only a single st. The relationship between H and st is given by how we model solvetimes:

CDF = 1 - e^(-st * H / D)
1 - CDF = e^(-st * H / D)
CDF & 1-CDF are random variables from 0 to 1. All equal-sized ranges within 0 to 1 have equal probability. I use the 2nd equation to model solvetimes with 1-CDF = x = rand(0,1).
x = e^(-st * H / D)

If avg(st) over many blocks = T, the D was selected correctly for the given H, i.e. H / D = 1/T so that on average we will see:
x = e^(- avg(st) /T) = e^(-1).
If H is 10% higher than expected, i.e. H_actual = 1.1 * H_expected, then avg(st) will be T/1.1 too low for the D. But we have to use the x equation to estimate avg(st) from a single st.

I'll claim the random variable "x" is the desired linear mapping of the solvetimes that takes the probabilities into account. I can't explain/prove it, but it seemed apparent when I first saw ASERT / NEFDA that this is what it's doing:

D = prev_D * x_actual / x_expected
D = prev_D * e^(-st/T) / e^(-1)

Comments on the Imperial College Paper

Below is a review of a new difficulty algorithm paper, "Unstable Throughput: When the Difficulty Algorithm Breaks" . ( Their tn in this will be the timestamp of prev_D, not current time. )

IMO the paper is good, except for these notes / complaints. I appreciate the author's citing my DA articles and speaking well of this repository of work I've put into it. As with Tom Harding's and Mark Lundeburg's papers, it's good to see better mathematicians carry things through.

For the sake of history in trying to understand why it's taking so long to find the best DA, Jacob Eliosoff is the first one I know of who considered this algorithm, knowing it may have been the best, but opted for the approximation (EMA) due to wanting to avoid the exponential.

Section 3.2
I believe BCH very closely followed the Poisson distribution for a year post-EDA when on-off mining was not a problem due to a higher price & subsidized mining. The depictions of how far away from the ideal Poisson seem a little unfair in the sense that problem happens only when on-off mining is a big problem. It's infinitely better than BTC's. Further, a coin has a much bigger security issue than the DA if on-off mining is a problem because it shows > 50% of miners do not depend exclusively on the coin for income which is the sole basis of BTC's security.

SMA's do not directly cause oscillations as a result of high or low solvetimes falling out the back of the window because new solvetimes coming in the front and the entire averaging should balance out. The SMA is a fair estimate of the current hashrate as it was in the past at 1/2 the size of the averaging window. Accidentally going low with accidental slow solvetimes out the back or long solvetimes in the front can motivate extra miners to come online. So in that sense it can be a "cause", but it doesn't oscillate if H is constant. Once miner motivation causes a significant change for several blocks, the SMA does not dampen the oscillation. It invites a continuance. A close analogy: a steel bar does not ring with oscillations unless you strike it while it's in the air (no dampening from holding it). The strike causes the ringing, not the bar.

Oscillations made worse by delays
The median of 3 blocks BCH uses may directly cause a very slight oscillation due to the delay. Delays like using the MTP as the most recent st as the authors suggest (and as is done in Digishield) may directly cause some oscillation. Zcash has always had a mild oscillation from using MTP. Monero's algorithm caused huge problems in alt coins that copied it because of the various ways it causes a delay (does not use the most recent solvetimes). All alts I'm aware of that used it "blew up" and got stuck until they switched algorithms.

Section 4.2.1
D/st is not an estimate of H for small samples due to increased frequency of small st values making avg(1/st) > 1/avg(st) = 1/T. The best H estimator I have is sum(D)/sum(st) * (N-1)/N. I do not know how to get hashrate or the expected number of hashes from 1 solvetime unless we also have an a priori expected value of the solvetime (i.e. H), in which case it appears the "observed" H for 1 block is D/T * e^(-st/T) where D/T is the expected H.
I wonder if there is an e^x function that can do away with the (N-1)/N and always correctly estimate HR. BTW If solvetimes are prevented from being < 1, the estimate of H for 1 block that gives the right average is: avg(1/st) = D/st/7. The avg H estimate for 1 block is not an integral that converges if st can be 0.

Section 4.6
I've never seen any argument against having a future time limit of say 10 seconds. Based on distributive consensus theory, it only needs to be > 2x propagation delays and should be << block solve time. Block solve time is purposefully chosen to be >> propagation delay. There is no BFT without a pre-consensus on time and its limits. A 2-hour or 10 second rule are not honored because >50% of miners decide to. It's honored because each honest node will immediately reject a chain that is ahead of time due to world time being an objective and obvious fact that the protocol and network do not, should not, and can't provide. Peer time is not the basis because it's rejected if it is 70 minutes off from local time. It should not have been used.

Section 5.5
NEFDA-MTP-72 (which is has the same smoothness as BCH under constant H) is as bad as BCH's DA in terms of delays & "stolen" blocks when modeling on-off mining problems. It invites bad oscillations. The oscillations are faster but more frequent. My method of testing it is a step function that goes on or off if D goes too low or too high, respectively. Since a 10% change in D can cause a 10,000% change in HR, it's pretty accurate. I've used this a lot to simulate live attacks very accurately.

image

@crypto-perry
Copy link

crypto-perry commented Jun 10, 2020

Hello,

Happy to see you reviewed the paper so promptly.

Your comments are all valid points. Let me respond to them in order:

Section 3.2
We were not trying to say that BCH is really bad just that the situation can get much worse, maybe we were not very explicit. You are absolutely right saying that the situation used to be better and this can be seen from the plot measuring number of blocks per hour, but as you rightly point out this only used to happen when coin-hopping was not happening and mining was subsidised. I think in the future coin-hopping will happen more and more as mining pools introduce auto-switching functionalities, so deserts and spikes will indeed get out of hand.

I think the problem with a sliding window is actually an important factor for sustained cyclicality. It's just not right that a solve time influences the hash rate estimation twice: when it enters the window with negative feedback (i.e. if the solve time is long, it increases the window length and causes the difficulty to go down) and also when it falls off the window with positive feedback (i.e. when a long solve time falls off the window, its length decreases and the difficulty goes up). The reason I don't believe the entire average is balancing out is that this is not just a random process where long block times occur randomly because miners will respond to these oscillations by mining according to profitability. Positive feedback should not exist in any DA because it creates incentives that should not be there. This positive feedback is essentially sustained by coin-hoppers and that's how the cycle of long block times followed by short solve times forms. The fact that the SMA also has a delay, is a major factor as well as you describe very well in some of your posts, because it allows coin-hoppers to operate without the DA noticing their activity immediately. I think we basically have the same views, just that we believe that

miners causing the oscillation

is actually the DA's fault for creating the incentive in the first place. I mean why expect miners to not be economically rational and profit seeking?

Section 4.2.1

D/st is not an estimate of H for small samples due to increased frequency of small st values making avg(1/st) > 1/avg(st) = 1/T.

This is indeed absolutely true. We explain this issue when we argue for the introduction of the correction from 1 + T/S to e^(T/S). The reason this is not just an approximation (like it usually is for small values of T/S) but actually a correction is exactly what you describe. We are not quite sure how to include the effects of more probable shorter st without getting some very convoluted probabilistic formulas so instead we gave that indirect argument relying on a scenario under which the algorithm must work correctly.

Section 4.6

A 2-hour or 10 second rule are not honored because >50% of miners decide to. It's honored because each honest node will immediately reject a chain that is ahead of time due to world time being an objective and obvious fact that the protocol and network do not, should not, and can't provide.

We were trying to make 2 points here and I think we are mostly in agreement over them. First of all this rule is not a consensus rule as miners catching up with the network cannot impose it at all. Secondly, it is not at all obvious what constitutes a block that lies about its timestamp by placing it in the future. If miners would not globally agree on a certain cutoff time that is approximately the same for all of them (be it 2hrs or 10 mins in the future) then they would not know which blocks to consider as adopted by others as well and start mining on them or ignore them as the block is too much in the future (and maybe other miners think so as well). If there would not be majority consensus over the exact time a block is allowed to be in the future (I don't understand why 2 hours... it seems way too much to me) then miners would essentially not know what blocks their peers consider as valid at this time and might keep mining on the previous chain hoping others do so similarly. Essentially this would lead to many different competing forks that miners switch between as they grow and as blocks with various future timestamps are appended.
The reason we make the devil's advocate regarding breaking this 2-hour rule is to argue that using RTT is actually a viable alternative because the rule is not just "some rule" that miners happen to respect, but in fact everything would descend into chaos without it as miners would not be able to agree what amounts to lying and what not. So a majority of nodes do indeed need to agree on how much can a new block's timestamp differ from the current time. Also it is not clear to me immediately that miners have no incentive to break this rule so that's why we make that argument. If you already believe that they will follow real time no matter what then great! :)))

Section 5.5
Indeed, NEFDA-MTP-72 is really quite bad. We did not want to create the impression it's good at all. We were just trying to make the point that if the smoothing factor is not large in comparison to the MTP window size then this leads to similar oscillations problems. We were giving this configuration as a bad example, and for showing how to mitigate its problems by increasing the smoothing factor. We really like your simulations with stolen blocks but the space limitations of an academic paper kind of limited the depth of analysis we could go into. In my opinion simulating miner's behaviour is a whole paper on its own. In fact the model section of our paper could probably be more rigorously extended by proving the properties it has in a short paper as well.

Overall it seems like we are on the same page, which is very much expected as we studied your work a lot and found it really detailed, rigorous and insightful. Finally, I guess... Well Done! to both of us and especially to Jacob Eliosoff and Mark Lundeberg who saw the value in this algorithm long ago. The only question that remains is why is this not already used in many blockchains?

PS. The only "regret" I have about the paper is not placing enough emphasis on the importance of simulating various scenarios when configuring this EMA / ASERT / NEFDA. I think this should be THE algorithm of choice for most blockchains, provided they configure it accordingly after simulating scenarios that are relevant to their use case.

@zawy12
Copy link
Owner Author

zawy12 commented Jun 11, 2020

We agree SMA's plus miner motivation is a positive feedback situation. But then every DA that is not an RTT causes positive feedback. A major reason for having a long-window SMA is to minimize the variation that would affect miner motivation, so short solvetimes rolling out the back have little effect and are (usually) soon counteracted by long solvetimes. In short, a delay in estimating current hashrate plus miner motivation causes oscillations.

No one has ever stated a reason for 2 hours except for irrational fear over daylight savings time. I've discussed it with a lot with small coin devs and no one could think of a reason that 10 seconds would not work. Lamport's 1978 paper on clocks and his byzantine paper give the requirements for clocks in order to reach distributed consensus. If these had been followed, there would have never been a timestamp attack of any type. One rule is that future time limit needs to be significantly longer than propagation delays and a lot less than block time because block time is what's measuring the number of voters who have purchased their lottery ticket (each hash). Another rule is that timestamps must be sequential (no MTP needed).

My TSA article is all about RTTs and discusses effect of timestamps in detail. There are a lot of complex things. My conclusion is that the future time limit (FTL) should be very tight and that difficulty should not be affected hardly any by timestamps set at that limit. So my simple RTT at the top of the TSA article is what I believe is the best DA: a fast mild RTT riding on top of a slower DA. I decided chain work should be based on the slow DA w/o the RTT adjustment. If not, then lying to get an earlier timestamp to win chain work might occur, complicating things, possibly allowing an imaginative attack. If an RTT add-on does not affect chain work, then everyone may use a forwarded time if you don't have a tight timestamp. They would all agree to "push time to the FTL" constantly having timestamps that fall on top of the FTL, to the extent they don't think their blocks would be rejected for being too far forward. This could cause forks, which they don't want, so they would find a happy medium but always still pushing to closer to the FTL, so you might has well have nodes enforce it closer just so that time is more accurate.

Something to keep in mind for competing tips is that chain work is not sum(D), but
sum(D) * (N-1)/N
Emin et al have 2 articles that discuss successful attacks with around 35% HR and I wonder if it's because chain work is not being correctly measured. I have not read their newer paper yet. I keep looking for an e^x calculation that can be more accurate for estimating HR and chain work for a small number of blocks.

@crypto-perry
Copy link

crypto-perry commented Jun 11, 2020

Well the point is to have less positive feedback than negative feedback. That's why lwma and ewma/asert/nefda work well. Because the blocks that are in the past are weighted less so the positive feedback they amount to is diminished in time. The problem with SMA is both the fixed window and the equal weights.

I completely agree on the 2-hour rule. No idea why it is so long and I agree it could be diminished. I think the daylight saving time is a really weak argument nowadays. I think it's a good proposal to keep it shorter than ideal inter block time. We just have to make sure miners use good clocks but I think this is expected with current computers.

I'm not sure I understand why you would lie with an earlier timestamp if the algorithm is RTT. First of all you have to set the timestamp before starting to hash. And if you succeed to find a solution before that lower timestamp then that just means you really did do more work. Am I misunderstanding you? About tight FTL, absolutely, that's a very good reason to have short FTL. Even with short FTL I don't think we can ever get any of these blockchains to be a correct time-stamping server like the BTC whitepaper suggested so even if the blockchain time is not correct at all that's not such a serious problem in my opinion.

Good point about chain work! and thanks for the link, I'll look into it.

@zawy12
Copy link
Owner Author

zawy12 commented Jun 11, 2020

My conjecture that this is the perfect DA seems wrong (although it could still be in some sense the best possible). To fill in the gap where I hand-waved above, recall H and st are related by
x = rand(0,1) = e^(-st * H/D)
which means
st = -ln(x) * D / H

Let k = fraction of err in H_actual/H_expected
k = H_actual/H_expected - 1
i.e. k=0.1 means H is 10% higher than expected and thereby needs a 10% increase in D. We want a DA that does this:
D = prior_D * (1+k) = prev_D * H_actual/H_expected

Our DA ASERT aka NEFDA (without the smoothing factor) is
D = prior_D * e^(-st/T) / e^(-1)
I've speculated is doing this:
D = prior_D * x_observed / x_expected
I want D in terms of x instead of st, so I use the equation for st.
st = -ln(x) * prior_D / H_observed
T = prior_D / H_expected
This gives
st/T = -ln(x) * H_expected / H_observed = -ln(x)/(1+k)
where H_observed = H_actual.
Plugging this into ASERT:
D = prior_D * e^(ln(x)/(1+k)) / e^(-1)
x = rand(0,1) is equally probable in any range, so the expected (average) adjustment from our DA is
image
where Wolfram Alpha uses log() = ln().

In other words, our average correction is (after rearranging the integral result):

D = prior_D * e * (1+k)/(2+k)
D = prior_D * e / (1+H_expected/H_observed)

So the adjustment to prior_D is not the (1+k)=H_observed/H_expected I hoped for, but for small k it's e/2 =~ 1.36. If no adjustment is "needed", it adjusts by
D = prior_D * e/2
This means the avg block is targeting D = H_expected * T * e/2 but when taking the size of the adjustments into account it is targeting
D = H_expected * T

It does not adjust at all ( D = prior_D) for the average block if
H_observed / H_expected = 1/(1+k) = e-1
This can be seen by
image

So the average adjustment is none if k = 1/(e-1) - 1 = -0.42 = 42% low, i.e. if
H_observed / H_expected = 0.58
Again, the reason is that adjusting for the frequent fast (small-valued) solvetimes causes an overly large adjustment because they are in the denominator [at least as important is that fast blocks are adjusting more often]. We would like to adjust by H ~ 1/avg(st) but but we only have access to 1/st values for each block, so we're adjusting by avg(1/st) which is larger.

With the smoothing factor of M blocks is included, the adjustment to difficulty is
image

D = next_D * L/(L +1) * e^(1/M)
where L = M*(1+k)

@zawy12
Copy link
Owner Author

zawy12 commented Jun 14, 2020

I changed my chain work article to include ASERT / NEFDA as the estimator. It gives much more reasonable results that using sum(D)/sum(st) * (N-1)/N. In short, I use this DA to estimate the number of hashes by assuming a 2-tip chain split has 50% of the hashrate for each of the tips. This gives the tips an equal opportunity to win.

@jacob-eliosoff
Copy link

jacob-eliosoff commented Jun 16, 2020

@Dii14, and any others reading along - just FYI here are a couple of old threads on our old EMA DAAs that I reread when I'm trying to remind myself how they work. The specific algos (@dgenr8's wtema, my ema/ema2) aren't quite identical to ASERT/NEFDA (the one that is is my simpexp), but they have the same exponential-filter spirit and wtema is very elegantly implemented.

@zawy12
Copy link
Owner Author

zawy12 commented Jun 16, 2020

The best way to view wtema is to simplify the math:
next_target = prev_target*(1 + st/T/N -1/N)
next_diff = prev_diff / (1+st/T/N - 1/N)

For largish N, it's same as ASERT/NEFDA
next_diff = prev_diff * e^(-st/T/N + 1/N)
N = mean lifetime in blocks.
Std Dev in diff =~ 1/SQRT(2 * N)

ETH intuited an approximate inverse wtema via 1/(1-x) =~ 1+x.
next_diff = prev_diff * (1 - st/T/N + 1/N)

They saw the potential problem of a negative difficulty for large st, so they used a max() function to stop it. We rejected this version because of that potential. Their max() function has a drawback: it enables a 50% selfish miner to get I believe 174% of blocks in the time they would normally get 100%.

Digishield and Grin are an SMA inside a wtema:
next_target = avg_17_targets* (1 + avg_17_st/T/N - 1/N)
N = 4 Digi, N=2 N=3 Grin.

There is still the possibility of negative difficulty with wtema in BCH if using wtema is the only change made. It can happen if an attacker gets 2 blocks in a row to get around Amaury's median of 3 and sets his timestamps to the MTP+1 and if the previous 6 blocks were so slow such that N is not sufficiently large. For example N=72 and last 6 blocks took an average >12xT it could happen.

@tromp
Copy link

tromp commented Jul 23, 2020

Digishield and Grin are an SMA inside a wtema:
next_target = avg_17_targets* (1 + avg_17_st/T/N - 1/N)
N = 4 Digi, N=2 Grin.

Grin uses a DIFFICULTY_ADJUST_WINDOW of HOUR_HEIGHT = 60
and DIFFICULTY_DAMP_FACTOR = 3
The DAA function consensus::next_difficulty is at
https://github.com/mimblewimble/grin/blob/master/core/src/consensus.rs#L309-L356

Note that it adjusts average difficulty, not average target.
Grin enforces strictly increasing timestamps.
The difficulty chart at https://grinscan.net/charts shows there's still a fair bit of oscillation.

The main reason for preferring damped SMA over something like wtema is worry about timestamp manipulation; a sum of the past 60 solvetimes seems much harder to manipulate than just the last one.

I do like the simplicity of
next_diff = prev_diff / (1 + (st/T - 1) / N)
though. What choice of N would give an overall behaviour closet to Grin's DAA?
Would it be 60 * 3 ?

@zawy12
Copy link
Owner Author

zawy12 commented Jul 23, 2020

Since it uses difficulty, replace my avg target should be the harmonic mean. I was remembering wrong when I wrote N=2 Plug in N=3 and you should then see the equation I have is the same as yours.

@tromp
Copy link

tromp commented Jul 23, 2020

Are you saying N corresponds more to a dampening factor than to a block window?

@zawy12
Copy link
Owner Author

zawy12 commented Jul 23, 2020

N is the mean lifetime = ML in e^(-t/ML). The best alternate description might be calling it a filtering constant as jonathan toomim found this paper that shows EMA is a first order IIR filter. If ASERT is perfect and Grin is an approximation, then perfect Grin is

difficulty = avg_difficulty * e^(-avg_st/T/3 + 1/3)

Using avg difficulty seems to be not as proper as using average target, so it should be harmonic mean of the difficulties to be "perfect". But in actual modelling, being proper may not result in a better algorithm when hashrate may change a lot. The perfect SMA describe on my issue on all algorithms gets perfect solvetimes, but generally may not be as good as the more common SMA.

@zawy12
Copy link
Owner Author

zawy12 commented Jul 23, 2020

Using e^x = 1/e^-x = 1/(1-x) for small x gives one of the alternate EMA formulas. Then multiplying by N/N and T/T gives me:
difficulty = avg_difficulty * N * T / ((N-1) * T + avg_st)
My N is your damp factor N=3. I was remembering wrong when I wrote N=2.

@tromp
Copy link

tromp commented Jul 23, 2020

In wtema, you don't compute any average target or difficulty; you just multiply the previous one by some factor depending on last solvetime to get the next.
Multiplying the target by X is exactly equivalent to dividing the difficulty by X.

I prefer to use N to denote the half-live in blocks,
and then set alpha = 1 - 0.5^{1/N}, leading to a wtema update of

next_diff = prev_diff / (1 + alpha * (st/T - 1))

That seems to match the formulation of

next_target = prior_target / (IDEAL_BLOCK_TIME * alpha_recip)
next_target *= block_time + IDEAL_BLOCK_TIME * (alpha_recip - 1)

from https://read.cash/@jtoomim/bch-upgrade-proposal-use-asert-as-the-new-daa-1d875696

@zawy12
Copy link
Owner Author

zawy12 commented Jul 23, 2020

In my "perfect" grin above, the x in the above is often not small, so I'm not at all sure the "perfected Grin" is good.

Half-life is easier for people to understand, but I want to keep things in mean_lifetime to use the e^x approximation.
half_life = ln(2) * mean_lifetime. Alpha keeps it connected to EMA notation. Another benefit is that SMA and LWMA's mean lifetimes in terms of equivalent std dev of difficulties with ASERT/EMA is 1/2 their N. LWMA N=200 has its closest connection to ASERT/EMA mean_lifetime 100. Grin's mean lifetime should be 3x the averaging window, also divided by two. I know this from experiment with Digishield.

@tromp
Copy link

tromp commented Jul 23, 2020

Grin's mean lifetime should be 3x the averaging window, also divided by two.

Aha, so when I said

What choice of N would give an overall behaviour closet to Grin's DAA? Would it be 60 * 3 ?

I was only off by the division by 2, and N should be 60 * 3 / 2 = 90 ?

@zawy12
Copy link
Owner Author

zawy12 commented Jul 23, 2020

Yes, if you're averaging 60 blocks and "lifetime" units are in blocks as opposed to seconds. The equivalent std devs are with constant hashrate. I have a theory that the expected std dev in exchange price should be about 2x the std dev of difficulty under constant hashrate, as determined by mean lifetime.

@tromp
Copy link

tromp commented Jul 23, 2020

Thanks for elaborating. I should do some simulations for Grin's current DAA vs this simpler twema alternative to evaluate the pros and cons.

@zawy12
Copy link
Owner Author

zawy12 commented Jul 23, 2020

WTEMA is going to be better in all respects because grin is partially SMA. To make DA's work better and prevent exploits, make FTL < block_time (at least no more than 3x block time). Especially important to prevent all kinds of hassles is to make the consensus rule on timestamps > 0 from previous timestamp (override local time if necessary to make it +1 if previous block as long as it is < FTL). It's also possible to make the rule "do not let timestamp be more than the mean lifetime in the past, if your DA handles negatives well (not LWMA) to that level. If you use an MTP(11) =~ 6 blocks in past for the solvetime, make sure to use prior difficulty, not the one 6 blocks in the past (you should be doing that now with your difficulty average, a ~6-block shift ahead of the solvetimes) and also make the mean lifetime at least 20x the 6 block delay in solvetimes or it is asking for oscillations. Another option to deal with negatives and allow WTEMA to use most recent solvetime is to do the "solvetime conditioning" loop I have in LWMA for the MTP=11 blocks (going back 12 blocks to get initial timestamp). Allowing negative solvetimes or dealing with them in a different way than these 3 options is fraught with problems in all DAs.

@zawy12
Copy link
Owner Author

zawy12 commented Jul 27, 2020

@Dii14 ASERT/NEFDA is going to be used for BCH. It emits too many or too few blocks if hashrate consistently increases or decreases. The excess blocks it emits is M * ln(current_difficulty/start_difficulty). To correct for this, we make the block time that is input to ASERT a function of the long term solvetime error. It can actually be an absolute ASERT to adjust block time inside an ASERT that adjusts target, but we use the e^x = 1+x approximation. BCH will use a relative ASERT for the target.

Relative ASERT:
h = height
T=desired block time
t = timestamps
M = mean lifetime in blocks if each block is T seconds. M*T = mean lifetime in seconds.
k = additional dampening factor
target[h+1] = target[h] * e^( (t[h]-t[h-1] - T[h])/M/T)
T[h] = T * max(1/1.13, min(1.13, 1 - ((t[h]-t[0]) + T*h)/M/T/k ) )
The 1.13 clamp prevents a private mine attack from getting a much higher number of blocks than the usual 100% per time period. I am recommending M=144 and k=2016/M which is two weeks. BCH will probably use M=416 (a halflife of 416 * ln(2) = 288) and maybe a different k. But with M = 144 ASERT getting behind schedule is not a problem, soT[h] should not be needed.
Using e^x = 1+x approximation give the DA algorithm I'll recommend to people to avoid the e^x complexity. The truncation error is very small as I described in an update above.
target[h+1] = target[h] * (1 + (A - B) + B * (C-D)/k)
A = (t[h] - t[h-1] ) / 144 / T = prior observed block time in units of blocks, filtered
B = 1/144 = previous block expected time, filtered
C = (t[h] - t[0]) / 144 / T = past blocks observed, filtered
D = h/144 = past blocks expected, filtered

This is a typical PI controller of the form
next = prior * ( J + K * x + L * sum(xi))
with constants
J = offset
K = slope
L = undershoot
and x = error signal = observed solvetime minus desired solvetime.

ASERT just gave us a good handle on choosing the constants.

@zawy12
Copy link
Owner Author

zawy12 commented Jul 28, 2020

This is the derivation of how the nBits truncation error E (as a fraction of nBits) will cause relative ASERT to have an avg solvetime error that is E * H * 100% where H = mean_lifetime. For example, the max nBits fractional error in a certain range of nBits is E = 2^(-15). For a half life 288 this is 2^(-15) * 288/ln(2) = 1.3%

Relative ASERT with small nBits truncation error E:
T = target block time
H = mean lifetime ni blocks
T * H = mean lifetime in seconds
h = height
t[i] = timestamp
target[h+1] = target[h] * (1 - E) * e(^(t[h] - t[h-1] - T)/T/H)
For small E: (1-E) = e^(-E) so the error has this effect:
target[h+1] = target[h] * e(^(t[h] - t[h-1] - T)/T/H - E)
target[h+1] = target[h] * e(^(t[h] - t[h-1] -E*H*T- T)/T/H)

So the solvetime input to ASERT is thrown off E*H*T instead of just E*T which makes it have that much error.

@zawy12
Copy link
Owner Author

zawy12 commented Jul 28, 2020

People forget that ASERT gets behind in the number of blocks if hashrate continually increases because it has perfect solvetime if hashrate is oscillating around some average. Getting behind is how it increases difficulty.

This is the derivation of how many blocks ASERT will get ahead or behind as difficulty increases or decreases.
T = desired block time
t[h] = timestamp at height h
H = half life in time units of T (i.e. the number of blocks if they were all emitted in T seconds)
T*H = half life in seconds
Absolute ASERT:
target[N+1] = target[M] * 2^( ( t[N] - t[M-1] - T * (N-M+1) )/T/H)

Average solvetime is T * (1+F) where F is the fraction of delays.
target[N+1] = target[M] * 2^( ( T*(1+F) * (N-M) - T * (N-M+1) )/T/H)
Solve for F * (N-M) = number of blocks behind
F*(N-M) = 288 * log( target[M] / target[N+1] ) / log(2) - 1
F*(N-M) = 288 * log( difficulty[N+1] / difficulty[M] ) / log(2) -1

The -1 on the end is only because we are considering the drift at block N+1 relative to the timestamp at N. The drift at block N (which is what we really want) does not have the -1.

Example:
BTC has had about 6% faster solvetime than desired due to hashrate increasing faster than the DA. What would BCH's avg solvetime have been if it had used ASERT with half life 288 from the beginning? Difficulty is 350E9 and it started at D=1. The emission schedule wanted about 605,000 blocks emitted by now instead of 640,000 (5.7% too fast).

288 * log(350E9 / 1) / ln(2) / 605,000 = 1.8% too fast (too many blocks)

@tromp
Copy link

tromp commented Jul 28, 2020

I forked jtoomim's comparison tool in order to compare Grin's current DAA against some other damping factors and against wtema with various halving times.
Since Grin does't use nBits, I stripped it out, and just store precise targets in the states instead. See

https://github.com/tromp/difficulty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants