Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percentile function to handle out of range percentiles #983

Merged
merged 5 commits into from
Aug 8, 2018

Conversation

dylanjw
Copy link
Contributor

@dylanjw dylanjw commented Jul 31, 2018

What was wrong?

Fixes #979

How was it fixed?

The percentile function requires at least 3 data points to make a calculation. The percentile now throws a custom exception that is caught and min() is used as an alternative in these cases.

Cute Animal Picture

image

@dylanjw dylanjw changed the title Update time based strategy to use min() with < 3 data points Update time based strategy to use min() when < 3 txs in block Jul 31, 2018
@dylanjw dylanjw force-pushed the percentile_low_data_points branch from 379eb7a to 605d136 Compare July 31, 2018 17:08
@dylanjw dylanjw force-pushed the percentile_low_data_points branch from 605d136 to ae718ba Compare July 31, 2018 18:09
Copy link
Collaborator

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd feel a lot better about this if there were a test showing what happens on an empty block, one with one transaction, and one with two.

try:
price_percentile = percentile(gas_prices, percentile=15)
except InsufficientData:
price_percentile = min(gas_prices)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My instinct would be max() here: usually would prefer overpaying than getting the transaction stuck. Any reason to prefer min()?

Also, is it definitely impossible for gas_prices to be empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because min() is what was being calculated before the percentile function. In a private network with just a few transactions per block, wouldn't using the last block max(prices) potentially send the price on a continual upward trend?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose, but it could also trend down with min. Also maybe it doesn't matter on a private network? I don't feel strongly about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you also include transactions from the last few blocks when there is not enough data available?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already looking across 120 blocks by default, but this is grouped by miner. So a few scenarios could cause a small data set:

  • we are near the beginning of the chain, with only a few blocks to sample
  • even with many blocks and transactions, maybe this particular miner has only mined a few blocks/transactions

@dylanjw dylanjw changed the title Update time based strategy to use min() when < 3 txs in block Percentile function to handle out of range percentiles Aug 1, 2018
@dylanjw dylanjw force-pushed the percentile_low_data_points branch from a2dac87 to cab6a5b Compare August 1, 2018 22:50
@dylanjw
Copy link
Contributor Author

dylanjw commented Aug 7, 2018

This is ready for a review. I found and resolved an error when a percentile falls outside the range of numbers. For example, the 15th percentile would break for <7 data points because 1/6 ~ 17%. Now when the percentile choice falls below the lowest value, the lowest value is returned. With this change the percentile function will work with a single data point.

Copy link
Collaborator

@carver carver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

if values in [None, tuple(), []] or len(values) < 3:
raise ValueError("Expected a sequence of at least 3 integers, got {0}".format(values))
if values in [None, tuple(), []] or len(values) < 1:
raise InsufficientData("Expected a sequence of at least 1 integers, got {0}".format(values))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe got {0!r} would help show exactly what weird value was sent.

(dict(max_wait_seconds=20, sample_size=5, probability=50), 50),
(dict(max_wait_seconds=20, sample_size=5, probability=98), 100),
(dict(max_wait_seconds=20, sample_size=5, probability=90), 100),
(dict(max_wait_seconds=20, sample_size=5, probability=50), 34),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not reviewing these numbers, just trusting that you double-checked them. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to at least once check how accurate the target times are when compared to actual times on ropsten or even mainnet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Also, as a quicker test, we could compare against https://ethgasstation.info/.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh yeah. Then I could compare to main net without spending any gas.

Copy link
Contributor Author

@dylanjw dylanjw Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The estimates are comparable to ethgasstation.info, but I noticed that with longer acceptable wait times, our estimates are a bit lower. I ran some tests by sending a transaction back and forth on on mainnet using our estimate. I tried raising the percentile. Given the results, I am going to bump up the percentile to 20%, since it yielded the best results.

15% percentile

Expected Actual
<2m 2m
<5m 10m, 4m
<30m 13m

20% percentile

Expected Actual
<2m 1.5m
<5m 2m, 4m, 3m
<30m 15m

25%

Expected Actual
<2m 2m
<5m 2m
<30m 3m

40%

Expected Actual
<30m 7m

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid 👍

@dylanjw dylanjw force-pushed the percentile_low_data_points branch from fb9cddd to 9ccc9d1 Compare August 8, 2018 21:24
@dylanjw dylanjw force-pushed the percentile_low_data_points branch from 9ccc9d1 to ce5f8c8 Compare August 8, 2018 22:48
@dylanjw dylanjw merged commit 5cc331f into ethereum:master Aug 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash in time based gas price strategies
3 participants