Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Percentile function to handle out of range percentiles #983

Merged
merged 5 commits into from
Aug 8, 2018
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
43 changes: 26 additions & 17 deletions tests/core/gas-strategies/test_time_based_gas_price_strategy.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ def _get_block_by_something(method, params):
'transactions': [
{'gasPrice': 70},
{'gasPrice': 60},
{'gasPrice': 60},
{'gasPrice': 60},
{'gasPrice': 15},
{'gasPrice': 5},
{'gasPrice': 50},
],
'miner': '0xA',
'timestamp': 100,
Expand All @@ -51,8 +56,6 @@ def _get_block_by_something(method, params):
'number': 3,
'parentHash': '0x0000000000000000000000000000000000000000000000000000000000000002',
'transactions': [
{'gasPrice': 10},
{'gasPrice': 50},
{'gasPrice': 100},
],
'miner': '0xC',
Expand All @@ -64,9 +67,6 @@ def _get_block_by_something(method, params):
'number': 2,
'parentHash': '0x0000000000000000000000000000000000000000000000000000000000000001',
'transactions': [
{'gasPrice': 50},
{'gasPrice': 70},
{'gasPrice': 100},
],
'miner': '0xB',
'timestamp': 40,
Expand Down Expand Up @@ -96,6 +96,15 @@ def _get_block_by_something(method, params):
{'gasPrice': 30},
{'gasPrice': 50},
{'gasPrice': 60},
{'gasPrice': 30},
{'gasPrice': 50},
{'gasPrice': 60},
{'gasPrice': 30},
{'gasPrice': 50},
{'gasPrice': 60},
{'gasPrice': 30},
{'gasPrice': 54},
{'gasPrice': 10000000000000000000000},
],
'miner': '0xA',
'timestamp': 0,
Expand All @@ -108,21 +117,21 @@ def _get_block_by_something(method, params):
'strategy_params,expected',
(
# 120 second wait times
(dict(max_wait_seconds=80, sample_size=5, probability=98), 61),
(dict(max_wait_seconds=80, sample_size=5, probability=90), 46),
(dict(max_wait_seconds=80, sample_size=5, probability=50), 40),
(dict(max_wait_seconds=80, sample_size=5, probability=98), 70),
(dict(max_wait_seconds=80, sample_size=5, probability=90), 23),
(dict(max_wait_seconds=80, sample_size=5, probability=50), 7),
# 60 second wait times
(dict(max_wait_seconds=60, sample_size=5, probability=98), 62),
(dict(max_wait_seconds=60, sample_size=5, probability=90), 55),
(dict(max_wait_seconds=60, sample_size=5, probability=50), 40),
(dict(max_wait_seconds=60, sample_size=5, probability=98), 92),
(dict(max_wait_seconds=60, sample_size=5, probability=90), 48),
(dict(max_wait_seconds=60, sample_size=5, probability=50), 7),
# 40 second wait times
(dict(max_wait_seconds=40, sample_size=5, probability=98), 63),
(dict(max_wait_seconds=40, sample_size=5, probability=90), 61),
(dict(max_wait_seconds=40, sample_size=5, probability=50), 40),
(dict(max_wait_seconds=40, sample_size=5, probability=98), 100),
(dict(max_wait_seconds=40, sample_size=5, probability=90), 81),
(dict(max_wait_seconds=40, sample_size=5, probability=50), 7),
# 20 second wait times
(dict(max_wait_seconds=20, sample_size=5, probability=98), 63),
(dict(max_wait_seconds=20, sample_size=5, probability=90), 62),
(dict(max_wait_seconds=20, sample_size=5, probability=50), 50),
(dict(max_wait_seconds=20, sample_size=5, probability=98), 100),
(dict(max_wait_seconds=20, sample_size=5, probability=90), 100),
(dict(max_wait_seconds=20, sample_size=5, probability=50), 34),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not reviewing these numbers, just trusting that you double-checked them. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a good idea to at least once check how accurate the target times are when compared to actual times on ropsten or even mainnet.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Also, as a quicker test, we could compare against https://ethgasstation.info/.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh yeah. Then I could compare to main net without spending any gas.

Copy link
Contributor Author

@dylanjw dylanjw Aug 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The estimates are comparable to ethgasstation.info, but I noticed that with longer acceptable wait times, our estimates are a bit lower. I ran some tests by sending a transaction back and forth on on mainnet using our estimate. I tried raising the percentile. Given the results, I am going to bump up the percentile to 20%, since it yielded the best results.

15% percentile

Expected Actual
<2m 2m
<5m 10m, 4m
<30m 13m

20% percentile

Expected Actual
<2m 1.5m
<5m 2m, 4m, 3m
<30m 15m

25%

Expected Actual
<2m 2m
<5m 2m
<30m 3m

40%

Expected Actual
<30m 7m

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid 👍

),
)
def test_time_based_gas_price_strategy(strategy_params, expected):
Expand Down
19 changes: 15 additions & 4 deletions tests/core/utilities/test_math.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@
strategies as st,
)

from web3.exceptions import (
InsufficientData,
)
from web3.utils.math import (
percentile,
)
Expand All @@ -29,13 +32,21 @@ def test_percentiles_out_of_one_hundred(p, expected):
assert percentile(values, p) == expected


def test_percentiles_with_single_values():
with pytest.raises(ValueError):
percentile([0], 1)
def test_percentiles_with_no_values():
with pytest.raises(InsufficientData):
percentile([], 1)


def test_percentiles_with_out_of_bounds_fractions():
assert 1 == percentile([1, 2, 3, 4], percentile=10)
assert 1 == percentile([1, 2, 3, 4], percentile=15)
assert 1 == percentile([1, 2, 3, 4], percentile=20)
assert 1 == percentile([1, 2, 3, 4], percentile=25)
assert 1 < percentile([1, 2, 3, 4], percentile=30)


@given(
values=st.lists(elements=st.integers(), min_size=3, max_size=200),
values=st.lists(elements=st.integers(), min_size=1, max_size=200),
p=st.integers(max_value=100, min_value=0))
def test_fuzz_test_percentiles(values, p):
if not values:
Expand Down
8 changes: 8 additions & 0 deletions web3/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,3 +99,11 @@ class NoABIEventsFound(AttributeError):
Raised when an ABI doesn't contain any events.
"""
pass


class InsufficientData(Exception):
"""
Raised when there are insufficient data points to
complete a calculation
"""
pass
7 changes: 6 additions & 1 deletion web3/gas_strategies/time_based.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
)

from web3.exceptions import (
InsufficientData,
ValidationError,
)
from web3.utils.math import (
Expand Down Expand Up @@ -60,11 +61,15 @@ def _aggregate_miner_data(raw_data):

for miner, miner_data in data_by_miner.items():
_, block_hashes, gas_prices = map(set, zip(*miner_data))
try:
price_percentile = percentile(gas_prices, percentile=15)
except InsufficientData:
price_percentile = min(gas_prices)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My instinct would be max() here: usually would prefer overpaying than getting the transaction stuck. Any reason to prefer min()?

Also, is it definitely impossible for gas_prices to be empty?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because min() is what was being calculated before the percentile function. In a private network with just a few transactions per block, wouldn't using the last block max(prices) potentially send the price on a continual upward trend?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose, but it could also trend down with min. Also maybe it doesn't matter on a private network? I don't feel strongly about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't you also include transactions from the last few blocks when there is not enough data available?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already looking across 120 blocks by default, but this is grouped by miner. So a few scenarios could cause a small data set:

  • we are near the beginning of the chain, with only a few blocks to sample
  • even with many blocks and transactions, maybe this particular miner has only mined a few blocks/transactions

yield MinerData(
miner,
len(set(block_hashes)),
min(gas_prices),
percentile(gas_prices, percentile=15))
price_percentile)


@to_tuple
Expand Down
11 changes: 9 additions & 2 deletions web3/utils/math.py
Original file line number Diff line number Diff line change
@@ -1,8 +1,13 @@
from web3.exceptions import (
InsufficientData,
)


def percentile(values=None, percentile=None):
"""Calculates a simplified weighted average percentile
"""
if values in [None, tuple(), []] or len(values) < 3:
raise ValueError("Expected a sequence of at least 3 integers, got {0}".format(values))
if values in [None, tuple(), []] or len(values) < 1:
raise InsufficientData("Expected a sequence of at least 1 integers, got {0}".format(values))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe got {0!r} would help show exactly what weird value was sent.

if percentile is None:
raise ValueError("Expected a percentile choice, got {0}".format(percentile))

Expand All @@ -11,6 +16,8 @@ def percentile(values=None, percentile=None):
rank = len(values) * percentile / 100
if rank > 0:
index = rank - 1
if index < 0:
return sorted_values[0]
else:
index = rank

Expand Down