Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce MAPPING_BASE_COST from 10_000 to 1_500 #2564

Merged
merged 18 commits into from
Nov 13, 2024

Conversation

evanmarshall
Copy link
Contributor

Lower GET, GET_OR_USE, & CONTAINS finalize fees

The goal of this PR is to lower fees safely such that the impact would encourage more chain activity without impacting the safety of network.
Specifically, the PR only changes a single value: MAPPING_BASE_COST from 10_000 microcredits to 500 microcredits

Previous work

The most comphrensive work IMO is this analysis: https://github.com/iamalwaysuncomfortable/analysis/blob/main/aleo_finalize_opcode_benchmark_and_pricing_analysis.ipynb as part of this PR: #2281

Some key observations taken from this analysis:

  • SET & GET are the two most commonly used OPCodes
  • Most simple operations run in 1 to 5 µs and are priced at 500 microcredits
  • SET & GET have the same base cost of 10_000 credits with different per byte additional costs of 100 & 10 respectively.
  • The base assumption for the fee model is 1 second of single threaded runtime should cost 100 credits

New Benchmarks Approach

I could not find the actual benchmarks for SET/GET in the linked previous work but all of the other opcodes are included.
So I created these benchmarks for SET/GET . I decided not to include these benchmarks as part of this PR but it can be used again when revisiting the fee model and as other DB optimizations land.

These new benchmarks use rocksdb & specifically benchmark: finalize_store.update_key_value & finalize_store.get_value_speculative.
Although this is the core part of a SET & GET operation, other overhead exists which is not included in this benchmark but the overhead should be roughly equivalent between SET & GET.

There are probably signficant differences from running benchmarks and running on a live node under heavy load but some care has been taken to ensure realistic benchmarks including using unique keys for the gets & sets and using rocksdb instead of the test finalize store.

Results

Although priced similarly, GET & SET are extremely different in actual runtime.

For primitive types like a U64 a SET costs 11_200 and a GET costs 10_120. (A base cost of 10_000 and serialized, a U64 is 12 bytes)
The SET runtime for a U64 is 170µs and the GET runtime is 4.3µs

Benchmark Summary

  • Type: small_struct (422 bytes)

    • Set Time: ~440.30 µs
    • Get Time: ~79.75 µs
  • Type: large_struct (6456 bytes)

    • Set Time: ~4690 µs
    • Get Time: ~135.55 µs
  • Type: array_1x1 (43 bytes)

    • Set Time: ~185.21 µs
    • Get Time: ~4.21 µs
  • Type: array_32x32 (1407 bytes)

    • Set Time: ~1.23 ms
    • Get Time: ~17.41 µs
  • Type: U8

    • Set Time: ~165.74 µs
    • Get Time: ~4.27 µs
  • Type: U16

    • Set Time: ~169.60 µs
    • Get Time: ~4.48 µs
  • Type: U32

    • Set Time: ~154.07 µs
    • Get Time: ~4.45 µs
  • Type: U64

    • Set Time: ~168.84 µs
    • Get Time: ~4.34 µs
  • Type: U128

    • Set Time: ~170.26 µs
    • Get Time: ~4.49 µs

Conclusions

GET is currently priced similarly to a SET when the in many cases, there is a 40x difference in runtime.
We should lower the base price of GET to 500 to put more inline with other primitive operations.

The big outlier is structs which perhaps deserve their own pricing model but I wanted to keep this change as simple as possible.

Screenshot 2024-10-25 at 2 12 22 AM Screenshot 2024-10-25 at 2 10 39 AM

Safety

Despite this change leading to a more consistent but still imperfect pricing model, we should consider the potential impact.

As a validator, we run a 32-core, 64GB memory computer to power our validator. Our CPU utilization for the node in the last 7 days has stayed between 5% and 10% despite relatively heavy load. It seem at current network activity levels, we are unlikely to risk overloading machines.

Screenshot 2024-10-23 at 6 23 57 PM

If there are other concerns, I'm unaware of please share.

Impact

For many functions that use public state, we can expect to reduce the transaction fee by 35% to 60%. As the team behind the Leo Wallet, the most frequent compliant we hear is about the high fees.

To pick a couple of programs, I'm familiar with:

  • token_registry.aleo/transfer_public: 140_173 -> 92_673
  • pondo_protocol.aleo/deposit_public_as_signer: 477_484 -> 211_484

Since higher fees are still accepted, migration for downstream programs and applications isn't required, only highly recommended.

It will be difficult to raise fees in the future, so we should be aware that if we expect the runtime of GET to become more expensive over time, then we should perhaps be more cautious with how much we lower the GET base fees.

@vicsn
Copy link
Collaborator

vicsn commented Oct 25, 2024

As this would be the first PR which changes which transactions (and their fees) are deemed valid, we'll have to add some "migration" logic. Would defining it as follows suffice? The new pricing can apply from this height onwards. CC @raychu86 @d0cd

diff --git a/console/network/src/canary_v0.rs b/console/network/src/canary_v0.rs
index f0a973acd..5c3770789 100644
--- a/console/network/src/canary_v0.rs
+++ b/console/network/src/canary_v0.rs
@@ -147,6 +147,8 @@ impl Network for CanaryV0 {
     const INCLUSION_FUNCTION_NAME: &'static str = MainnetV0::INCLUSION_FUNCTION_NAME;
     /// The maximum number of certificates in a batch.
     const MAX_CERTIFICATES: u16 = 100;
+    /// The block height from which new consensus rules apply.
+    const CONSENSUS_V2_HEIGHT: u32 = 1_000; // TODO: adjust based on canary height.
     /// The network name.
     const NAME: &'static str = "Aleo Canary (v0)";
 
diff --git a/console/network/src/lib.rs b/console/network/src/lib.rs
index b0694d43a..483157e15 100644
--- a/console/network/src/lib.rs
+++ b/console/network/src/lib.rs
@@ -203,6 +203,8 @@ pub trait Network:
 
     /// The maximum number of certificates in a batch.
     const MAX_CERTIFICATES: u16;
+    /// The block height from which new consensus rules apply.
+    const CONSENSUS_V2_HEIGHT: u32;
 
     /// The maximum number of bytes in a transaction.
     // Note: This value must **not** be decreased as it would invalidate existing transactions.
diff --git a/console/network/src/mainnet_v0.rs b/console/network/src/mainnet_v0.rs
index 6aadaa3f3..1aaaecc9e 100644
--- a/console/network/src/mainnet_v0.rs
+++ b/console/network/src/mainnet_v0.rs
@@ -158,6 +158,8 @@ impl Network for MainnetV0 {
     const INCLUSION_FUNCTION_NAME: &'static str = snarkvm_parameters::mainnet::NETWORK_INCLUSION_FUNCTION_NAME;
     /// The maximum number of certificates in a batch.
     const MAX_CERTIFICATES: u16 = 16;
+    /// The block height from which new consensus rules apply.
+    const CONSENSUS_V2_HEIGHT: u32 = 3_000_000; // TODO: adjust based on mainnet height.
     /// The network name.
     const NAME: &'static str = "Aleo Mainnet (v0)";
 
diff --git a/console/network/src/testnet_v0.rs b/console/network/src/testnet_v0.rs
index fd793055a..0f42c7c90 100644
--- a/console/network/src/testnet_v0.rs
+++ b/console/network/src/testnet_v0.rs
@@ -147,6 +147,8 @@ impl Network for TestnetV0 {
     const INCLUSION_FUNCTION_NAME: &'static str = MainnetV0::INCLUSION_FUNCTION_NAME;
     /// The maximum number of certificates in a batch.
     const MAX_CERTIFICATES: u16 = 100;
+    /// The block height from which new consensus rules apply.
+    const CONSENSUS_V2_HEIGHT: u32 = 1_000; // TODO:  adjust based on testnet height.
     /// The network name.
     const NAME: &'static str = "Aleo Testnet (v0)";

@evanmarshall
Copy link
Contributor Author

As this would be the first PR which changes which transactions (and their fees) are deemed valid

Do we have to add a migration? All previously valid transactions will still be valid.

@vicsn
Copy link
Collaborator

vicsn commented Oct 25, 2024

As this would be the first PR which changes which transactions (and their fees) are deemed valid

Do we have to add a migration? All previously valid transactions will still be valid.

Good point. I still think it would be good for:

  • Defense in depth. I'm wondering whether this could makerejected transactions could potentially be considered valid (don't know the potential reasons for rejection by heart), and I want to make reviewers life easy by not having to think about this.
  • Avoiding a synchronous restart. If only some validators have the upgrade, it could lead to a fork.

@raychu86
Copy link
Contributor

Good point. I still think it would be good for:

  • Defense in depth. I'm wondering whether this could makerejected transactions could potentially be considered valid (don't know the potential reasons for rejection by heart), and I want to make reviewers life easy by not having to think about this.
  • Avoiding a synchronous restart. If only some validators have the upgrade, it could lead to a fork.
  1. Technically there is no requirement to add additional migration logic for pricing reductions because previously, txes that have too few fees either never make it to DAG serialization or are aborted by honest validators. (Note that is not the case for pricing increases, because it could invalidate accepted txes.). However, because mainnet is live and we can not expect a coordinated restart of all nodes, we need to consider adding logic s.t. nodes automatically roll into the new change once it passes a future block X that we specify.

  2. Any consensus change (which this is) should increment the snarkOS message versions, so nodes on older versions do not connect to nodes on the new version. This is our defense against forking, however it's a bit more nuanced.

TLDR: If we do a coordinated restart, then we can do without extra migration logic. Otherwise, migration logic may be necessary to keep the network rolling

@raychu86
Copy link
Contributor

In general, since I assume there will be a larger effort to reduce fees overall other parts of finalize/storage they should be implemented together to avoid multiple migration efforts. Additionally, I would advise a fee reduction discussion take place as an ARC so there is more weigh in regarding pain points (and to discuss the migration considerations as well).

@evanmarshall
Copy link
Contributor Author

TLDR: If we do a coordinated restart, then we can do without extra migration logic. Otherwise, migration logic may be necessary to keep the network rolling

In any case, we should plan on a synchronous upgrade as well to minimize any chance of a chain halt because during an async upgrade, it's likely neither version will have quorum. If we plan on a synchronous upgrade, do you think the migration is necessary? I'm happy to add it but I'm not sure if it's preferred to keep that branching code around forever.

In general, since I assume there will be a larger effort to reduce fees overall other parts of finalize/storage they should be implemented together to avoid multiple migration efforts. Additionally, I would advise a fee reduction discussion take place as an ARC so there is more weigh in regarding pain points (and to discuss the migration considerations as well).

I absolutely think we should examine every part of fees in general but I strongly suggest we implement this change immediately while we consider other larger changes. I don't think it requires an ARC as none of the mechanisms, logic, or assumptions are changing. The only part that's changing is fixing the inconsistency of pricing GET/GET_OR_USE/CONTAINS.

@vicsn
Copy link
Collaborator

vicsn commented Oct 28, 2024

I agree with Evan that given this change does not change assumptions and is urgent, it is unlikely to warrant an ARC. Future bigger fee adjustment proposals likely will require broader discussion. But its up to Foundation to draw the line. CC @zkxuerb

I agree with Ray there should be gateway EVENT version bump. Want to add this in @evanmarshall ? Also removes the need for migration logic.

A router MESSAGE version bump would force a coordinated restart for all the clients in the ecosystem if they want to continue functioning, which sounds very difficult to coordinate. Other approaches would be:

  1. Use migration logic, which gives clients a larger window to update.
  2. Don't use migration logic nor do a MESSAGE version bump. Increasing the risk that old clients will temporarily relay transactions with incorrect fees (which is unfortunate but acceptable) or risk forking because of syncing with incorrect blocks (I did not analyse in depth yet).

@d0cd
Copy link
Collaborator

d0cd commented Oct 31, 2024

Do we have data (either in this PR or prior) that helps us understand:

  1. How the performance of get varies as the size of the data type increases?
  2. For gets of data type of varying sizes, how does the performance vary as the number of entries in the mapping increases?

I believe the current benchmarks are on small-sized gets, which runs us the risk of missing a DOS vector at larger sizes.

Copy link
Contributor

@iamalwaysuncomfortable iamalwaysuncomfortable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@evanmarshall @d0cd @vicsn @raychu86 @zkxuerb

Here is a set of benchmarks of multiple types run roughly February of this year using this branch of SnarkVM.

According to these results, existing entries + datasize had a slight effect on GET time, but at most by 10k nanoseconds. This should be re-run independently to verify that this is the case with the latest branch of SnarkVM.

opcode type bytes rt_nanosecs rt_secs #_entries
contains field 32 16684.0833 0.000016684083   100000
contains u64 8 19199.46656 0.000019199467   500000
contains array,field,length,2,depth,0 64 21344.0327 0.000021344033   1000000
contains array,field,length,8,depth,1 2048 20746.66533 0.000020746665   100000
contains array,field,length,8,depth,0 256 18655.01273 0.000018655013   1
contains group 64 19559.88021 0.000019559880   1000000
contains array,field,length,2,depth,1 128 19795.68546 0.000019795685   100000
contains array,field,length,2,depth,2 256 18790.78975 0.000018790790   1
contains array,field,length,4,depth,1 512 19318.90495 0.000019318905   50000
contains array,field,length,16,depth,1 8192 18548.67723 0.000018548677   100000
contains array,field,length,16,depth,0 512 21658.67046 0.000021658670   1000000
contains array,field,length,4,depth,2 2048 19744.94803 0.000019744948   100000
contains array,field,length,4,depth,0 128 19628.12796 0.000019628128   500000
contains array,field,length,8,depth,2 16384 19507.82929 0.000019507829   100000
contains array,field,length,16,depth,2 131072 17602.99657 0.000017602997   1
get array,field,length,2,depth,0 64 17632.54062 0.000017632541   1
get array,field,length,16,depth,1 8192 19408.99476 0.000019408995   10000
get field 32 17205.72254 0.000017205723   10000
get array,field,length,8,depth,2 16384 18760.94211 0.000018760942   1
get array,field,length,4,depth,0 128 18106.38733 0.000018106387   100000
get array,field,length,8,depth,1 2048 19963.12907 0.000019963129   50000
get array,field,length,2,depth,2 256 20535.13759 0.000020535138   500000
get array,field,length,16,depth,0 512 22155.52666 0.000022155527   1000000
get array,field,length,4,depth,1 512 23084.59778 0.000023084598   1000000
get array,field,length,4,depth,2 2048 21478.06992 0.000021478070   500000
get array,field,length,8,depth,0 256 18784.7789 0.000018784779   100000
get array,field,length,2,depth,1 128 18501.47539 0.000018501475   1
get group 64 18667.25578 0.000018667256   1
get u64 8 19378.57399 0.000019378574   500000
get array,field,length,16,depth,2 131072 18154.70924 0.000018154709   1
get.or.use array,field,length,4,depth,1 512 24943.32693 0.000024943327   50000
get.or.use array,field,length,2,depth,1 128 21589.8717 0.000021589872   1000000
get.or.use array,field,length,2,depth,0 64 18829.94789 0.000018829948   200000
get.or.use u64 8 18567.80415 0.000018567804   10000
get.or.use array,field,length,4,depth,2 2048 25969.41762 0.000025969418   1000000
get.or.use field 32 19384.96555 0.000019384966   500000
get.or.use array,field,length,4,depth,0 128 18831.95341 0.000018831953   200000
get.or.use group 64 20138.47141 0.000020138471   100000
get.or.use array,field,length,8,depth,0 256 19632.41775 0.000019632418   200000
get.or.use array,field,length,16,depth,0 512 19811.54192 0.000019811542   200000
get.or.use array,field,length,8,depth,1 2048 23602.38492 0.000023602385   1000
get.or.use array,field,length,2,depth,2 256 22148.29278 0.000022148293   200000
get.or.use array,field,length,8,depth,2 16384 55572.94982 0.000055572950   50000
get.or.use array,field,length,16,depth,1 8192 34934.46167 0.000034934462   200000
get.or.use array,field,length,16,depth,2 131072 179140.3925 0.000179140392   1

@evanmarshall
Copy link
Contributor Author

I ran a subset of the benchmarks provided by @iamalwaysuncomfortable on an updated snarkVM branch: https://github.com/demox-labs/snarkVM/tree/benches/updated-mappings

These benchmarks were run on an Apple M2 Max. I noticed a pretty big difference (+- 50%) in time depending on what I had open/running on my computer. Do we know if the original measurements were taken on a specific aws ec2 instance or what machine under what conditions they were run under?

When I ran the benchmarks and nothing else, I got:

get_field_1_entries_rocksdb
                        time:   [3.2964 µs 3.3324 µs 3.3700 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benching set_field_1_entries_rocksdb
set_field_1_entries_rocksdb
                        time:   [142.65 µs 145.04 µs 147.84 µs]
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

Benching get_field_1000000_entries_rocksdb
get_field_1000000_entries_rocksdb
                        time:   [4.5343 µs 4.5907 µs 4.6217 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) low mild

Benching set_field_1000000_entries_rocksdb
set_field_1000000_entries_rocksdb
                        time:   [146.09 µs 148.43 µs 150.67 µs]

Benching get_u64_1_entries_rocksdb
get_u64_1_entries_rocksdb
                        time:   [4.7081 µs 4.7722 µs 4.8537 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benching set_u64_1_entries_rocksdb
set_u64_1_entries_rocksdb
                        time:   [128.64 µs 130.35 µs 134.57 µs]
Found 2 outliers among 10 measurements (20.00%)
  2 (20.00%) high severe

Benching get_u64_1000000_entries_rocksdb
get_u64_1000000_entries_rocksdb
                        time:   [5.6874 µs 5.7532 µs 5.8719 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benching set_u64_1000000_entries_rocksdb
set_u64_1000000_entries_rocksdb
                        time:   [132.95 µs 136.37 µs 140.74 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benching get_field_array_depth_2_length_8_1_entries_rocksdb
get_field_array_depth_2_length_8_1_entries_rocksdb
                        time:   [5.0197 µs 5.8679 µs 6.7498 µs]
Found 3 outliers among 10 measurements (30.00%)
  1 (10.00%) low mild
  2 (20.00%) high severe

Benching set_field_array_depth_2_length_8_1_entries_rocksdb
set_field_array_depth_2_length_8_1_entries_rocksdb
                        time:   [12.715 ms 12.787 ms 12.855 ms]

Some Observations:

  • How full the mapping is does matter but not by a lot.
  • Set grows more expensive faster per byte than the 100 mc vs 10 mc currently in the code implies.
  • If we use lto=true in snarkVM release profile, we can get significant performance improvements.

@d0cd
Copy link
Collaborator

d0cd commented Nov 6, 2024

Looking at the benchmarks for get, it seems like a random index is used.
Wouldn't this result in a DB access that is likely to miss, meaning that we're not measuring the full cost of an access?
Furthermore, it looks like we only benching on large value types, we should also be benching for large key types.

@evanmarshall
Copy link
Contributor Author

The benchmarks I created in the original comment ensured that each get hit a value that was previously set.

I'll add some benchmarks with different key sizes though. I think that's a great idea to test.

@iamalwaysuncomfortable
Copy link
Contributor

iamalwaysuncomfortable commented Nov 8, 2024

@evanmarshall @raychu86 @d0cd @vicsn

Here are benchmarks of values with large keys. Analysis to follow.

Mappings Benched

mapping m_field_array_depth_2_length_4:
    key as [[[field; 4u32]; 4u32]; 4u32].public;
    value as [[[field; 4u32]; 4u32]; 4u32].public;

mapping m_field_array_depth_2_length_8:
    key as [[[field; 8u32]; 8u32]; 8u32].public;
    value as [[[field; 8u32]; 8u32]; 8u32].public;

Setup Notes

  1. RocksDB mappings were benched
  2. All keys were benched with keys randomly selected from the available set of keys that were used on insert of the data into the RocksDB Mappings.
  3. Inserts of m_field_array_depth_2_length_8: to 1_000_000 entries in serial took a full 13 hours

Benchmark Code

benches/enhanced-mappings

Results

Mappings with Large Keys & Values

Mapping Key Mapping Value Opcode Number of RocksDB Entries Avg µs OS Chip RAM
[[[field; 4u32]; 4u32]; 4u32] [[[field; 4u32]; 4u32]; 4u32] contains 1 39.714 OSX Sonoma 14.5 M1 MAX 64 GB
[[[field; 4u32]; 4u32]; 4u32] [[[field; 4u32]; 4u32]; 4u32] get 1 50.166 OSX Sonoma 14.5 M1 MAX 64 GB
[[[field; 4u32]; 4u32]; 4u32] [[[field; 4u32]; 4u32]; 4u32] get.or.use 1 59.001 OSX Sonoma 14.5 M1 MAX 64 GB
[[[field; 4u32]; 4u32]; 4u32] [[[field; 4u32]; 4u32]; 4u32] contains 1_000_000 95.158 µs OSX Sonoma 14.5 M1 MAX 64 GB
[[[field; 4u32]; 4u32]; 4u32] [[[field; 4u32]; 4u32]; 4u32] get 1_000_000 112.49 OSX Sonoma 14.5 M1 MAX 64 GB
[[[field; 4u32]; 4u32]; 4u32] [[[field; 4u32]; 4u32]; 4u32] get.or.use 1_000_000 129.89 OSX Sonoma 14.5 M1 MAX 64 GB
[[[field; 4u32]; 8u32]; 8u32] [[[field; 8u32]; 8u32]; 4u32] contains 1 220.94 OSX Sonoma 14.5 M1 MAX 64 GB
[[[field; 8u32]; 8u32]; 8u32] [[[field; 8u32]; 8u32]; 4u32] get 1 269.33 OSX Sonoma 14.5 M1 MAX 64 GB
[[[field; 4u32]; 8u32]; 8u32] [[[field; 8u32]; 8u32]; 4u32] get.or.use 1 318.92 OSX Sonoma 14.5 M1 MAX 64 GB
[[[field; 4u32]; 8u32]; 8u32] [[[field; 8u32]; 8u32]; 4u32] contains 1_000_000 474.99 OSX Sonoma 14.5 M1 MAX 64 GB
[[[field; 8u32]; 8u32]; 8u32] [[[field; 8u32]; 8u32]; 4u32] get 1 _000_000 645.68 OSX Sonoma 14.5 M1 MAX 64 GB
[[[field; 4u32]; 8u32]; 8u32] [[[field; 8u32]; 8u32]; 4u32] get.or.use 1_000_000 673.76 OSX Sonoma 14.5 M1 MAX 64 GB

Atomic Aleo Instructions (field, u64)

Mapping Key Mapping Value Opcode Number of RocksDB Entries Avg µs OS Chip RAM
field field contains 1 6.6172 µs OSX Sonoma 14.5 M1 MAX 64 GB
field field contains 1_000_000 18.352 µs OSX Sonoma 14.5 M1 MAX 64 GB
field field get 1 6.7656 µs OSX Sonoma 14.5 M1 MAX 64 GB
field field get 1_000_000 20.799 µs OSX Sonoma 14.5 M1 MAX 64 GB
field field get.or_use 1 7.6780 µs OSX Sonoma 14.5 M1 MAX 64 GB
field field get.or_use 1_000_000 19.735 µs OSX Sonoma 14.5 M1 MAX 64 GB
u64 u64 contains 1 8.8390 µs OSX Sonoma 14.5 M1 MAX 64 GB
u64 u64 contains 1_000_000 16.045 µs OSX Sonoma 14.5 M1 MAX 64 GB
u64 u64 get 1 9.4391 µs OSX Sonoma 14.5 M1 MAX 64 GB
u64 u64 get 1_000_000 14.766 µs OSX Sonoma 14.5 M1 MAX 64 GB
u64 u64 get.or_use 1 13.608 µs OSX Sonoma 14.5 M1 MAX 64 GB
u64 u64 get.or_use 1_000_000 16.849 µs OSX Sonoma 14.5 M1 MAX 64 GB

Raw Results

Large Key & Value Types

[[[field; 4u32]; 4u32]; 4u32] - 1 entry

contains_field_array_depth_2_length_4_1_entries_rocksdb
                        time:   [33.617 µs 39.714 µs 48.348 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

Benching get_field_array_depth_2_length_4_1_entries_rocksdb
get_field_array_depth_2_length_4_1_entries_rocksdb
                        time:   [49.137 µs 50.166 µs 51.037 µs]

Benching get.or_use_field_array_depth_2_length_4_1_entries_rocksdb
get.or_use_field_array_depth_2_length_4_1_entries_rocksdb
                        time:   [55.262 µs 59.001 µs 63.156 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

[[[field; 4u32]; 4u32]; 4u32] - 1_000_000 entries

Benching contains_field_array_depth_2_length_4_1000000_entries_rocksdb
contains_field_array_depth_2_length_4_1000000_entries_rocksdb
                        time:   [89.188 µs 95.158 µs 101.90 µs]

Benching get_field_array_depth_2_length_4_1000000_entries_rocksdb
get_field_array_depth_2_length_4_1000000_entries_rocksdb
                        time:   [105.67 µs 112.49 µs 124.89 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

Benching get.or_use_field_array_depth_2_length_4_1000000_entries_rocksdb
get.or_use_field_array_depth_2_length_4_1000000_entries_rocksdb
                        time:   [117.60 µs 129.89 µs 141.77 µs]

[[[field; 8u32]; 8u32]; 8u32] - 1 entry

contains_field_array_depth_2_length_8_1_entries_rocksdb
                        time:   [205.45 µs 220.94 µs 236.34 µs]
                        change: [-13.076% -5.8931% +1.6257%] (p = 0.17 > 0.05)
                        No change in performance detected.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) low severe
  1 (10.00%) high severe

Benching get_field_array_depth_2_length_8_1_entries_rocksdb
get_field_array_depth_2_length_8_1_entries_rocksdb
                        time:   [258.91 µs 269.33 µs 286.08 µs]
                        change: [-26.703% -23.411% -19.214%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benching get.or_use_field_array_depth_2_length_8_1_entries_rocksdb
get.or_use_field_array_depth_2_length_8_1_entries_rocksdb
                        time:   [292.21 µs 318.92 µs 355.19 µs]
                        change: [+31.223% +44.052% +58.086%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

[[[field; 8u32]; 8u32]; 8u32] - 1_000_000 entries

contains_field_array_depth_2_length_8_1000000_entries_rocksdb
                        time:   [411.35 µs 474.99 µs 570.07 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benching get_field_array_depth_2_length_8_1000000_entries_rocksdb
get_field_array_depth_2_length_8_1000000_entries_rocksdb
                        time:   [548.96 µs 645.68 µs 712.91 µs]

Benching get.or_use_field_array_depth_2_length_8_1000000_entries_rocksdb
get.or_use_field_array_depth_2_length_8_1000000_entries_rocksdb
                        time:   [622.53 µs 673.76 µs 721.37 µs]

Atomic Key & Value Types

field 1 - entry

contains_field_1_entries_rocksdb
                        time:   [5.9539 µs 6.6172 µs 7.3990 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benching get_field_1_entries_rocksdb
get_field_1_entries_rocksdb
                        time:   [6.4490 µs 6.7656 µs 7.0782 µs]

Benching get.or_use_field_1_entries_rocksdb
get.or_use_field_1_entries_rocksdb
                        time:   [6.8831 µs 7.6780 µs 9.3108 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

field - 1_000_000 entries

contains_field_1000000_entries_rocksdb
                        time:   [15.130 µs 18.352 µs 21.536 µs]

Benching get_field_1000000_entries_rocksdb
get_field_1000000_entries_rocksdb
                        time:   [16.891 µs 20.799 µs 25.687 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benching get.or_use_field_1000000_entries_rocksdb
get.or_use_field_1000000_entries_rocksdb
                        time:   [15.632 µs 19.735 µs 25.047 µs]

u64 - 1 entry

Mapping Name: m_u64, Mapping Type: u64, Mapping Length: 0
Inserting key #0 into m_u64...
Benching contains_u64_1_entries_rocksdb
contains_u64_1_entries_rocksdb
                        time:   [7.9271 µs 8.8390 µs 9.7301 µs]

Benching get_u64_1_entries_rocksdb
get_u64_1_entries_rocksdb
                        time:   [7.8072 µs 9.4391 µs 11.672 µs]

Benching get.or_use_u64_1_entries_rocksdb
get.or_use_u64_1_entries_rocksdb
                        time:   [10.476 µs 13.608 µs 16.395 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

u64 - 1_000_000 entries

contains_u64_1000000_entries_rocksdb
                        time:   [14.524 µs 16.045 µs 17.262 µs]

Benching get_u64_1000000_entries_rocksdb
get_u64_1000000_entries_rocksdb
                        time:   [12.859 µs 14.766 µs 17.214 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benching get.or_use_u64_1000000_entries_rocksdb
get.or_use_u64_1000000_entries_rocksdb
                        time:   [13.454 µs 16.849 µs 23.615 µs]
Found 2 outliers among 10 measurements (20.00%)
  1 (10.00%) high mild
  1 (10.00%) high severe

Copy link
Contributor

@iamalwaysuncomfortable iamalwaysuncomfortable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As requested by @d0cd benchmarks of large keys were performed of get, get.or_use and contains

Analysis of the data

The worst case seen was for this mapping (keys and values of 2048 bytes) where access times were in the range of 645.68 µs or 0.64568 ms.

mapping m_field_array_depth_2_length_8:
    key as [[[field; 8u32]; 8u32]; 8u32].public;
    value as [[[field; 8u32]; 8u32]; 8u32].public;

Pricing at the 100 credits per 1 second of runtime standard, this would lead to a price of 64_568 microcredits that this call should be priced at if the above benchmarks are to be believed.

Initial Reasoning about the Results

Doing a napkin correlation with the results posted above, it seems that large key sizes ARE a significant contributor to the GET speeds and contribute moreso to the speed of access than large values do. Reasoning about this knowing LSM-trees are data structure behind RocksDB storage, it's likely that large keys create serious write and read amplification over much smaller key sizes.

Impacts

So what does this mean for pricing? We do already price per byte of the Key, so this byte size would be accounted for in the pricing as contains, get, get.or_use pricing is priced with the formula MAPPING_BASE_COST + KEY_BYTES*MAPPING_PER_BYTE_COST.

Thus, @evanmarshall @fulltimemike several explicit expectation tests of large key sizes to ENSURE this is being accounted in the pricing for should be done.

However, the question is BASE cost here and it seems that for smaller-byte-keys basal runtime is ~0.000017 seconds. Multiplied by 100 credits per runtime and converting that to microcredits (i.e. 0.000017100 credits1_000_000) = 1700. Thus it seems to me that the base price might be priced anywhere from 1_000 to 2_000 credits.

The other major question is what will happen one we lower the price. It's likely that GET usage will go up and so our worry won't be how long each individual call takes, it'll be the fact that there are many many more GETS on the network. To account for this, perhaps the compromise is indeed the pricing range of 1_000 and 2_000.

Ultimately this is a tradeoff analysis between inviting people to use the chain for interesting applications and safety, and so I'd invite everyone else involved in this discussion to comment on this tradeoff.

@d0cd
Copy link
Collaborator

d0cd commented Nov 8, 2024

@iamalwaysuncomfortable thanks for adding the additional data points!
I would agree that the base cost should be around 1000 to 2000.
The reason is that gets and contains are approximately ~(2-3) times as slow when the mapping is "full" (roughly 1M entries). This is assuming that we believe that 1M entries is a reasonable representation of a production-sized mapping.

@raychu86
Copy link
Contributor

raychu86 commented Nov 8, 2024

Great analysis @iamalwaysuncomfortable!

In general, I would have liked a larger effort for repricing overall costs via an ARC rather than one-off PRs; especially since it's clear that accurate pricing is more of a moving target based on current network assumptions.

If there is a strong push to land this PR in the next release, I would advocate for a more conservative change as suggested by @iamalwaysuncomfortable and @d0cd. Pricing down in the future is much easier to do than pricing up.

@evanmarshall
Copy link
Contributor Author

We can do a more conservative approach to reducing the fee.

I do think we should agree on a standard machine for running benchmarks. I'd suggest a c7a.8xlarge from AWS (32 core, 64GB memory) as it slightly undershoots the snarkOS client recommendations.

Running the benchmarks on this machine returns much lower numbers:

Mapping Name: m_field, Mapping Type: field, Mapping Length: 0
Inserting key #0 into m_field...
Benching contains_field_1_entries_rocksdb
contains_field_1_entries_rocksdb
                        time:   [3.4270 µs 3.4316 µs 3.4360 µs]

Benching get_field_1_entries_rocksdb
get_field_1_entries_rocksdb
                        time:   [3.6822 µs 3.6882 µs 3.6956 µs]

Benching get.or_use_field_1_entries_rocksdb
get.or_use_field_1_entries_rocksdb
                        time:   [3.8189 µs 3.8258 µs 3.8350 µs]
Benching contains_field_1000000_entries_rocksdb
contains_field_1000000_entries_rocksdb
                        time:   [13.748 µs 13.763 µs 13.787 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high severe

Benching get_field_1000000_entries_rocksdb
get_field_1000000_entries_rocksdb
                        time:   [14.028 µs 14.058 µs 14.087 µs]

Benching get.or_use_field_1000000_entries_rocksdb
get.or_use_field_1000000_entries_rocksdb
                        time:   [14.312 µs 14.329 µs 14.343 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Mapping Name: m_u64, Mapping Type: u64, Mapping Length: 0
Benching contains_u64_1_entries_rocksdb
contains_u64_1_entries_rocksdb
                        time:   [4.5086 µs 4.5123 µs 4.5142 µs]

Benching get_u64_1_entries_rocksdb
get_u64_1_entries_rocksdb
                        time:   [4.7126 µs 4.7198 µs 4.7296 µs]

Benching get.or_use_u64_1_entries_rocksdb
get.or_use_u64_1_entries_rocksdb
                        time:   [4.8637 µs 4.8681 µs 4.8720 µs]
contains_u64_1000000_entries_rocksdb
                        time:   [11.210 µs 11.219 µs 11.234 µs]
Found 1 outliers among 10 measurements (10.00%)
  1 (10.00%) high mild

Benching get_u64_1000000_entries_rocksdb
get_u64_1000000_entries_rocksdb
                        time:   [11.369 µs 11.387 µs 11.413 µs]

Benching get.or_use_u64_1000000_entries_rocksdb
get.or_use_u64_1000000_entries_rocksdb
                        time:   [11.831 µs 11.842 µs 11.859 µs]

Copy link
Collaborator

@vicsn vicsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thank you for adding the tests!

Some CI failing, e.g. due to clippy.
Others failing as expected due to CI resource issues, if you think some of those may be impacted you can run the tests locally.

synthesizer/src/vm/execute.rs Outdated Show resolved Hide resolved
synthesizer/process/src/cost.rs Outdated Show resolved Hide resolved
synthesizer/src/vm/execute.rs Outdated Show resolved Hide resolved
synthesizer/process/src/cost.rs Outdated Show resolved Hide resolved
synthesizer/src/vm/execute.rs Show resolved Hide resolved
@raychu86
Copy link
Contributor

Have any devnets been run with the --test feature + tx generation?

We need to simulate the migration for fee requirements s.t.:

  1. the transactions with the original fee amounts are valid before and after the V2 consensus block height.
  2. The transactions with the reduced fee amount are valid only after V2 consensus block height.

Copy link
Contributor

@iamalwaysuncomfortable iamalwaysuncomfortable left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I would have liked a larger effort for repricing overall costs via an ARC rather than one-off PRs; especially since it's clear that accurate pricing is more of a moving target based on current network assumptions.

I echo this sentiment. I think another Arc needs to be introduced for a standard migration path as well. My fear about versioned constants is that they're ultimately a bit of a hardcoded migration path and therefore technical debt. However we don't have a better answer right now and so this is what we proceed with.

I believe the current one-off price migration is worthy to include in testnet and canary testing since it's a core opcode.

We should start preparing some work on a longer term migration (and potentially work on true Ratifications?) path that standardizes where nodes and other consumers of SnarkVM check for migration information and what constant sets should be provided.

Something like check_migration_paths that checks perhaps an enum of potential change areas, change timing (via block height) and handles the response to those.

Secondly, I do think a next immediate priority AFTER this migration is start examining price structures generally. We do want to make the chain accessible for builders to build applications with the chain, but significant analysis of what it means for pricing to be accessible and what a pricing model would be for encouraging this while discouraging chain abuse is key.

@zosorock zosorock added enhancement New feature or request v1.1.4 Canary release v1.1.4 and removed v1.1.4 Canary release v1.1.4 labels Nov 12, 2024
@raychu86
Copy link
Contributor

raychu86 commented Nov 12, 2024

Have any devnets been run with the --test feature + tx generation?

I ran a quick Devnet with the following instructions:

  1. Use this PR's imports in snarkOS git = "https://github.com/demox-labs/snarkVM.git" rev = "a5847df"
  2. Add "test" to snarkVM features.
  3. Run cargo install --path .
  4. Run devnet with ./devnet.sh on testnet
  5. Generate and send transactions.
  6. Observe sent transactions and the validator generated transactions to monitor fees.

Fees from transactions generated after block 10 did seem to transition to lower fees correctly. However, this should be done again (and confirmed in PR comments) after the requested changes have been made.

@evanmarshall
Copy link
Contributor Author

evanmarshall commented Nov 13, 2024

I've updated to all PR feedback. I'm running a devnet now and I'll update this comment when everything looks good.

Edit:

I've tested on a devnet. I ran the tests as suggested by @raychu86 . Additionally, I deployed a program with nested calls (the program, nested_call.aleo from the execute.rs tests) and saw the fees lower as expected post the consensus v2 height.

Copy link
Contributor

@zkxuerb zkxuerb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@vicsn vicsn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late substantial new review, I was sick the past few days.

The new changes look good to me. Sharing some of my learnings for other reviewers:

  • Just for context, this recently merged PR stores deployments to disk. Upon loading a deployment and after a restart it does construct fresh new Stacks with updated finalize fee. credits.aleo is kept in memory at all times.
  • Validators will have consistent views on fees:
    • validator on V1 height without this PR: believes the old fees are correct.
    • validator on V1 height with this PR: believes the old fees are correct by using execution_cost_v1 and skipping the cached value in Stack.
    • honest validators on V2 height are assumed to have this PR and believe the new fees are correct by using execution_cost_v2 and the updated value in Stack.
  • For future migrations we should also consider whether to evict the partially_verified_transaction cache. For this PR its not relevant because we don't cache fee verification and moreover this PR is a strict loosening of the validation criteria.
  • I will prioritize improving the unit testing framework latest in Q1 2025, which currently is very slow by regenerating so much duplicate data, and its also hard to see at which "level" of abstraction adjustments have to be tested exactly.

@zosorock zosorock merged commit 79402b5 into ProvableHQ:staging Nov 13, 2024
84 checks passed
@zosorock zosorock changed the title Reduce MAPPING_BASE_COST from 10_000 to 500 Reduce MAPPING_BASE_COST from 10_000 to 1_500 Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request v1.1.4 Canary release v1.1.4
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants