Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy and understandable fees estimation #4633

Closed
matklad opened this issue Aug 4, 2021 · 11 comments
Closed

Easy and understandable fees estimation #4633

matklad opened this issue Aug 4, 2021 · 11 comments
Assignees
Labels
A-params-estimator Area: runtime params estimator T-contract-runtime Team: issues relevant to the contract runtime team

Comments

@matklad
Copy link
Contributor

matklad commented Aug 4, 2021

Zulip topic: https://near.zulipchat.com/#narrow/stream/295306-dev-contract-runtime/topic/Design.20for.20parameter.20estimator

Problem: any task that requires adjustment, addition, or investigation of fees today requires a lot of engineering effort. One does not simply "run params estimator".

For the past couple of weeks, I've look at how current estimator works, and I identified the following set of issues:

  1. code technical debt: our approach to estimation drastically changed over time (from running many integrated real-time measurements to using isolated qemu based tests), and the resulting code feels more convoluted that it can be. It takes time to understand it.
  2. configuration technical debt: estimator has a lot of tweakable parameters, and it's unclear what are the "standard parameters". Most significant issue is Bring current protocol costs in-line with current parameter estimator implementation #4474: we count IO costs by default, but the current configuration doesn't include IO costs
  3. slow iteration speed -- running parameter estimator takes a lot of time
  4. bulky usage -- running parameter estimator means entering a sequence of commands using docked and qemu. As a result, you get a ton of output on stdoout, most of which is noise
  5. it's hard to understand how each specific fee is computed -- all fees are via a single linear sequence of operations. If you are interested in a particular fee, it's hard to understand what dose and does not contribute to it.

That said, the overall approach to estimation feels broadly correct -- to estimate a fee, we construct a transaction which exercises worst-case for this fee, and observer the time it takes whole runtime to process the op.

Proposed plan of action (somewhat incomplete at this point and to be adjusted as we go):

  • incrementally refactor existing estimator
  • make it possible to pass only subset of fees for estimation Let param-estimator estimate subset of RuntimeConfig fees #4501
  • make implicit ordering dependencies between the fees into explicit setup dependencies
  • encode all tribal knowledge about running parameter estimator into the tool iteself, such that cargo run -p params-estimator does the docker and quemu stuff automatically.
  • make it possible to debug fee issues without docker and qemu (Consider adding a way to account for IO-costs without qemu #4566, adding CPU time as metric, maaaybe going all the way to https://hackmd.io/sH315lO2RuicY-SEt7ynGA?view).
  • make the code nicely to read. Clearly separate it into:
    • a small driver module to run the measurements
    • a medium stateless setup module which establishes context for measurements
    • a large, but very shallow set of measurements, such that each measurement can be understood in isolatin. Idally, we'd have a function per measurement.
@matklad matklad added T-contract-runtime Team: issues relevant to the contract runtime team A-params-estimator Area: runtime params estimator labels Aug 4, 2021
@matklad matklad self-assigned this Aug 4, 2021
@MaksymZavershynskyi
Copy link
Contributor

MaksymZavershynskyi commented Aug 4, 2021

Thank you for taking a high-level look at it!

I would add another motivation on top of it:

  • It is easy to make a non-obvious mistake. Here are some examples:
    • We need to run param estimator with 10M accounts for estimating fees in transaction_costs (so that we use the worst-case scenario of the deepest practically possible trie, since these fees are fixed and do not depend on the size of the trie), but with 20K accounts for wasm_config (so that this math holds);
    • If we disable certain things like Rocksdb compaction in param estimator, it will make fees look smaller, but it might actually be a computationally heavy process that we want to charge for;
    • Adding in-memory cache here in there in our codebase might make param estimator compute some fees to be much lower than what they actually need to be.

@matklad
Copy link
Contributor Author

matklad commented Aug 20, 2021

Ok, I think I have proposed implementation strategy for the estimator.

We have a CostTable which lists every possible cost. For each cost, we have a rust function that computes the cost. Each function is stateless, and we don't try to use a single unified framework for all costs. If a particular costs needs to be computed in an odd way, so be it.

That said, there are groups of costs that are computed in a similar way. For them, we'll just have helper functions to be called from the view-computing function.

Similarly, if several cost functions need similar setups, the steup is extracted into a helper, and is called from each cost function:

// Do
fn cost1() {
   common_setup()
   compute_cost1()
}

fn cost2() {
   common_setup()
   compute_cost2()
}

// Don't
fn costs() {
   common_setup()
   compute_cost1()
   compute_cost2()
}

This is a bit more verbose, but makes dependencies explicit, and allows to reason about costs in isolation.

If we find that re-running some common setup is too slow, we'll add an explicit cache for this bit of state.

As a special case, if cost2 and cost3 both depend on cost1, then we'll add explicit caching code to cost1 computation.

As to the way we actually compute costs, the current approach of manufacturing the block, computing "average" time, and maybe subtracting some base cost seems broadly reasonable to me for most costs. I can also see that for some costs we might want to use a more fine-grained approach and measure something specific, rather than the whole block. The suggested setup should allow us to transparently support both types of costs.

I don't see a bullet proof solution to "performance depends on the exact setup" problem. I think the proposal makes it easier by making setup an explicit per-cost setup, such that we at least see what we think is important for each particular cost. What would also help is reducing the configuration space of the param estimator to just two flags --dev and --release, which run estimator in the fast mode (for ad-hoc investigations) and production mode (for actual estimation).

Here's roughly how I expect the interesting parts to look like:

struct GasCost(u32);

static ALL_COSTS: &[(Cost, fn(&mut Ctx) -> GasCost)] = &[
    (Cost::ActionReceiptCreation, action_receipt_creation),
    (Cost::ActionCreateAccount, action_create_account),
];

pub fn run(config: Config) -> CostTable {
    let mut ctx = Ctx::new(config);
    let mut res = CostTable::default();

    for (cost, f) in all_costs {
        let value = f(&mut ctx);
        res.add(cost, value)
    }

    res
}

fn action_receipt_creation(ctx: &mut Ctx) -> GasCost {
    if let Some(gas_cost) = ctx.cached.action_receipt_creation {
        return gas_cost;
    }

    let res = ctx.measure_transactions(|ctx| from_actions(vec![]));

    ctx.cached.action_receipt_creation = Some(res);

    res
}

fn action_create_account(ctx: &mut Ctx) -> GasCost {
    let total_cost = ctx.measure_transactions(|ctx| {
        let account_id = ctx.random_existing_account();
        let other_account_id = ctx.random_non_existing_account();
        SignedTransaction::from_actions(
            ctx.nonce(&account_id),
            account_id,
            other_account_id,
            &InMemorySigner::from_seed(account_id.clone(), KeyType::ED25519, account_id.as_ref()),
            vec![
                Action::CreateAccount(CreateAccountAction {}),
                Action::Transfer(TransferAction { deposit: 10u128.pow(26) }),
            ],
            CryptoHash::default(),
        )
    });

    let base_cost = action_receipt_creation(ctx);

    total_cost - base_cost
}

@olonho
Copy link
Contributor

olonho commented Aug 24, 2021

This approach sounds very much desirable, thanks Alexey!

matklad added a commit to matklad/nearcore that referenced this issue Aug 25, 2021
You can now pass something like

  --metrics-to-measure Receipt,data_receipt_10b_1000,data_receipt_base_10b_1000

to calculate only those metrics. Note that metrics != costs. The end
goal is to do this for actual costs, as outlined in

  near#4633 (comment)

The current approach is an intermediate step in that direction. In
particular, it allows to debug estimator refactors much faster.

Test Plan
---------

Manually run estimator to smoke test that this works, use a tool
([SSR](https://rust-analyzer.github.io/manual.html#structural-search-and-replace))
to change extraction logic without introducing typos.
matklad added a commit to matklad/nearcore that referenced this issue Aug 25, 2021
You can now pass something like

  --metrics-to-measure Receipt,data_receipt_10b_1000,data_receipt_base_10b_1000

to calculate only those metrics. Note that metrics != costs. The end
goal is to do this for actual costs, as outlined in

  near#4633 (comment)

The current approach is an intermediate step in that direction. In
particular, it allows to debug estimator refactors much faster.

Test Plan
---------

Manually run estimator to smoke test that this works, use a tool
([SSR](https://rust-analyzer.github.io/manual.html#structural-search-and-replace))
to change extraction logic without introducing typos.
@matklad
Copy link
Contributor Author

matklad commented Aug 25, 2021

is a first cut at it #4733, very incomplete atm

@MaksymZavershynskyi
Copy link
Contributor

As a special case, if cost2 and cost3 both depend on cost1, then we'll add explicit caching code to cost1 computation.

We might want to avoid any kind of caching because it creates complex logic flows and make diagnostics harder. If someone runs a fee estimation for one operation then caching will not save time, but if someone runs fee estimation for multiple operations then they can run them in parallel and the execution time will be defined by whichever operation was the one to create the cache, which again does not shorten the total time much (unless we argue that operations take very different time to estimate and the shortest one creates the cache). It only saves CPU cycles which we can be wasteful about, since we run param estimator in frequently.

 for (cost, f) in all_costs {

This part can be behind rayon.

@matklad
Copy link
Contributor Author

matklad commented Aug 26, 2021

Oh wow, the idea of adding parallelism didn't enter my mind at all, thanks @nearmax for mentioning it! Yeah, I do think, long term, we'll be able to leverage parallelism here, cost computation is embarassingly parallel. I don't think that would be as easy as rayon though -- we do process-wide counting in qemu, and scoping that to a single thread might be fiddly. So it seems that we'd rather go for multiprocess setup, where we spawn several qemu in docker, where each process computes a subset of costs. Both per-process qemu and multiprocessing seems like some amount of work, so for the time being I concentrate on the single threaded approach.

We might want to avoid any kind of caching because it creates complex logic flows and make diagnostics harder

I think I see where you are coming from, but I do think the caching makes sense here. The problem is that caching can be hard, if you implement it in a generic way, with high-order functions and macros. In my wip PR I instead implemented cache in the dumbest way possible, as an extra first-order if. I think this shouldn't create any problems with understanding and debugging.

I think caching helps in two ways:

  • certain costs are used as base costs a lot, and it would be simpler if the base cost stays fixed for the set of costs we are computing.
  • computing less stuff helps with understanding as well. Eg, we currently print a dignostic message every time we create a testbed, and caching reduces the amout of messages printed. Similarly, you'll have less stuff to skip over during debugging, etc.

Either way, the caching is implemented in concatenable way (there's no caching framework/high-order-function, each cachable cost has 2 extra statements for caching), it should be trivial to add it or back it out.

near-bulldozer bot pushed a commit that referenced this issue Sep 2, 2021
You can now pass something like

  --metrics-to-measure Receipt,data_receipt_10b_1000,data_receipt_base_10b_1000

to calculate only those metrics. Note that metrics != costs. The end
goal is to do this for actual costs, as outlined in

  #4633 (comment)

The current approach is an intermediate step in that direction. In
particular, it allows to debug estimator refactors much faster.

Test Plan
---------

Manually run estimator to smoke test that this works, use a tool
([SSR](https://rust-analyzer.github.io/manual.html#structural-search-and-replace))
to change extraction logic without introducing typos.
@matklad
Copy link
Contributor Author

matklad commented Sep 22, 2021

Some in-progress findings:

  • ReadMemoryBase seems to be safely multiplied twice
  • Some of our costs suspicially equal (Utf16DecodingBase, LogBase)
  • alt_bn128_pairing_check_10_1k and the like are misnamed: it's not 10, its 1924.

@stale
Copy link

stale bot commented Dec 21, 2021

This issue has been automatically marked as stale because it has not had recent activity in the last 2 months.
It will be closed in 7 days if no further activity occurs.
Thank you for your contributions.

@stale stale bot added the S-stale label Dec 21, 2021
@bowenwang1996
Copy link
Collaborator

@matklad I believe @jakmeier is working on this. Is it correct?

@stale stale bot removed the S-stale label Dec 27, 2021
@matklad
Copy link
Contributor Author

matklad commented Jan 4, 2022

Yeah, @jakmeier is working on this general area, though I'd say this particular issue is fixed, and we should open more specific sub-issues for what's left to be done ...

  • residual refactors
  • look at each and every cost one-by-one and make sure it makes sense
  • update QEMU image
  • fix touching-trie-node costs

... with the terminal goal of ensuring that our fees are sound (non-exploitable) and allow for reliable implementation of runtime

@jakmeier I think we two should just go and create a bunch of follow-up issues here. I might not get to this today myself, so feel free to go ahead (see A-param-estimator label).

@matklad matklad closed this as completed Jan 4, 2022
@jakmeier
Copy link
Contributor

jakmeier commented Jan 4, 2022

I've written down all that I could think of as follow-ups around the topic of estimator confidence:

This covers at least some of what you listed. @matklad
But feel free to create more where you see fit!

I think most pressing are probably the last two, understanding storage better and getting a rough overview of undercharging right now. So I guess that's what I will work on for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-params-estimator Area: runtime params estimator T-contract-runtime Team: issues relevant to the contract runtime team
Projects
None yet
Development

No branches or pull requests

5 participants