feat(dynamic config): Refactor and make several config fields reloadable at runtime #8240

nikurt · 2022-12-19T09:35:54Z

Introduce mutable fields in ClientConfig.
Intoruce the infrastructure to reload config.json and notify Client that certain fields were updated.
Refactoring to inline dyn_config.json into config.json.

This reverts commit efb319c.

This reverts commit 40d0710.

Co-authored-by: nikurt <86772482+nikurt@users.noreply.github.com>

This is the next step in the refactoring steps for changing gas profiles to track gas by parameter (near#8033). Here we make `ExtCostsConfig` opaque and look up parameters by ```rust pub fn cost(&self, param: ExtCosts) -> Gas ``` instead of using a specific field inside. There are side-effects for this in 1. parameter definition 2. JSON RPC 3. parameter estimator 1) We no longer load the parameters through "parameter table -> JSON -> serde_deser" steps because `ExtCostsConfig` no longer has serde derives. Instead each `ExtCosts` maps to a `Parameter` that allows looking up the value directly from `ParameterTable`. This explicit mapping also replaces the `Parameter::ext_costs()` iterator previously used to find all parameters that are ext costs. We used to define `wasm_read_cached_trie_node` in `53.txt` and fill old values with serde default. Serde was removed here, so I changed it to define the parameter in the base files. This is equivalent to the old behavior, only it is less clear when we added the parameter. 2) JSON RPC must keep the old format. Thus, I added `ExtCostsConfigView` and `VMConfigView` there. It is a direct copy-paste of the old structs in the old format but without serde magic to fill in missing values. 3) The estimator generates a `ExtCostsConfig` from estimations. This is now done through a mapping from estimated costs to `ExtCosts`. # Testing The exact JSON output is checked in existing tests `test_json_unchanged`.

Follow up to near#8073, near#8109, and near#8110 for the two remaining tests in `integration-tests/src/tests/network/routing.rs`.

Added create_test_signer that creates the signer for the given account with the seed that matches the account name. Can be used in tests only.

* fix a bug in skip procesing * add test * remove comments

)

removed edge field from Connection to avoid redundancy with GraphSnapshot.local_edges removed sending a duplicate RoutingTableUpdate message in response to RequestUpdateNonce

They were already rejected by wasmer2 before this change, just after preparation. So this should not be a protocol change. Second attempt at near#8029, without a protocol change this time

read_memory and write_memory have to perform the same checks as fits_memory already does. So rather than panicking, change the methods to return an error. This makes the interface panic-free and in some situations allows fits_memory call to be skipped. However, separate fits_memory check may still be necessary so keep the method but document in detail why it’s needed and, to keep interfaces consistent, change it to return a Result. Finally, read_memory_u8 is never used so get rid of it. While at it, add tests for WasmerMemory. Essentially copies of tests from Wasmer2Memory.

in the mirror code, where we're starting an indexer for the target chain, the genesis records file is usually very big since we're forking mainnet or testnet state to run a mocknet test. So starting the indexer with full genesis validation takes quite a long time and we don't really need it.

With the wrapping computation of `new_used_gas` any contract is in full control of this value, including those that are actually less than either operand (due to overflows.) This is not actually a big deal by itself, if not for the fact that it gives an opportunity for the attacker to nuke the entire network out of service. The in-line comments largely explain the reasoning behind the fix and why it preserves the protocol-facing behaviour, thus not necessitating a protocol version change. To reiterate what the comments say, basically the only case that we want to resolve is for when the attacker picks a value of `new_used_gas` such that it is less than `burnt_gas`. Instead of asserting and aborting we simply set `self.promises_gas = 0`. --- The fix in this PR has been included in 1.29.3 (72d4a4d) and 1.30.0-rc.6 (4290758).

The promise_and method processes one promise index at a time. There’s no benefit in having all them in memory. Rather than copying the indices from Vec<u8> into a new Vec<u64> simply do the little endian conversion while going through the former vector. This saves an allocation.

…ts (near#8195) Bumps [certifi](https://github.com/certifi/python-certifi) from 2021.10.8 to 2022.12.7. - [Release notes](https://github.com/certifi/python-certifi/releases) - [Commits](certifi/python-certifi@2021.10.08...2022.12.07) --- updated-dependencies: - dependency-name: certifi dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Update the security policy to include hackenproof for security issue submissions.

Adds a separate page `/debug/pages/tier1_network_info` displaying information about TIER1 connections. For TIER1 connections we mostly display the same information as TIER2 connections, except that: - TIER1 messages don't include height, last block hash, or nonce info, so these columns are omitted - TIER1 nodes have configured public proxies; a column is added to display their addresses Includes a couple of minor tweaks to the TIER2 `network_info` page (show PeerId of the current node at the top of the page, rename the connection table's `AccountId` column to `PeerId`).

Deprecating AnnounceAccount in favor of AccountData, when finding routing target. AnnounceAccount will be still used as a fallback. In the next release, once we confirm that AnnounceAccount is not used any more, we will remove the fallback. We will stop broadcasting AnnounceAccount only once MIN_SUPPORTED_VERSION > 57.

nearcore/src/config.rs

nearcore/src/dyn_config.rs

chain/client/src/client_actor.rs

core/chain-configs/src/updateable_config.rs

core/dyn-configs/README.md

core/o11y/src/lib.rs

core/chain-configs/src/updateable_config.rs

core/dyn-configs/src/lib.rs

mzhangmzz · 2023-01-12T00:23:28Z

nearcore/src/dyn_config.rs

+            None
+        }
+    };
+    let updateable_client_config =


I think it is better UX if the updateable configs are read through a separate file instead of the original config file. It has a few benefits:

Users won't be able to modify a config field that is not actually upgradeable. Right now, if user does that, the code just silently drops the update, which could be a source of confusion to the user. On top of that, the node can be in trouble when they restart.

Users can't accidentally change the config file to a bad state, so they won't be in trouble when they restart the node.

IIUC, the suggestion is to have config.json which is assumed to be in a good state; and to duplicate the updateable fields into a separate file, e.g. dyn_config.json.

If in the future we decide to make max_num_peers updateable at runtime, will that mean that the same field will be configurable in config.json as network.max_num_peers and in dyn_config.json as max_num_peers?

I would find that UX confusing, because reading config.json is no longer enough to understand the node's configuration. There is always a chance that some field is overridden in dyn_config.json.
On the other hand, if I want to change max_num_peers, why shouldn't I change it in both config.json and dyn_config.json?
Which means that to read the configuration I need to read two config files; and to update the configuration I need to update two config files.

I think my main concern with let user update config.json directly is that in that case, the values in config.json may not actually reflect what the config the node is using. But I also see your point about two sources of truth.

What about we let the user update the dyn_config.json but the code will automatically update config.json to reflect the updated value after dyn_config.json is applied. This way, the user only needs to rely on config.json for the source of truth. And they don't have to worry about messing up the config.json when the node is running.

To make it even safer, the program can also save a copy of the old config value, so the user has something to fall back to in case they make a mistake and want to revert to the previous config.

Just an idea about this concern. Could the following rules be an easier solution?

Introduce a new config field allow_config_change: true (default is false) in the original config.json, and this value is not allowed changed in runtime. Just in case people do not want to change config in runtime and keep same behavior as current.

If the dyn_config is valid and applied. Could we save a config snapshot as config.{timesampts}.json?

If the snapshot can not store, neard will panic in case of any config changes without a clear snapshot.

(Optional) When the neard start and allow_config_change is true, it also creates config.{timesampts}.json such that the behavior will much more consistent.

The reason to use config.{timesampts}.json but not config_log.txt is

user will easier to reuse the config.{timesampts}.json by renaming it to config.json.

it is easier to use diff tools for two config files

keep a single place to store the log

once the dyn_config is applied, the snapshot is a kind of receipt, and we can just write a one-line log config.{timesampts}.json applied without verbose.

If these look good from your end, I am happy to implement this. 😄

@mzhangmzz

the code will automatically update config.json

I find it very confusing when a program modifies its own configs.

the user has something to fall back to in case they make a mistake

Unclear how useful this is. I rely on the (persistent) undo history. Some users make and manage backups of the configs.
In general, I don't want to prescribe how the users should manage their config.json file. There are too many exceptions.

@yanganto

allow_config_change

Seems redundant when a SIGHUP is required to trigger a config reload. If the user doesn't send a signal, nothing will happen.

save a config snapshot

I believe many users would like to manage config versions themselves.

I also like the idea of a single 'config.json' file.

For the config{timestamp}.json - the part I don't like, is that it pollutes the directory a little - what we could do instead, is to have the /config HTTP endpoint (or /debug/config) that would always report the current config. (this will allow people to do diffs in an easy way if needed)

/config is already work in progress: https://pagodaplatform.atlassian.net/browse/ND-273

mzhangmzz · 2023-01-12T00:24:47Z

Please also update the description of the PR.

mzhangmzz · 2023-01-12T23:02:40Z

nearcore/src/config.rs

+            Ok(())
+        }
+        // TODO: Add more config validation.
+        // TODO: Validate `ClientConfig` instead.


+1 for validating ClientConfig instead. Ideally, the validation should happen as soon as ClientConfig is loaded.

Do you mind if I leave this to a follow-up code change?

mzhangmzz · 2023-01-12T23:14:58Z

chain/client/src/info.rs

        }
        if let Some(statistics) = statistics {
            rocksdb_metrics::export_stats_as_metrics(statistics);
        }
-
+        if let Some(config_updater) = &config_updater {
+            config_updater.log_error();


I am not sure about passing the error all the way from neard to client and to here just for logging. Without this, the logic for ConfigUpdater is much simpler as the struct does not contain any state. I see two other options:

Simply let the node panic in neard after loading the new config if the config is updated to an invalid setting. This may be a little harsh, but I think it is also reasonable. When restarting the node, it panics if the config is invalid. So I think it is ok to terminate the node if the updated config is invalid.

Log the error into a separate file such as config_log.txt. This way, the error message will not be lost in the other log messages, so you don't have store the message and log it every time with the info().

We have a tradeoff between code complexity and making it more user-friendly. Or rather, more difficult to shoot yourself in the foot and not notice it.

I lean strongly on the side of user-friendliness.
There is also a metric exported about the config being invalid.

let the node panic

That is user-anti-friendly. I can imagine updating the config during an incident, and accidentally killing a very valuable node. Don't want that.
And if a validator accidentally kills a node and misses blocks, that is not good either.

config_log.txt

That is non-trivial to use properly 🤔
I would like to use it like

killall -SIGHUP neard ; cat ~/.near/config_log.txt || echo 'Config valid'

But it may take longer than a moment to apply that config change, parse the config and validate it.
To be sure it's applied a delay is needed.

killall -SIGHUP neard ; sleep 10; cat ~/.near/config_log.txt || echo 'Config valid'

But I also cannot guarantee that 10 seconds is a sufficient delay.
And anyway, if I can remember to check config_log.txt, I can remember to check the Prometheus metric.

addressed

chain/client/src/config_updater.rs

core/dyn-configs/README.md

Co-authored-by: Jakob Meier <mail@jakobmeier.ch>

…ble at runtime (near#8240) Introduce mutable fields in ClientConfig. Intoruce the infrastructure to reload `config.json` and notify `Client` that certain fields were updated. Refactoring to inline `dyn_config.json` into `config.json`.

Co-authored-by: Jakob Meier <mail@jakobmeier.ch>

yanganto and others added 30 commits November 25, 2022 13:34

neard: reload key for clients when recv SIGHUP

40d0710

update rpc servers

efb319c

Revert "update rpc servers"

94de8b7

This reverts commit efb319c.

Revert "neard: reload key for clients when recv SIGHUP"

dd99f29

This reverts commit 40d0710.

restarting to reload key base on config change

9aed32b

try to reset logger

33e8cf2

Merge branch 'master' into reload

50e55ba

Add watch_config feature to reload signer key

76ea5a0

Update core/o11y/src/lib.rs

2cfd460

Co-authored-by: nikurt <86772482+nikurt@users.noreply.github.com>

Update neard/src/cli.rs

839221d

Co-authored-by: nikurt <86772482+nikurt@users.noreply.github.com>

fix log on restarting

0c4ab4c

Remove watch config

a73e1e8

Merge branch 'master' into reload

7444ced

Merge branch 'master' into reload

08ed4fb

Migrate remaining routing tests to network crate (near#8168)

caad73d

Follow up to near#8073, near#8109, and near#8110 for the two remaining tests in `integration-tests/src/tests/network/routing.rs`.

Added create_test_signer function that simplifies tests (near#8176)

72c57da

Added create_test_signer that creates the signer for the given account with the seed that matches the account name. Can be used in tests only.

Fix a bug in processing skips approvals when there are forks (near#8165)

7a57975

* fix a bug in skip procesing * add test * remove comments

[fix] Fix localnet config - local ports should be set to true (near#8180

f8ac73d

)

[doc] Fix a link to the testing page in the CONTRIBUTING.md (near#8172)

136ae4d

removed edge field from Connection (near#8160)

1d0d5c7

removed edge field from Connection to avoid redundancy with GraphSnapshot.local_edges removed sending a duplicate RoutingTableUpdate message in response to RequestUpdateNonce

reject imported memories early (near#8146)

acb2dd1

They were already rejected by wasmer2 before this change, just after preparation. So this should not be a protocol change. Second attempt at near#8029, without a protocol change this time

Update SECURITY.md (near#8183)

de7caf7

Update the security policy to include hackenproof for security issue submissions.

nikurt added 2 commits January 9, 2023 10:42

Merge

3a19e65

Merge

e7060d6

mm-near previously requested changes Jan 9, 2023

View reviewed changes

Addressed more comments

e6167a5

nikurt requested a review from mm-near January 10, 2023 12:43

nikurt added 2 commits January 10, 2023 16:04

Addressed more comments

592fd7d

Addressed more comments

fe4f393

mzhangmzz reviewed Jan 12, 2023

View reviewed changes

Comments

b9c9abc

mzhangmzz reviewed Jan 12, 2023

View reviewed changes

nikurt requested a review from mzhangmzz January 13, 2023 17:55

Merge branch 'master' into nikurt-dyn-config

3ba626f

mm-near approved these changes Jan 17, 2023

View reviewed changes

chain/client/src/config_updater.rs Outdated Show resolved Hide resolved

chain/client/src/config_updater.rs Outdated Show resolved Hide resolved

chain/client/src/config_updater.rs Outdated Show resolved Hide resolved

chain/client/src/config_updater.rs Outdated Show resolved Hide resolved

nikurt and others added 3 commits January 17, 2023 14:52

config_updater fixes

5d000c7

Merge branch 'master' into nikurt-dyn-config

446c339

config_updater fixes

7ed8aa1

mzhangmzz approved these changes Jan 17, 2023

View reviewed changes

chain/client/src/config_updater.rs Outdated Show resolved Hide resolved

core/dyn-configs/README.md Show resolved Hide resolved

nikurt and others added 2 commits January 19, 2023 11:06

Merge branch 'master' into nikurt-dyn-config

8931bd0

try_update tuning

6f09a9c

nikurt added the S-automerge label Jan 19, 2023

near-bulldozer bot merged commit 748ac49 into near:master Jan 19, 2023

nikurt mentioned this pull request Jan 19, 2023

feat: add /client_config endpoint to view runtime client config #8379

Closed

nikurt added a commit to nikurt/nearcore that referenced this pull request Jan 19, 2023

Changelog for near#8240

94c9a4c

near-bulldozer bot pushed a commit that referenced this pull request Jan 30, 2023

feat(dynamic config): Changelog for #8240 (#8399)

28372ad

Co-authored-by: Jakob Meier <mail@jakobmeier.ch>

nikurt added a commit to nikurt/nearcore that referenced this pull request Jan 30, 2023

feat(dynamic config): Changelog for near#8240 (near#8399)

948063d

Co-authored-by: Jakob Meier <mail@jakobmeier.ch>

ppca pushed a commit to ppca/nearcore that referenced this pull request Jan 30, 2023

feat(dynamic config): Changelog for near#8240 (near#8399)

969db0d

Co-authored-by: Jakob Meier <mail@jakobmeier.ch>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(dynamic config): Refactor and make several config fields reloadable at runtime #8240

feat(dynamic config): Refactor and make several config fields reloadable at runtime #8240

nikurt commented Dec 19, 2022 •

edited

Loading

mzhangmzz Jan 12, 2023

nikurt Jan 12, 2023

mzhangmzz Jan 12, 2023

mzhangmzz Jan 12, 2023

yanganto Jan 13, 2023 •

edited

Loading

nikurt Jan 13, 2023 •

edited

Loading

nikurt Jan 13, 2023

mm-near Jan 16, 2023

nikurt Jan 16, 2023

mzhangmzz commented Jan 12, 2023 •

edited

Loading

mzhangmzz Jan 12, 2023

nikurt Jan 13, 2023

mzhangmzz Jan 12, 2023

nikurt Jan 13, 2023

feat(dynamic config): Refactor and make several config fields reloadable at runtime #8240

feat(dynamic config): Refactor and make several config fields reloadable at runtime #8240

Conversation

nikurt commented Dec 19, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanganto Jan 13, 2023 • edited Loading

Choose a reason for hiding this comment

nikurt Jan 13, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzhangmzz commented Jan 12, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nikurt commented Dec 19, 2022 •

edited

Loading

yanganto Jan 13, 2023 •

edited

Loading

nikurt Jan 13, 2023 •

edited

Loading

mzhangmzz commented Jan 12, 2023 •

edited

Loading