-
Notifications
You must be signed in to change notification settings - Fork 618
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: customeizable fd_limit as env var #10962
Commits on Jan 22, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 843a44c - Browse repository at this point
Copy the full SHA 843a44cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6c2d981 - Browse repository at this point
Copy the full SHA 6c2d981View commit details -
config: adjust shard cache sizes (#10409)
* Do not apply default size increase introduced in #10373 to `view_trie_cache` since view calls are not latency-sensitive * Opt out shard 1 from the increase since it only contains aurora account and based on the current metrics has very low cache miss rate even with a size of 50MB * Configure shard 3 override of 3GB to also apply after resharding
Configuration menu - View commit details
-
Copy full SHA for d7edd8e - Browse repository at this point
Copy the full SHA d7edd8eView commit details -
fix(metrics): fix false positives in the near_num_invalid_blocks metr…
Configuration menu - View commit details
-
Copy full SHA for 65b89a2 - Browse repository at this point
Copy the full SHA 65b89a2View commit details -
Configuration menu - View commit details
-
Copy full SHA for 174ce77 - Browse repository at this point
Copy the full SHA 174ce77View commit details -
Configuration menu - View commit details
-
Copy full SHA for 06168f7 - Browse repository at this point
Copy the full SHA 06168f7View commit details -
Configuration menu - View commit details
-
Copy full SHA for e8021d5 - Browse repository at this point
Copy the full SHA e8021d5View commit details -
Configuration menu - View commit details
-
Copy full SHA for 40aa586 - Browse repository at this point
Copy the full SHA 40aa586View commit details
Commits on Jan 23, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 390b452 - Browse repository at this point
Copy the full SHA 390b452View commit details
Commits on Jan 24, 2024
-
fix(configs): fill in missing values in ExperimentalConfig with defau…
…lts (#10472) the command `neard --home {home_dir} init --chain-id mainnet --download-config` downloads the config at https://s3-us-west-1.amazonaws.com/build.nearprotocol.com/nearcore-deploy/mainnet/config.json and then saves a config file built from that one to {home_dir}/config.json. This is expected to work even when particular fields are missing by filling them in with defaults, which is why many config fields are marked with some sort of `#[serde(default)]` or `#[serde(default = default_fn)]`. But the `ExperimentalConfig` doesn't fill in these defaults. The existing `#[serde(default)]` over that field in `near_network::config_json::Config` will have us fill it in if it's totally missing, but we get an error if it's present with some fields missing, which is the case today: ``` Error: Failed to initialize configs Caused by: config.json file issue: Failed to deserialize config from /tmp/n/config.json: Error("missing field `network_config_overrides`", line: 98, column: 5) ``` Fix it by adding a `#[serde(default)]` to each of the fields in `ExperimentalConfig`
Configuration menu - View commit details
-
Copy full SHA for e1fc6bc - Browse repository at this point
Copy the full SHA e1fc6bcView commit details -
Configuration menu - View commit details
-
Copy full SHA for c869dd6 - Browse repository at this point
Copy the full SHA c869dd6View commit details
Commits on Jan 28, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 71d0dc7 - Browse repository at this point
Copy the full SHA 71d0dc7View commit details
Commits on Jan 29, 2024
-
Fix: get branch on RELEASE event triggered workflow (#10495)
On release triggered workflow runs, the HEAD is detached and no local branch is present. Due to this, BRANCH var ends up as an empty string and this causes failures to publish artifacts: https://github.com/near/nearcore/actions/runs/7640475082/job/20815674136 ++ git branch --show-current + BRANCH= ++ git rev-parse HEAD + COMMIT=c869dd6c1e942f21e27a74c7e47a698de ++ uname this leads to incorrect S3 paths: `s3://build.nearprotocol.com/nearcore/$(uname)/${BRANCH}/latest` -> `s3://build.nearprotocol.com/nearcore/Linux//latest`
Configuration menu - View commit details
-
Copy full SHA for 1bdb64f - Browse repository at this point
Copy the full SHA 1bdb64fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 563091a - Browse repository at this point
Copy the full SHA 563091aView commit details -
Configuration menu - View commit details
-
Copy full SHA for f4a9a68 - Browse repository at this point
Copy the full SHA f4a9a68View commit details -
Configuration menu - View commit details
-
Copy full SHA for ea69991 - Browse repository at this point
Copy the full SHA ea69991View commit details
Commits on Jan 30, 2024
-
Fix: neard-release pipeline (#10521)
#10495 was meant to fix the builds triggered by release events. With actions/checkout GHA action only a single commit is fetched by default and thus missing branch match. To fetch all history for all branches and tags, setting fetch-depth to 0 for both binary and docker image release jobs. This was [tested](https://github.com/near/andrei-playground/actions/runs/7696882172/job/20972653224) on a private repo. <img width="676" alt="Screenshot 2024-01-29 at 13 42 09" src="https://github.com/near/nearcore/assets/122784628/941c1cd8-285e-4853-a9e7-a6ea6885c838">
Configuration menu - View commit details
-
Copy full SHA for 08a2812 - Browse repository at this point
Copy the full SHA 08a2812View commit details -
remove spammy info logs (#10529)
Removing spam from info logs. 1. GC shouldn't announce itself every block in `info` mode 2. `BlockResponse` handle shouldn't spam 10 times per block 3. `EpochOutOfBounds` is a normal error in `is_last_block_in_finished_epoch`, and we shouldn't flag it.
Configuration menu - View commit details
-
Copy full SHA for b097942 - Browse repository at this point
Copy the full SHA b097942View commit details
Commits on Jan 31, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 430a22f - Browse repository at this point
Copy the full SHA 430a22fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 276b7e7 - Browse repository at this point
Copy the full SHA 276b7e7View commit details
Commits on Feb 1, 2024
-
fix(resharding): delay deleting the snapshot until after catchup (#10545
) I haven't verified it but it seem that restarting the node during catchup does not resume the catchup but rather it restarts the whole resharding from scratch. While suboptimal I'm not aiming to fix that now. This PR merely moves the deletion of state snapshot to after catchup is finished. This way the restarted resharding can succeed.
Configuration menu - View commit details
-
Copy full SHA for 4e06d0f - Browse repository at this point
Copy the full SHA 4e06d0fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 61fef83 - Browse repository at this point
Copy the full SHA 61fef83View commit details
Commits on Mar 5, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 7b4ee1e - Browse repository at this point
Copy the full SHA 7b4ee1eView commit details -
[resharding] Fix opening snapshot for split storage nodes (#10588)
In testnet we hit an issue during resharding with split storage nodes where they failed to create the snapshot required for resharding. This PR is a fix for this issue. Link to zulip thread: https://near.zulipchat.com/#narrow/stream/308695-pagoda.2Fprivate/topic/cutting.201.2E37.2E0/near/420092380 Creation of snapshot happens via the function call `checkpoint_hot_storage_and_cleanup_columns` which takes a hot_store and creates a snapshot out of it. Later we open the snapshot `opener.open_in_mode(Mode::ReadWriteExisting)`. The function `open_in_mode` -> `ensure_kind` -> `is_valid_kind_archive` was the point of failure. This is evident from the log line ``` Feb 06 14:19:29 testnet-rpc-archive-public-02-asia-east1-b-fc341f59 neard[1753]: 2024-02-06T14:19:29.215599Z ERROR state_snapshot: State snapshot creation failed err=Hot database kind should be RPC but got Some(Hot). Did you forget to set archive on your store opener? ``` There are three types of nodes, RPC, legacy archival and split storage nodes. The function call to `get_default_kind` gets us the DbKind for the new snapshot hot storage while opening To get the snapshot storage to open properly, we need to handle the three types of storages. We need to set the value of `archive` appropriately which is passed to the storage opener and ensure we set the correct storage type of the snapshot. The table below talks about the expected values of each of these fields. | Node type | Hot storage kind | Required snapshot storage type | archive | | -- | -- | -- | -- | | RPC | DbKind::RPC | RPC | false | | Legacy Archival | DbKind::Archival | Legacy Archival | true | | Split Storage | DbKind::Hot | RPC | false | The place where our code went wrong with the split storage node was in function call to `is_valid_kind_archive` where the DbKind is Hot for split storage and archive is set as false. The new relaxation in check basically states, if we are creating a snapshot of the hot storage from a split storage node, convert it into a RPC storage. Testing: Adhoc testing where I manually set the storage DbKind as Hot and check the fix works. Additionally added an integration test that manually sets the DbKind as Hot, Archive and RPC and checks whether resharding happens.
Configuration menu - View commit details
-
Copy full SHA for 1035db5 - Browse repository at this point
Copy the full SHA 1035db5View commit details -
[resharding] Handle case of restarting node during resharding catchup (…
…#10611) This PR is a fix for the issue where when we restart a node doing resharding and in the catchup phase. High level issue: When we restart a node in the epoch when resharding in happening, what happens is we go through the whole state_sync process from the beginning which includes resharding. Once building of the child trie is completed, we then do a catchup, apply the split state to the child tried and delete the split state changes for resharding [here](https://github.com/near/nearcore/blob/e00dcaa72cfed35831b1e72760d21bb8152f1049/chain/chain/src/chain_update.rs#L399). Zulip thread link: https://near.zulipchat.com/#narrow/stream/308695-pagoda.2Fprivate/topic/Problems.20after.20resharding.20restart/near/421312417 This is the implementation of Option 1 in the thread. The key idea here is to not delete the `DBCol::StateChangesForSplitStates` during catchup of individual blocks but rather at the end of the catchup phase. This implies, in case we restart the node in the middle the catchup, we would still have all the split state information in `DBCol::StateChangesForSplitStates` and it wouldn't have been deleted. TODO: Testing on mocknet.
Configuration menu - View commit details
-
Copy full SHA for 87bf8fe - Browse repository at this point
Copy the full SHA 87bf8feView commit details -
fix(resharding): create flat storage for children shards after node r…
…estart (#10684) The root cause of the resharding issue in mocknet testing seems to be that when the node is restarted the flat storage is not recreated. Without flat storage when applying the first block of the new epoch we get different state roots than the nodes that do have flat storage. This PR introduces creating the flat storage for the children shards aka next epoch shards. This is depended on the flat storage status that is set at the end of resharding [here](https://github.com/near/nearcore-private/blob/1a4756e1acbebc073f312fde1457be47e1da3fc0/chain/chain/src/resharding.rs#L180). I failed to reproduce the issue so we'll need to test it in mocknet unfortunately. We should test all the cases (no restart, restart during resharding, restart during catchup, restart post catchup). cc @marcelo-gonzalez Ideally we should also error out when trying to apply a chunk without flat storage but I'm not brave enough to do it in 1.37 :) I also changed shard_id to shard_uid in a few places but this needs to happen on a wider scale. I'll do that separately too.
Configuration menu - View commit details
-
Copy full SHA for 9e98b54 - Browse repository at this point
Copy the full SHA 9e98b54View commit details -
fix(resharding): Allow creating the flat storage multiple times for a…
… shard. (#10696) Removing the assertion and allowing flat storage to be created multiple times for a shard. This is needed so fix an issue when node is restarted in the middle of resharding. The flat storage may be created already for a subset of shards but unless all are finished resharding will get restarted. Becuase the flat storage was created, for those shards, it will be created on node startup as well as after the second resharding is finished. This is not a perfect solution and not particularly clean. The best alternative seems to be to implement resuming of resharding where we don't restart resharding for shards that were finished. This is more a comlex change and we want to get this PR in to the release so for now I'm sticking to the simplest approach. This seems to be safe because even though the flat storage for children shards is created it's not used anywhere. Sanity check - do we ever check the existance of flat storage for a shard for anything?
Configuration menu - View commit details
-
Copy full SHA for f3597b8 - Browse repository at this point
Copy the full SHA f3597b8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 69ed989 - Browse repository at this point
Copy the full SHA 69ed989View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0f8e073 - Browse repository at this point
Copy the full SHA 0f8e073View commit details
Commits on Mar 7, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 3ecedff - Browse repository at this point
Copy the full SHA 3ecedffView commit details
Commits on Mar 8, 2024
-
[1.37] runtime: temporarily workaround the LimitedMemoryPool limiting…
Configuration menu - View commit details
-
Copy full SHA for 2567e70 - Browse repository at this point
Copy the full SHA 2567e70View commit details -
Configuration menu - View commit details
-
Copy full SHA for 915aea7 - Browse repository at this point
Copy the full SHA 915aea7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9f9de3b - Browse repository at this point
Copy the full SHA 9f9de3bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 77f40fa - Browse repository at this point
Copy the full SHA 77f40faView commit details
Commits on Mar 11, 2024
-
[resharding] Nightshade V3 shard layout for protocol version 65 (#10725)
This PR is to be cherry-picked into release 1.38.0 for splitting shard 2 into two parts. Hot and kai-ching both fall on shard 2 which has been causing a lot of congestion. Zulip thread: https://near.zulipchat.com/#narrow/stream/308695-nearone.2Fprivate/topic/constant.20congestion.20on.20shard.202/near/425367222 --------- Co-authored-by: wacban <wac.banasik@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 1f98f92 - Browse repository at this point
Copy the full SHA 1f98f92View commit details -
bump protocol version for 1.38.0 (#10714)
Bump protocol version to 35
Configuration menu - View commit details
-
Copy full SHA for 9577d06 - Browse repository at this point
Copy the full SHA 9577d06View commit details -
Configuration menu - View commit details
-
Copy full SHA for 91018d9 - Browse repository at this point
Copy the full SHA 91018d9View commit details -
Configuration menu - View commit details
-
Copy full SHA for 63eeb87 - Browse repository at this point
Copy the full SHA 63eeb87View commit details
Commits on Mar 12, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 054ab4e - Browse repository at this point
Copy the full SHA 054ab4eView commit details -
fix(metrics): changed shard id to shard uid in flat storage metrics (#…
…10685) During resharding we maintain flat storage for both the parent and children shards. In order to disambiguate and not mix the metrics for shards with the same ids we should use shard_uid instead of shard_id.
Configuration menu - View commit details
-
Copy full SHA for 5798dd6 - Browse repository at this point
Copy the full SHA 5798dd6View commit details -
backport: crates 0.20.1 had two important fixes that has to be in 1.3…
Configuration menu - View commit details
-
Copy full SHA for b9bbb62 - Browse repository at this point
Copy the full SHA b9bbb62View commit details -
Configuration menu - View commit details
-
Copy full SHA for fa18c12 - Browse repository at this point
Copy the full SHA fa18c12View commit details -
fix: near_ prefix to flat storage metrics (#10763)
As these metrics were recently changed and continuity of data is broken, we can add "near_" prefix to prometheus metrics which was missing all the time. @wacban I wonder how painful is it to add this to 1.38 for convenience? Co-authored-by: Longarithm <the.aleksandr.logunov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for ed7a80f - Browse repository at this point
Copy the full SHA ed7a80fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 78122a6 - Browse repository at this point
Copy the full SHA 78122a6View commit details
Commits on Mar 14, 2024
-
Configuration menu - View commit details
-
Copy full SHA for bb02713 - Browse repository at this point
Copy the full SHA bb02713View commit details
Commits on Mar 15, 2024
-
rpc: return timeout error from tx_status_fetch (#10789)
All the logic of `tx_status_fetch` function is: we poll tx_status method and wait for the desired level of tx finality. `tx_status_fetch` is used in several places including `broadcast_tx_commit` RPC method. With the chunk congestions we have right now, we return `UNKNOWN_TRANSACTION` error to `broadcast_tx_commit` after 20 seconds of waiting, which is both sad and weird. The error we store in `tx_status_result` is not good enough to show it to the user, otherwise we would break from the loop with it immediately. If we reach the timeout boundary, I suggest always to return timeout error.
Configuration menu - View commit details
-
Copy full SHA for c5517fb - Browse repository at this point
Copy the full SHA c5517fbView commit details -
support Included transaction status (#10796)
Ideally, we need to rewrite `chain.get_final_transaction_result` method completely. But for now, let's start at least with supporting `TxExecutionStatus::Included` status.
Configuration menu - View commit details
-
Copy full SHA for 6f905b5 - Browse repository at this point
Copy the full SHA 6f905b5View commit details -
fix(snapshots): drop unwanted column families instead of deleting all…
… keys (#10803) If the `columns_to_keep` arg of `checkpoint_hot_storage_and_cleanup_columns()` is `Some`, then we delete all the data in every other column. Then if snapshot compaction is enabled in the configs, we rely on that to clean up the files on disk. Instead of doing that, we can just call `drop_cf()` on every unwanted column family, and the associated sst files will be removed without the need for any compactions. So this moves the `columns_to_keep` arg to `near_store::db::Database::create_checkpoint()`, and has the rocksdb implementation of that trait call `drop_cf()` on unwanted column families. These column families are then immediately recreated in `checkpoint_hot_storage_and_cleanup_columns()` by the call to `StoreOpener::open_in_mode()`, but the data on disk is gone. This also means we can get rid of the state snapshot options in the config, since they were only ever intended to clean up the unwanted files, which aren't there anymore.
Configuration menu - View commit details
-
Copy full SHA for a56cd4a - Browse repository at this point
Copy the full SHA a56cd4aView commit details
Commits on Mar 18, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 7fec892 - Browse repository at this point
Copy the full SHA 7fec892View commit details -
Configuration menu - View commit details
-
Copy full SHA for afc8f93 - Browse repository at this point
Copy the full SHA afc8f93View commit details -
Configuration menu - View commit details
-
Copy full SHA for b6b6f4f - Browse repository at this point
Copy the full SHA b6b6f4fView commit details -
Configuration menu - View commit details
-
Copy full SHA for 301c01a - Browse repository at this point
Copy the full SHA 301c01aView commit details -
Configuration menu - View commit details
-
Copy full SHA for d639d51 - Browse repository at this point
Copy the full SHA d639d51View commit details -
Configuration menu - View commit details
-
Copy full SHA for aac5e42 - Browse repository at this point
Copy the full SHA aac5e42View commit details
Commits on Mar 26, 2024
-
Configuration menu - View commit details
-
Copy full SHA for c64b022 - Browse repository at this point
Copy the full SHA c64b022View commit details -
Configuration menu - View commit details
-
Copy full SHA for 0f28ce9 - Browse repository at this point
Copy the full SHA 0f28ce9View commit details