Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use RocksDB's snapshots instead of RwLock on database #832

Merged
merged 12 commits into from
Feb 16, 2024

Conversation

jakrawcz-rdx
Copy link
Contributor

@jakrawcz-rdx jakrawcz-rdx commented Jan 31, 2024

The first PR for https://radixdlt.atlassian.net/browse/NODE-571.
Addresses point 1. of DoD: just not via a single hot-swapped snapshot, but by on-demand snapshots.

Main points:

  • StateLock<> is now replaced by DbLock<Snapshottable> (see its extensive rustdocs)
  • This lock, apart from "locked" vs "historical" access, can give you a snapshot() at any moment:
    • taking a snapshot is apparently so cheap (~microseconds) that it's not worth caching
    • if it turns out otherwise, "add a cache for snapshot-after-last-commit" will be a cool future performance improvement PR
  • The "DB enum middleman" (enum_dispatch-ing to rocks only) is gone
  • For technical reasons, we need to define our own trait over Rocks (see ReadableRocks and WriteableRocks)
  • Every place that used StateManagerDatabase now uses either StateManagerDatabase<impl ReadableRocks> or StateManagerDatabase<impl WriteableRocks> (depending on its use-case)
    • This gives some extra compile-time safety, so that you don't write to Rocks when you e.g. access it via snapshot.
  • This refactoring PR does not attempt to fix all "sloppy DB locking" problems that we have; there are cases which (I believe) only work fine because a higher layer synchronizes them (e.g. Java lock) or their nature excludes concurrent processes (e.g. during boot-up). Most notably:
    • execution of genesis and scenarios
    • applying of protocol update

Copy link

github-actions bot commented Jan 31, 2024

Docker tags
docker.io/radixdlt/private-babylon-node:pr-832
docker.io/radixdlt/private-babylon-node:8eb4f12a77
docker.io/radixdlt/private-babylon-node:sha-8eb4f12

Copy link
Contributor

@LukasGasior1 LukasGasior1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor comments.

Looks good! Love the new readable/writeable traits 👍

@@ -30,9 +30,7 @@ pub(crate) async fn handle_lts_transaction_status(
pending_transaction_result_cache.peek_all_known_payloads_for_intent(&intent_hash);
drop(pending_transaction_result_cache);

// TODO(locks): consider this (and other) usages of DB "read current" lock. *Carefully* migrate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I take it this has been considered and deemed not worth it given how cheap snapshots are?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That TODO was mostly about avoiding the read lock, and now it is easiest to achieve by snapshots, so yes - most of these places were migrated to snapshots.
(although not all - e.g. for vertex store, after looking at how it is used, I went for direct access. The same for StateManager's boot-up. In this review, I would like to ask you for careful re-analysis of all "direct access" cases, whether they are really safe, because who knows what knowledge I am missing.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmh so we're now allowing (locking-wise) concurrent writes to VertexStoreCf (in save_vertex_store and commit).
While I don't think it can cause issues at the moment given the context in which those methods are used (e.g. we only include vertex store in commit if it originates from consensus, and that's the same thread that calls save_vertex_store), I'd still suggest to change that to lock just in case (e.g. if some assumptions change in the future). In normal operating conditions this lock will always be non-contentious.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, let me add the lock, since it indeed writes to the same "db region" as commit 👍

pub struct DbLock<D> {
exclusive_live_marker_lock: Mutex<()>,
database: D,
shared_live_access_listener: ActualLockListener, // only for metrics of "raw access"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hereby starting the usual naming discussion # 1 😄
my preference: "access direct" (fn accessDirect() + direct_access_listener here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "direct access" name itself is perfect 👍

This also calls for rename of "exclusive_live" (because it also contains "live"). I went for "cooperative locking", because that's really what it is - all honest callers must use it (even though they could "access directly") in order to be safe.

///
/// This method should be used by clients who need to coordinate an exclusive read+write access
/// to a known mutable region of the database.
// TODO(future enhancement): we really should have a set of `RwLock`s for independent regions?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why don't we also create a backlog task for that?

Copy link
Contributor Author

@jakrawcz-rdx jakrawcz-rdx Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.new_state_computer_config.execution_configurator(true), /* No fees for protocol updates */
database.deref(),
// The costing and logging parameters (of the Engine) are not really used for flash
// transactions; let's still pass sane values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(and also at some point these might be some non-flash transactions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point; it called for a conditional TODO here 👍

/// snapshots.
///
/// The library we use (a thin C wrapper, really) does not introduce this trivial and natural trait
/// itself, while we desperately need it to abstract the DB-reading code from the actual source of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps something we could contribute upstream (and possibly open a PR)? just a suggestion :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "partial" trait that you see in this PR does not capture all common methods (between DB and snapshot) - just the ones we need. So the work we could contribute is far from complete. But if one day I have too much time (right...), I'll try to remember about this one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah gotcha 👍 I thought it was close to complete

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not rocket science, but yeah, it would need exploring what other methods should belong to such trait (and I know there are a few)

db,
rocks: ReadonlyRocks {
wrapped: DirectRocks { db },
},
}
}

pub fn try_catchup_with_primary(&self) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since we're going the traits / type safety route would it make sense to introduce SecondaryRocks and move it there? not that it matters much, but just for consistency

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion; I also got rid of the non-sensical ReadonlyRocks struct 👍

pub fn execution_configurator(
&self,
no_fees: bool,
engine_trace: bool,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note from author)

This is a drive-by: previously the entire LoggingConfig coming from java was effectively ignored.

*write_mempool = PriorityMempool::new(MempoolConfig::default(), metrics_registry);
drop(write_mempool);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note from author)

This is a drive-by delete of a (presumably) debug leftover.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note from author of this snippet)

Yes, that was a debug leftover. Thanks 👍

@@ -30,9 +30,7 @@ pub(crate) async fn handle_lts_transaction_status(
pending_transaction_result_cache.peek_all_known_payloads_for_intent(&intent_hash);
drop(pending_transaction_result_cache);

// TODO(locks): consider this (and other) usages of DB "read current" lock. *Carefully* migrate
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That TODO was mostly about avoiding the read lock, and now it is easiest to achieve by snapshots, so yes - most of these places were migrated to snapshots.
(although not all - e.g. for vertex store, after looking at how it is used, I went for direct access. The same for StateManager's boot-up. In this review, I would like to ask you for careful re-analysis of all "direct access" cases, whether they are really safe, because who knows what knowledge I am missing.)

pub struct DbLock<D> {
exclusive_live_marker_lock: Mutex<()>,
database: D,
shared_live_access_listener: ActualLockListener, // only for metrics of "raw access"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "direct access" name itself is perfect 👍

This also calls for rename of "exclusive_live" (because it also contains "live"). I went for "cooperative locking", because that's really what it is - all honest callers must use it (even though they could "access directly") in order to be safe.

self.new_state_computer_config.execution_configurator(true), /* No fees for protocol updates */
database.deref(),
// The costing and logging parameters (of the Engine) are not really used for flash
// transactions; let's still pass sane values.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point; it called for a conditional TODO here 👍

/// snapshots.
///
/// The library we use (a thin C wrapper, really) does not introduce this trivial and natural trait
/// itself, while we desperately need it to abstract the DB-reading code from the actual source of
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "partial" trait that you see in this PR does not capture all common methods (between DB and snapshot) - just the ones we need. So the work we could contribute is far from complete. But if one day I have too much time (right...), I'll try to remember about this one.

db,
rocks: ReadonlyRocks {
wrapped: DirectRocks { db },
},
}
}

pub fn try_catchup_with_primary(&self) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good suggestion; I also got rid of the non-sensical ReadonlyRocks struct 👍

///
/// This method should be used by clients who need to coordinate an exclusive read+write access
/// to a known mutable region of the database.
// TODO(future enhancement): we really should have a set of `RwLock`s for independent regions?
Copy link
Contributor Author

@jakrawcz-rdx jakrawcz-rdx Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jakrawcz-rdx
Copy link
Contributor Author

personally I was waiting here for the performance tests - they are now complete (at https://radixdlt.atlassian.net/browse/NODE-593) and they confirm that there is some improvement in Core API response times (with no degradation to consensus TPS).

@LukasGasior1 are you happy with the code after addressing the comments? plz ✅

Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Copy link
Contributor

@LukasGasior1 LukasGasior1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jakrawcz-rdx jakrawcz-rdx merged commit 978f9ec into develop Feb 16, 2024
20 checks passed
@jakrawcz-rdx jakrawcz-rdx deleted the refactor/use_rocksdb_snapshots branch February 16, 2024 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants