Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BP-44: Ledger storage metrics enhancement #2862

Merged
merged 2 commits into from
Nov 30, 2021

Conversation

Vanlightly
Copy link
Contributor

Motivation

See BP-44 for full motivation

Changes

Various enhancements related to DbLedgerStorage:

  • All DbLedgerStorage stats now report their ledgerDir.
  • Read cache hit/misses now differentiated from write cache.
  • Read/write cache hit/misses now counters, not OpStatsLoggers.
    The OpStatsLoggers are relatively expensive and the counter
    values are most important here.
  • Stats where thread info useful are now thread-scoped. In
    general the focus is on time based metrics for operations
    carried out by the read/write thread pools.
  • Time spent counters on sub-operations for reads (entry log,
    locations index, readahead).
  • Flush time started moved to after lock to avoid 200% time
    utilization calculations. Can reach 200% because one thread
    is busy flushing and another busy waiting on the lock.
  • Time spent counters on sub-operations for flushes (entry log,
    locations index, ledgers index).
  • DbStorage thread reports time utilization.
  • SyncThread reports time utilization.

Some new gauges that report configuration values which are useful
for utilization calculations in dashboards/alerts:

  • Write cache max size.
  • Number of ledger dirs.
  • Readahead batch size.

Master Issue: #2834

@Vanlightly
Copy link
Contributor Author

rerun failure checks

1 similar comment
@Vanlightly
Copy link
Contributor Author

rerun failure checks

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Vanlightly Vanlightly force-pushed the bp-44-ledger-storage branch 2 times, most recently from 56b169d to d20790a Compare November 19, 2021 10:34
Various enhancements related to DbLedgerStorage:
- All DbLedgerStorage stats now report their ledgerDir.
- Read cache hit/misses now differentiated from write cache.
- Read/write cache hit/misses now counters, not OpStatsLoggers.
  The OpStatsLoggers are relatively expensive and the counter
  values are most important here.
- Stats where thread info useful are now thread-scoped. In
  general the focus is on time based metrics for operations
  carried out by the read/write thread pools.
- Time spent counters on sub-operations for reads (entry log,
  locations index, readahead).
- Flush time started moved to after lock to avoid 200% time
  utilization calculations. Can reach 200% because one thread
  is busy flusing and another busy waiting on the lock.
- Time spent counters on sub-operations for flushes (entry log,
  locations index, ledgers index).
- DbStorage thread reports time utilization.
- SyncThread reports time utilization.

Some new gauges that report configuration values which are useful
for utilization calculations in dashboards/alerts:
- Write cache max size.
- Number of ledger dirs.
- Readahead batch size.
Copy link
Contributor

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@eolivelli eolivelli merged commit 4576ec9 into apache:master Nov 30, 2021
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
* Ledger storage metrics enhancement

Various enhancements related to DbLedgerStorage:
- All DbLedgerStorage stats now report their ledgerDir.
- Read cache hit/misses now differentiated from write cache.
- Read/write cache hit/misses now counters, not OpStatsLoggers.
  The OpStatsLoggers are relatively expensive and the counter
  values are most important here.
- Stats where thread info useful are now thread-scoped. In
  general the focus is on time based metrics for operations
  carried out by the read/write thread pools.
- Time spent counters on sub-operations for reads (entry log,
  locations index, readahead).
- Flush time started moved to after lock to avoid 200% time
  utilization calculations. Can reach 200% because one thread
  is busy flusing and another busy waiting on the lock.
- Time spent counters on sub-operations for flushes (entry log,
  locations index, ledgers index).
- DbStorage thread reports time utilization.
- SyncThread reports time utilization.

Some new gauges that report configuration values which are useful
for utilization calculations in dashboards/alerts:
- Write cache max size.
- Number of ledger dirs.
- Readahead batch size.

* Fix checkstyle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants