WAL Recovery Modes

Introduction

Every application is unique and requires a certain consistency guarantee from RocksDB. Every committed record in RocksDB is persisted. The uncommitted records are recorded in write-ahead-log (WAL). When RocksDB is shutdown cleanly, all uncommitted data is committed before shutdown and hence consistency is always guaranteed. When RocksDB is killed or the machine is restarted, on restart RocksDB needs to restore itself to a consistent state.

One of the important recovery operations is to replay uncommitted records in WAL. The different WAL recovery modes define the behavior of WAL replay.

WAL Recovery Modes

kTolerateCorruptedTailRecords

In this mode, the WAL replay ignores any error discovered at the tail of the log. The rational is that, on unclean shutdown there can be incomplete writes at the tail of the log. This is a heuristic mode, the system cannot differentiate between corruption at the tail of the log and incomplete write. Any other IO error, will be considered as data corruption.

This mode is acceptable for most application since this provides a reasonable tradeoff between starting RocksDB after an unclean shutdown and consistency.

kAbsoluteConsistency

In this mode, any IO error during WAL replay is considered as data corruption. This mode is ideal for application that cannot afford to loose even a single record and/or have other means of recovering uncommitted data.

kPointInTimeRecovery

In this mode, the WAL replay is stopped after encountering an IO error. The system is recovered to a point-in-time where it is consistent. This is ideal for systems with replicas. Data from another replica can be used to replay past the "point-in-time" where the system is recovered to. (This is the default as of version 6.6.)

kSkipAnyCorruptedRecords

In this mode, any IO error while reading the log is ignored. The system tries to recover as much data as possible. This is ideal for disaster recovery.

Contents

RocksDB Wiki
Overview
RocksDB FAQ
Terminology
Requirements
Contributors' Guide
Release Methodology
RocksDB Users and Use Cases
RocksDB Public Communication and Information Channels
Basic Operations
- Iterator
- Prefix seek
- SeekForPrev
- Tailing Iterator
- Compaction Filter
- Multi Column Family Iterator
- Read-Modify-Write (Merge) Operator
- Column Families
- Creating and Ingesting SST files
- Single Delete
- Low Priority Write
- Time to Live (TTL) Support
- Transactions
- Snapshot
- DeleteRange
- Atomic flush
- Read-only and Secondary instances
- Approximate Size
- User-defined Timestamp
- Wide Columns
- BlobDB
- Online Verification
Options
- Setup Options and Basic Tuning
- Option String and Option Map
- RocksDB Options File
MemTable
Journal
- Write Ahead Log (WAL)
- MANIFEST
- Track WAL in MANIFEST
Cache
- Block Cache
- SecondaryCache (Experimental)
Write Buffer Manager
Compaction
- Leveled Compaction
- Universal compaction style
- FIFO compaction style
- Manual Compaction
- Subcompaction
- Choose Level Compaction Files
- Managing Disk Space Utilization
- Trivial Move Compaction
- Remote Compaction (Experimental)
SST File Formats
- Block-based Table Format
- PlainTable Format
- CuckooTable Format
- Index Block Format
- Bloom Filter
- Data Block Hash Index
IO
- Rate Limiter
- SST File Manager
- Direct I/O
Compression
- Dictionary Compression
Full File Checksum and Checksum Handoff
Background Error Handling
Huge Page TLB Support
Tiered Storage (Experimental)
Logging and Monitoring
- Logger
- Statistics
- Compaction Stats and DB Status
- Perf Context and IO Stats Context
- EventListener
Known Issues
Troubleshooting Guide
Tests
- Stress Test
- Fuzzing
- Benchmarking
Tools / Utilities
- Administration and Data Access Tool
- How to Backup RocksDB?
- Replication Helpers
- Checkpoints
- How to persist in-memory RocksDB database
- Third-party language bindings
- RocksDB Trace, Replay, Analyzer, and Workload Generation
- Block cache analysis and simulation tools
- IO Tracer and Parser
Implementation Details
- Delete Stale Files
- Partitioned Index/Filters
- WritePrepared-Transactions
- WriteUnprepared-Transactions
- How we keep track of live SST files
- How we index SST
- Merge Operator Implementation
- RocksDB Repairer
- Write Batch With Index
- Two Phase Commit
- Iterator's Implementation
- Simulation Cache
- [To Be Deprecated] Persistent Read Cache
- DeleteRange Implementation
- unordered_write
Extending RocksDB
- RocksDB Configurable Objects
- The Customizable Class
- Object Registry
RocksJava
- RocksJava Basics
- Logging in RocksJava
- JNI Debugging
- RocksJava API TODO
- RocksJava Performance on Flash Storage
- Tuning RocksDB from Java
Lua
- Lua CompactionFilter
Performance
- Performance Benchmarks
- In Memory Workload Performance
- Read-Modify-Write (Merge) Performance
- Delete A Range Of Keys
- Write Stalls
- Pipelined Write
- MultiGet Performance
- Tuning Guide
- Memory usage in RocksDB
- Speed-Up DB Open
- Implement Queue Service Using RocksDB
- Asynchronous IO
- Off-peak in RocksDB
Projects Being Developed
Misc
- Building on Windows
- Developing with an IDE
- Open Projects
- Talks
- Publication
- Features Not in LevelDB
- How to ask a performance-related question?
- Articles about Rocks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WAL Recovery Modes

Introduction

WAL Recovery Modes

kTolerateCorruptedTailRecords

kAbsoluteConsistency

kPointInTimeRecovery

kSkipAnyCorruptedRecords

Clone this wiki locally