Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement atomic append operation #843

Open
ijsong opened this issue Jul 21, 2024 · 0 comments
Open

Implement atomic append operation #843

ijsong opened this issue Jul 21, 2024 · 0 comments
Assignees

Comments

@ijsong
Copy link
Member

ijsong commented Jul 21, 2024

Current Situation

Currently, Varlog's Append API writes payloads to disk by dividing them into batchlets and processing each independently. This can result in partial success/failure scenarios where some batchlets are successfully written while others fail. This leads to two main issues:

  1. Users neither expect nor desire partial success/failure when appending their payloads.
  2. It's challenging for Varlog to manage and communicate these partial success/failure states.

Proposed Solution

Implement an atomic append operation for the entire payload. Key changes include:

  1. Remove the concept of batchlets: Write all batches in the user's payload to disk at once.
  2. Utilize the existing atomic batch write functionality in Varlog's Storage layer.
  3. Introduce batch length limit settings:
    • Global setting (applied to all Varlog topics)
    • Topic-specific setting (if necessary)
  4. Remove the Error field from the AppendResult message type.

Expected Benefits

  1. Simplified user experience: Users can rely on a clear success/failure status for their entire payload.
  2. Simplified system architecture: Removing the batchlet concept reduces system complexity.
  3. Simplified error handling and recovery: Atomic operations make error scenarios more straightforward to handle and recover from.
  4. Potential performance improvement: Eliminating the step of dividing into batchlets may reduce overall processing time.

Challenges and Next Steps

  1. Handling large payloads

    • Challenge: Potential increase in memory usage
    • Action: Analyze memory usage and research optimization strategies
  2. Batch length limit settings

    • Challenge: Changes in user experience and determining optimal values
    • Action: Research and decide on optimal values for batch length limits
    • Action: Implement global and topic-specific settings
  3. Maintaining backward compatibility

    • Challenge: Compatibility issues with existing systems
    • Action: Develop migration strategy and plan for phased implementation
  4. Performance impact assessment

    • Challenge: Impact of atomic writes on huge batches
    • Action: Implement prototype and conduct performance tests under various conditions
  5. API and client library updates

    • Action: Modify API response structure (remove Error field)
    • Action: Update client libraries and plan new version release
  6. Documentation and communication

    • Action: Update API documentation
    • Action: Create and distribute user guide for the changes

Discussion Points

  1. What should be the appropriate default value for the batch length limit?
  2. Are there specific use cases that require topic-specific settings?
  3. How can we minimize the impact of this change on systems currently using Varlog?
  4. Are there additional methods to optimize the performance of atomic batch writes?

Testing Plan

  • Develop unit tests for the new atomic append operation
  • Conduct integration tests to ensure compatibility with existing Varlog components
  • Perform stress tests with various payload sizes to assess performance and stability
@ijsong ijsong self-assigned this Jul 21, 2024
@ijsong ijsong changed the title Implement Atomic Append Operation for Varlog Implement Atomic Append Operation Aug 20, 2024
@ijsong ijsong changed the title Implement Atomic Append Operation Implement atomic append operation Oct 18, 2024
ijsong added a commit that referenced this issue Jan 21, 2025
This PR modifies replicate task pool implementation for future refactoring that
will resolve #843.

- Deprecated `newReplicateTask` and `release` functions in favor of new
  implementations.
- Added `replicateTaskPool` struct for simplified pool management.
- Updated tests to use the new functions and ensure backward compatibility.
ijsong added a commit that referenced this issue Feb 3, 2025
This PR modifies replicate task pool implementation for future refactoring that
will resolve #843.

- Deprecated `newReplicateTask` and `release` functions in favor of new
  implementations.
- Added `replicateTaskPool` struct for simplified pool management.
- Updated tests to use the new functions and ensure backward compatibility.
ijsong added a commit that referenced this issue Feb 3, 2025
This PR modifies replicate task pool implementation for future refactoring that
will resolve #843.

- Deprecated `newReplicateTask` and `release` functions in favor of new
  implementations.
- Added `replicateTaskPool` struct for simplified pool management.
- Updated tests to use the new functions and ensure backward compatibility.
ijsong added a commit that referenced this issue Feb 7, 2025
This PR modifies replicate task pool implementation for future refactoring that
will resolve #843.

- Deprecated `newReplicateTask` and `release` functions in favor of new
  implementations.
- Added `replicateTaskPool` struct for simplified pool management.
- Updated tests to use the new functions and ensure backward compatibility.
ijsong added a commit that referenced this issue Feb 7, 2025
This commit introduces a commit wait task that represents an entire append
batch, rather than individual log entries. This is a crucial step towards
implementing atomic append operations.

Previously, a separate commit wait task was created for each log entry in an
append batch. This approach made it difficult to handle the batch atomically, as
commit wait tasks were processed individually.

With this change, a single commit wait task is created for the entire batch.
This allows the committer to process the batch atomically, ensuring that either
all log entries in the batch are committed or none are.

This change also brings a slight performance improvement, as the committer now
needs to process fewer tasks. However, no specific benchmarks have been
performed to measure the exact gain.

The client API does not yet support atomic append operations, and partial
success/failure is still allowed. This will be addressed in a future update.

This change is a major step towards resolving #843, which aims to implement
atomic append operations.
ijsong added a commit that referenced this issue Feb 7, 2025
This commit deprecates the error field in AppendResult as a step towards
implementing atomic append operations while maintaining backward compatibility.

Previously, the Append RPC could return partial success/failure results, with
some log entries in a batch being appended successfully and others failing. This
was indicated by the error field in AppendResult.

To support atomic append operations without breaking existing clients, the error
field is deprecated instead of being removed completely. This change allows
clients to continue using the error field for now, but they should be aware that
it will be removed in a future release. Clients should start migrating to the
new atomic append API as soon as possible.

The following changes were made to deprecate the error field:

- The error field in AppendResult is marked as deprecated in the protobuf
  definition.
- The Append RPC implementation no longer populates the error field.

The next step is to implement atomic append operations in the client API. This
will enable clients to append multiple log entries atomically, which will help
to resolve #843.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant