Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access SST full file checksum via RocksDB#getLiveFilesMetadata #11770

Closed
wants to merge 3 commits into from

Conversation

npepinpe
Copy link
Contributor

@npepinpe npepinpe commented Aug 28, 2023

Description

This PR passes along the native LiveFileMetaData#file_checksum field from the C++ class to the Java API as a copied byte array. If there is no file checksum generator factory set beforehand, then the array will empty. Please advise if you'd rather it be null - an empty array means one extra allocation, but it avoids possible null pointer exceptions.

Note
This functionality complements but does not supersede #11736

It's outside the scope here to add support for Java based FileChecksumGenFactory implementations. As a workaround, users can already use the built-in one by creating their initial DBOptions via properties:

final Properties props = new Properties();
props.put("file_checksum_gen_factory", "FileChecksumGenCrc32cFactory");

try (final DBOptions dbOptions = DBOptions.getDBOptionsFromProps(props);
     final ColumnFamilyOptions cfOptions = new ColumnFamilyOptions();
     final Options options = new Options(dbOptions, cfOptions).setCreateIfMissing(true)) {
// do stuff
}

I wanted to add a better test, but unfortunately there's no available CRC32C implementation available in Java 8 without adding a dependency or adding a JNI helper for RocksDB's own implementation (or bumping the minimum version for tests to Java 9). That said, I understand the test is rather poor, so happy to change it to whatever you'd like.

Context

To give some context, we replicate RocksDB checkpoints to other nodes. Part of this is verifying the integrity of each file during replication. With a large enough RocksDB, computing the checksum ourselves is prohibitively expensive. Since SST files comprise the bulk of the data, we'd much rather delegate this to RocksDB on file write, and read it back after to compare.

It's likely we will provide a follow up to read the file checksum list directly from the manifest without having to open the DB, but this was the easiest first step to get it working for us.

@npepinpe npepinpe force-pushed the np-java-full-file-checksum branch 2 times, most recently from 72df950 to cdba2e3 Compare August 28, 2023 18:57
@npepinpe
Copy link
Contributor Author

A ldb test failed; it's unclear to me how this is related to my changes. Let me know if there's anything I can do!

@ChrisKujawa
Copy link

@adamretter or @ajkr due you have time to review or give feedback on this and #11772?

Updates the `SstFileMetaData` class and its native transform method to
pass in the full file checksum if present. If absent, a null byte array
is passed.

This gives minimum support to the Full File Checksum feature for the
Java API; users can enable the feature via
`DBOptions#getOptionsFromProps`, and read the checksums via
`RocksDB#getLiveFilesMetadata`.
@npepinpe npepinpe force-pushed the np-java-full-file-checksum branch from cdba2e3 to af937da Compare December 22, 2023 12:57
@facebook-github-bot
Copy link
Contributor

@ajkr has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@facebook-github-bot
Copy link
Contributor

@ajkr merged this pull request in 5b073a7.

facebook-github-bot pushed a commit that referenced this pull request Apr 5, 2024
Summary:
#12466 reported a bug when `RocksDB.getColumnFamilyMetaData()` is called on an existing database(With files stored on disk). As neilramaswamy mentioned, this was caused by #11770 where the signature of `SstFileMetaData` constructor was changed, but JNI code wasn't updated.

This PR fix JNI code, and also properly populate `fileChecksum` on `SstFileMetaData`.

Pull Request resolved: #12474

Reviewed By: jowlyzhang

Differential Revision: D55811808

Pulled By: ajkr

fbshipit-source-id: 2ab156f41eaf4a4f30c49e6df421b61e8451230e
ajkr pushed a commit that referenced this pull request Apr 9, 2024
Summary:
#12466 reported a bug when `RocksDB.getColumnFamilyMetaData()` is called on an existing database(With files stored on disk). As neilramaswamy mentioned, this was caused by #11770 where the signature of `SstFileMetaData` constructor was changed, but JNI code wasn't updated.

This PR fix JNI code, and also properly populate `fileChecksum` on `SstFileMetaData`.

Pull Request resolved: #12474

Reviewed By: jowlyzhang

Differential Revision: D55811808

Pulled By: ajkr

fbshipit-source-id: 2ab156f41eaf4a4f30c49e6df421b61e8451230e
ajkr pushed a commit that referenced this pull request Apr 9, 2024
Summary:
#12466 reported a bug when `RocksDB.getColumnFamilyMetaData()` is called on an existing database(With files stored on disk). As neilramaswamy mentioned, this was caused by #11770 where the signature of `SstFileMetaData` constructor was changed, but JNI code wasn't updated.

This PR fix JNI code, and also properly populate `fileChecksum` on `SstFileMetaData`.

Pull Request resolved: #12474

Reviewed By: jowlyzhang

Differential Revision: D55811808

Pulled By: ajkr

fbshipit-source-id: 2ab156f41eaf4a4f30c49e6df421b61e8451230e
ajkr pushed a commit that referenced this pull request Apr 22, 2024
Summary:
#12466 reported a bug when `RocksDB.getColumnFamilyMetaData()` is called on an existing database(With files stored on disk). As neilramaswamy mentioned, this was caused by #11770 where the signature of `SstFileMetaData` constructor was changed, but JNI code wasn't updated.

This PR fix JNI code, and also properly populate `fileChecksum` on `SstFileMetaData`.

Pull Request resolved: #12474

Reviewed By: jowlyzhang

Differential Revision: D55811808

Pulled By: ajkr

fbshipit-source-id: 2ab156f41eaf4a4f30c49e6df421b61e8451230e
mayuehappy pushed a commit to mayuehappy/frocksdb that referenced this pull request Jul 22, 2024
Summary:
facebook/rocksdb#12466 reported a bug when `RocksDB.getColumnFamilyMetaData()` is called on an existing database(With files stored on disk). As neilramaswamy mentioned, this was caused by facebook/rocksdb#11770 where the signature of `SstFileMetaData` constructor was changed, but JNI code wasn't updated.

This PR fix JNI code, and also properly populate `fileChecksum` on `SstFileMetaData`.

Pull Request resolved: facebook/rocksdb#12474

Reviewed By: jowlyzhang

Differential Revision: D55811808

Pulled By: ajkr

fbshipit-source-id: 2ab156f41eaf4a4f30c49e6df421b61e8451230e

(cherry picked from commit a8035ebc0b22f079a447bdc6b0aeeb2f1cf09d47)
mayuehappy pushed a commit to mayuehappy/frocksdb that referenced this pull request Jul 22, 2024
Summary:
facebook/rocksdb#12466 reported a bug when `RocksDB.getColumnFamilyMetaData()` is called on an existing database(With files stored on disk). As neilramaswamy mentioned, this was caused by facebook/rocksdb#11770 where the signature of `SstFileMetaData` constructor was changed, but JNI code wasn't updated.

This PR fix JNI code, and also properly populate `fileChecksum` on `SstFileMetaData`.

Pull Request resolved: facebook/rocksdb#12474

Reviewed By: jowlyzhang

Differential Revision: D55811808

Pulled By: ajkr

fbshipit-source-id: 2ab156f41eaf4a4f30c49e6df421b61e8451230e

(cherry picked from commit a8035ebc0b22f079a447bdc6b0aeeb2f1cf09d47)
mayuehappy pushed a commit to mayuehappy/frocksdb that referenced this pull request Jul 22, 2024
Summary:
**Description**

This PR passes along the native `LiveFileMetaData#file_checksum` field from the C++ class to the Java API as a copied byte array. If there is no file checksum generator factory set beforehand, then the array will empty. Please advise if you'd rather it be null - an empty array means one extra allocation, but it avoids possible null pointer exceptions.

> **Note**
> This functionality complements but does not supersede facebook/rocksdb#11736

It's outside the scope here to add support for Java based `FileChecksumGenFactory` implementations. As a workaround, users can already use the built-in one by creating their initial `DBOptions` via properties:

```java
final Properties props = new Properties();
props.put("file_checksum_gen_factory", "FileChecksumGenCrc32cFactory");

try (final DBOptions dbOptions = DBOptions.getDBOptionsFromProps(props);
     final ColumnFamilyOptions cfOptions = new ColumnFamilyOptions();
     final Options options = new Options(dbOptions, cfOptions).setCreateIfMissing(true)) {
// do stuff
}
```

I wanted to add a better test, but unfortunately there's no available CRC32C implementation available in Java 8 without adding a dependency or adding a JNI helper for RocksDB's own implementation (or bumping the minimum version for tests to Java 9). That said, I understand the test is rather poor, so happy to change it to whatever you'd like.

**Context**

To give some context, we replicate RocksDB checkpoints to other nodes. Part of this is verifying the integrity of each file during replication. With a large enough RocksDB, computing the checksum ourselves is prohibitively expensive. Since SST files comprise the bulk of the data, we'd much rather delegate this to RocksDB on file write, and read it back after to compare.

It's likely we will provide a follow up to read the file checksum list directly from the manifest without having to open the DB, but this was the easiest first step to get it working for us.

Pull Request resolved: facebook/rocksdb#11770

Reviewed By: hx235

Differential Revision: D52420729

Pulled By: ajkr

fbshipit-source-id: a873de35a48aaf315e125733091cd221a97b9073
(cherry picked from commit 5b073a7)
mayuehappy pushed a commit to mayuehappy/frocksdb that referenced this pull request Jul 22, 2024
Summary:
facebook/rocksdb#12466 reported a bug when `RocksDB.getColumnFamilyMetaData()` is called on an existing database(With files stored on disk). As neilramaswamy mentioned, this was caused by facebook/rocksdb#11770 where the signature of `SstFileMetaData` constructor was changed, but JNI code wasn't updated.

This PR fix JNI code, and also properly populate `fileChecksum` on `SstFileMetaData`.

Pull Request resolved: facebook/rocksdb#12474

Reviewed By: jowlyzhang

Differential Revision: D55811808

Pulled By: ajkr

fbshipit-source-id: 2ab156f41eaf4a4f30c49e6df421b61e8451230e
(cherry picked from commit a8035ebc0b22f079a447bdc6b0aeeb2f1cf09d47)
rkhachatryan pushed a commit to ververica/frocksdb that referenced this pull request Jul 23, 2024
Summary:
**Description**

This PR passes along the native `LiveFileMetaData#file_checksum` field from the C++ class to the Java API as a copied byte array. If there is no file checksum generator factory set beforehand, then the array will empty. Please advise if you'd rather it be null - an empty array means one extra allocation, but it avoids possible null pointer exceptions.

> **Note**
> This functionality complements but does not supersede facebook/rocksdb#11736

It's outside the scope here to add support for Java based `FileChecksumGenFactory` implementations. As a workaround, users can already use the built-in one by creating their initial `DBOptions` via properties:

```java
final Properties props = new Properties();
props.put("file_checksum_gen_factory", "FileChecksumGenCrc32cFactory");

try (final DBOptions dbOptions = DBOptions.getDBOptionsFromProps(props);
     final ColumnFamilyOptions cfOptions = new ColumnFamilyOptions();
     final Options options = new Options(dbOptions, cfOptions).setCreateIfMissing(true)) {
// do stuff
}
```

I wanted to add a better test, but unfortunately there's no available CRC32C implementation available in Java 8 without adding a dependency or adding a JNI helper for RocksDB's own implementation (or bumping the minimum version for tests to Java 9). That said, I understand the test is rather poor, so happy to change it to whatever you'd like.

**Context**

To give some context, we replicate RocksDB checkpoints to other nodes. Part of this is verifying the integrity of each file during replication. With a large enough RocksDB, computing the checksum ourselves is prohibitively expensive. Since SST files comprise the bulk of the data, we'd much rather delegate this to RocksDB on file write, and read it back after to compare.

It's likely we will provide a follow up to read the file checksum list directly from the manifest without having to open the DB, but this was the easiest first step to get it working for us.

Pull Request resolved: facebook/rocksdb#11770

Reviewed By: hx235

Differential Revision: D52420729

Pulled By: ajkr

fbshipit-source-id: a873de35a48aaf315e125733091cd221a97b9073
(cherry picked from commit 5b073a7)
rkhachatryan pushed a commit to ververica/frocksdb that referenced this pull request Jul 23, 2024
Summary:
facebook/rocksdb#12466 reported a bug when `RocksDB.getColumnFamilyMetaData()` is called on an existing database(With files stored on disk). As neilramaswamy mentioned, this was caused by facebook/rocksdb#11770 where the signature of `SstFileMetaData` constructor was changed, but JNI code wasn't updated.

This PR fix JNI code, and also properly populate `fileChecksum` on `SstFileMetaData`.

Pull Request resolved: facebook/rocksdb#12474

Reviewed By: jowlyzhang

Differential Revision: D55811808

Pulled By: ajkr

fbshipit-source-id: 2ab156f41eaf4a4f30c49e6df421b61e8451230e
(cherry picked from commit a8035ebc0b22f079a447bdc6b0aeeb2f1cf09d47)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants