Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: added sharding_config field in DocumentOutputConfig.GcsOutputConfig in document_io.proto #3735

Merged
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,25 @@ message BatchDocumentsInputConfig {
message DocumentOutputConfig {
// The configuration used when outputting documents.
message GcsOutputConfig {
// The sharding config for the output document.
message ShardingConfig {
// The number of pages per shard.
int32 pages_per_shard = 1;

// The number of overlapping pages between consecutive shards.
int32 pages_overlap = 2;
}

// The Cloud Storage uri (a directory) of the output.
string gcs_uri = 1;

// Specifies which fields to include in the output documents.
// Only supports top level document and pages field so it must be in the
// form of `{document_field_name}` or `pages.{page_field_name}`.
google.protobuf.FieldMask field_mask = 2;

// Specifies the sharding config for the output document.
ShardingConfig sharding_config = 3;
}

// The destination of the results.
Expand Down