-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-35775: [Go][Parquet] Allow key value file metadata to be written after writing row groups #37786
Conversation
|
go/parquet/file/file_writer.go
Outdated
props *parquet.WriterProperties | ||
rowGroups int | ||
nrows int | ||
metadata metadata.FileMetaDataBuilder | ||
fileEncryptor encryption.FileEncryptor | ||
rowGroupWriter *rowGroupWriter | ||
initialKeyValueMetadata metadata.KeyValueMetadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we just initialize the FileMetaDataBuilder
with the initialKeyValueMetadata, we don't actually need to store it after we initialize things.
We could probably modify the WriteOption
and how the config works by introducing a config struct like type config struct { props *parquet.WriterProperties; initialKVMetadata metadata.KeyValueMetadata }
and have WriteOption
modify the config struct to initialize the writer. This would let us remove the initialKeyValueMetadata
member entirely from the Writer
.
It's not necessary (things work just fine currently and I think this is good) but it's an idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zeroshade - I'm happy to make the change to the WriteOption
interface if you think that makes sense. I wasn't sure how much of a breaking change would be acceptable.
I could add tschaub@9e5f7c1 to this branch if that sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it, let's add that to this branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit added. It looks like the failing Dev workflow is unrelated.
f3214b4
to
19d3cd7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks, @zeroshade |
After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit 6eb08dd. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them. |
…tten after writing row groups (apache#37786) ### Rationale for this change The key value file metadata may include information generated while writing row groups. Currently, it is not possible to set the key value file metadata after creating a writer. With the changes in this branch, key value pairs may be added any time before closing the writer. ### What changes are included in this PR? This branch adds a `writer.AppendKeyValueMetadata(key, value)` method to the parquet `file.Writer` and to the `pqarrow.FileWriter`. ### Are these changes tested? Tests are added for the new functionality. ### Are there any user-facing changes? The `KeyValueMetadata` field on the parquet `file.Writer` has been renamed to `initialKeyValueMetadata`. This is a breaking change. Although the field was exported, setting it did not result in new key value metadata being written. Instead, it represented the initial key value metadata if the writer was passed the `WithWriteMetadata` write option. The `WithWriteMetadata` option can still be used to provide the initial key value metadata values. In addition, the `AppendKeyValueMetadata` method can be called to add key value pairs after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. Previously, setting this field value had no effect. **This PR includes breaking changes to public APIs.** The `KeyValueMetadata` field is no longer exported from the parquet `file.Writer` struct. Use the `WithWriteMetadata` writer option to set key value metadata when creating a writer or use the `AppendKeyValueMetadata` method to add key value metadata after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. * Closes: apache#35775 Authored-by: Tim Schaub <tim@planet.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
…tten after writing row groups (apache#37786) ### Rationale for this change The key value file metadata may include information generated while writing row groups. Currently, it is not possible to set the key value file metadata after creating a writer. With the changes in this branch, key value pairs may be added any time before closing the writer. ### What changes are included in this PR? This branch adds a `writer.AppendKeyValueMetadata(key, value)` method to the parquet `file.Writer` and to the `pqarrow.FileWriter`. ### Are these changes tested? Tests are added for the new functionality. ### Are there any user-facing changes? The `KeyValueMetadata` field on the parquet `file.Writer` has been renamed to `initialKeyValueMetadata`. This is a breaking change. Although the field was exported, setting it did not result in new key value metadata being written. Instead, it represented the initial key value metadata if the writer was passed the `WithWriteMetadata` write option. The `WithWriteMetadata` option can still be used to provide the initial key value metadata values. In addition, the `AppendKeyValueMetadata` method can be called to add key value pairs after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. Previously, setting this field value had no effect. **This PR includes breaking changes to public APIs.** The `KeyValueMetadata` field is no longer exported from the parquet `file.Writer` struct. Use the `WithWriteMetadata` writer option to set key value metadata when creating a writer or use the `AppendKeyValueMetadata` method to add key value metadata after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. * Closes: apache#35775 Authored-by: Tim Schaub <tim@planet.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
…tten after writing row groups (apache#37786) ### Rationale for this change The key value file metadata may include information generated while writing row groups. Currently, it is not possible to set the key value file metadata after creating a writer. With the changes in this branch, key value pairs may be added any time before closing the writer. ### What changes are included in this PR? This branch adds a `writer.AppendKeyValueMetadata(key, value)` method to the parquet `file.Writer` and to the `pqarrow.FileWriter`. ### Are these changes tested? Tests are added for the new functionality. ### Are there any user-facing changes? The `KeyValueMetadata` field on the parquet `file.Writer` has been renamed to `initialKeyValueMetadata`. This is a breaking change. Although the field was exported, setting it did not result in new key value metadata being written. Instead, it represented the initial key value metadata if the writer was passed the `WithWriteMetadata` write option. The `WithWriteMetadata` option can still be used to provide the initial key value metadata values. In addition, the `AppendKeyValueMetadata` method can be called to add key value pairs after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. Previously, setting this field value had no effect. **This PR includes breaking changes to public APIs.** The `KeyValueMetadata` field is no longer exported from the parquet `file.Writer` struct. Use the `WithWriteMetadata` writer option to set key value metadata when creating a writer or use the `AppendKeyValueMetadata` method to add key value metadata after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. * Closes: apache#35775 Authored-by: Tim Schaub <tim@planet.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
…tten after writing row groups (apache#37786) ### Rationale for this change The key value file metadata may include information generated while writing row groups. Currently, it is not possible to set the key value file metadata after creating a writer. With the changes in this branch, key value pairs may be added any time before closing the writer. ### What changes are included in this PR? This branch adds a `writer.AppendKeyValueMetadata(key, value)` method to the parquet `file.Writer` and to the `pqarrow.FileWriter`. ### Are these changes tested? Tests are added for the new functionality. ### Are there any user-facing changes? The `KeyValueMetadata` field on the parquet `file.Writer` has been renamed to `initialKeyValueMetadata`. This is a breaking change. Although the field was exported, setting it did not result in new key value metadata being written. Instead, it represented the initial key value metadata if the writer was passed the `WithWriteMetadata` write option. The `WithWriteMetadata` option can still be used to provide the initial key value metadata values. In addition, the `AppendKeyValueMetadata` method can be called to add key value pairs after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. Previously, setting this field value had no effect. **This PR includes breaking changes to public APIs.** The `KeyValueMetadata` field is no longer exported from the parquet `file.Writer` struct. Use the `WithWriteMetadata` writer option to set key value metadata when creating a writer or use the `AppendKeyValueMetadata` method to add key value metadata after creating a writer. The `FileMetadata` field on the parquet `file.Writer` has been removed. * Closes: apache#35775 Authored-by: Tim Schaub <tim@planet.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Rationale for this change
The key value file metadata may include information generated while writing row groups. Currently, it is not possible to set the key value file metadata after creating a writer. With the changes in this branch, key value pairs may be added any time before closing the writer.
What changes are included in this PR?
This branch adds a
writer.AppendKeyValueMetadata(key, value)
method to the parquetfile.Writer
and to thepqarrow.FileWriter
.Are these changes tested?
Tests are added for the new functionality.
Are there any user-facing changes?
The
KeyValueMetadata
field on the parquetfile.Writer
has been renamed toinitialKeyValueMetadata
. This is a breaking change. Although the field was exported, setting it did not result in new key value metadata being written. Instead, it represented the initial key value metadata if the writer was passed theWithWriteMetadata
write option.The
WithWriteMetadata
option can still be used to provide the initial key value metadata values. In addition, theAppendKeyValueMetadata
method can be called to add key value pairs after creating a writer.The
FileMetadata
field on the parquetfile.Writer
has been removed. Previously, setting this field value had no effect.This PR includes breaking changes to public APIs.
The
KeyValueMetadata
field is no longer exported from the parquetfile.Writer
struct. Use theWithWriteMetadata
writer option to set key value metadata when creating a writer or use theAppendKeyValueMetadata
method to add key value metadata after creating a writer.The
FileMetadata
field on the parquetfile.Writer
has been removed.