Implement serde serializers and deserializers for the builders #3429

winding-lines · 2023-01-02T14:55:07Z

Which issue does this PR close?

Closes #3419 .

Rationale for this change

When integrating the object-store crate in other libraries it is desirable to propagate configuration settings from the top level to different layers/threads in the app. At least polars relies on serde in some use cases.

What changes are included in this PR?

Derive serde Serializers and Deserializers when appropriate. Manually implement them for types re-exported from the http crate.

Are there any user-facing changes?

No API changes.

winding-lines · 2023-01-02T15:12:21Z

@alamb need some guidance licensing issues, please see the file reqwest_serde.rs. I copied some code from another crate that doesn't have a LICENSE file. I am not sure what is the right way forward here, I am happy to re-implement from scratch if needed.

tustvold · 2023-01-02T16:00:22Z

I have to confess to being a bit apprehensive about this, I worry it leaks a lot of internal implementation details out into the public, and leaves us supporting a serialization format with all the forwards/backwards compatibility excitement that entails. A configuration system appears a bit beyond the scope of this crate...

I'm not all that familiar with polars, but I am surprised to see it relying on serde over more normal FFI-style bindings?

Is there perhaps some subset of the options we could expose, perhaps the same subset as exposed by from_env, as opposed to exposing effectively the entire configuration tree?

I don't know if any other people have thoughts on this, perhaps @roeap @alamb @crepererum

winding-lines · 2023-01-02T16:15:05Z

@tustvold your concerns seem valid, and we do not have to enable serde.

As you mentioned, I am interested in the following use case:

specify configuration in main
be able to propagate it through code that doesn't care about download (e.g. all the Planner layers inside polars)
at some point I need to download/upload and then I can instantiate a builder, likely in another thread.

Delta-rs seems to be solving a similar issue here
https://github.com/delta-io/delta-rs/tree/main/rust/src/builder

It seems to me that it would be better if object-storage provided this additional API instead of all the clients coming up with something that they have to maintain.

I have a limited understanding of this project, but the builder interface is not a bad starting point for this public API. I am cleaning up this PR in case you decide to take some/all of it.

Let me know what you think :)

roeap · 2023-01-03T01:06:36Z

As @winding-lines mentioned, in delta-rs we are faced with similar questions and also want to support configration via python that resemble pyarrow / fsspec confgurations, as well as configration within datafusion queries, similar to what you can do in spark.

The way we went there is so work with a property bag potentially augmented by configuration from the environment. As it turns out, I very recently experimented with an iteration on the higher level builder that handles creating differnt kinds of object stores, which I am planning to move to delta.rs.

That said, I definately see the usecase, but also share @tustvold's concerns. My personal preference (and I may be biased :)) would be to handle these use cases, by just sending property bags around, that allow for configuration based on "well-known-keys" that may also be in the environment. Not entirely sure though if the implementation linked above is the way to go, or if it is maybe going a bit too far in what alternatives are possible.

Mainly complexities arise when supporting multiple aliases for the same configuration value and handling connections to various stores in the same environment.

If there is general interest already I'd be happy to open a PR for a more concrete discussion - if that would also meet your needs @winding-lines.

winding-lines · 2023-01-03T01:13:05Z

I support any approach that we can use across multiple clients, specially if it has the support of the object-storage devs.

…

On Mon, Jan 2, 2023 at 5:06 PM Robert Pack ***@***.***> wrote: As @winding-lines <https://github.com/winding-lines> mentioned, in delta-rs we are faced with similar questions and also want to support configration via python that resemble pyarrow / fsspec confgurations, as well as configration within datafusion queries <delta-io/delta-rs#1043>, similar to what you can do in spark. The way we went there is so work with a property bag potentially augmented by configuration from the environment. As it turns out, I very recently experimented with an iteration on the higher level builder <https://github.com/roeap/object-store-python/blob/main/object-store/src/builder.rs> that handles creating differnt kinds of object stores, which I am planning to move to delta.rs. That said, I definately see the usecase, but also share @tustvold <https://github.com/tustvold>'s concerns. My personal preference (and I may be biased :)) would be to handle these use cases, by just sending property bags around, that allow for configuration based on "well-known-keys" that may also be in the environment. Not entirely sure though if the implementation linked above is the way to go, or if it is maybe going a bit too far in what alternatives are possible. Mainly complexities arise when supporting multiple aliases for the same configuration value and handling connections to various stores in the same environment. If there is general interest already I'd be happy to open a PR for a more concrete discussion - if that would also meet your needs @winding-lines <https://github.com/winding-lines>. — Reply to this email directly, view it on GitHub <#3429 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAJYPQK7K7SMKMOMXJPARLWQN3SNANCNFSM6AAAAAATO4ISEQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

crepererum · 2023-01-03T11:19:24Z

I share @tustvold's concerns here. From an API perspective, I think "builders" should never be cloned or serialized. They are a convenience interface to avoid having large unchecked structs or constructor functions with too many arguments. However I think there is a good case for creating object store instances from key-value-like configs. This makes deployments and user interactions (e.g. in polars where you may come from Python) way easier and I've seen this being implemented in other, similar libraries before. My recommendation would be do add something like fn from_kv(cfg: &HashMap<String, String>) -> Self for each builder and a general top-level fn from_type_and_kv(store_type: &str, cfg: HashMap<String, String>) -> Arc<dyn ObjectStore> that can even dyn-construct instances just based on config. The API user may get the key-value pairs from wherever they want. I would not use the Unix env variables here since different top-level apps may want to prefix/name these in different ways.

tustvold · 2023-01-03T11:32:56Z

Is there perhaps some subset of the options we could expose, perhaps the same subset as exposed by from_env
that allow for configuration based on "well-known-keys" that may also be in the environment.
key-value-like configs

There appears to be consensus on this being the path forward, having an explicit config interface makes sense to me, over serializing the builders directly. I also agree that being able to "hydrate" from an arbitrary list of key value pairs, is a compelling interface.

I think the major challenge with such an interface is documenting it well, as you don't have the benefit of the compiler / rustdoc to do the heavy lifting for you, but that is certainly not insurmountable.

fn from_kv(cfg: &HashMap<String, String>) -> Self

Or even a fn with_option(self, key: &str, value: &str) -> Result<Self> or something, no need to constrain to being a HashMap

If there is general interest already I'd be happy to open a PR for a more concrete discussion

That would be amazing

winding-lines · 2023-01-03T13:26:57Z

The comments above have hit the problem on the head:- we want a set of key-values- the keys should not be strings, they should be compiler checked I agree that having builders themselves as the API is unusual. However the fields inside the builders are close to what we need. Creating a separate layer that has to be kept up to date is also wasteful. A simple refactoring may help here where the actual fields inside the builder structs can be separated and the builder can have a with_config() method. It would help me a lot of those configs are Clone and serde *. Then we can come up with some macros to read the values from the env.At the endof the day as an user I prefer to not have to type strings.

tustvold · 2023-01-05T07:35:20Z

Closed by #3436

github-actions bot added the object-store Object Store Interface label Jan 2, 2023

Implement serde serializers and deserializers for the builders

0607361

Fix clippy and docs.

8f87256

roeap mentioned this pull request Jan 3, 2023

object_store: builder configuration api #3436

Merged

tustvold closed this Jan 5, 2023

winding-lines deleted the serde-for-object-store-builders branch January 12, 2023 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement serde serializers and deserializers for the builders #3429

Implement serde serializers and deserializers for the builders #3429

winding-lines commented Jan 2, 2023

winding-lines commented Jan 2, 2023

tustvold commented Jan 2, 2023

winding-lines commented Jan 2, 2023 •

edited

Loading

roeap commented Jan 3, 2023

winding-lines commented Jan 3, 2023 via email

crepererum commented Jan 3, 2023

tustvold commented Jan 3, 2023

winding-lines commented Jan 3, 2023 via email •

edited

Loading

tustvold commented Jan 5, 2023

Implement serde serializers and deserializers for the builders #3429

Implement serde serializers and deserializers for the builders #3429

Conversation

winding-lines commented Jan 2, 2023

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

winding-lines commented Jan 2, 2023

tustvold commented Jan 2, 2023

winding-lines commented Jan 2, 2023 • edited Loading

roeap commented Jan 3, 2023

winding-lines commented Jan 3, 2023 via email

crepererum commented Jan 3, 2023

tustvold commented Jan 3, 2023

winding-lines commented Jan 3, 2023 via email • edited Loading

tustvold commented Jan 5, 2023

winding-lines commented Jan 2, 2023 •

edited

Loading

winding-lines commented Jan 3, 2023 via email •

edited

Loading