Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement serde serializers and deserializers for the builders #3429

Closed
wants to merge 2 commits into from
Closed

Implement serde serializers and deserializers for the builders #3429

wants to merge 2 commits into from

Conversation

winding-lines
Copy link
Contributor

Which issue does this PR close?

Closes #3419 .

Rationale for this change

When integrating the object-store crate in other libraries it is desirable to propagate configuration settings from the top level to different layers/threads in the app. At least polars relies on serde in some use cases.

What changes are included in this PR?

Derive serde Serializers and Deserializers when appropriate. Manually implement them for types re-exported from the http crate.

Are there any user-facing changes?

No API changes.

@github-actions github-actions bot added the object-store Object Store Interface label Jan 2, 2023
@winding-lines
Copy link
Contributor Author

@alamb need some guidance licensing issues, please see the file reqwest_serde.rs. I copied some code from another crate that doesn't have a LICENSE file. I am not sure what is the right way forward here, I am happy to re-implement from scratch if needed.

@tustvold
Copy link
Contributor

tustvold commented Jan 2, 2023

I have to confess to being a bit apprehensive about this, I worry it leaks a lot of internal implementation details out into the public, and leaves us supporting a serialization format with all the forwards/backwards compatibility excitement that entails. A configuration system appears a bit beyond the scope of this crate...

I'm not all that familiar with polars, but I am surprised to see it relying on serde over more normal FFI-style bindings?

Is there perhaps some subset of the options we could expose, perhaps the same subset as exposed by from_env, as opposed to exposing effectively the entire configuration tree?

I don't know if any other people have thoughts on this, perhaps @roeap @alamb @crepererum

@winding-lines
Copy link
Contributor Author

winding-lines commented Jan 2, 2023

@tustvold your concerns seem valid, and we do not have to enable serde.

As you mentioned, I am interested in the following use case:

  1. specify configuration in main
  2. be able to propagate it through code that doesn't care about download (e.g. all the Planner layers inside polars)
  3. at some point I need to download/upload and then I can instantiate a builder, likely in another thread.

Delta-rs seems to be solving a similar issue here
https://github.com/delta-io/delta-rs/tree/main/rust/src/builder

It seems to me that it would be better if object-storage provided this additional API instead of all the clients coming up with something that they have to maintain.

I have a limited understanding of this project, but the builder interface is not a bad starting point for this public API. I am cleaning up this PR in case you decide to take some/all of it.

Let me know what you think :)

@roeap
Copy link
Contributor

roeap commented Jan 3, 2023

As @winding-lines mentioned, in delta-rs we are faced with similar questions and also want to support configration via python that resemble pyarrow / fsspec confgurations, as well as configration within datafusion queries, similar to what you can do in spark.

The way we went there is so work with a property bag potentially augmented by configuration from the environment. As it turns out, I very recently experimented with an iteration on the higher level builder that handles creating differnt kinds of object stores, which I am planning to move to delta.rs.

That said, I definately see the usecase, but also share @tustvold's concerns. My personal preference (and I may be biased :)) would be to handle these use cases, by just sending property bags around, that allow for configuration based on "well-known-keys" that may also be in the environment. Not entirely sure though if the implementation linked above is the way to go, or if it is maybe going a bit too far in what alternatives are possible.

Mainly complexities arise when supporting multiple aliases for the same configuration value and handling connections to various stores in the same environment.

If there is general interest already I'd be happy to open a PR for a more concrete discussion - if that would also meet your needs @winding-lines.

@winding-lines
Copy link
Contributor Author

winding-lines commented Jan 3, 2023 via email

@crepererum
Copy link
Contributor

I share @tustvold's concerns here. From an API perspective, I think "builders" should never be cloned or serialized. They are a convenience interface to avoid having large unchecked structs or constructor functions with too many arguments. However I think there is a good case for creating object store instances from key-value-like configs. This makes deployments and user interactions (e.g. in polars where you may come from Python) way easier and I've seen this being implemented in other, similar libraries before. My recommendation would be do add something like fn from_kv(cfg: &HashMap<String, String>) -> Self for each builder and a general top-level fn from_type_and_kv(store_type: &str, cfg: HashMap<String, String>) -> Arc<dyn ObjectStore> that can even dyn-construct instances just based on config. The API user may get the key-value pairs from wherever they want. I would not use the Unix env variables here since different top-level apps may want to prefix/name these in different ways.

@tustvold
Copy link
Contributor

tustvold commented Jan 3, 2023

Is there perhaps some subset of the options we could expose, perhaps the same subset as exposed by from_env
that allow for configuration based on "well-known-keys" that may also be in the environment.
key-value-like configs

There appears to be consensus on this being the path forward, having an explicit config interface makes sense to me, over serializing the builders directly. I also agree that being able to "hydrate" from an arbitrary list of key value pairs, is a compelling interface.

I think the major challenge with such an interface is documenting it well, as you don't have the benefit of the compiler / rustdoc to do the heavy lifting for you, but that is certainly not insurmountable.

fn from_kv(cfg: &HashMap<String, String>) -> Self

Or even a fn with_option(self, key: &str, value: &str) -> Result<Self> or something, no need to constrain to being a HashMap

If there is general interest already I'd be happy to open a PR for a more concrete discussion

That would be amazing

@winding-lines
Copy link
Contributor Author

winding-lines commented Jan 3, 2023 via email

@tustvold
Copy link
Contributor

tustvold commented Jan 5, 2023

Closed by #3436

@tustvold tustvold closed this Jan 5, 2023
@winding-lines winding-lines deleted the serde-for-object-store-builders branch January 12, 2023 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Derive Clone for the builders in object-store.
4 participants