Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Improved source management #10

Closed

Conversation

Geosynopsis
Copy link

@Geosynopsis Geosynopsis commented Nov 17, 2022

Hey @samuelcolvin,

Could you have a look at the changes? It's still a work in progress. But it would be great to have your input on the general ideas.

Here's what the changes include:

  • Separation between the source provider, source mapper and Base Settings

    • The source providers are callable that will provide sources that implement the Mapping interface. It doesn't concern case sensitivity, nesting etc.
    • The source mapper provides a mapping logic between the sources and the settings fields. It handles the case_sensitivity, nesting logics etc. It can be used with any source as long as the source implements a Mapping interface.
    • The BaseSettings takes responsibility for instantiating the source and the mapper and calling them to instantiate the BaseSettings.
  • [Breaking-Change] as of now, but can add a legacy layer. The SourceMapper call takes Iterator(ModelFields) instead of BaseSettings.

  • The Config exposes the source_providers attribute as an interface to manipulate the source_providers to be used. The BaseSettings calls the providers in the same order except, init_kwargs (at this point in time) which gets the first priority. But customize_sources can still be used to manipulate order.

Further Improvements

  • Make the source_provider & source_mapper more configurable via Config to provide an option for specific use cases when the user wants case-sensitivity, nestings etc. for one source but not for another.
  • Take the env_names preparation logic to the mapper
  • Add legacy implementation with a deprecation warning

Reference: #6 (comment)

@samuelcolvin
Copy link
Member

sorry I've been slow to look at this, will review tomorrow. 🙏

@Geosynopsis
Copy link
Author

sorry I've been slow to look at this, will review tomorrow. 🙏

Thank you! I'll also take in a few improvement suggestions from the issues after you've reviewed the general approach.

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we can revert the renaming of main.py -> settings.py, that would make this PR much easier to review.

Then I'll review properly, the changed you've explained in the PR body look broadly sensible, but I'm a bit lost on code changes at this point.

Makefile Outdated
@@ -25,7 +25,7 @@ mypy:

.PHONY: test
test:
coverage run -m pytest --durations=10
coverage run -m pytest --durations=10 -W ignore::DeprecationWarning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we shouldn't ignore all DeprecationWarnings, also please move any warning ignores here

@@ -1,6 +1,6 @@
import warnings

from .main import BaseSettings
from .settings import BaseSettings
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale behind renaming main to settings? I'd prefer that we kept the old name, also if we do rename, let's do so in a separate PR so the other changes in the file are easier to read.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing serious TBH, just semantics. Will change!

@Geosynopsis
Copy link
Author

Preliminary work to add support for multiple secrets dirs as discussed in issue. The field name is taken in the same sequence as the directories provided.

@Geosynopsis
Copy link
Author

Any chance you could look at it soon? @samuelcolvin

Copy link
Member

@samuelcolvin samuelcolvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slow response.

Having thought about this, I think we should make a change to how sources work, here's my proposal:

We have a PydanticBaseSource abc which sources should inherit from.

Here's a very rough sketch of what the abc might look like:

from abc import ABC, abstractmethod


class PydanticBaseSource(ABC):
    def __init__(self, settings_cls: type[BaseSettings]):
        self.settings_cls = settings_cls
        self.config = settings_cls.__config__
        self.case_sensitive = self.config.case_sensitive
        ...
        # anything else that is generally useful for subclasses

    @abstractmethod
    def get_field_value(self, names: tuple[str, ...], field: ModelField) -> Any:
        """
        Get the value for a field, should return `Empty` if the field is not available.
        """
        pass

    def prepare_field(self, value: Any, field: ModelField) -> Any:
        # default implementation does `json.loads` for "complex" fields
        ...

    def __call__(self) -> dict[str, Any]:
        # default implementation iterates over fields and calls `get_field_value`
        # and `prepare_field`
        ...

I'm flexible on how PydanticBaseSource is implemented, if you think there's a better way, let me know.

We would then define subclasses of PydanticBaseSource like EnvSource, SecretSource, DotEnvSource.

Advantages of this:

  • we can split out as much common logic into methods which mean the logic can be adjusted, or the default implementation can be used easily
  • it should make it easier to customise the existing sources while duplicating as little code as possible

This should allow SourceMapper and the source providers to be combined into one cleaner abstraction.

I hope this makes sense, I'm sorry for the change of direction, but having read your PR and thought about this a bit more, I think this is the best way forward.

Let me know what you think?

)
if sources:
return deep_update(*reversed([source(self) for source in sources]))
def get_attribute(name: str) -> Any:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think better that this becomes a method on the class. I know that will mean passing lots of arguments, but it'll be much easier to read and maintain.

if sources:
return deep_update(*reversed([source(self) for source in sources]))
def get_attribute(name: str) -> Any:
attr_with_fallback: Dict[str, Tuple[Any, str]] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be defined in __init__, you might need to use PrivateAttr.

Comment on lines +77 to +80
if not value and name in attr_with_fallback:
value, warning = attr_with_fallback.get(name, (None, None))
if warning:
warnings.warn(warning, DeprecationWarning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to check attr_with_fallback once, e.g. something like

if not value:
    try:
        value, warning = attr_with_fallback[name]
    except KeyError:
        ...
    else:
        ...

parameter_value = parameter.default
if parameter.kind in {parameter.POSITIONAL_OR_KEYWORD, parameter.KEYWORD_ONLY}:
kwargs[parameter.name] = parameter_value
elif parameter.kind in {parameter.VAR_KEYWORD, parameter.VAR_POSITIONAL}:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See pydantic/pydantic#4673, Config will become a dict in V2, I think we should accept **kwargs and just pass the entire Config dict.

But happy to defer this until after that change is made and V2 is in beta.

# init_kwargs are treated separately from all other sources. The
# Config.sources need not include the init_kwargs. If the user wishes to
# override the init_kwargs, they can do so with customize_sources
_mappers: List[SettingsSourceCallable] = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a particular reason we're creating this on every call, rather than when creating the class?

@Geosynopsis
Copy link
Author

Geosynopsis commented Dec 30, 2022

Hey @samuelcolvin,
I understand your proposal to reduce the number of abstractions. However, IMO Settings side would be the perfect place to carry the mapping logic over if we believe that the mapping logic will remain consistent across all the sources.

If the mapping logic isn't consistent, the current implementation would be better for addressing that issue. That way, we can mix and match the mappers with the sources. eg. [{source: secret, mapper: mapper1}, {source:env, mapper: mapper2}, {source: vault, mapper: mapper1}]. It makes it easier for the contributors working on the mapping logic strictly focus on the mapping logic, and once one mapping logic is changed, it's reflected over all sources that use it.

Although creating ABCs for the sources with the common mapping methods and implementing each source as its implementation sounds excellent, there are some caveats:

  • We will impose on the users that their implementation must be the subtype of this ABC.
  • The users must ensure they don't overwrite the init parameters and the methods of the ABC.
  • We will mix up concerns about providing data and mapping it to settings.

Having said that, if you still think having ABC for the sources is better. Let me know. I'll create a new PR.

Cheers & happy new year,
Abhi

@samuelcolvin
Copy link
Member

samuelcolvin commented Dec 30, 2022

@adriangb @PrettyWood @hramezani @tiangolo please let me us know what you all think?

If there's a consensus between the 6 of us, I'm happy to accept it.

@tiangolo
Copy link
Collaborator

I'm slightly inclined towards subclasses rather than list of dicts, here's my rationale.

Some of my points are probably invalid just because I'm not understanding the problem correctly or because I'm missing something.

First:

  • I like the idea of re-using code and avoiding duplication (applies to both options).
  • I like the possibility to customize a source. E.g. being able to make the source be a YAML or TOML file that could be passed via Kubernetes secrets or something like that (applies to both).

DX

About developer experience: When asking users to write code, I tend to prefer to ask them to write functions rather than classes because it's the simplest callable primitive they could have to write, this would be a +1 for a list of dicts with source providers and mappers. But, given that the parameters of the functions have to be complex (e.g. a ModelField), users would not get tooling (autocompletion and inline errors) when writing those functions, only typing errors when passing them to the list of providers/mappers. There's a higher chance they would get inline errors and autocompletion for the non-trivial signatures in the methods of a sublcass (e.g. get the field: ModelField parameter in the signature autocompleted), as the editor would know what is the method that the user is trying to override.

So, not sure about this for both options, I think in this case I would be inclined to the subclasses.

Flexibility

About flexibility, I guess a list of dicts with source providers and mappers would allow better composition, allowing to mix and match functions, that's also why I tend to prefer functions rather than classes in most cases. Nevertheless, for each of the dicts, it would require defining the provider and the mapper in the Config. And in the case of the subclasses the user could inherit from an existing class and only customize the provider without having to specify the mapper again. So, again not sure.

If the alternative ended up being some type of mixin classes, I would be strongly against classes with mixins. But I don't think that would be necessary or even feasible.

I think probably more important would be to focus on use cases and the probability that something would be used or not.

Although I agree that a list of dicts with explicit provider and mapper would be more flexible, how much would it be expected that it's needed to mix and match different functions for providers and mappers?

If the expectation is more towards all the combinations of each provider with each mapping (many to many), then it would make more sense to have a list of dicts, as this would allow combining those if the alternative with classes would need a provider from one class and a mapper from another, because in that case, using classes, it would be needed to inherit one thing from one class (e.g. the provider) but override the other (e.g. the mapper) duplicating that code (if that code already lives in another class).

Nevertheless... I would actually expect that the use cases would be more a one to many or one to one between providers and mappers, not many to many. And if that's true, having subclasses would cover most/all use cases and wouldn't require defining explicitly the combinations of provider and mapper in the list of dicts.

And here's where I think it would make sense to consider use cases to guide the decisions a bit.

Use cases

I'm listing some use cases I can imagine. There might be some key use case that discards my thinking and moves the needle towards a list of dicts instead of subclasses, but up to now, it seems to me that subclasses cover everything I can imagine.

Env vars

The first and original use case. And I think this is the one with the most custom logic.

The provider would be quite simple, just the environ dict.

The mapper would be the most complex, parsing strings into JSON data, handling upper / lower cases. Env var prefixes, possibly lists of values as comma separated in a single env var (CSV).

I don't really see another use case that would need this complex mapper apart from this one for env vars and that would also need a complex provider from another use case. So, this provider would probably only be used with this mapper, so, one to one or one to many, not many to many.

Dot env file

This would be the same mapper as the one from Env vars, and the provider would be overridden to read from a file. But here the provider would be used only with the same mapper, so, again, one to many.

Docker / Kubernetes secrets

This would read from a file or set of files, so, a custom provider. But as the file would most probably be JSON or YAML, no custom mapper would be needed, just the standard Pydantic serialization.

So, not many to many.

Remote storage

This would be Redis, Hashicorp Vault, even a SQL DB, or S3 storage, or remote secret/password management system. But in all the cases, the custom logic would be just the provider, because whatever is the remote thing, it would have native data types. The mapper would be the default Pydantic serialization.

So, not many to many.


I haven't imagined a use case that really needs a complex provider AND a complex mapper, and that could benefit from being able to re-use the two methods from different sub-classes, which is the main reason why I would think that a list of dicts would be better than subclasses. All the use cases I can think of can be well served by just subclasses (or default classes).

Subclasses concerns

Some arguments about the current concerns about subclasses:

  • We will impose on the users that their implementation must be the subtype of this ABC.

But as this is already the case for using Pydantic models or BaseSettings, I wouldn't think this is particularly problematic or strange for users, they already have to create subclasses to use other parts of Pydantic.

I would imagine that not requiring users to create subclasses would be an advantage mainly if that would allow them to reuse some code or function they might already have, apart from the extra import and extra line/complexity. But the main restriction is already in that users have to create functions with a very specific signature, to receive a value, and a very Pydatnic-specific ModelField, so, they will already have to create a custom function to handle this, inheriting from a class won't be the main added complexity.

  • The users must ensure they don't overwrite the init parameters and the methods of the ABC.

This is already the case for Pydantic's BaseModel and BaseSettings, so I don't see a problem in this, it's already a design decision they had to accept when they started using Pydantic, so it wouldn't be unfamiliar.

  • We will mix up concerns about providing data and mapping it to settings.

Yes, but if my thought process is right, it will provide a much better developer experience, safer code (with better static analysis support), etc. which is one of the main reasons to not want to mix concerns, to have better code, but in this case, I think the benefits of subclasses up to now seem better than the drawbacks.


The funny thing is, I tend to not like classes, tend to avoid them, and prefer plain functions when possible, even more when requiring users to write them. But in this case, I think the benefits of subclasses, up to now, seem to be better than their drawbacks.

In particular, I think the strongest benefit for subclasses is having the chance of the editor autocompleting the signature of the method I have to override.

@samuelcolvin
Copy link
Member

amazing thoughtful response @tiangolo, thank you. I agree with you, but I'll let others reply before making an decision.

@Geosynopsis
Copy link
Author

Geosynopsis commented Dec 31, 2022

Hey @tiangolo ,
Thanks for the response. Totally concur with the use cases you've outlined, which is why there's only one mapper as of now. Here, I would like to add more context to the discussion and please let me know if my understanding needs to be corrected.

Before the start of the discussion, we consider the following as given:

  • In all the options we are discussing, we have addressed the issues with code duplication.

First, here's the architecture of the pydantic-settings library included in this PR (all square brackets are what exists now in the PR); let's call it Option 1.

SettingsArch(1)

Here's a summary of what an individual Component does, how it is implemented and what it means for the user:

Option 1: Components

a. BaseSettings

  • There isn't much change except internal logic on how we handle the source mapper and providers

  • For most users, the interface is the same as now.

  • For the users adding a new source provider, they must provide it and the runtime parameters of the provider in the Config.

b. SourceMapper

  • As you've realized, all use cases are one-to-many. Additionally, they all share a very similar, if not the same, mapping logic. The logic of the mapping is, hence, abstracted out to this component.

  • Why it exists as a separate component? It's for that WhatIf scenario where more than one mapping may be required, which was the example of mix & match source and mapping list of dictionary in my previous comment.

  • The assumption is that the mapper is something that the user never has to touch unless the person is a contributor.

  • The contributor working on the mapper can work with the assurance that they are getting a dictionary from any source provider and work on logic without having to bother about in what format or where the source stays and how to open them. We'll let the user worry about that, especially in exotic scenarios.

c. SourceProvider

  • The source providers are functions that provide implementations of Mapping (python) as an output. The output could be a plain dictionary, as in loading the .env file or taking the environment from the system variables, or another implementation of collections.abc.Mapping interface as done with the secret source.

  • We don't impose any implementation constraints from pydantic-settings on how the user should implement it. We are happy as long as it's callable that gives a dictionary back.

  • The assumption is that the user would want to add more sources in different flavours.


Perhaps we want to reduce the abstractions to not overengineer from the get-go. If that's the case, we've now two options.

Option 2: Merging BaseSetting and SourceMapping

  • As we've seen with the use cases, we've one-to-many use cases primarily as of now. If that's the case, we could merge BaseSetting & SourceMapping. That way, contributors have to only work on the BaseSetting implementation while working on mapping logic, and the source providers remain as flexible and straightforward as in the Option 1.

  • If the user wants to change some mapping constraints, it will be a bit of work unless they address it in their source provider.

Option 3: Creating an ABC with the logic of SourceMapping and implementing each source as a subclass of this ABC.

  • In this implementation, the user would inherit from the ABC and add their provider-specific logic to the subclass. It's more flexible in introducing additional provider-specific mapping logic within the provider.
  • On the other hand, it lacks the flexibility of Options 1 & 2 in defining the source provider for the user. And it is the place where my concern about subclasses comes.
  • The issue about the ModelField that you've mentioned kicks in as well, and the user needs knowledge of internals to some extent to get things working correctly.

Perhaps those contexts would help us to make a proper decision. Let me know if I've missed something and if my understanding needs to be readjusted.

Cheers & Happy New Year to you all.

@samuelcolvin
Copy link
Member

Sorry for the slow response, on balance I still think we should go with PydanticBaseSource abc.

@Geosynopsis I totally understand if you don't agree with this and would prefer to step back from this piece as it's no going the way you thought best, but if you'd be willing to continue to work on it, that would be absolutely great.

Let me know.

@Geosynopsis
Copy link
Author

Sorry for the slow response, on balance I still think we should go with PydanticBaseSource abc.

@Geosynopsis I totally understand if you don't agree with this and would prefer to step back from this piece as it's no going the way you thought best, but if you'd be willing to continue to work on it, that would be absolutely great.

Let me know.

Hey @samuelcolvin, no worries, if you consider going with ABC is best for it; I will modify and send you PR as soon as I find some time to spare.

@samuelcolvin
Copy link
Member

👍 Awesome, thank you.

@Geosynopsis
Copy link
Author

Hey @samuelcolvin,
I have a few questions regarding the requirements for the ABC-based implementation.

  1. Currently, we have two places to pass configuration-related arguments, e.g. _env_file in init or env_file in the Config class (in future config dictionary, perhaps). The same goes for env_file_encoding, env_nested_delimiter, secrets_dir etc. Do you think it's best to leave those arguments to strictly exist in Config and remove it from init?
  2. What's your approach to treating the customise_sources hook when we move to config as a dictionary? Should it still be present as a callable object inside the dictionary, or should we move it to a protected function within BaseSetting class?

@Geosynopsis
Copy link
Author

Hey @samuelcolvin, I have a few questions regarding the requirements for the ABC-based implementation.

1. Currently, we have two places to pass configuration-related arguments, e.g. `_env_file` in init or `env_file` in the `Config` class (in future config dictionary, perhaps). The same goes for `env_file_encoding`, `env_nested_delimiter`, `secrets_dir` etc. Do you think it's best to leave those arguments to strictly exist in `Config` and remove it from `init`?

2. What's your approach to treating the `customise_sources` hook when we move to `config` as a dictionary? Should it still be present as a callable object inside the dictionary, or should we move it to a protected function within `BaseSetting` class?

Additionally, looking at the changes coming with V2, fields attribute and get_fields_info attributes are also getting kicked out of Config. Should we look at an alternative to providing field-related info apart from field definition with Field class, or shall we entirely depend on what's provided in Field class?

@hramezani
Copy link
Member

@Geosynopsis we used your excellent idea and impelemented it in 0ecac22
Thanks for the contribution and sorry it took so long to get to it!

@hramezani hramezani closed this Apr 20, 2023
@Geosynopsis
Copy link
Author

Great work, @hramezani! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants