Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trait ergonomics str implementation #4233

Merged
merged 47 commits into from
Jul 17, 2024
Merged

Conversation

Zyell
Copy link
Contributor

@Zyell Zyell commented Jun 4, 2024

Implementation of str as identified in #4207

This introduces the str argument to pyclass which optionally takes a formatting string using the shorthand notation inspired by thiserror. It expands similarly to this library with the one addition that empty unnamed or indexed format brackets will place self in the call to format. (this is particularly useful for enum formatting. Any formatting arguments after : within the brackets will be respected.

If no formatting string is provided an implementation of the Display trait is expected.

Michael Gilbert added 7 commits June 4, 2024 08:26
This update brings custom string formatting for PyClass with a #[pyclass(str = "format string")] attribute. It allows users to specify how their PyClass objects are converted to string in Python. The implementation includes additional tests and parsing logic.
@Zyell
Copy link
Contributor Author

Zyell commented Jun 4, 2024

This is a draft. The following still needs to be implemented:

  • newsfragment
  • doc updates
  • expand tests
  • capture renaming of variants for proper output
  • ensure manually implemented __str__ is not allowed if automatic __str__ is requested.

The format expansion and optional value passed to the str are both completed. I wanted to get this created before going too far so I could get feedback. 😃

@Zyell
Copy link
Contributor Author

Zyell commented Jun 5, 2024

@davidhewitt I have an implementation question for you. Currently we have the following implementation for str:

  • The user passes just str: We use the Display Trait implementation they have implemented for str formatting.
  • The user passes str("<formatting shorthand>"): We use the shorthand format to map named members value and allow for "{}", "{:?}", ... which allow for self to be passed into the formatting string and leverages the implemented traits accordingly.

Given that the user can rename variants on an enum, it seems to me that it makes sense for us to respect this only when the user provides a shorthand formatting string to str. This will remain consistent to the by default we use whatever Display implementation is found.

Are there any other renaming gotchas I should consider? Does this implementation seem reasonable?

Michael Gilbert added 3 commits June 5, 2024 16:28
Implemented the capacity to handle renamed variants in enum string representation. Now, custom Python names for enum variants will be correctly reflected when calling the __str__() method on an enum instance. Additionally, the related test has been updated to reflect this change.
Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I really like this. Nicely done :)

pyo3-macros-backend/src/pyclass.rs Outdated Show resolved Hide resolved
newsfragments/4233.added.md Outdated Show resolved Hide resolved
tests/ui/invalid_pyclass_args.stderr Outdated Show resolved Hide resolved
Zyell and others added 7 commits June 6, 2024 13:22
Co-authored-by: Bruno Kolenbrander <59372212+mejrs@users.noreply.github.com>
Co-authored-by: Bruno Kolenbrander <59372212+mejrs@users.noreply.github.com>
@davidhewitt
Copy link
Member

Thanks, this looks looks really great! I have to confess when I proposed the feature I hadn't realised the number of interactions around renaming and really thought too hard about the differences between Rust and Python formatting. I'd prefer move slowly and err on the side of not making this too powerful too quickly, even if that means we only allow simpler cases we're sure about.

The cases I think risk causing users pain:

  • #[pyclass(repr)] without a format string - I think I see now that almost always it would be a bad default to use the Rust Debug implementation here, because users would expect a Python syntax.
  • Interactions with struct / field renaming, which have been discussed at length above.
  • Enum scoping - I think users probably want MyEnum. prepended on their string most of the time? It would be nice to not need to have to specify this.

Moving to the design, this makes me wonder a few things:

  • Is allowing :? debug format specifier ever useful in this shorthand? Probably yes, in repr particularly. It makes me a bit easy that it makes it easy for Rust syntax to leak into the formatting though 🤔
  • For complex enums, should we require all variants to have a str = "format" specifier, and then we'll be responsible for prepending the correct variant name?

Maybe a lot of these design problems are more related to repr - e.g. would str on a complex enum even want to refer to the variant name? I imagine the repr definitely would. I think field names are also more usually a concern for repr.

I suppose, what are the forms we're sure about? I think it's the following:

  • #[pyclass(str)] using Display seems intuitive on structs. On enums we run into the renaming problem.
  • #[pyclass(str = "<format>")] also seems clear on structs. On enums we have challenges with renaming, and we have a bit of a question whether to put the format string on the top-level or the variants.

Does this imply that for this first version it would be wiser to support this just for structs, and allow ourselves a bit more time to work out the enum design?

@Icxolu
Copy link
Contributor

Icxolu commented Jun 8, 2024

I suppose, what are the forms we're sure about? I think it's the following:

  • #[pyclass(str)] using Display seems intuitive on structs. On enums we run into the renaming problem.
  • #[pyclass(str = "<format>")] also seems clear on structs. On enums we have challenges with renaming, and we have a bit of a question whether to put the format string on the top-level or the variants.

Does this imply that for this first version it would be wiser to support this just for structs, and allow ourselves a bit more time to work out the enum design?

I agree here. On structs it seems pretty clear and intuitive what this does.

One thing we should probably test here: Does the formatting handle raw identifiers (correctly)?

#[pyclass(str = "{r#type}")]
struct Foo {
    r#type: Bar
}
  • Is allowing :? debug format specifier ever useful in this shorthand? Probably yes, in repr particularly. It makes me a bit easy that it makes it easy for Rust syntax to leak into the formatting though

It's true that it easier to leak syntax here, but we also don't have any guarantee about the syntax in any Display impl and I think it's pretty clear what to expect from a Debug format in Rust usually. So allowing this should not cause too much confusion and would be pretty useful for something like arrays or Vecs

@Zyell
Copy link
Contributor Author

Zyell commented Jun 10, 2024

@Icxolu Thank you for pointing out the raw identifiers case! It now properly handles that in the latest commit.

I agree that it is reasonable to expect that the users know what will come from a debug format in Rust for a given type. For me, the biggest concern in this feature is ensuring the user understands the cause and effect of this feature without surprises. I mentioned above that if we have strange cases, we would need to document them thoroughly, but such a thing is brittle (it relies on the user finding and reading those caveats). As for the leaking of rust syntax, I don't see that as a problem. As mentioned, the user should generally know what to expect with debug output in Rust. Also, I actually like some of the debug output in rust and would personally retain it in my own implementations of repr or str. In my Python code I mostly use repr implementations (since they are picked up by str unless you implement str separately) and use them for logging and debugging. As my primary use case is debugging quickly, I prefer output structures that "map the issue" quickly and efficiently for me. While there is guidance for repr and str implementation in Python, there is a "practicality beats purity" implementation within dataclasses themselves (for example enabling/disabling per field - such that repr could not be used to fully recreate the instance).

I think that the renaming of variants adds surprises to how the user implements the shorthand format strings and I think requiring every variant to have a str format might be onerous. I am leaning toward the following caveats:

  • If the user has implemented naming or renaming for the variants or the class, disallow the string format shorthand and direct implementation of str or the Display trait. We would raise a compile time error explaining this.
  • Allow any format specifiers in the shorthand string at the user's discretion such that it is clear what type is being output (hence forbidding the above case)
  • Allow specifying variant string formats for ease of use with complex enum cases in particular.

One that I'm not sure about:

* Enum scoping - I think users probably want `MyEnum.` prepended on their string most of the time? It would be nice to not need to have to specify this.

If we do this automatically (I really wish Rust would do this by default for Debug derivations :-) ), it would lead to surprising behavior if they have implemented Display directly, as it wouldn't be used. I haven't played around with this, but could we detect the Display implementation in the macro and use that knowledge to switch accordingly? Then I think this would be great.

Michael Gilbert added 3 commits June 25, 2024 10:16
# Conflicts:
#	pyo3-macros-backend/src/pyclass.rs
…ss, field, or variant args when mixed with a str shorthand formatter.
@Zyell
Copy link
Contributor Author

Zyell commented Jun 25, 2024

@davidhewitt @Icxolu Sorry for the delay in returning to this PR. I have made the string shorthand format incompatible with any renaming, forcing the user to either implement Display or __str__ explicitly to remove any ambiguity in the shorthand format token choices. It looks like the remaining issues surround handling Enums:

  • Enum scoping affecting both Simple and Complex (prepending the enum class name)
  • Complex Enum variant formatting

For the first case, I don't want to make the assumption that they need the name prepended if Display has been implemented explicitly. I don't believe proc-macros have access to the what traits have been implemented. I looked into how I could bound the problem, but I'm not entirely sure how to do so just yet. There is also the other case where they explicitly prepended the class name using a shorthand string formatter and we don't want to double prepend it. If we make the assumption for the user, it would be surprising.

For the second case, it was mentioned

* For complex enums, should we require all variants to have a `str = "format"` specifier, and then we'll be responsible for prepending the correct variant name?  

This again runs into the potential double pre-pending mentioned above, or maybe the user simply doesn't want it prepended at all.

Which brings us to the following:

Does this imply that for this first version it would be wiser to support this just for structs, and allow ourselves a bit more time to work out the enum design?

It isn't clear to me how the shorthand logic should function for the enum cases without making potentially bad assumptions for the user. For this initial release, should we release just the following?

  • Support for str which uses a Display trait implementation for structs, simple enums, and complex enums.
  • Support a string formatter str="<string formatter>" only for structs and disallowed for enums.
  • Disallow the the string formatter for all cases of renaming across all pyclass implementations. (Optionally I can re-enable this for structs if I remove the ability to reference self in the shorthand notation with {}. Then there won't be a chance of accidentally introducing incorrect field names when using the shorthand for structs. This only works under the assumption that the shorthand only applies for structs.)

Thoughts?

@Icxolu
Copy link
Contributor

Icxolu commented Jul 5, 2024

Sorry for the lack of response here, I was away from my keyboard for a few days.

  • Support for str which uses a Display trait implementation for structs, simple enums, and complex enums.

Doing anything other than simply delegating to Display would feel wrong to me. So I guess the choice is whether we want to allow this for all types, or just structs. Personally I think it is fine to allow this one for all, since to me this has only one resonanble implementation.

  • Support a string formatter str="<string formatter>" only for structs and disallowed for enums.

I agree, structs seem pretty clear while enums definitely need more design first.

  • Disallow the the string formatter for all cases of renaming across all pyclass implementations. (Optionally I can re-enable this for structs if I remove the ability to reference self in the shorthand notation with {}. Then there won't be a chance of accidentally introducing incorrect field names when using the shorthand for structs. This only works under the assumption that the shorthand only applies for structs.)

This is probably a good idea. We might want to disallow the self shorthand as well for the time beeing. For structs I don't see a real use case, so I would defer that to future work on enums if it then turns out to be needed/useful. Even though me could lift the restriction about renaming in this case, having it in place for now seems not too problematic and gives us more room in the future to adapt and improve.

@Zyell
Copy link
Contributor Author

Zyell commented Jul 17, 2024

@Icxolu I have implemented everything as discussed above. I am having some trouble though with the codecov/patch test. It is claiming that this piece of code is not covered, but I do have 2 tests fully covering it here, throwing the compile time error from this code block as seen here. Do the tests in tests/ui not count as part of the patch coverage check?

MG added 2 commits July 17, 2024 08:04
…d, fixing Ok issue in eq implementation."

This reverts commit a37c24b.
Copy link
Contributor

@Icxolu Icxolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, implementation wise this looks good to me now. I left some thoughts about the error messages, but these are just what came to my mind while reading them and I'm also completely fine if we decide that they are good as is.

Regarding codecov, I'm not sure, maybe @davidhewitt knows more, but I don't think we need to worry too much here.

tests/ui/invalid_pyclass_args.stderr Outdated Show resolved Hide resolved
tests/ui/invalid_pyclass_args.stderr Outdated Show resolved Hide resolved
guide/pyclass-parameters.md Outdated Show resolved Hide resolved
pyo3-macros-backend/src/pyclass.rs Outdated Show resolved Hide resolved
…R for additional details regarding the implementation of `str`
Copy link
Contributor

@Icxolu Icxolu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for all the effort! Looking forward to see this landing.

@Icxolu Icxolu added this pull request to the merge queue Jul 17, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Jul 17, 2024
@Zyell
Copy link
Contributor Author

Zyell commented Jul 17, 2024

@Icxolu This merge conflicted with the fix in my other PR. I just fixed the conflict.

@Icxolu Icxolu enabled auto-merge July 17, 2024 22:03
@Icxolu Icxolu added this pull request to the merge queue Jul 17, 2024
Merged via the queue into PyO3:main with commit 5ac5cef Jul 17, 2024
41 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants