Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Derive Macro for IntoPyDict #3350

Closed
wants to merge 57 commits into from
Closed

Derive Macro for IntoPyDict #3350

wants to merge 57 commits into from

Conversation

vidur2
Copy link

@vidur2 vidur2 commented Jul 28, 2023

Create derive macro for IntoPyDict trait. Derives all primitives and std::collections except for BTreeMap and HashMap. Currently does not work on enums or tuple structs, but support will be added later.

Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution :)

Before doing an in depth review, I'd like to see the following addressed:

  • Can you give a motivating example for this? Is this useful to you personally, and how would you use it? Think in terms of another user seeing this and asking themselves "should i implement this for my structs? Is this useful?"
  • Don't use &str and/or String. Instead use syn's types and compose code with the quote! macro, like our other macro code does.
  • Don't panic in macro code, the user experience with this is really bad. Instead use syn::Error and syn::Result.

tests/test_intopydict.rs Outdated Show resolved Hide resolved
@vidur2 vidur2 requested a review from mejrs July 29, 2023 14:28
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this functionality.

Overall I'm ok with having this, I've seen lots of manual implementations of to_dict so having a way to make to make this easier is welcome. An alternative is #1703 which could add a py_dict! macro, that's still more boilerplate than a derive.

The biggest design question in my head is what happens when a #[pyclass] implements IntoPyDict and is included in a sub-structure. e.g.

#[pyclass]
#[derive(IntoPyDict)]
struct A { }

#[pyclass]
#[derive(IntoPyDict)]
struct B {
    pub a: A,
}

Should IntoPyDict do a deep conversion into Python dictionaries, or should it be shallow? In the example above, should IntoPyDict for B result in {'a': {} } or {'a': <object of type A>}?

Is there a right answer, or should this be configurable?

How does this macro interact with other PyO3 options? I can imagine at least #[pyo3(crate = ...)] and #[pyo3(name = "...")] would affect this.

@vidur2
Copy link
Author

vidur2 commented Jul 30, 2023

Python 3.7 added the dataclasses module which enabled users of the language to write classes in a similar syntax to rust structs or typescript types. In addition to being syntactic sugar over the init dunder, it also came with an asdict function which could turn any dataclass into a dictionary. If a field of a dataclass was a dataclass, then it would turn that field into a dataclass. If it was just a normal class then it would do a shallow copy. I suggest we follow this format (though I do not have it implemented this way yet). Since in every case, the composed struct must implement IntoPyDict, it would be a deepcopy. If someone wanted a shallow copy, they could implement IntoPyDict on the struct thesemselves. The official documentation for dataclass's asdict can be found here. I haven't tested the macro interaction for this.

@vidur2
Copy link
Author

vidur2 commented Jul 30, 2023

I don't think it would be affected by pyo3 options because it directly takes the field names of the class as the dictionary keys. Is this a wrong designwise?

@vidur2
Copy link
Author

vidur2 commented Jul 30, 2023

Im having a little trouble understanding what #[pyo3(crate=...)] does. Do you mind explaining it to me?

@vidur2 vidur2 requested a review from davidhewitt July 31, 2023 13:37
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose of #[pyo3(crate = "foo")] is to redirect the macro to use foo instead of pyo3 as the top-level symbol. There are several reasons to do this; within test_hygiene we use this to run the macros within pyo3's lib test suite. Users might have pyo3 as a transient dependency e.g. foo::pyo3.

The way I suggest the macro should handle it is emit an outer const _: () = { use pyo3 as _pyo3; <impl> }; structure, where <impl> is the derive macro and then can use _pyo3 everywhere internally.

At the moment this implementation is significantly different from the rest of our macro machinery. That's not the end of the world, but I do think there are a few edge cases which will need to be addressed which would be helped by using the existing machinery.

Also, the implementation definitely needs to move from parsing type names as strings to emitting code which gets the trait solver to work out the final details. Users can freely rename types or add new types etc, so parsing type names is generally something we use as a last resort rather than something core to the implementation.

Comment on lines +10 to +19
const COL_NAMES: [&str; 8] = [
"BTreeSet",
"BinaryHeap",
"Vec",
"HashSet",
"LinkedList",
"VecDeque",
"BTreeMap",
"HashMap",
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel quite strongly that instead of special-casing this macro on names of types the implementation should emit code based on traits. In this case, if we go with the proposal of deep IntoPyDict, we should use autoref-specialization which prefers IntoPyDict if implemented and otherwise falls back on IntoPy<PyObject>.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this apply to collections as well (ie Vecs, Sets etc)? These would get stored as a PyObject instead of a PyList?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how do you recommend I use autoref specialization? I don't have much experience with it. Is there a crate which makes it easier?

Copy link
Member

@adamreichold adamreichold Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate crate usually doesn't help because abstracting the pattern away is often more complex than just applying it.

In this case, the general technique is described at https://github.com/dtolnay/case-studies/blob/master/autoref-specialization/README.md and you can find an example of its application in PyO3's existing macros by looking for usages of the type PyClassImplCollector defined at

pub struct PyClassImplCollector<T>(PhantomData<T>);

and for example its relation to the trait PyClassNewTextSignature defined at

pub trait PyClassNewTextSignature<T> {

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I add the specialization feature as a crate attribute?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; the specialization feature is incomplete and unsound. Autoref specialization is a technique which works on stable Rust to get a specialization-like result for some limited cases.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be some issues with reference moving when I try to do autoref specialization. Sorry it took me a while for this, I was finishing up an internship

tests/test_intopydict.rs Show resolved Hide resolved

for (i, char_val) in binding.chars().enumerate() {
if char_val == ',' && stack.is_empty() {
if binding[start..i].starts_with('#') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might want to take a look at take_pyo3_options to see how the rest of the crate parses this.

tests/test_intopydict.rs Show resolved Hide resolved
Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment regarding strings still stands. You shouldn't be parsing a single String, &str or char in this code. Use the appropriate syn types instead. In general: when writing macros if you feel you have to use strings you are either using quote/syn wrong or you just haven't found the right function/method yet.

Comment on lines +168 to +173
check_type(&ast).unwrap();
let ident = ast.ident.into_token_stream();
let clause_wrapped = ast.generics.where_clause.clone();
let mut where_clause: TokenStream2 = TokenStream2::new();
let generic_params: TokenStream2 = parse_generics(&ast.generics).parse().unwrap();
let generics = ast.generics.into_token_stream();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These and the other unwraps should use into_compile_error to create errors instead.

}
}

pub fn check_type(input: &DeriveInput) -> syn::Result<()> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should not exist. Instead match on the DeriveInput and forward to a function that handles structs specifically. Something like...

pub fn handle_deriveinput(input: &DeriveInput) -> syn::Result<???> {
    match input.data {
        syn::Data::Struct(strukt @ syn::Datastruct { fields: syn::Fields::Named(_), ..}) => {
                handle_struct(strukt)
        }
        _ => Err(syn::Error::new(
                    // span,
                    "IntoPyDict derive is only possible for structs with named fields",
                ))
    }
}
 
pub fn handle_struct(input: &syn::DataStruct) -> syn::Result<???>{
    // ...
}

Instead of having a bunch of "check" functions, parse, don't validate. Did you spot the bug/oversight in your code that this code does not have? What about structs without fields? 🙃


#[derive(Debug, Clone)]
pub struct Pyo3DictField {
name: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be an Ident or &Ident, not String.

Comment on lines +169 to +182
let ident = ast.ident.into_token_stream();
let clause_wrapped = ast.generics.where_clause.clone();
let mut where_clause: TokenStream2 = TokenStream2::new();
let generic_params: TokenStream2 = parse_generics(&ast.generics).parse().unwrap();
let generics = ast.generics.into_token_stream();

if let Some(clause) = clause_wrapped {
where_clause = clause.into_token_stream();
}
let mut dict_fields: Pyo3Collection = Pyo3Collection(Vec::new());
for token in item {
let token_stream: syn::__private::TokenStream = token.into();
dict_fields += parse_macro_input!(token_stream as Pyo3Collection);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't go back and forth between syn types, tokenstreams and strings here and elsewhere. Parse it into DeriveInput, once, then just write functions that take that or the types contained in it.

Comment on lines +316 to +343
pub fn parse_generics(generics: &Generics) -> String {
if !generics.params.is_empty() {
let mut generics_parsed = "<".to_string();

for param in &generics.params {
match param {
syn::GenericParam::Lifetime(lt) => {
generics_parsed += ("'".to_string() + &lt.lifetime.ident.to_string()).as_str()
}
syn::GenericParam::Type(generic_type) => {
generics_parsed += generic_type.ident.to_string().as_str()
}
syn::GenericParam::Const(const_type) => {
generics_parsed +=
("const".to_string() + const_type.ident.to_string().as_str()).as_str()
}
}

generics_parsed += ",";
}

generics_parsed = generics_parsed[0..generics_parsed.len() - 1].to_string();
generics_parsed += ">";
generics_parsed
} else {
String::new()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Ill keep that in mind

@vidur2
Copy link
Author

vidur2 commented Aug 14, 2023

Im finding that I cannot parse the pyo3 options when the struct is annotated with a #[pyclass] attribute macro. Is there an alterantive way to do this within pyo3's macro system instead of that provided by syn

@davidhewitt
Copy link
Member

Not totally sure what the problem you're encountering is, probably it's related to the fact that one of the macros will be running on the result of the other.

The solution might be to detect this case and run both expansions at the same time.

@davidhewitt
Copy link
Member

Thanks for the contributions here. It seems like this work has stalled, and I'm doing a bit of housekeeping this morning so I'm going to close this PR for now.

I think there are still open design questions and the implementation needed further work. If anyone is interested in continuing this please comment and we can discuss design.

@davidhewitt davidhewitt closed this Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants