-
Notifications
You must be signed in to change notification settings - Fork 783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Derive Macro for IntoPyDict #3350
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution :)
Before doing an in depth review, I'd like to see the following addressed:
- Can you give a motivating example for this? Is this useful to you personally, and how would you use it? Think in terms of another user seeing this and asking themselves "should i implement this for my structs? Is this useful?"
- Don't use
&str
and/orString
. Instead usesyn
's types and compose code with thequote!
macro, like our other macro code does. - Don't panic in macro code, the user experience with this is really bad. Instead use
syn::Error
andsyn::Result
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this functionality.
Overall I'm ok with having this, I've seen lots of manual implementations of to_dict
so having a way to make to make this easier is welcome. An alternative is #1703 which could add a py_dict!
macro, that's still more boilerplate than a derive.
The biggest design question in my head is what happens when a #[pyclass]
implements IntoPyDict
and is included in a sub-structure. e.g.
#[pyclass]
#[derive(IntoPyDict)]
struct A { }
#[pyclass]
#[derive(IntoPyDict)]
struct B {
pub a: A,
}
Should IntoPyDict
do a deep conversion into Python dictionaries, or should it be shallow? In the example above, should IntoPyDict
for B
result in {'a': {} }
or {'a': <object of type A>}
?
Is there a right answer, or should this be configurable?
How does this macro interact with other PyO3 options? I can imagine at least #[pyo3(crate = ...)]
and #[pyo3(name = "...")]
would affect this.
Python 3.7 added the dataclasses module which enabled users of the language to write classes in a similar syntax to rust structs or typescript types. In addition to being syntactic sugar over the init dunder, it also came with an asdict function which could turn any dataclass into a dictionary. If a field of a dataclass was a dataclass, then it would turn that field into a dataclass. If it was just a normal class then it would do a shallow copy. I suggest we follow this format (though I do not have it implemented this way yet). Since in every case, the composed struct must implement IntoPyDict, it would be a deepcopy. If someone wanted a shallow copy, they could implement IntoPyDict on the struct thesemselves. The official documentation for dataclass's asdict can be found here. I haven't tested the macro interaction for this. |
I don't think it would be affected by pyo3 options because it directly takes the field names of the class as the dictionary keys. Is this a wrong designwise? |
Im having a little trouble understanding what |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of #[pyo3(crate = "foo")]
is to redirect the macro to use foo
instead of pyo3
as the top-level symbol. There are several reasons to do this; within test_hygiene
we use this to run the macros within pyo3's lib test suite. Users might have pyo3 as a transient dependency e.g. foo::pyo3
.
The way I suggest the macro should handle it is emit an outer const _: () = { use pyo3 as _pyo3; <impl> };
structure, where <impl>
is the derive macro and then can use _pyo3
everywhere internally.
At the moment this implementation is significantly different from the rest of our macro machinery. That's not the end of the world, but I do think there are a few edge cases which will need to be addressed which would be helped by using the existing machinery.
Also, the implementation definitely needs to move from parsing type names as strings to emitting code which gets the trait solver to work out the final details. Users can freely rename types or add new types etc, so parsing type names is generally something we use as a last resort rather than something core to the implementation.
const COL_NAMES: [&str; 8] = [ | ||
"BTreeSet", | ||
"BinaryHeap", | ||
"Vec", | ||
"HashSet", | ||
"LinkedList", | ||
"VecDeque", | ||
"BTreeMap", | ||
"HashMap", | ||
]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel quite strongly that instead of special-casing this macro on names of types the implementation should emit code based on traits. In this case, if we go with the proposal of deep IntoPyDict
, we should use autoref-specialization which prefers IntoPyDict
if implemented and otherwise falls back on IntoPy<PyObject>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this apply to collections as well (ie Vecs, Sets etc)? These would get stored as a PyObject instead of a PyList?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And how do you recommend I use autoref specialization? I don't have much experience with it. Is there a crate which makes it easier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate crate usually doesn't help because abstracting the pattern away is often more complex than just applying it.
In this case, the general technique is described at https://github.com/dtolnay/case-studies/blob/master/autoref-specialization/README.md and you can find an example of its application in PyO3's existing macros by looking for usages of the type PyClassImplCollector
defined at
Line 109 in ae982b8
pub struct PyClassImplCollector<T>(PhantomData<T>); |
and for example its relation to the trait PyClassNewTextSignature
defined at
Line 995 in ae982b8
pub trait PyClassNewTextSignature<T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I add the specialization feature as a crate attribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; the specialization feature is incomplete and unsound. Autoref specialization is a technique which works on stable Rust to get a specialization-like result for some limited cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be some issues with reference moving when I try to do autoref specialization. Sorry it took me a while for this, I was finishing up an internship
|
||
for (i, char_val) in binding.chars().enumerate() { | ||
if char_val == ',' && stack.is_empty() { | ||
if binding[start..i].starts_with('#') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might want to take a look at take_pyo3_options
to see how the rest of the crate parses this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment regarding strings still stands. You shouldn't be parsing a single String
, &str
or char
in this code. Use the appropriate syn
types instead. In general: when writing macros if you feel you have to use strings you are either using quote/syn wrong or you just haven't found the right function/method yet.
check_type(&ast).unwrap(); | ||
let ident = ast.ident.into_token_stream(); | ||
let clause_wrapped = ast.generics.where_clause.clone(); | ||
let mut where_clause: TokenStream2 = TokenStream2::new(); | ||
let generic_params: TokenStream2 = parse_generics(&ast.generics).parse().unwrap(); | ||
let generics = ast.generics.into_token_stream(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These and the other unwraps should use into_compile_error
to create errors instead.
} | ||
} | ||
|
||
pub fn check_type(input: &DeriveInput) -> syn::Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should not exist. Instead match on the DeriveInput
and forward to a function that handles structs specifically. Something like...
pub fn handle_deriveinput(input: &DeriveInput) -> syn::Result<???> {
match input.data {
syn::Data::Struct(strukt @ syn::Datastruct { fields: syn::Fields::Named(_), ..}) => {
handle_struct(strukt)
}
_ => Err(syn::Error::new(
// span,
"IntoPyDict derive is only possible for structs with named fields",
))
}
}
pub fn handle_struct(input: &syn::DataStruct) -> syn::Result<???>{
// ...
}
Instead of having a bunch of "check" functions, parse, don't validate. Did you spot the bug/oversight in your code that this code does not have? What about structs without fields? 🙃
|
||
#[derive(Debug, Clone)] | ||
pub struct Pyo3DictField { | ||
name: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be an Ident
or &Ident
, not String
.
let ident = ast.ident.into_token_stream(); | ||
let clause_wrapped = ast.generics.where_clause.clone(); | ||
let mut where_clause: TokenStream2 = TokenStream2::new(); | ||
let generic_params: TokenStream2 = parse_generics(&ast.generics).parse().unwrap(); | ||
let generics = ast.generics.into_token_stream(); | ||
|
||
if let Some(clause) = clause_wrapped { | ||
where_clause = clause.into_token_stream(); | ||
} | ||
let mut dict_fields: Pyo3Collection = Pyo3Collection(Vec::new()); | ||
for token in item { | ||
let token_stream: syn::__private::TokenStream = token.into(); | ||
dict_fields += parse_macro_input!(token_stream as Pyo3Collection); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't go back and forth between syn
types, tokenstreams and strings here and elsewhere. Parse it into DeriveInput
, once, then just write functions that take that or the types contained in it.
pub fn parse_generics(generics: &Generics) -> String { | ||
if !generics.params.is_empty() { | ||
let mut generics_parsed = "<".to_string(); | ||
|
||
for param in &generics.params { | ||
match param { | ||
syn::GenericParam::Lifetime(lt) => { | ||
generics_parsed += ("'".to_string() + <.lifetime.ident.to_string()).as_str() | ||
} | ||
syn::GenericParam::Type(generic_type) => { | ||
generics_parsed += generic_type.ident.to_string().as_str() | ||
} | ||
syn::GenericParam::Const(const_type) => { | ||
generics_parsed += | ||
("const".to_string() + const_type.ident.to_string().as_str()).as_str() | ||
} | ||
} | ||
|
||
generics_parsed += ","; | ||
} | ||
|
||
generics_parsed = generics_parsed[0..generics_parsed.len() - 1].to_string(); | ||
generics_parsed += ">"; | ||
generics_parsed | ||
} else { | ||
String::new() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Ill keep that in mind
Im finding that I cannot parse the pyo3 options when the struct is annotated with a |
Not totally sure what the problem you're encountering is, probably it's related to the fact that one of the macros will be running on the result of the other. The solution might be to detect this case and run both expansions at the same time. |
Thanks for the contributions here. It seems like this work has stalled, and I'm doing a bit of housekeeping this morning so I'm going to close this PR for now. I think there are still open design questions and the implementation needed further work. If anyone is interested in continuing this please comment and we can discuss design. |
Create derive macro for IntoPyDict trait. Derives all primitives and std::collections except for BTreeMap and HashMap. Currently does not work on enums or tuple structs, but support will be added later.