Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial yarp-sys (Rust bindings) #1093

Merged
merged 1 commit into from
Aug 15, 2023
Merged

Conversation

turboladen
Copy link
Contributor

@turboladen turboladen commented Jun 29, 2023

Per discussion 1068, here's my first attempt at a yarp-sys crate: Rust bindings to the YARP C API. It's my first time using the bindgen crate (I've only ever hand-rolled my own Rust FFI stuff), so it's possible I've missed a thing or two there. I've added tests to rust/yarp-sys/tests that just attempt to validate that the bindings are getting generated ok. Outside of those tests, there's not really much to the crate.

I've started using these in a separate, higher level crate, and so far they seem to be working alright. Let me know if you'd like more docs or tests or anything else.


Questions:

  • Add a CI job for running tests

rust/yarp-sys/build.rs Outdated Show resolved Hide resolved
rust/yarp-sys/build.rs Outdated Show resolved Hide resolved
rust/yarp-sys/build.rs Outdated Show resolved Hide resolved
rust/yarp-sys/build.rs Outdated Show resolved Hide resolved
@kddnewton
Copy link
Collaborator

Thanks @turboladen! I've asked a couple of folks from our team to take a look at this PR. It's going to be in limbo for a bit because I'm going on paternity leave for 4 weeks, but it's going to stay open because once we discuss everything I think it'd be good to merge.

@turboladen
Copy link
Contributor Author

Thanks @turboladen! I've asked a couple of folks from our team to take a look at this PR. It's going to be in limbo for a bit because I'm going on paternity leave for 4 weeks, but it's going to stay open because once we discuss everything I think it'd be good to merge.

Sounds good! Enjoy your time off!

@ianks
Copy link
Contributor

ianks commented Jun 30, 2023

@turboladen I totally missed this PR when I implemented #1104

Would prefer if yours got merged though since you put in in some great effort here.

Would you be OK with making the output types, etc match the format from #1104?

@turboladen
Copy link
Contributor Author

Would you be OK with making the output types, etc match the format from #1104?

Hm... could you be a bit more specific? Are you talking about all the structs, enums, functions, typedefs, etc? If that's what you mean, I'd think it'd be less effort to just take your PR instead of this one (since you put a lot off effort into that!).


It looks like #1104 does work to remove things like yp_ and _t and to change the cases of enums and such. ...which I get, but in my experience with doing FFI things, if I'm a consumer of the FFI library (i.e. a consumer of yarp-sys), I find it a lot easier to use if the bindings exactly/closely match the C API. If/when you have to look at the C source or docs to figure out some behavior or something, there's mental overhead to translate, say parser, to yp_parser_t, as opposed to if the FFI type is exactly the same as the C type; which it's maybe no big deal if you're just looking for 1 thing, but if you get down a rabbit hole, there just becomes more you have to keep on that stack in your brain.

Sure, the bindgen generated stuff here isn't Rust-y, but I think the whole point in having the bindings is to take them and put a nice Rust API on top of those, and anyone who's doing that is going to need to refer to the C code to make sure they're doing the right thing. ...and IMHO, having those closer to the C code is a plus, not a minus.

I suppose another upside in the unaltered bindings is that there's almost nothing to maintain? IDK how much of an argument that is though since the main point is to get them right.

Thoughts?

@ianks
Copy link
Contributor

ianks commented Jul 5, 2023

It looks like #1104 does work to remove things like yp_ and _t and to change the cases of enums and such. ...which I get, but in my experience with doing FFI things, if I'm a consumer of the FFI library (i.e. a consumer of yarp-sys), I find it a lot easier to use if the bindings exactly/closely match the C API. If/when you have to look at the C source or docs to figure out some behavior or something, there's mental overhead to translate, say parser, to yp_parser_t, as opposed to if the FFI type is exactly the same as the C type; which it's maybe no big deal if you're just looking for 1 thing, but if you get down a rabbit hole, there just becomes more you have to keep on that stack in your brain.

Sure, the bindgen generated stuff here isn't Rust-y, but I think the whole point in having the bindings is to take them and put a nice Rust API on top of those, and anyone who's doing that is going to need to refer to the C code to make sure they're doing the right thing. ...and IMHO, having those closer to the C code is a plus, not a minus.

I suppose another upside in the unaltered bindings is that there's almost nothing to maintain? IDK how much of an argument that is though since the main point is to get them right.

Thoughts?

I think your logic makes perfect sense when there is a higher level wrapper library which users consume. However, I wasn't confident that someone would end up maintaining that library in the long-term so it seemed like generating something close would be a good middle-ground. The middle ground being:

  • Keep all function names the exact same
  • Make the struct / enum names Rusty
  • Keep original C docs

All of this is to say, if there are plans for a higher level library, then I agree it'd be better to go for the exact C naming like your approach.

Regardless, I trust your call on this one and happy go with whatever way you decide in the end!

@turboladen
Copy link
Contributor Author

However, I wasn't confident that someone would end up maintaining that library in the long-term so it seemed like generating something close would be a good middle-ground.

Ah, yeah, good point on maintainability. It sounds like (per #1068) @kddnewton still wants to assess things before determining if a higher-level lib would also make sense in this repo, but if not here, then I'd think the main point of the bindings would be to enable others to create their own higher-level lib using the bindings that they'd find here. ...in which case I think I'd still argue my point, but I think yours is also valid and would be curious to hear if anyone else had thoughts on the matter.

FWIW, I started plugging away over at turboladen/yarp-rs, where I'd initially planned to just do a straight, general purpose wrapper, but I've kinda paused to think a bit more about my use case (an LSP implementation) in that I might want to wrap yarp-sys in such a way that mixes in some async stuff. I guess I only mention this to say that it's been helpful to have the bindings match the C stuff in that work.

In any case, thanks for your feedback!

@ianks
Copy link
Contributor

ianks commented Jul 6, 2023

Alright, my opinion is now: let's go with the raw naming (yp_) and get this thing shipped! ❤️

I think some of the bindgen config I had would be good here (namely the derives). Want to do a quick pass over the options to see what you think? Other than that it all LGTM:

https://github.com/ianks/yarp/blob/be76362aa4803bb02f420be5e8e3130a72402496/rust/yarp-bindgen/src/main.rs#L126-L146

@turboladen
Copy link
Contributor Author

turboladen commented Jul 6, 2023

Alright, my opinion is now: let's go with the raw naming (yp_) and get this thing shipped! ❤️

I think some of the bindgen config I had would be good here (namely the derives). Want to do a quick pass over the options to see what you think? Other than that it all LGTM:

https://github.com/ianks/yarp/blob/be76362aa4803bb02f420be5e8e3130a72402496/rust/yarp-bindgen/src/main.rs#L126-L146

Most of those look good to me; a couple questions though:

  1. Why .disable_header_comment()? Having the bindgen version in the bindings seems benign, and maybe even helpful to someone down the road (idk). I don't think I care personally, just curious.
  2. These are on by default--any objections to leaving them out?
    • size_t_is_usize(true)
    • layout_tests(true)
    • generate_comments(true)
  3. .generate_cstr(true): this is cool--I hadn't seen it before! It requires rust 1.59.0, so I set rust-version = "1.59.0" in Cargo.toml.
  4. Why .derive_copy(false)? I tend to like having this turned on...
  5. .clang_arg("-fparse-all-comments"): This generates warnings that point to the C docs; I'll leave those C comments alone, but just thought I'd point that out.

image

rust/yarp-sys/Cargo.toml Outdated Show resolved Hide resolved
@ianks
Copy link
Contributor

ianks commented Jul 6, 2023

  1. Why .disable_header_comment()? Having the bindgen version in the bindings seems benign, and maybe even helpful to someone down the road (idk). I don't think I care personally, just curious.

I have the same opinion. Don't care either way.

  1. These are on by default--any objections to leaving them out?

    • size_t_is_usize(true)
    • layout_tests(true)
    • generate_comments(true)

I tend to prefer keeping them in, since it makes understanding-at-a-glance a bit easier. Up to you though.

  1. .generate_cstr(true): this is cool--I hadn't seen it before! It requires rust 1.59.0, so I set rust-version = "1.59.0" in Cargo.toml.

Yeah, we dont need this...

  1. Why .derive_copy(false)? I tend to like having this turned on...

I think for certain types we should derive Copy, but I was more concerned about "freeable" types, such as Parser and Buffer, etc. Copy doesn't make much sense on these since they naturally need drop semantics.

  1. .clang_arg("-fparse-all-comments"): This generates warnings that point to the C docs; I'll leave those C comments alone, but just thought I'd point that out.

image

Yeah it's unfortunate... In rb-sys I did some janky stuff to make them more markdown friendly. Not necessary here unless you want to do a quick hack job. Maybe one day yarp will ship with an RDoc parser and we can do a proper transformation! 😆

On another note, ideally we wouldn't need this flag, but the comments use // comments instead of doxygen style (/**)... @kddnewton would be could to convert these at some point.

@turboladen
Copy link
Contributor Author

turboladen commented Jul 8, 2023

On the bindings builder comment...

  1. I left out .disable_header_comment().
  2. I added the 3 methods in that are on by default (your point was well taken).
  3. I wasn't sure exactly what you mean about the generate_cstr() comment--were you saying to remove the generate_cstr() call or remove the ruby-version setting in Cargo.toml? I'm fine either way with using generate_cstr(), but if we do, I think having the min version set will provide a better experience for someone that's using an old version of rust.
  4. I added the derive_copy(false) call', makes sense.
  5. Good to know about the docs generating.

@ianks
Copy link
Contributor

ianks commented Jul 13, 2023

I think this is in a good enough state to ship and start messing around with, thanks so much @turboladen!

@kddnewton Whenever you are ready, free free to hit that :shipit: button!

@eregon
Copy link
Member

eregon commented Jul 17, 2023

This most likely needs a CI check to avoid breaking unknowingly.

@turboladen
Copy link
Contributor Author

turboladen commented Jul 28, 2023

This most likely needs a CI check to avoid breaking unknowingly.

Sorry I missed this comment! I just added a new rust-bindings workflow.

I didn't make any changes to the existing workflows, but it might make sense to skip running some of those jobs if only rust/* stuff changes. Shall I update those too?


Edit: Looks like thebuild* jobs in main.yaml are failing now (doesn't like finding the rust files so I'll be making some changes there.

@kddnewton
Copy link
Collaborator

@turboladen looks like it might just be needing to ignore the files in the manifest check.

@turboladen
Copy link
Contributor Author

@turboladen looks like it might just be needing to ignore the files in the manifest check.

@kddnewton I'm having a bit of a time trying to track this down. I see that the log has:

(see /home/runner/work/yarp/yarp/tasks/check_manifest.rake)

...but I a) don't see this file in the repo, and b) am not able to grep for anything that looks relevant. Could you point me in the right direction (a bit more :D)? Sorry, I'm not super familiar with Ruby<->C build things. 👀

@kddnewton
Copy link
Collaborator

Hey @turboladen, no problem, it's not exactly intuitive.

For this, you should add rust to the ignore_directories array here: https://github.com/ruby/yarp/blob/main/rakelib/check_manifest.rake#L5.

@kddnewton
Copy link
Collaborator

Also if you wouldn't mind squashing the commits, then I think this will basically be good to go!

@turboladen
Copy link
Contributor Author

For this, you should add rust to the ignore_directories array here: https://github.com/ruby/yarp/blob/main/rakelib/check_manifest.rake#L5.

Interesting... I don't have rakelib/ in my branch. I suppose that means I need to pull in some changes. :) Thanks--I'll get that in.

@turboladen
Copy link
Contributor Author

Also if you wouldn't mind squashing the commits, then I think this will basically be good to go!

You betcha--once the CI stuff is sorted, I'll squash away and post back.

README.md Outdated Show resolved Hide resolved
Comment on lines 5 to 10
yp_string_t_type, yp_unescape_calculate_difference, yp_unescape_manipulate_string,
yp_unescape_type_t,
};

#[test]
fn unescape_manipulate_string_test() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yp_unescape_ is internal stuff, I don't think it makes sense to expose that to Rust.
(it's exposed under YARP::Debug for YARP internal testing but that's it, it's not meant to be used by anything but YARP itself)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent was to follow the YP_EXPORTED_FUNCTIONs, but it sounds like that was a mistake. Is there a better way to deduce what's in the public API?

@kddnewton
Copy link
Collaborator

This thread is getting difficult to parse through because of the number of comments. I think at this point I'm pretty satisfied with the work that has been done, and I'm happy to merge this as is and then iterate on the main branch going forward before we release to cargo. @turboladen if you don't mind just checking out the CI failures, once this is passing I'm happy to merge.

@turboladen
Copy link
Contributor Author

I think this latest push should fix all the tests. Sorry for the churn. There are still 3 failures total in the sanitizer jobs, pointing to memory leaks in the tests. Normally I'd want to fix those before merging, but given a) the length this PR has been open, and b) the fact that I'm on a Mac and can't easily run tests using the sanitizers (normally I set up a small docker container and run inside those; IMO it'd be nice to commit that setup for other devs on Mac that end up here), personally I'd like to merge this, then do a subsequent PR for addressing the leaks in the tests. If, of course, you'd like me to address those before merging, that's totally understandable, just let me know.

@kddnewton
Copy link
Collaborator

I don't want to merge if the CI is red and it's leaking. I went through the leaks and I believe I've found them all. I've added a patch file here you can apply that should fix them all:

test.patch

Once that's applied, I'm good to merge.

@turboladen
Copy link
Contributor Author

I don't want to merge if the CI is red and it's leaking. I went through the leaks and I believe I've found them all. I've added a patch file here you can apply that should fix them all

Thanks @kddnewton! Patch applied, rebased, and pushed.

@kddnewton
Copy link
Collaborator

@turboladen I think it may have lost some of your more recent changes, like where you caught it up to 0.7.0.

@turboladen
Copy link
Contributor Author

@turboladen I think it may have lost some of your more recent changes, like where you caught it up to 0.7.0.

Yeah, you're right. And it'd been a few days since I'd pulled & rebased main, so I figured I'd do that. ...but now I'm getting:

error[E0432]: unresolved import `yarp_sys::yp_list_init`

Seems like that something related changed upstream? Looking into it.

@kddnewton
Copy link
Collaborator

Ahh man, yeah that function doesn't exist anymore. It's a macro that's YP_LIST_EMPTY that just sets everything to 0. If you can calloc it should be fine.

Copy link
Member

@eregon eregon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through rust/yarp-sys/build.rs and trying to note what's internal or what's not necessary to expose for parsing and pack format parsing.

.allowlist_type("yp_diagnostic_t")
.allowlist_type("yp_encoding_changed_callback_t")
.allowlist_type("yp_encoding_decode_callback_t")
.allowlist_type("yp_memsize_t")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be exposed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment meant to only quote one line, I think the concurrent git push messed it up.
Basically assume I'm only talking about the last line of every comment unless mentioned otherwise.

.allowlist_type("yp_comment_t")
.allowlist_type("yp_diagnostic_t")
.allowlist_type("yp_encoding_changed_callback_t")
.allowlist_type("yp_encoding_decode_callback_t")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only needed for encodings YARP does not understand. I would expect this is extremely rare, so probably unnecessary to expose.

.allowlist_type("yp_buffer_t")
.allowlist_type("yp_comment_t")
.allowlist_type("yp_diagnostic_t")
.allowlist_type("yp_encoding_changed_callback_t")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be exposed.
The encoding can be read after yp_parse(), I don't there is a need to know it before.

rust/yarp-sys/build.rs Show resolved Hide resolved
.allowlist_type("yp_list_t")
.allowlist_type("yp_node_t")
.allowlist_type("yp_node_type")
.allowlist_type("yp_parser_t")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct has way too much internal impl details, but unfortunately there are not enough functions yet to access it like an opaque pointer or so, e.g. for comments/errors/warnings, encoding and maybe start/end.
So until then it has to be exposed to Rust.

.allowlist_function("yp_list_empty_p")
.allowlist_function("yp_list_free")
.allowlist_function("yp_list_init")
.allowlist_function("yp_node_memsize")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be exposed.

Comment on lines +117 to +118
.allowlist_function("yp_parser_register_encoding_changed_callback")
.allowlist_function("yp_parser_register_encoding_decode_callback")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be exposed (see above).

.allowlist_function("yp_parser_register_encoding_changed_callback")
.allowlist_function("yp_parser_register_encoding_decode_callback")
.allowlist_function("yp_prettyprint")
.allowlist_function("yp_regexp_named_capture_group_names")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be exposed, i.e. I think the name are already exposed in the resulting AST nodes

.allowlist_function("yp_prettyprint")
.allowlist_function("yp_regexp_named_capture_group_names")
.allowlist_function("yp_serialize")
.allowlist_function("yp_size_to_native")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be exposed

Comment on lines +129 to +130
.allowlist_function("yp_unescape_calculate_difference")
.allowlist_function("yp_unescape_manipulate_string")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this needs to be exposed

@kddnewton
Copy link
Collaborator

@turboladen sorry, another function signature changed out from under you. I'll get you a patch ready for that.

@eregon I agree this exposes a lot of stuff, but in the interest of getting this PR merged, I'm overlooking that in the hopes to change it on main. It's already very difficult to read through this whole thread.

@eregon
Copy link
Member

eregon commented Aug 15, 2023

Yeah I'm OK if this is cleaned up after merge. It should be done somewhat soon, because I don't think we want to spend much effort updating Rust files if some of it goes away like the unescape test.

@eregon
Copy link
Member

eregon commented Aug 15, 2023

I think the one thing I'd like done in this PR is removing rust/yarp-sys/tests/unescape_tests.rs, there is no point to try to fix it if the corresponding API is internal and there is no real value for it to be public or exposed to Rust.

@eregon
Copy link
Member

eregon commented Aug 15, 2023

And the fact the CI specifically fails on that is good coincidence, better remove that test file now than fix it for nothing.

@kddnewton
Copy link
Collaborator

Actually yeah @turboladen if you want to just remove unescape_tests.rs it should be passing and then let's merge.

@turboladen
Copy link
Contributor Author

Ahh man, yeah that function doesn't exist anymore. It's a macro that's YP_LIST_EMPTY that just sets everything to 0. If you can calloc it should be fine.

Ok, fixed that, but am finding some unexpected behavior (tests that used to pass aren't anymore):

  1. parser_tests::diagnostics_test has the parser parse some invalid Ruby (class Foo;), then checks parser.error_list and asserts the message there (maybe not the most sustainable test, but it was something...). After rebasing, parser.error_list is null.
  2. parser_tests::encode_change_test tests that parser.encoding_changed got set when parsing an # encoding: attrib. That used to get set, but is now null.
  3. parser_tests::encoding_decode_test. parser.encoding.name used to get populated during that test and now it's null. I was just trying to show an example of using a callback for yp_parser_register_encoding_decode_callback.

I think 2 and 3 could probably just get deleted (looking at the sidebar convo about what should be exposed to rust; these functions probably could be excluded (?) and thus the tests removed). ...but I'm not sure what to do about test 1. That seems like it should be working.

@turboladen
Copy link
Contributor Author

Actually yeah @turboladen if you want to just remove unescape_tests.rs it should be passing and then let's merge.

Will do.

Still need to add more tests

Update Cargo.toml; add README

Switch yp_string_t_type variants to SNAKE_CASE

Add unescape tests

Add encoding callback tests

Add pack_parse test

Add diagnostic test

Add comment test

Add node tests

Add string_list tests

Add other string tests

Add shared string test

Add list tests

Fixes for updated branch

Run bundle install before running Rust tests

Fix version test

ci: Add proper config for rust-toolchain step for sanitizers

ci: Fix tests, clippy

Remove extra `bundle install`; run `bundle exec rake`

Didn't realize `setup-ruby`'s `bundle-cache: true` runs `bundle install`.

Remove `rake compile` from build.rs

This is complicating CI for me; maybe we add it back later.

Undo README formatting changes

Fix UB in C callbacks

Use slice+str instead of String for raw things

Move bindings to bindings module

Handle non-UTF-8 strings in paths

rust ci: test with sanitizers; add -D warnings

Update rust-bindings.yml

Update Cargo.toml

Don't need to compile extra crate_types

PR changes

Apply patch from @kddnewton

Delete unescape_tests.rs

Fix things after rebasing
@turboladen
Copy link
Contributor Author

@eregon for my part, I've got some PR fatigue here and consequently would like to get this whole baseline merged in; I think your feedback on the API surface area is great and should be addressed sooner than later. Just my two cents on the matter though.

@kddnewton
Copy link
Collaborator

@turboladen thank you so much for you patience. I think I can take it from here. I really appreciate all the effort you put in! If you aren't too burned out and feel like helping out on the main yarp crate (not just yarp-sys) I think we can start on that soon.

@kddnewton kddnewton merged commit 11295d2 into ruby:main Aug 15, 2023
38 of 41 checks passed
@turboladen
Copy link
Contributor Author

@turboladen thank you so much for you patience. I think I can take it from here. I really appreciate all the effort you put in! If you aren't too burned out and feel like helping out on the main yarp crate (not just yarp-sys) I think we can start on that soon.

My pleasure! I do wish I'd have done more work up front to minimize the churn, but oh well. Goes to show how nice it is to have a community to help make things better. But yeah, after a breather, I'd love to stick around and help out with yarp-to-be and yarp-sys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants