Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize cstrings with the empty string. #4359

Closed
wants to merge 1 commit into from

Conversation

fruffy
Copy link
Collaborator

@fruffy fruffy commented Jan 24, 2024

This PR is meant to raise a discussion. We stumbled across this in #4357 where an incorrectly initialized cstring caused a bug. This PR changes the default initializer to use the empty string instead of nullpointer.

I am not sure about the motivation about making the default cstring invalid, could be performance related?

Alternatively, we could forbid this empty constructor. This could expose a lot of incorrectly constructed cstrings.

@fruffy fruffy added the core Topics concerning the core segments of the compiler (frontend, midend, parser) label Jan 24, 2024
@fruffy fruffy force-pushed the fruffy/cstring_initialization branch from 11dfc9e to efa5b21 Compare January 24, 2024 22:40
@fruffy fruffy force-pushed the fruffy/cstring_initialization branch from efa5b21 to 34cca86 Compare January 24, 2024 22:50
@jafingerhut
Copy link
Contributor

I am not here to guide a technical discussion on the contents of the cstring class, only to point out what looks like the main relevant history that I could find in the commit logs.

The discussion on this PR from Sep 2018 includes at least @mihaibudiu and @ChrisDodd who weighed in on that round of changes to the cstring implementation in p4c: #1463

The original version seems to be from Mihai in 2016.

@fruffy
Copy link
Collaborator Author

fruffy commented Jan 24, 2024

Well, the CI failures expose that there is a LOT of code in the compiler that assumes that cstring can be a nullpointer. Which is a little unpleasant. So this change might be non-trivial. Eventually we may phase out most of cstring in favor of std::string_view.

@vlstill
Copy link
Contributor

vlstill commented Jan 25, 2024

Eventually we may phase out most of cstring in favor of std::string_view.

We use cstring a lot as an owning object, so it would not be possible to replace it with std::string_view (unless we put heap-allocated pointers that noone but GC is going to collect in them, which seems little weird to me).

Side note: overall, if we wanted to take such big refactoring steps, I think there should be greater discussion on what is a good place to invest into. Just on top of my head, I can think about at least 1 really big issue that I would rank higher then cstring (but is also higher cost to change) -- I think it would make sense to use GC just as opt-in for certain classes/places, for example the IR::Node-derived once, but not for everything. The current state with replaced malloc tests to be quite fragile and it most cases there is no real reason to use GC, the C++ way of managing memory is just fine (for IR nodes the argument could be that GC is likely better than ref-counting which is the only reasonable alternative there).

@vlstill
Copy link
Contributor

vlstill commented Jan 25, 2024

Well, the CI failures expose that there is a LOT of code in the compiler that assumes that cstring can be a nullpointer.

Looks like it is all (that we can see now) related to generated IR code, so it might not be that hard to change it. I wonder why would anyone use the null cstring as something different then empty one, this looks like a really confusing design to me :-/.

@ChrisDodd
Copy link
Contributor

I think it would make sense to use GC just as opt-in for certain classes/places, for example the IR::Node-derived once, but not for everything. The current state with replaced malloc tests to be quite fragile and it most cases there is no real reason to use GC, the C++ way of managing memory is just fine (for IR nodes the argument could be that GC is likely better than ref-counting which is the only reasonable alternative there).

Related to this, I've been working on code (in https://github.com/ChrisDodd/p4c/tree/cdodd-disablegc) to allow removing the garbage collector and use std::shared_ptr instead. It still requires work though, and its been a low-priority background project for me.

@ChrisDodd
Copy link
Contributor

ChrisDodd commented Feb 12, 2024

There's a fair amount of code that uses cstring as kind of an optional<const string> -- that is, it might be a string (which can't be changed) or might be a nullptr. So an empty string and nullptr are logically two different things.

I think if you add cstring::operator bool() const { return str && *str; }, most of the failures seen here will go away. Not sure if anything else will fail, however.

@fruffy
Copy link
Collaborator Author

fruffy commented Feb 12, 2024

We use cstring a lot as an owning object, so it would not be possible to replace it with std::string_view (unless we put heap-allocated pointers that noone but GC is going to collect in them, which seems little weird to me).

Well, in that case we could likely replace these with std::string. Although I am not sure about the object-copy-cost in some cases.

There's a fair amount of code that uses cstring as kind of an optional -- that is, it might be a string (which can't be changed) or might be a nullptr. So an empty string and nullptr are logically two different things.
I think if you add cstring::operator bool() const { return str && *str; }, most of the failures seen here will go away. Not sure if anything else will fail, however.

I also tried to simply forbid the empty constructor. It requires a bit more work but in case where we need the optional we can now simply use the actual std::optional.

@fruffy
Copy link
Collaborator Author

fruffy commented Feb 12, 2024

Related to this, I've been working on code (in https://github.com/ChrisDodd/p4c/tree/cdodd-disablegc) to allow removing the garbage collector and use std::shared_ptr instead. It still requires work though, and its been a low-priority background project for me.

This is great, the garbage collector causes quite a few problems nowadays. I will create an issue to track this.

@fruffy fruffy added the breaking-change This change may break assumptions of compiler back ends. label May 29, 2024
@fruffy
Copy link
Collaborator Author

fruffy commented May 31, 2024

Closing in favour of #4694.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking-change This change may break assumptions of compiler back ends. core Topics concerning the core segments of the compiler (frontend, midend, parser)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants