-
Notifications
You must be signed in to change notification settings - Fork 597
embed.fnc: Add ability to assert string ptr is in-bounds #23774
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: blead
Are you sure you want to change the base?
Conversation
d2f1327
to
4da32a7
Compare
die_at_end "$plain_func: u flag only usable with m" if $flags =~ /u/; | ||
die_at_end "$plain_func: u flag only usable with m" | ||
if $flags =~ /u/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit regen/embed.pl: Comments/white-space
Just a side-note: personally I don't like this style very much but I have no idea what the style-guide says about it (if there even is one).
If I would write I would probably switch to:
if ($flags =~ /u/) {
die_at_end "$plain_func: u flag only usable with m"
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is less desirable than if it could be on the same line, but this paradigm is fairly common in our source. Perhaps it is because I'm a native English speaker, but this feature of Perl where the 'if' is trailing makes things more understandable at a glance to me even when on a separate line, than the alternative.
regen/embed.pl
Outdated
warn "$func: $arg needs NN or NULLOK\n"; | ||
warn "$func: $arg needs NN, [EMS]PTR, or NULLOK\n"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit regen/embed.pl: Add ability to assert(s < e)
A bit of nit-picking: EPTRQ is also an option. Might be implied by EPTR tho.
Slightly wondering if it would be clearer if they are spelled out, i.e.: $arg needs one of NN, SPTR, MPTR, EPTR, EPTRQ, NULLOK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworded the message
if ( @bounded_strings | ||
&& ! defined $bounded_strings[-1]{$ptr_type}) | ||
{ | ||
$bounded_strings[-1]{$ptr_type} = \%entry; | ||
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit regen/embed.pl: Add ability to assert(s < e)
(Still going through the changes - so comment my be premature but still adding it for now)
At this point it's not clear to me why we can't just add the assert()
here and why the @bounded_strings
is needed.
Based on the comments I have some vague idea on what it's trying to handle but without deeper looking (and/or testing) that is not yet clear to me.
Can you provide an example of the cases that it handles?
Update:
After some more thinking I guess the main problem is that in for example:
ATdp |char * |delimcpy |SPTR char *to \
|EPTRQ const char *to_end \
|SPTR const char *from \
|EPTRQ const char *from_end \
it needs a way to "link" to
and to_end
together.
And at the same time also wants to support:
ATdp |char * |delimcpy |EPTRQ char *to_end \
|SPTR const char *to \
|SPTR const char *from \
|EPTRQ const char *from_end \
(i.e. to
and to_end
switching places)
The real problem: when we see SPTR/EPTR/... the code needs to figure to what string it belong.
I think it might be good to explicitly state that difficulty somewhere in the comments.
Also wondering if it should be made explicit if multiple pointers are present.
That is: using something like:
ATdp |char * |delimcpy |SPTR1 char *to \
|EPTRQ1 const char *to_end \
|SPTR2 const char *from \
|EPTRQ2 const char *from_end \
so that there is no need to make an assumption that they come in pair/in triplets.
It would also make it more flexible since then something like:
ATdp |char * |delimcpy |SPTR char *to \
|SPTR const char *from \
|EPTRQ const char *to_end \
|EPTRQ const char *from_end \
could also be supported. (But I'm not sure if that is really worth the trouble/effort)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added extensive comments to this point in the code that hopefully explain things adequately at this point.
if (1 == ( (defined $string->{S}) | ||
+ (defined $string->{M}) | ||
+ (defined $string->{E}))) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit regen/embed.pl: Add ability to assert(s < e)
I'm wondering.. for MPTR: should it enforce that it has both a SPTR and an EPTR?
The docs contains:
: MPTR means that not only must this pointer parameter be non-NULL, it
: points to a position in a character string such that
: SPTR <= MPTR < EPTR. The called function is free to look at any
: position in the range starting at SPTR up to but not including
: EPTR.
but if there is only a SPTR or only a EPTR then it can not guarantee that..
Maybe it should raise an error or a warning in such a case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've also added more extensive comments to embed.fnc about how you only need two constraints, and MPTR or SPTR are chosen based on the semantics of the parameter, even though the generated constraint could be the same in either case.
embed.fnc
Outdated
: MPTR means that not only must this pointer parameter be non-NULL, it | ||
: points to a position in a character string such that | ||
: SPTR <= MPTR < EPTR. The called function is free to look at any | ||
: position in the range starting at SPTR up to but not including | ||
: EPTR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit regen/embed.pl: Add ability to assert(s < e)
How does this work in combination with EPTRQ?
Just a dummy example:
|EPTRQ char *to_end \
|SPTR const char *to \
|MPTR const char *to_middle
Will the assert be: to <= to_middle && to_middle <= to_end
?
Or will it still be: to <= to_middle && to_middle < to_end
?
(I'm guessing the former, also not sure if it should be explicitly stated or not)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comments I've added as a result of yours here will make this clear.
regen/embed.pl
Outdated
push @asserts, "assert($lower->{deref}$lower->{argname}" | ||
. " <$upper->{equal} " | ||
. "$upper->{deref}$upper->{argname})"; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit regen/embed.pl: Add ability to assert(s < e)
The code works, it just required some thought to figure how it works.
which makes me wonder if it wouldn't be clearer to just write it more explicit, i.e. something like (untested):
if ($string->{S} and $string{E}) {
push @asserts, "assert($string->{S}->{deref}$string->{S}->{argname} <assert($string->{E}->{equal} assert($string->{E}->{deref}$string->{E}->{argname})";
}
if ($string->{S} and $string{M}) {
push @asserts, "assert($string->{S}->{deref}$string->{S}->{argname} <= assert($string->{M}->{deref}$string->{M}->{argname})";
}
if ($string->{M} and $string{E}) {
push @asserts, "assert($string->{M}->{deref}$string->{M}->{argname} <assert($string->{E}->{equal} assert($string->{E}->{deref}$string->{E}->{argname})";
}
Or I guess something like: (assuming v5.40.0, untested)
for my ($lower_i, $upper_i) (("S", "E"), ("S", "M"), ("M", "E")) {
my $lower = $string->{$lower_i} or next;
my $upper = $string->{$upper_i} or next;
push @asserts, "assert($lower->{deref}$lower->{argname}"
. " <$upper->{equal} "
. "$upper->{deref}$upper->{argname})";
}
(But I'm guessing this might generate an extra assert? With the current code when SPTR, MPTR and EPTR are all set I guess it creates the asserts: assert SPTR <= MPTR
and assert MPTR < EPTR
; while with the untested code above it would result in: assert SPTR < EPTR
, assert SPTR <= MPTR
, assert MPTR < EPTR
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is commented to avoid using modern-only constructs; I tried then using your first suggestion, and found that there was too much repetition of complicated expressions for my taste. I think the loop I have is easier to maintain. But I've added a bunch of comments to explain it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a fyi, if you want to avoid modern-only construct you could write it as (untested):
for my $i (["S", "E"], ["S", "M"], ["M", "E"]) {
my ($lower_i, $upper_i) = @$i;
my $lower = $string->{$lower_i} or next;
my $upper = $string->{$upper_i} or next;
push @asserts, "assert($lower->{deref}$lower->{argname}"
. " <$upper->{equal} "
. "$upper->{deref}$upper->{argname})";
}
or:
for my $i (["S", "E"], ["S", "M"], ["M", "E"]) {
my $lower = $string->{$i->[0]} or next;
my $upper = $string->{$i->[1]} or next;
push @asserts, "assert($lower->{deref}$lower->{argname}"
. " <$upper->{equal} "
. "$upper->{deref}$upper->{argname})";
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is commented to avoid using modern-only constructs
It indeed says:
require 5.004; # keep this compatible, an old perl is all we may have before
# we build the new one
But it doesn't make any sense to me. This file isn't executed while perl is built. It's only executed by people who develop the interpreter, who all should have access to modern Perl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think your solution here is a lot clearer than mine; so I've adopted it, but added a statement to skip s < e
if there is s <= m < e
embed.fnc
Outdated
|NN const U8 * const s \ | ||
|NN const U8 * const e \ | ||
|SPTR const U8 * const s \ | ||
|EPTR const U8 * const e \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit embed.fnc: Add ptr assertions for apparently non-problematic
I then ran the test suite, and reverted any that had problems.
Out of curiosity: do you still have the list of the failing ones somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure; I plan to go make every function with these kinds of parameters to have constraints added to embed.fnc.. Maybe I should do that now for each of them, using EPTRQ. Then I or someone else could go through them to see why there is an apparent discrepancy.
PERL_ARGS_ASSERT_IS_UTF8_CHAR_HELPER_; | ||
assert(0 == (flags & ~UTF8_DISALLOW_ILLEGAL_INTERCHANGE)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit embed.fnc: Add ptr assertions for apparently non-problematic
I think it would be better to move some of these changes in a separate commit.
In this commit two things happen:
- the assert calls are moved to the beginning of the function (since we can now do that)
- one of the asserts (
assert(e > s)
) is moved the PERL_ARGS_ASSERT macro.
Thinking some more: how I would likely do it:
- commit 1: add SPTR/EPTR/MPTR in embed.fnc except for the functions in the utf8.c file
- commit 2: move the asserts to the beginning of the function (and keep the
assert e > s
) - commit 3: add the SPTR/EPTR/MPTR in embed.fnc for these function and remove the
assert(e > s)
.
In commit 3 it should be obvious that these are now moved to the PERL_ARGS_ASSERT macro. In the current commit it's not obvious because there are many changes in proto.h)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, for all affected .c files
Inspection showed these were wrongly categorized
It's best to have these at the beginning to guarantee that the asserts come before any invalid use of the parameters. The functions done here will have their asserts manipulated in a few commits hence. Doing this now will highlight just the changes that one causes.
Wrap to fit in 80 column terminal window
This: * reorders some comments to make more sense * makes minor clarifications * removes comments that no longer make sense * adds comments about moving ARGS_ASSERT macros to the top of their functions when the code around them gets changed anyway.
Where s is a pointer into a string, and e is the end of it.
This uses the previous commit's new abilities on a couple of sample functions, so you can see the changes it makes in two small bites.
I went through the declarations in embed.fnc and added PTR constraints for all the ones that looked to have pointers to the beginning and end of a string. I then ran the test suite, and reverted any that had problems. Then I looked at the code for each one remaining to see if it was equipped to handle the case where the end == the beginning, and removed those. This is the result. Testing in the field may reveal others that the test suite missed; we can fix those as they occur. I removed now redundant asserts that were in the functions, and now are included in the ARGS_ASSERT macros
This is the first use of the new MPTR constraint that is used to generate an assertion that a pointer is somewhere in the middle of a string. I removed now redundant asserts that were in the functions, and now are included in the ARGS_ASSERT macros
Generally, a pointer to a string upper bound actually is to one beyond the actual final byte in the string. This is sanctioned by the C Standard, and allows you to just subtract the lower bound from it to get its length, without having to add 1. But some functions are written to tolerate the upper bound pointer being set to the actual final byte. The EPTRQ constraint in embed.fnc is used for those; the assertion becomes 'l <= u' instead of strictly less-than. This commit is the first to use this type of constraint, and it applies it only to those functions whose documentation or behavior clearly indicate this is expected. I removed now redundant asserts that were in the functions, and now are included in the ARGS_ASSERT macros There's a dozen-ish ones where that isn't true. And they need to be investigated further before deciding their disposition. l
There are a bunch of functions that are passed both a string pointer and a pointer to the string end. This commit allows you to declare the latter in embed.fnc, and have an
assert(s < e)
generated that goes in thePERL_ARGS_ASSERT
for the function.This is extended for functions where it should instead be
assert(s <= e)
andassert(s <= cur)