Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] gh-89621: re.subn?(): raise error when RegexFlag is wrongly used as count argument #91477

Closed
wants to merge 3 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Apr 12, 2022

@serhiy-storchaka

In re.subn?() functions, some users wrongly use RegexFlag as the count parameter.

Many invalid bug reports can be found (#89621, #86639, #85930, #85830, #79724, #75110, #73091, #72038, #70542, #66949, #61863, #59972, #56287, #56166, #56156, #55471). Maybe there are some code still generating incorrect data silently.

This PR is just a draft, not completive. I didn't expect this PR to be merged.
See if you agree with this method. Compared to changing count to a keyword only parameter, it keeps compatibility, and the overhead is very small.

If you want to make a better PR, please make it.
IMO making such PR is faster & easier than reviewing.
Or let me know if you want me to perfect this PR.

@ghost ghost changed the title [WIP] gh-89621: raise error when RegexFlag is wrongly used as count argument [WIP] gh-89621: re.subn?(): raise error when RegexFlag is wrongly used as count argument Apr 12, 2022
@ghost
Copy link
Author

ghost commented Apr 12, 2022

Another method is implementing re.subn?() functions in C, and check type(count) != RegexFlag there. In this way, Pattern.subn?() have no overhead at all.

def sub(pattern, repl, string, count=0, flags=0):
    return _compile(pattern, flags).sub(repl, string, count)

def subn(pattern, repl, string, count=0, flags=0):
    return _compile(pattern, flags).subn(repl, string, count)

Comment on lines +1276 to +1277
PyErr_SetString(PyExc_ValueError,
"count arguemnt wrong type.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just keep the existing error. ValueError is wrong exception for wrong type, and the raised exception can be for example OverflowError.

"count argument should not be RegexFlag.");
return NULL;
} else {
count_value = PyLong_AsSsize_t(count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look at the code for the n format unit in getargs.c. You should reproduce it.

@@ -1249,19 +1250,36 @@ _sre.SRE_Pattern.sub
/
repl: object
string: object
count: Py_ssize_t = 0
count: object(c_default="NULL") = 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be easier to implement a custom converter (see docs for PyArg_Parse, format unit O&).

    count: object(converter="my_converter_for_ssize_t_except_RegexFlag") = 0

@luxedo
Copy link

luxedo commented Sep 12, 2024

I have just fallen into this trap. It's very easy to get cocky and forget to pay attention to the count argument.

Any chance of at least adding type hints to the re library or would this break too many things?

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants