-
Notifications
You must be signed in to change notification settings - Fork 808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix invalid and accidental escape sequences #2112
Conversation
3821e54
to
7408a02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, but while fixing invalid escapes are fine I don't see why we should avoid raw strings.
setup.py
Outdated
@@ -213,7 +213,7 @@ def finalize_options(self, build_ext): | |||
found_mfc = False | |||
for incl in os.environ.get("INCLUDE", "").split(os.pathsep): | |||
# first is a "standard" MSVC install, second is the Vista SDK. | |||
for candidate in (r"..\src\occimpl.h", r"..\..\src\mfc\occimpl.h"): | |||
for candidate in ("..\\src\\occimpl.h", "..\\..\\src\\mfc\\occimpl.h"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using raw strings before was fine, right? I see no reason to avoid raw strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of styling/consistency between using r-strings or \\
for paths strings. If you'd prefer, I can revert those specific manual changes. (where I made non-regex r-strings regular strings for consistency purpose)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a need to enforce consistency here - IMO r-strings exist purely because consistency is over-rated :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, I see that in vscode too. That's gotta be a bug though, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be the intent:
To let you know that those characters will be escaped when used as part of a regex string.
https://docs.python.org/3/howto/regex.html#the-backslash-plague
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't make sense to me. The link you pointed to basically says "if you don't want to double up your backslashes when matching them in a regex, use a raw string" - it doesn't seem to come close to saying "raw strings should only be used when you want to avoid using 4 backslashes in a regex". Eg, https://docs.python.org/3/reference/lexical_analysis.html says just "such strings are called raw strings and treat backslashes as literal characters." and makes no reference to regular expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not saying it can't be used for anything else than regex strings.
Just to answer your comment about That's gotta be a bug though, right?
. It's intended semantic highlighting on microsoft's part to help with "the backslash plague" when using regex. Similarly to how it's done for JS/TS /regex/
strings
The highlight can be ignored if you use the r-string for other purposes (ie: writing raw text without escape characters)
GitHub isn't showing my new commits. But it's definitely in the branch https://github.com/Avasam/pywin32/commits/invalid-escape-sequences Maybe a temporary GitHub issue? |
com/win32com/client/__init__.py
Outdated
@@ -61,7 +61,7 @@ def GetObject(Pathname=None, Class=None, clsctx=None): | |||
ob = GetObject(Class = "ProgID") or GetObject(Class = clsid) will | |||
connect to an already running instance of the COM object. | |||
|
|||
ob = GetObject(r"c:\blah\blah\foo.xls") (aka the COM moniker syntax) | |||
ob = GetObject(r"c:\\blah\\blah\\foo.xls") (aka the COM moniker syntax) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring itself is invalid, this changes fixes it.
Before:
>>> from win32com.client import GetObject
>>> print(GetObject.__doc__)
Mimic VB's GetObject() function.
ob = GetObject(Class = "ProgID") or GetObject(Class = clsid) will
connect to an already running instance of the COM object.
ob = GetObject(r"clalah♀oo.xls") (aka the COM moniker syntax)
will return a ready to use Python wrapping of the required COM object.
Note: You must specifiy one or the other of these arguments. I know
this isn't pretty, but it is what VB does. Blech. If you don't
I'll throw ValueError at you. :)
This will most likely throw pythoncom.com_error if anything fails.
After:
>>> from win32com.client import GetObject
>>> print(GetObject.__doc__)
Mimic VB's GetObject() function.
ob = GetObject(Class = "ProgID") or GetObject(Class = clsid) will
connect to an already running instance of the COM object.
ob = GetObject(r"c:\blah\blah\foo.xls") (aka the COM moniker syntax)
will return a ready to use Python wrapping of the required COM object.
Note: You must specifiy one or the other of these arguments. I know
this isn't pretty, but it is what VB does. Blech. If you don't
I'll throw ValueError at you. :)
This will most likely throw pythoncom.com_error if anything fails.
another option is to make the docstring a raw-multiline-string:
>>> from win32com.client import GetObject
>>> print(GetObject.__doc__)
Mimic VB's GetObject() function.
ob = GetObject(Class = "ProgID") or GetObject(Class = clsid) will
connect to an already running instance of the COM object.
ob = GetObject(r"c:\blah\blah\foo.xls") (aka the COM moniker syntax)
will return a ready to use Python wrapping of the required COM object.
Note: You must specifiy one or the other of these arguments. I know
this isn't pretty, but it is what VB does. Blech. If you don't
I'll throw ValueError at you. :)
This will most likely throw pythoncom.com_error if anything fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - let's to the raw multiline doc string then as otherwise the code just looks wrong when read inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated for all changes. This PR is ready for a new review
@@ -1,7 +1,7 @@ | |||
""" | |||
Implements a permissions editor for services. | |||
Service can be specified as plain name for local machine, | |||
or as a remote service of the form \\machinename\service | |||
or as a remote service of the form \\\\machinename\\service |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is wrong too - it doesn't even pretend to be a string (ie, it's not quoted), it's a textual description of how Windows represents UNC paths, which is wrong with the slashes doubled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same comment about docstring)
\s
is an invalid escape sequence in a string (this is a multiline string)
Before:
>>> from win32comext.authorization.demos import EditServiceSecurity
>>> print(EditServiceSecurity.__doc__)
Implements a permissions editor for services.
Service can be specified as plain name for local machine,
or as a remote service of the form \machinename\service
After:
>>> from win32comext.authorization.demos import EditServiceSecurity
>>> print(EditServiceSecurity.__doc__)
Implements a permissions editor for services.
Service can be specified as plain name for local machine,
or as a remote service of the form \\machinename\service
As for the path itself, I understand it's supposed to be like network paths?
win32/scripts/regsetup.py
Outdated
@@ -506,9 +506,9 @@ def RegisterShellInfo(searchPaths): | |||
as well as the 2 wierd spots to locate the core Python files (eg, Python.exe, | |||
pythonXX.dll, the standard library and Win32 Extensions. | |||
|
|||
"regsetup -a myappname . .\subdir" | |||
"regsetup -a myappname . .\\subdir" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of a command-line where the doubling is wrong too - perhaps the best fix here is to make that cleaner - eg, drop the quotes and add, say, %
or c:\>
as a prefix or similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind if you suggest a line replacement, and/or I can make this a raw-multiline-string and drop all the double slashes in this string from line 502-525
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, as above, let's make sure it works both as a docstring and reads correctly in the editor.
win32/test/test_odbc.py
Outdated
@@ -206,7 +206,7 @@ def testLongBinary(self): | |||
|
|||
def testRaw(self): | |||
## Test binary data | |||
self._test_val("rawfield", memoryview(b"\1\2\3\4\0\5\6\7\8")) | |||
self._test_val("rawfield", memoryview(b"\1\2\3\4\0\5\6\7\\8")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why just one of the slashes here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
\8
is not a valid escape sequence, \0
to \7
is. This keeps the same behaviour:
>>> b"\0\1\2\3\4\5\6\7\8"
b'\x00\x01\x02\x03\x04\x05\x06\x07\\8'
Although it seems that by the test's intention, it might be better to remove \8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh, right - I think I'm fine with either removing the 8 or adding a comment about it.
Follow-up to #2045 and improved over #2100 .
The first commit was done by running
ruff . --select=UP025 --exclude=Pythonwin/Scintilla --exclude=adodbapi --fix
The second one was done manually.