Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix invalid and accidental escape sequences #2112

Merged
merged 8 commits into from
Nov 1, 2023

Conversation

Avasam
Copy link
Collaborator

@Avasam Avasam commented Aug 22, 2023

Follow-up to #2045 and improved over #2100 .
The first commit was done by running ruff . --select=UP025 --exclude=Pythonwin/Scintilla --exclude=adodbapi --fix
The second one was done manually.

Copy link
Owner

@mhammond mhammond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, but while fixing invalid escapes are fine I don't see why we should avoid raw strings.

setup.py Outdated
@@ -213,7 +213,7 @@ def finalize_options(self, build_ext):
found_mfc = False
for incl in os.environ.get("INCLUDE", "").split(os.pathsep):
# first is a "standard" MSVC install, second is the Vista SDK.
for candidate in (r"..\src\occimpl.h", r"..\..\src\mfc\occimpl.h"):
for candidate in ("..\\src\\occimpl.h", "..\\..\\src\\mfc\\occimpl.h"):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using raw strings before was fine, right? I see no reason to avoid raw strings.

Copy link
Collaborator Author

@Avasam Avasam Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of styling/consistency between using r-strings or \\ for paths strings. If you'd prefer, I can revert those specific manual changes. (where I made non-regex r-strings regular strings for consistency purpose)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a need to enforce consistency here - IMO r-strings exist purely because consistency is over-rated :)

Copy link
Collaborator Author

@Avasam Avasam Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just remembered the other reason, it was visual (though it'll slightly differ per editor)
Looks like we're trying to escape some characters (for use in a regex)
image
vs
image
but yeah that's minor anyway, I'm reverting atm to reduce changes

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, I see that in vscode too. That's gotta be a bug though, right?

Copy link
Collaborator Author

@Avasam Avasam Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be the intent:
image
To let you know that those characters will be escaped when used as part of a regex string.
https://docs.python.org/3/howto/regex.html#the-backslash-plague

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't make sense to me. The link you pointed to basically says "if you don't want to double up your backslashes when matching them in a regex, use a raw string" - it doesn't seem to come close to saying "raw strings should only be used when you want to avoid using 4 backslashes in a regex". Eg, https://docs.python.org/3/reference/lexical_analysis.html says just "such strings are called raw strings and treat backslashes as literal characters." and makes no reference to regular expressions.

Copy link
Collaborator Author

@Avasam Avasam Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not saying it can't be used for anything else than regex strings.
Just to answer your comment about That's gotta be a bug though, right?. It's intended semantic highlighting on microsoft's part to help with "the backslash plague" when using regex. Similarly to how it's done for JS/TS /regex/ strings

The highlight can be ignored if you use the r-string for other purposes (ie: writing raw text without escape characters)

setup.py Outdated Show resolved Hide resolved
@Avasam
Copy link
Collaborator Author

Avasam commented Sep 21, 2023

GitHub isn't showing my new commits. But it's definitely in the branch https://github.com/Avasam/pywin32/commits/invalid-escape-sequences

Maybe a temporary GitHub issue?

@@ -61,7 +61,7 @@ def GetObject(Pathname=None, Class=None, clsctx=None):
ob = GetObject(Class = "ProgID") or GetObject(Class = clsid) will
connect to an already running instance of the COM object.

ob = GetObject(r"c:\blah\blah\foo.xls") (aka the COM moniker syntax)
ob = GetObject(r"c:\\blah\\blah\\foo.xls") (aka the COM moniker syntax)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is wrong.

Copy link
Collaborator Author

@Avasam Avasam Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring itself is invalid, this changes fixes it.

Before:

>>> from win32com.client import GetObject
>>> print(GetObject.__doc__)

    Mimic VB's GetObject() function.

    ob = GetObject(Class = "ProgID") or GetObject(Class = clsid) will
    connect to an already running instance of the COM object.

    ob = GetObject(r"clalah♀oo.xls") (aka the COM moniker syntax)
    will return a ready to use Python wrapping of the required COM object.

    Note: You must specifiy one or the other of these arguments. I know
    this isn't pretty, but it is what VB does. Blech. If you don't
    I'll throw ValueError at you. :)

    This will most likely throw pythoncom.com_error if anything fails.

image

After:

>>> from win32com.client import GetObject
>>> print(GetObject.__doc__)

    Mimic VB's GetObject() function.

    ob = GetObject(Class = "ProgID") or GetObject(Class = clsid) will
    connect to an already running instance of the COM object.

    ob = GetObject(r"c:\blah\blah\foo.xls") (aka the COM moniker syntax)
    will return a ready to use Python wrapping of the required COM object.

    Note: You must specifiy one or the other of these arguments. I know
    this isn't pretty, but it is what VB does. Blech. If you don't
    I'll throw ValueError at you. :)

    This will most likely throw pythoncom.com_error if anything fails.

image


another option is to make the docstring a raw-multiline-string:

>>> from win32com.client import GetObject
>>> print(GetObject.__doc__)

    Mimic VB's GetObject() function.

    ob = GetObject(Class = "ProgID") or GetObject(Class = clsid) will
    connect to an already running instance of the COM object.

    ob = GetObject(r"c:\blah\blah\foo.xls") (aka the COM moniker syntax)
    will return a ready to use Python wrapping of the required COM object.

    Note: You must specifiy one or the other of these arguments. I know
    this isn't pretty, but it is what VB does. Blech. If you don't
    I'll throw ValueError at you. :)

    This will most likely throw pythoncom.com_error if anything fails.

image

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - let's to the raw multiline doc string then as otherwise the code just looks wrong when read inline.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated for all changes. This PR is ready for a new review

@@ -1,7 +1,7 @@
"""
Implements a permissions editor for services.
Service can be specified as plain name for local machine,
or as a remote service of the form \\machinename\service
or as a remote service of the form \\\\machinename\\service
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is wrong too - it doesn't even pretend to be a string (ie, it's not quoted), it's a textual description of how Windows represents UNC paths, which is wrong with the slashes doubled.

Copy link
Collaborator Author

@Avasam Avasam Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same comment about docstring)
\s is an invalid escape sequence in a string (this is a multiline string)
Before:

>>> from win32comext.authorization.demos import EditServiceSecurity
>>> print(EditServiceSecurity.__doc__)

Implements a permissions editor for services.
Service can be specified as plain name for local machine,
or as a remote service of the form \machinename\service

image

After:

>>> from win32comext.authorization.demos import EditServiceSecurity
>>> print(EditServiceSecurity.__doc__)

Implements a permissions editor for services.
Service can be specified as plain name for local machine,
or as a remote service of the form \\machinename\service

As for the path itself, I understand it's supposed to be like network paths?
image

@@ -506,9 +506,9 @@ def RegisterShellInfo(searchPaths):
as well as the 2 wierd spots to locate the core Python files (eg, Python.exe,
pythonXX.dll, the standard library and Win32 Extensions.

"regsetup -a myappname . .\subdir"
"regsetup -a myappname . .\\subdir"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an example of a command-line where the doubling is wrong too - perhaps the best fix here is to make that cleaner - eg, drop the quotes and add, say, % or c:\> as a prefix or similar?

Copy link
Collaborator Author

@Avasam Avasam Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't mind if you suggest a line replacement, and/or I can make this a raw-multiline-string and drop all the double slashes in this string from line 502-525

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, as above, let's make sure it works both as a docstring and reads correctly in the editor.

@@ -206,7 +206,7 @@ def testLongBinary(self):

def testRaw(self):
## Test binary data
self._test_val("rawfield", memoryview(b"\1\2\3\4\0\5\6\7\8"))
self._test_val("rawfield", memoryview(b"\1\2\3\4\0\5\6\7\\8"))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why just one of the slashes here?

Copy link
Collaborator Author

@Avasam Avasam Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

\8 is not a valid escape sequence, \0 to \7 is. This keeps the same behaviour:

>>> b"\0\1\2\3\4\5\6\7\8"
b'\x00\x01\x02\x03\x04\x05\x06\x07\\8'

Although it seems that by the test's intention, it might be better to remove \8

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, right - I think I'm fine with either removing the 8 or adding a comment about it.

@Avasam Avasam requested a review from mhammond September 22, 2023 15:42
@mhammond mhammond merged commit d26c5c8 into mhammond:main Nov 1, 2023
16 checks passed
@Avasam Avasam deleted the invalid-escape-sequences branch November 1, 2023 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants