Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Python backslashreplace to avoid UNRESOLVED tests #4366

Merged
merged 1 commit into from
Feb 6, 2024

Conversation

StephanTLavavej
Copy link
Member

Fixes #4308.

Followup to #2145, which added a two-step process for decoding compiler and test output. See Python's Unicode documentation and bytes.decode documentation. Our decodeOutput(bytes) first tries bytes.decode(), i.e. bytes.decode(encoding='utf-8', errors='strict'), to handle EDG's UTF-8 output. 'strict' asks it to throw a UnicodeDecodeError (derived from UnicodeError). If that happens, we assume we're looking at MSVC's output in the active codepage, so we fall back to locale-aware decoding.

The problem was that this fallback was still 'strict', so if the test emitted unrecognized characters, we'd get another UnicodeDecodeError, this time uncaught. That causes the test to be reported as UNRESOLVED. It originally had incredibly confusing output, which #4323 improved by printing the contents of the UnicodeDecodeError.

How can we get a test printing unrecognized characters? I validated my fix with a deterministic puts("\x8d");. (I don't think it's worth adding a test to validate this part of the test harness.) We originally encountered it sporadically, due to heap corruption tracked by #4268. Due to an STL bug, those tests reliably corrupt the heap, but (due to a still-mysterious chain of events), as the UCRT attempts to print its "HEAP CORRUPTION DETECTED" message, when printing the block type (usually "Normal"), something is damaged and it occasionally prints garbage memory contents. (We did corrupt the heap, after all.)

By requesting 'backslashreplace' during the fallback conversion, we avoid uncaught exceptions here. This will allow the test to pass (if it simply happened to print bizarre characters during its execution) or fail (if it printed bizarre characters during its plunge into Mount Doom), with readable output captured in the logs thanks to the backslash escaping.

I considered also marking #4268's heap-corrupting tests as SKIPPED instead of FAIL. However, they do seem to be reliably failing with detected heap corruption, the only sporadic part was whether they corrupted the heap badly enough to damage the UCRT's message. Unless and until we start seeing sporadic "unexpected passes" here, I am inclined to leave them marked as FAIL. Thus #4308 will be truly fixed as we'll no longer encounter sporadic test run failures due to UNRESOLVED (the affected tests will FAIL but that's expected, so the test run as a whole will pass).

@StephanTLavavej StephanTLavavej added the test Related to test code label Feb 3, 2024
@StephanTLavavej StephanTLavavej requested a review from a team as a code owner February 3, 2024 12:38
@StephanTLavavej StephanTLavavej self-assigned this Feb 5, 2024
@StephanTLavavej
Copy link
Member Author

I'm speculatively mirroring this to the MSVC-internal repo - please notify me if any further changes are pushed.

@StephanTLavavej StephanTLavavej changed the title Use Python backslashreplace to avoid UNRESOLVED tests Use Python backslashreplace to avoid UNRESOLVED tests Feb 5, 2024
Copy link
Member

@CaseyCarter CaseyCarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Approve with suggestions" that we can address in a followup.

@@ -77,8 +77,9 @@ def decodeOutput(bytes):
try:
return bytes.decode()
except UnicodeError:
# Use 'backslashreplace' to avoid throwing another exception for unrecognized characters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fond of "unrecognized characters" which suggests that the transcoder doesn't understand the source character set; the issue here is that tests sometimes emit byte sequences that aren't valid encoded characters. I'd prefer something like "when tests emit garbage bytes."

This is absolutely not worth resetting testing.

@StephanTLavavej StephanTLavavej merged commit 81314e3 into microsoft:main Feb 6, 2024
37 checks passed
@StephanTLavavej StephanTLavavej deleted the magic-decoder-ring branch February 6, 2024 08:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test Related to test code
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Sporadic unresolved test failures
2 participants