[Feature Request] Attribute for ANSI/Unicode Suffix #711

timsneath · 2021-10-27T18:31:06Z

It would be incredibly helpful if there was a way to programmatically identify structs and functions that have a A / W suffix that denotes whether a Win32 API is ANSI / Unicode.

I mask this from consumers of my projection (as indeed, C does through the use of #define macros), which means that I strip off the suffix. But there's no easy heuristic for me to differentiate between (say) Windows.Win32.UI.Controls.TBBUTTONINFOA (an ANSI struct) and Windows.Win32.Devices.BiometricFramework.WINBIO_UNIT_SCHEMA (a struct that is neither).

I could create an elaborate exclusion list of false positives in my projection, but this really belongs in the source metadata, I think.

The text was updated successfully, but these errors were encountered:

sotteson1 · 2021-10-27T20:03:01Z

Unfortunately that information isn't available from the headers. We would need to build up a big list of them and apply attributes when we generate the .winmd.

kennykerr · 2021-10-27T20:07:22Z

For what its worth, the Rust projection simply provides both A and W functions unchanged and the developer decides which to use.

timsneath · 2021-10-27T21:07:35Z

Thanks for the quick reply both of you.

@kennykerr, yeah, that's a possibility here, but it creates lots of micro-frictions:

Autocomplete can't resolve quickly because there are two near-identically named functions
It's out of sync with the original C model where you #define UNICODE and then just use the unprefixed functions
It causes the number of functions, structs etc. to balloon even more, which makes the package unwieldy
It adds extra cognitive load to developers when (at least for Dart) there's little benefit of offering a choice

@sotteson1, that's essentially what I'm having to do within the projection. I believe there's a semi-heuristic approach here, however:

For structs that contain a PSTR or similar and contain an A suffix, they are ANSI; for structs that contain a PWSTR or similar and contain a W suffix, they are Unicode. Others are mostly string-neutral.
There are going to be some exceptions to that, perhaps, but the list isn't going to be different to other lists that we already maintain in the win32metadata repo. And it's going to be mostly static over time, so this is a one-time cost.

AArnott · 2021-10-27T22:05:13Z

In CsWin32 we generate structs only when asked for or when another API that was requested requires it. In those cases, we only need to generate one struct anyway (to match the request), and we leave the suffix in place.

Functions don't tend to be named in all caps, and CsWin32 has a cheap and seemingly adequate check on the name to assign it to narrow or wide character sets, and then we ignore all the ansi functions where a wide char version exists unless the user specifically asks for them.

AArnott · 2021-10-27T22:07:58Z

BTW: the narrow character structs and APIs are not "ansi", as I understand it. They are whatever the process's current codepage states. They may be UTF-8 or it may be "Windows code page 1252", but may be any of a number of other 8-bit character codepages.

More about code pages.

timsneath · 2021-10-27T22:35:51Z

Yeah, your (Microsoft's) docs need fixing for consistency here :)

https://docs.microsoft.com/en-us/windows/win32/learnwin32/working-with-strings#unicode-and-ansi-functions

I don't know if this is true in the world of UTF-8, but the recommendation on this page is to just use Unicode:

New applications should always call the Unicode versions. Many world languages require Unicode. If you use ANSI strings, it will be impossible to localize your application. The ANSI versions are also less efficient, because the operating system must convert the ANSI strings to Unicode at run time.

marler8997 · 2021-10-28T12:20:14Z

In my JSON projection I programmatically identify the A/W variants and include them in a list in each Namespace, i.e.

https://github.com/marlersoft/win32json/blob/ef937288bee6aea8763f0071cbfdf7d9fef62ff4/api/Storage.FileSystem.json#L17824

,"UnicodeAliases":[
	"WIN32_FIND_DATA"
	,"NTMS_DRIVEINFORMATION"
	,"NTMS_CHANGERINFORMATION"
	,"NTMS_PMIDINFORMATION"
	,"NTMS_PARTITIONINFORMATION"
	,"NTMS_DRIVETYPEINFORMATION"
	,"NTMS_CHANGERTYPEINFORMATION"
	,"NTMS_LIBREQUESTINFORMATION"
	,"NTMS_OPREQUESTINFORMATION"
	,"NTMS_OBJECTINFORMATION"
	,"NTMS_I1_LIBREQUESTINFORMATION"
	,"NTMS_I1_PMIDINFORMATION"
	,"NTMS_I1_PARTITIONINFORMATION"
	,"NTMS_I1_OPREQUESTINFORMATION"
	,"NTMS_I1_OBJECTINFORMATION"
	,"SearchPath"
	,"CreateDirectory"
	,"CreateFile"
	,"DefineDosDevice"

The algorithm to identify them is fairly straightforward. Most of the code is contained within this UnicodeAliasSet class: https://github.com/marlersoft/win32jsongen/blob/16f9bb264b1534dbd90e376ffcb2a5d8bf422039/Generator/JsonGenerator.cs#L1054

It takes all the top level symbols and detects if they end with A or W. If they do, then it will put them in a Candidates list, and if another symbol is detected that matches except it has the other suffix, then it's saved as a "unicode alias".

It's simple enough that each projection could implement this themselves, or the metadata could also leverage this logic to include this information somehow.

P.S. You also need to track all the symbols that don't end in A/W and verify that the base symbol for the ones that do end in A/W isn't already defined. There's a few dozen symbols where this applies.

mikebattista · 2023-04-19T20:30:20Z

What would the attribute(s) look like?

mikebattista · 2023-05-12T18:26:37Z

Ansi variants are now decorated with [Ansi] and Unicode variants are now decorated with [Unicode].

mikebattista assigned sotteson1 Nov 1, 2021

mikebattista added enhancement New feature or request usability Touch-up to improve the user experience for a language projection labels Nov 1, 2021

mikebattista closed this as completed in e19140c May 12, 2023

AArnott mentioned this issue May 24, 2023

Leverage Unicode/Ansi attributes in metadata microsoft/CsWin32#942

Open

mikebattista mentioned this issue Nov 3, 2023

Missing PRINT_INFO_* struct. #1729

Closed

halildurmus mentioned this issue Jan 11, 2024

Some functions and structs are missing Ansi/Unicode attributes #1817

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Attribute for ANSI/Unicode Suffix #711

[Feature Request] Attribute for ANSI/Unicode Suffix #711

timsneath commented Oct 27, 2021

sotteson1 commented Oct 27, 2021

kennykerr commented Oct 27, 2021

timsneath commented Oct 27, 2021 •

edited

Loading

AArnott commented Oct 27, 2021

AArnott commented Oct 27, 2021 •

edited

Loading

timsneath commented Oct 27, 2021

marler8997 commented Oct 28, 2021 •

edited

Loading

mikebattista commented Apr 19, 2023

mikebattista commented May 12, 2023

[Feature Request] Attribute for ANSI/Unicode Suffix #711

[Feature Request] Attribute for ANSI/Unicode Suffix #711

Comments

timsneath commented Oct 27, 2021

sotteson1 commented Oct 27, 2021

kennykerr commented Oct 27, 2021

timsneath commented Oct 27, 2021 • edited Loading

AArnott commented Oct 27, 2021

AArnott commented Oct 27, 2021 • edited Loading

timsneath commented Oct 27, 2021

marler8997 commented Oct 28, 2021 • edited Loading

mikebattista commented Apr 19, 2023

mikebattista commented May 12, 2023

timsneath commented Oct 27, 2021 •

edited

Loading

AArnott commented Oct 27, 2021 •

edited

Loading

marler8997 commented Oct 28, 2021 •

edited

Loading