Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Attribute for ANSI/Unicode Suffix #711

Closed
timsneath opened this issue Oct 27, 2021 · 9 comments
Closed

[Feature Request] Attribute for ANSI/Unicode Suffix #711

timsneath opened this issue Oct 27, 2021 · 9 comments
Assignees
Labels
enhancement New feature or request usability Touch-up to improve the user experience for a language projection

Comments

@timsneath
Copy link
Contributor

It would be incredibly helpful if there was a way to programmatically identify structs and functions that have a A / W suffix that denotes whether a Win32 API is ANSI / Unicode.

I mask this from consumers of my projection (as indeed, C does through the use of #define macros), which means that I strip off the suffix. But there's no easy heuristic for me to differentiate between (say) Windows.Win32.UI.Controls.TBBUTTONINFOA (an ANSI struct) and Windows.Win32.Devices.BiometricFramework.WINBIO_UNIT_SCHEMA (a struct that is neither).

I could create an elaborate exclusion list of false positives in my projection, but this really belongs in the source metadata, I think.

@sotteson1
Copy link
Contributor

Unfortunately that information isn't available from the headers. We would need to build up a big list of them and apply attributes when we generate the .winmd.

@kennykerr
Copy link
Contributor

For what its worth, the Rust projection simply provides both A and W functions unchanged and the developer decides which to use.

@timsneath
Copy link
Contributor Author

timsneath commented Oct 27, 2021

Thanks for the quick reply both of you.

@kennykerr, yeah, that's a possibility here, but it creates lots of micro-frictions:

  • Autocomplete can't resolve quickly because there are two near-identically named functions
  • It's out of sync with the original C model where you #define UNICODE and then just use the unprefixed functions
  • It causes the number of functions, structs etc. to balloon even more, which makes the package unwieldy
  • It adds extra cognitive load to developers when (at least for Dart) there's little benefit of offering a choice

@sotteson1, that's essentially what I'm having to do within the projection. I believe there's a semi-heuristic approach here, however:

  • For structs that contain a PSTR or similar and contain an A suffix, they are ANSI; for structs that contain a PWSTR or similar and contain a W suffix, they are Unicode. Others are mostly string-neutral.
  • There are going to be some exceptions to that, perhaps, but the list isn't going to be different to other lists that we already maintain in the win32metadata repo. And it's going to be mostly static over time, so this is a one-time cost.

@AArnott
Copy link
Member

AArnott commented Oct 27, 2021

In CsWin32 we generate structs only when asked for or when another API that was requested requires it. In those cases, we only need to generate one struct anyway (to match the request), and we leave the suffix in place.

Functions don't tend to be named in all caps, and CsWin32 has a cheap and seemingly adequate check on the name to assign it to narrow or wide character sets, and then we ignore all the ansi functions where a wide char version exists unless the user specifically asks for them.

@AArnott
Copy link
Member

AArnott commented Oct 27, 2021

BTW: the narrow character structs and APIs are not "ansi", as I understand it. They are whatever the process's current codepage states. They may be UTF-8 or it may be "Windows code page 1252", but may be any of a number of other 8-bit character codepages.

More about code pages.

@timsneath
Copy link
Contributor Author

Yeah, your (Microsoft's) docs need fixing for consistency here :)

https://docs.microsoft.com/en-us/windows/win32/learnwin32/working-with-strings#unicode-and-ansi-functions

I don't know if this is true in the world of UTF-8, but the recommendation on this page is to just use Unicode:

New applications should always call the Unicode versions. Many world languages require Unicode. If you use ANSI strings, it will be impossible to localize your application. The ANSI versions are also less efficient, because the operating system must convert the ANSI strings to Unicode at run time.

@marler8997
Copy link
Contributor

marler8997 commented Oct 28, 2021

In my JSON projection I programmatically identify the A/W variants and include them in a list in each Namespace, i.e.

https://github.com/marlersoft/win32json/blob/ef937288bee6aea8763f0071cbfdf7d9fef62ff4/api/Storage.FileSystem.json#L17824

,"UnicodeAliases":[
	"WIN32_FIND_DATA"
	,"NTMS_DRIVEINFORMATION"
	,"NTMS_CHANGERINFORMATION"
	,"NTMS_PMIDINFORMATION"
	,"NTMS_PARTITIONINFORMATION"
	,"NTMS_DRIVETYPEINFORMATION"
	,"NTMS_CHANGERTYPEINFORMATION"
	,"NTMS_LIBREQUESTINFORMATION"
	,"NTMS_OPREQUESTINFORMATION"
	,"NTMS_OBJECTINFORMATION"
	,"NTMS_I1_LIBREQUESTINFORMATION"
	,"NTMS_I1_PMIDINFORMATION"
	,"NTMS_I1_PARTITIONINFORMATION"
	,"NTMS_I1_OPREQUESTINFORMATION"
	,"NTMS_I1_OBJECTINFORMATION"
	,"SearchPath"
	,"CreateDirectory"
	,"CreateFile"
	,"DefineDosDevice"

The algorithm to identify them is fairly straightforward. Most of the code is contained within this UnicodeAliasSet class: https://github.com/marlersoft/win32jsongen/blob/16f9bb264b1534dbd90e376ffcb2a5d8bf422039/Generator/JsonGenerator.cs#L1054

It takes all the top level symbols and detects if they end with A or W. If they do, then it will put them in a Candidates list, and if another symbol is detected that matches except it has the other suffix, then it's saved as a "unicode alias".

It's simple enough that each projection could implement this themselves, or the metadata could also leverage this logic to include this information somehow.

P.S. You also need to track all the symbols that don't end in A/W and verify that the base symbol for the ones that do end in A/W isn't already defined. There's a few dozen symbols where this applies.

@mikebattista mikebattista added enhancement New feature or request usability Touch-up to improve the user experience for a language projection labels Nov 1, 2021
@mikebattista
Copy link
Collaborator

What would the attribute(s) look like?

@mikebattista
Copy link
Collaborator

Ansi variants are now decorated with [Ansi] and Unicode variants are now decorated with [Unicode].

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request usability Touch-up to improve the user experience for a language projection
Projects
None yet
Development

No branches or pull requests

6 participants