Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VB default ByVal marshalling behavior is actually VBByRefStr and it truncates MBCS input #9944

Closed
jeffschwMSFT opened this issue Mar 15, 2018 · 4 comments

Comments

@jeffschwMSFT
Copy link
Member

VB defaults ByVal marshalling to VBByRefStr, which will truncate inputs that are MBCS. Here is the repro:

Suppose there is a DLL contains a function which requires char* argument as below.

void __stdcall func (char * pszText)
{
    return;
}

If you call this function via P/Invoke with multi-bytes character as below, the argument is corrupted.

Declare Sub func Lib "CppDll" (ByVal text As String)
Sub Main()

    Dim str As String = "あいうえお"
    Console.WriteLine(str)
    func(str)
    Console.WriteLine(str)

End Sub

Instead of using the declaration above, any of the followings can avoid the corruption.

Declare Sub func Lib "CppDll" (<MarshalAs(UnmanagedType.LPTStr)> ByVal text As String)
Declare Sub func Lib "CppDll" (<MarshalAs(UnmanagedType.LPStr)> ByVal text As String)
Declare Sub func Lib "CppDll" (<MarshalAs(UnmanagedType.BStr)> ByVal text As String)

Here is the analysis:
ByValStr marshaling concept is flawed when it comes to marshaling multi-byte characters, such as Japanese and Chinese. It marshales it out to native code correctly – allocating enough buffer and marshal it as ANSI characters. However, when it marshal the string back into native, it assumes the length of the string is the managed length of the string, which is 5 in this case. As you’ve already guessed by now, this will result in a partial string being read/convert into managed and you’ll see a truncation in the middle of the string. This is a bug in the CLR.

Refer to 270066 for more details, moving to .NET Core due to the potential breaking nature of this request.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@AaronRobinsonMSFT
Copy link
Member

Updated in repro in C#

public unsafe class Program
{
    static void PrintBytes(string str)
    {
        byte[] bytes = new byte[str.Length * sizeof(char)];
        System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);

        foreach (var b in bytes)
        {
            Console.Write($"{b:x} ");
        }
        Console.WriteLine();
    }

    [DllImport(@"NativeLibrary.dll")]
    extern static void Test2([MarshalAs(UnmanagedType.VBByRefStr)] ref string ff);
    static void Main(string[] args)
    {
        string ff = "あいうえお";

        PrintBytes(ff);
        Test2(ref ff);
        PrintBytes(ff);
    }
}
extern "C" void Test(char* s)
{
    return;
}

Bad output:

42 30 44 30 46 30 48 30 4a 30
3f 0 3f 0 3f 0 3f 0 3f 0

@AaronRobinsonMSFT
Copy link
Member

AaronRobinsonMSFT commented Apr 23, 2020

Here is the analysis:
ByValStr marshaling concept is flawed when it comes to marshaling multi-byte characters, such as Japanese and Chinese. It marshales it out to native code correctly – allocating enough buffer and marshal it as ANSI characters. However, when it marshal the string back into native, it assumes the length of the string is the managed length of the string, which is 5 in this case. As you’ve already guessed by now, this will result in a partial string being read/convert into managed and you’ll see a truncation in the middle of the string. This is a bug in the CLR.

This underlying issue here isn't length but the lossy nature of character encoding conversions.

  1. The string comes into the marshaler as expected and is converted via VBByValStrMarshaler::ConvertToNative() into an ANSI string (see line 454 below). This is the point where the mangling and data loss are occurring. The string length is computed using characters instead of bytes which results in a length of 5 and that is encoded in the returned string: 05 00 00 00 3F 3F 3F 3F 3F 00.

    internal static unsafe IntPtr ConvertToNative(string strManaged, bool fBestFit, bool fThrowOnUnmappableChar, ref int cch)
    {
    if (null == strManaged)
    {
    return IntPtr.Zero;
    }
    byte* pNative;
    cch = strManaged.Length;
    // length field at negative offset + (# of characters incl. the terminator) * max ANSI char size
    int nbytes = checked(sizeof(uint) + ((cch + 1) * Marshal.SystemMaxDBCSCharSize));
    pNative = (byte*)Marshal.AllocCoTaskMem(nbytes);
    int* pLength = (int*)pNative;
    pNative += sizeof(uint);
    if (0 == cch)
    {
    *pNative = 0;
    *pLength = 0;
    }
    else
    {
    byte[] bytes = AnsiCharMarshaler.DoAnsiConversion(strManaged, fBestFit, fThrowOnUnmappableChar, out int nbytesused);
    Debug.Assert(nbytesused < nbytes, "Insufficient buffer allocated in VBByValStrMarshaler.ConvertToNative");
    fixed (byte* pBytes = &bytes[0])
    {
    Buffer.Memcpy(pNative, pBytes, nbytesused);
    }
    pNative[nbytesused] = 0;
    *pLength = nbytesused;
    }
    return new IntPtr(pNative);
    }

  2. The native function is called with the converted string from (1).

  3. The string computed in (1) is then passed to VBByValStrMarshaler::ConvertToManaged() since it was a ref and we want to bring back any changes from the native call. However at this point we are converting the converted string (i.e. 05 00 00 00 3F 3F 3F 3F 3F 00).

    internal static unsafe string? ConvertToManaged(IntPtr pNative, int cch)
    {
    if (IntPtr.Zero == pNative)
    {
    return null;
    }
    return new string((sbyte*)pNative, 0, cch);
    }

  4. The converted string from (1), which was passed to (2), and then converted back in (3) is now passed back out as the ref dictates. The user then observes 3F 00 3F 00 3F 00 3F 00 3F 00 00 00 which is the converted string back into nonsense.

There are options here to fix this but I am dubious there is a lot of value in fixing this issue after so many years.

@AaronRobinsonMSFT
Copy link
Member

AaronRobinsonMSFT commented Apr 23, 2020

@jeffschwMSFT @jkoritzinsky @elinor-fung Any thoughts here?

My preference here would be to simply update the documentation about the nuances on VB.Net and mention MBCS strings so we can point people at something. I believe this kind of change should be pushed into the source generator approach for P/Invokes since it has so little impact given the number of reported issues.

@AaronRobinsonMSFT
Copy link
Member

Now that the LibraryImport generator (i.e., source generated DllImports) has been exposed as a publicly consumable tool, this issue should be explored relative to that effort. This specific issue is being close due to lower priority relative to other interop needs and should be solved using the source generator.

Support for marshalling customized types can be found in the new API described at #66121.

@ghost ghost locked as resolved and limited conversation to collaborators Jun 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants