What is the proper way to encode a std::wstring to UTF-16 in a std::vector<unsigned char>?

I am attempting to encode a std::wstring to UTF-16 and pass it to a function which takes a pair of vector iterators. To accomplish this, I have tried the following.

std::vector<unsigned char> HashAlgorithm::ComputeHash(std::wstring value)
{
    std::wstring_convert<std::codecvt_utf16<wchar_t>> converter;

    std::string encodedString = converter.to_bytes(value);

    std::vector<unsigned char> encodedBytes(
        reinterpret_cast<unsigned char const *>(encodedString.c_str()),
        reinterpret_cast<unsigned char const *>(encodedString.c_str() + encodedString.size()));

    std::vector<unsigned char> hashedBytes = this->ComputeHash(encodedBytes.begin(), encodedBytes.end());
    return hashedBytes;
}

It works fine for the most part, except I know something is wrong because in debug mode I am seeing the following assertion on the return of hashedBytes, which smells like some kind of stack corruption.

What is causing this error and how can I prevent it?

EDIT #1

Here are the contents of support functions that I am using. I've been trying to break it down to figure out where the assertion is originating and why, but I've not been able to get a minimal reproduction yet.

std::vector<unsigned char> HashAlgorithm::ComputeHash(std::vector<unsigned char>::const_iterator begin, std::vector<unsigned char>::const_iterator end)
{
    this->Process(begin, end);
    std::vector<unsigned char> hashedBytes = this->Complete();

    return hashedBytes;
}

void HashAlgorithm::Process(std::vector<unsigned char>::const_iterator begin, std::vector<unsigned char>::const_iterator end)
{
    NTSTATUS status = BCryptHashData(this->hash, const_cast<unsigned char *>(&(*begin)), std::distance(begin, end), 0);
}

std::vector<unsigned char> HashAlgorithm::Complete()
{
    std::vector<unsigned char> result(this->outputSize);

    NTSTATUS status = BCryptFinishHash(this->hash, result.data(), (ULONG)result.size(), 0);
    return result;
}

Answers


std::wstring between Microsoft VC++ 2010 and 2015 are not backward compatible.

The problem is that std::wstring in the library code (VS 2010) and the client code (VS 2015) differ in size by 4 bytes. The newer version of std::wstring is larger at 32 bytes while the older one is 28 bytes. When passing these variables by value around, stack corruption occurs in the first 4 bytes of the smaller std::wstring and triggers the stack canaries used to guard against stack based exploits.


To ensure you don't lose any data along the way you should hash the bytes directly:

std::vector<unsigned char> myClass::ComputeHash(std::wstring value)
{
    auto size_of_data = value.size()*sizeof(value[0]);
    auto pointer_to_data = reinterpret_cast<unsigned char const *>(value.data());
    std::vector<unsigned char> encodedBytes(pointer_to_data,pointer_to_data+size_of_data);
    std::vector<unsigned char> hashedBytes = this->ComputeHash(encodedBytes.begin(),encodedBytes.end());
    return hashedBytes;
}

Try adding a banana (🍌 \U0001F34C) to see what happens to your data as you step through. e.g. std::wstring my_unicode_string{L"Test string 🍌\n"}; or std::wstring wstr = L"z\u00df\u6c34\U0001F34C"; // L"zß水🍌". The second example might be better if your .cpp file isn't saved as unicode text.

You will probably get an exception thrown by to_bytes because only codepoints in the basic multilingual plane can be encoded into a single wchar. And if it does do the conversion for you it might have mapped different higher codepoints to similar bytes which would lead to the same hash for different strings.


Need Your Help

jQuery jqGrid documentation

c# jquery asp.net ajax jqgrid

I am trying to learn the jQuery jqGrid and I must say, the docs are very sparse...

Trying to reproduce "must declare a body" compiler error

c# .net visual-studio compiler-errors visual-studio-2015

I'm trying to reproduce the C# compiler error CS0840 with the exact code that's given in the website: