mbrtoc32 in MinGW-w64 buggy?

W

Werner Wenzel

Running the following MinGW-w64-built code on Windows 7 64 bit crashes
with me:

#include <stdio.h>
#include <uchar.h>

int main(void)
{
mbstate_t mbstate;

puts("So far okay ...");
mbrtoc32(NULL, "", 1, &mbstate);
puts("Not reached due to crash!");
return 0;
}

It should not crash as the problematic line derives from ISO C11 (N1570)
7.28.1.3p2.

As far as I can see this issue is caused by
\mingw-builds\sources\mingw-w64-v3.1.0\mingw-w64-crt\misc\uchar_mbrtoc32.c,
line 32, which--in this special case--dereferences NULL.

Is this thought correct or am I missing something?

Werner
 
E

Eric Sosman

Running the following MinGW-w64-built code on Windows 7 64 bit crashes
with me:

#include <stdio.h>
#include <uchar.h>

int main(void)
{
mbstate_t mbstate;

puts("So far okay ...");
mbrtoc32(NULL, "", 1, &mbstate);
puts("Not reached due to crash!");
return 0;
}

It should not crash as the problematic line derives from ISO C11 (N1570)
7.28.1.3p2.

As far as I can see this issue is caused by
\mingw-builds\sources\mingw-w64-v3.1.0\mingw-w64-crt\misc\uchar_mbrtoc32.c,
line 32, which--in this special case--dereferences NULL.

Is this thought correct or am I missing something?

I am no expert on wide-character utilities and I have not looked
at the source code, but it looks to me like the `mbstate' variable
has never been initialized, and so may "contain garbage." What
happens if you use `mbstate_t mbstate = { 0 };' instead?
 
J

James Kuyper

Running the following MinGW-w64-built code on Windows 7 64 bit crashes
with me:

#include <stdio.h>
#include <uchar.h>

int main(void)
{
mbstate_t mbstate;

puts("So far okay ...");
mbrtoc32(NULL, "", 1, &mbstate);
puts("Not reached due to crash!");
return 0;
}

It should not crash as the problematic line derives from ISO C11 (N1570)
7.28.1.3p2.

I suspect that the problem is that your mbstate object is uninitialized.
mbstate_t objects can be zero-initialized, and zero is the only value
that the standard guarantees that they can be initialized with, so I'd
recommend using that. 7.28.1.3p2 doesn't address how the object pointed
at by ps was initialized.
 
W

Werner Wenzel

Am 29.03.2014 14:08, schrieb Eric Sosman:
I am no expert on wide-character utilities and I have not looked
at the source code, but it looks to me like the `mbstate' variable
has never been initialized, and so may "contain garbage." What
happens if you use `mbstate_t mbstate = { 0 };' instead?

Actually, it still crashes with "mbstate_t mbstate = { 0 };".

The cited MinGW-w64 code reads as follows:

size_t mbrtoc32 (char32_t *__restrict__ pc32,
const char *__restrict__ s,
size_t n,
mbstate_t *__restrict__ __UNUSED_PARAM(ps))
{
if (*s == 0)
{
*pc32 = 0;
return 0;
}
....

In this special case the empty string (2nd argument) triggers an
assignment of 0 to where pc32 points to and pc32 points nowhere.

In my opinion the arguable code should read:

if (pc32) *pc32 = 0;

Werner
 
E

Eric Sosman

Am 29.03.2014 14:08, schrieb Eric Sosman:

Actually, it still crashes with "mbstate_t mbstate = { 0 };".

The cited MinGW-w64 code reads as follows:

size_t mbrtoc32 (char32_t *__restrict__ pc32,
const char *__restrict__ s,
size_t n,
mbstate_t *__restrict__ __UNUSED_PARAM(ps))
{
if (*s == 0)
{
*pc32 = 0;
return 0;
}
...

In this special case the empty string (2nd argument) triggers an
assignment of 0 to where pc32 points to and pc32 points nowhere.

In my opinion the arguable code should read:

if (pc32) *pc32 = 0;

That looks to me like a bug; you might want to report it
to the Mingols.

It also seems to me your original code had a bug, which
didn't happen to make a difference with this implementation
of mbrtoc32() but might have made trouble with others.
 
K

Keith Thompson

Eric Sosman said:
I am no expert on wide-character utilities and I have not looked
at the source code, but it looks to me like the `mbstate' variable
has never been initialized, and so may "contain garbage." What
happens if you use `mbstate_t mbstate = { 0 };' instead?

mbstate is not initialized prior to the call, but that's not a problem.
Its address, not its value, is passed to mbrtoc32(), which updates
the pointed-to object.

N1570 7.28.1:

These functions have a parameter, ps, of type pointer to mbstate_t
that points to an object that can completely describe the current
conversion state of the associated multibyte character sequence,
which the functions alter as necessary.
 
E

Eric Sosman

mbstate is not initialized prior to the call, but that's not a problem.
Its address, not its value, is passed to mbrtoc32(), which updates
the pointed-to object.

N1570 7.28.1:

These functions have a parameter, ps, of type pointer to mbstate_t
that points to an object that can completely describe the current
conversion state of the associated multibyte character sequence,
which the functions alter as necessary.

Yabbut... I understand this to mean that the pointed-to
mbstate_t object is both an output *and* an input to the function.
What would be the point of having one call report the state, and
then having the next ignore state changes encountered by the first?

(Still, as I said before: "I am no expert.")
 
K

Keith Thompson

Eric Sosman said:
On 3/29/2014 5:38 PM, Keith Thompson wrote: [...]
mbstate is not initialized prior to the call, but that's not a problem.
Its address, not its value, is passed to mbrtoc32(), which updates
the pointed-to object.

N1570 7.28.1:

These functions have a parameter, ps, of type pointer to mbstate_t
that points to an object that can completely describe the current
conversion state of the associated multibyte character sequence,
which the functions alter as necessary.

Yabbut... I understand this to mean that the pointed-to
mbstate_t object is both an output *and* an input to the function.
What would be the point of having one call report the state, and
then having the next ignore state changes encountered by the first?

(Still, as I said before: "I am no expert.")

I don't see anything in the quoted text that implies that the functions
read the value of the pointed-to mbstate_t object.

Still, you make a good point. I guess I've got some reading to do.
 
J

James Kuyper

I don't see anything in the quoted text that implies that the functions
read the value of the pointed-to mbstate_t object.

"... can completely describe the current conversion state ... the
functions alter as necessary."
To me that at least suggests that the mbstate_t object should start out
correctly describing the current conversion state at the time the
function is called, even though it doesn't explicitly say so.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,020
Latest member
GenesisGai

Latest Threads

Top