FR accent characters

B

Bert Szoghy

Hello,

I am struggling with an extended stored procedure DLL coded in C. The
SQL Server database contains French accented characters. The DLL drops
them as it passes on the data to managed code.

I'm trying to wrap my mind around the wchar_t data type.

I tried doing a simple EXE which compiles and runs on Visual Studion
..NET 2003:

#include <string.h>

void main()
{
char xml[100] = "hello";
strcat(xml," world");
printf("%s",xml);
}

I wanted to do the same with wide characters. The following compiles
but chokes on the second line when run:

#include <string.h>
#include <wchar.h>

void main()
{
wchar_t * xml = "hello";
wcscat(xml," là mondé");
// The 2 accented characters above might show up wrong when posted,
on a web page
// the special HTML characters are &agrave; and &eacute;
sprintf("%s",xml);
}

What would be the working code? Scoured the refs (K&R, C Unleashed)
they were unhelpful.

URLs would be welcome.

Thanks in advance!
Bert
 
M

Martin Ambuhl

Bert said:
Hello,

I am struggling with an extended stored procedure DLL coded in C. The
SQL Server database contains French accented characters. The DLL drops
them as it passes on the data to managed code.

DLL, SQL, etc. are not C and are off-topic. They have nothing to do
with your problem, either, so we'll ignore them.
I'm trying to wrap my mind around the wchar_t data type.

I tried doing a simple EXE which compiles and runs on Visual Studion
.NET 2003:

Then turn your diagnostics on ...
#include <string.h>

void main()
^^^^
This marks the coder as incompetent.
{
char xml[100] = "hello";
strcat(xml," world");
printf("%s",xml);
}

I wanted to do the same with wide characters. The following compiles
but chokes on the second line when run:

#include <string.h>
#include <wchar.h>

void main()
{
wchar_t * xml = "hello";
wcscat(xml," là mondé");
// The 2 accented characters above might show up wrong when posted,
on a web page
// the special HTML characters are &agrave; and &eacute;
sprintf("%s",xml);
}

What would be the working code? Scoured the refs (K&R, C Unleashed)
they were unhelpful.

Try the following. Notice the differences in the form of the constant
strings and in the output format. What actually is produced is
implementation- and locale-specific.

#include <stdio.h>
#include <string.h>
#include <wchar.h>

void first_proc(void)
{
char xml[100] = "hello";
strcat(xml, " world");
printf("%s\n", xml);
}
void second_proc(void)
{
wchar_t *xml = L"hello";
wcscat(xml, L" là mondé");
printf("%ls\n", xml);
}

int main()
{
first_proc();
second_proc();
return 0;
}
 
C

Chris Croughton

Try the following. Notice the differences in the form of the constant
strings and in the output format. What actually is produced is
implementation- and locale-specific.

As in core dumps and other nasal demons...
#include <stdio.h>
#include <string.h>
#include <wchar.h>

void first_proc(void)
{
char xml[100] = "hello";
strcat(xml, " world");
printf("%s\n", xml);
}
void second_proc(void)
{
wchar_t *xml = L"hello";
wcscat(xml, L" là mondé");

Splat! You've just tried to write past the end of a string which is a
pointer to a wide string literal with 6 characters. Undefined
behaviour twice (writing to a string literal and writing off the end of
it)...

If you make the declaration of xml wchar_t xml[100] (to match the first
one) it works rather better.

However, putting accented characters in source is horribly undefined and
non-portable. Indeed, the wide character functions do not produce
anything portable, which is why most people using Unicode have their own
code to handle it, the functions in the C Standard aren't guaranteed to
do anything at all usable unless __STDC_ISO_10646__ is defined, which it
needn't be (if the "supported locales" for the implementation are
limited then wchar_t can be an 8 bit type).
printf("%ls\n", xml);
}

int main()
{
first_proc();
second_proc();
return 0;
}

The C library on my system doesn't support %ls, unfortunately, I used
the debugger to verify that the wcscat worked...

Chris C
 
M

Martin Ambuhl

Chris Croughton wrote:
[..]
As in core dumps and other nasal demons...


Splat! You've just tried to write past the end of a string which is a
pointer to a wide string literal with 6 characters.

Thank you; I made the grievous error of correcting as little as possible
of the OP's code without noticeing that errors were on almost every line
rather than only half of them.
> Undefined
behaviour twice (writing to a string literal and writing off the end of
it)...

It's always good for regulars to be occasionally humiliated by someone
who only showed up last Novermber 15th.
 
B

Bertrand Szoghy

Hello Martin and Chris,

Indeed the following code does not go Splat! (the correct technical term
requires an exclamation point)

#include <string.h>
#include <wchar.h>
#include <stdio.h>

void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}

But the result of the printf will give (on Windows XP) on the command prompt
"hello lÓ mondÚ"

The code I am looking for will never be portedto anything else than Windows.

Best regards,
Bert
 
L

lndresnick

Bertrand said:
Hello Martin and Chris,

Indeed the following code does not go Splat! (the correct technical term
requires an exclamation point)

#include <string.h>
#include <wchar.h>
#include <stdio.h>

void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}

But the result of the printf will give (on Windows XP) on the command prompt
"hello lÓ mondÚ"

The code I am looking for will never be portedto anything else than Windows.

Best regards,
Bert

void main() is bogus, try int main(void) and returning something.
And a newline after your printf is required to guarantee anything
is displayed. But neither of those is your issue.

You probably need some help from a windows group at this point, not
comp.lang.c. How characters actually display in your command window
on windows is off topic here and may depend on the language setup of
your system. <OT> I seem to remember you can change this stuff by
changing the code page with chcp, as in chcp 1252. Gives you a
starting point at
least. But that is old, suspect knowledge, ask in a windows group for
better info. </OT>

-David
 
M

Mark McIntyre

void main()

this is still wrong. Please read Martin's original post.
But the result of the printf will give (on Windows XP) on the command prompt
"hello lÓ mondÚ"

This is probably entirely dependent on the codepage that the terminal
device uses for printing output. You'll need to ask Windows experts
about that part, as its nothing to do with C.
 
C

Chris Croughton

Hello Martin and Chris,

Indeed the following code does not go Splat! (the correct technical term
requires an exclamation point)

And a sentence requires a period, question mark or exclamtion mark at
the end said:
#include <string.h>
#include <wchar.h>
#include <stdio.h>

void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}

But the result of the printf will give (on Windows XP) on the command prompt
"hello lÓ mondÚ"

Quite likely, as I said putting accented characters into the code is
non-portable. So are wide characters in general, there is no guarantee
that what is printed will resemble what was compiled in because it
depends on the locale set, the setup of the terminal on which it is
output (or if output to a file what program you use to read the file),
etc.
The code I am looking for will never be portedto anything else than Windows.

I suspect that what you need are the Windows-specific conversion
interfaces, for those you will need to ask on a Windows-specific
newsgroup. But almost certainly you won't easily be able to just write
the characters in the literal strings.

I have some functions to convert from UCS2 or UCS4 to UTF8 (and the
reverse) if you would find those useful. I wrote them from the relevant
RFCs, they are open source (non-contaminating licence based on the zlib
licence). They don't do any locale conversion, though...

Chris C
 
B

Bertrand Szoghy

Hello Chris,

Yes, please, I would like to look at your routines. I know I will learn
something if I do. What is the URL?

My feeling about the subject is, wchar_t and associated functions,
libraries, and so on, are part
of a recent C standard, the wchar_t datatype is mentioned in passing in the
second edition of K&R
as "it's new", Petzold also mentions it in passing in Chapter 2 of his 5th
Edition "Programming for Windows" without any example code, so overall this
is a really good subject for a C newsgroup
discussion. It's not off topic. It's dead on topic, right there in the grey
zone of compilers handling something new in a myriad of ways.

Portability is a good thing, but as I said this question is for a project
that will never be ported away from Windows. Reading specifications is also
a good thing, fellow programmers.

Some of you mentioned that I shouldn't concatenate accented characters, and
I agree, and in fact I never intended to. In the actual system, we were
thinking of concatenating
XML (without accents) in order to target the DLL toward another programming
language. The code I
provided was reduced to the smallest example I could muster. A good hint of
that is that "hello world" code is likely not part of a large system's
source.

Thank you all for responding and have a nice day,
Bert Szoghy


Chris Croughton said:
Hello Martin and Chris,

Indeed the following code does not go Splat! (the correct technical term
requires an exclamation point)

And a sentence requires a period, question mark or exclamtion mark at
the end said:
#include <string.h>
#include <wchar.h>
#include <stdio.h>

void main()
{
wchar_t xml[100] = L"hello";
wcscat(xml, L" là mondé");
printf("%ls", xml);
}

But the result of the printf will give (on Windows XP) on the command
prompt
"hello lÓ mondÚ"

Quite likely, as I said putting accented characters into the code is
non-portable. So are wide characters in general, there is no guarantee
that what is printed will resemble what was compiled in because it
depends on the locale set, the setup of the terminal on which it is
output (or if output to a file what program you use to read the file),
etc.
The code I am looking for will never be portedto anything else than
Windows.

I suspect that what you need are the Windows-specific conversion
interfaces, for those you will need to ask on a Windows-specific
newsgroup. But almost certainly you won't easily be able to just write
the characters in the literal strings.

I have some functions to convert from UCS2 or UCS4 to UTF8 (and the
reverse) if you would find those useful. I wrote them from the relevant
RFCs, they are open source (non-contaminating licence based on the zlib
licence). They don't do any locale conversion, though...

Chris C
 
C

Chris Croughton

Hello Chris,

Yes, please, I would like to look at your routines. I know I will learn
something if I do. What is the URL?

Well, it wasn't actually on the web until you asked, I hadn't gotten
round to putting it there <g>.

The code is at

http://www.keristor.net/stuff/xutfstr.c

Documentation (produced by Doxygen) is at

http://www.keristor.net/stuff/xutfstr.html

(Note that the documentation has broken links to other pages, ignore
those...)
My feeling about the subject is, wchar_t and associated functions,
libraries, and so on, are part
of a recent C standard, the wchar_t datatype is mentioned in passing in the
second edition of K&R
as "it's new", Petzold also mentions it in passing in Chapter 2 of his 5th
Edition "Programming for Windows" without any example code, so overall this
is a really good subject for a C newsgroup
discussion. It's not off topic. It's dead on topic, right there in the grey
zone of compilers handling something new in a myriad of ways.

Certainly wchar_t and associated types and variables are on-topic for
comp.lang.c, but what they do on specific systems isn't. In other
words, noting that their action is implementation defined is on topic,
but asking about how to use them on Windows isn't.
Portability is a good thing, but as I said this question is for a project
that will never be ported away from Windows. Reading specifications is also
a good thing, fellow programmers.

Reading specifications is indeed a good thing, but you need to read
those specifications relevant to what you are doing. The C standard does
not say anything about what locales are supported on specific operating
systems, or what size wchar_t should be (except that it's at least as
big as unsigned char), for that you need to go to the specifications of
your system and compiler.

By the way, top-posting (replying at the top of the text to which you
are responding) is frowned on in comp.lang.c, it makes things harder to
read:

Terrible!
> How does he smell?

Chris C
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,152
Latest member
LorettaGur
Top