Problem with UTF-8

Charles · Nov 5, 2007

I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
test.csp.cpp:1: error: invalid token
test.csp.cpp:1: error: expected constructor, destructor, or type
conversion before '<' token
test.csp.cpp: In function `int main()':
test.csp.cpp:5: error: `cout' was not declared in this scope
test.csp.cpp:5: error: `endl' was not declared in this scope
%

I guess this is because UTF-8 format adds some extra info in the
header of the file. Do you know how I could use UTF-8 with my
application? Other than that, do some of you use C++ and FastCGI? What
do you think? So far I've been really pleased with the low resource
usage and with the outstanding speed. Thanks.

Charles.

Nemanja Trifunovic · Nov 5, 2007

I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
test.csp.cpp:1: error: invalid token
test.csp.cpp:1: error: expected constructor, destructor, or type
conversion before '<' token
test.csp.cpp: In function `int main()':
test.csp.cpp:5: error: `cout' was not declared in this scope
test.csp.cpp:5: error: `endl' was not declared in this scope
%

I guess this is because UTF-8 format adds some extra info in the
header of the file. Do you know how I could use UTF-8 with my
application?

You should process UTF-8 encoded data wthout a need to save your
source files in that encoding. For instance, take a look at
http://utfcpp.sourceforge.net/

Charles · Nov 5, 2007

You should process UTF-8 encoded data wthout a need to save your
source files in that encoding. For instance, take a look athttp://utfcpp.sourceforge.net/

Nice, thanks.

Charles.

James Kanze · Nov 6, 2007

I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
test.csp.cpp:1: error: invalid token
test.csp.cpp:1: error: expected constructor, destructor, or type
conversion before '<' token
test.csp.cpp: In function `int main()':
test.csp.cpp:5: error: `cout' was not declared in this scope
test.csp.cpp:5: error: `endl' was not declared in this scope
%

Something funny is going on. First, of course, if the file only
contains characters in the basic source character set, whether
it is UTF-8 or ASCII shouldn't make a difference---all of the
characters in the basic source character set are identical in
the two encodings. Even stranger, however, are the error
messages: g++ normally displays the uninterpretable character in
*octal*. But octal with an 8 or 9 in it? Something is very
strange about your g++.

I guess this is because UTF-8 format adds some extra info in
the header of the file.

It shouldn't.

Do you know how I could use UTF-8 with my application?

My editor at home is configured to use UTF-8, and it saves my
C++ files in "UTF-8". And I've never had any problems. (When I
write the comments in French, they look funny on my machine at
work, because it doesn't have any UTF-8 fonts installed, but
other than that, the compiler doesn't complain.)

Before anything else, however, I'd try to find out why your
installation of g++ is inserting 8's and 9's into its octal.
Then I'd write a very, very simple program (hello, world) with
my editor, and look at a hex dump of it, to see what it is
actually writing to the file---if the editor automatically
inserts junk you didn't insert, it may not be usable for program
development.

Charles · Nov 6, 2007

Before anything else, however, I'd try to find out why your
installation of g++ is inserting 8's and 9's into its octal.
Then I'd write a very, very simple program (hello, world) with
my editor, and look at a hex dump of it, to see what it is
actually writing to the file---if the editor automatically
inserts junk you didn't insert, it may not be usable for program
development.

Thanks James, will do.

Charles.

Ron Natalie · Nov 6, 2007

Charles said:
%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program
test.csp.cpp:1: error: invalid token
test.csp.cpp:1: error: expected constructor, destructor, or type
conversion before '<' token
test.csp.cpp: In function `int main()':
test.csp.cpp:5: error: `cout' was not declared in this scope
test.csp.cpp:5: error: `endl' was not declared in this scope
%

I guess this is because UTF-8 format adds some extra info in the
header of the file. Do you know how I could use UTF-8 with my
application? Other than that, do some of you use C++ and FastCGI? What
do you think? So far I've been really pleased with the low resource
usage and with the outstanding speed. Thanks.

The character set of the execution is INDEPENDANT of the character
set the program is written in. C++ only has barely adequate half-assed
wide character support. You must make sure that you have no characters
not in the basic set in the source file (outside of string/character
literals).

It looks like the first line has so cruft in it. Delete it and
retype it being careful not to use any characters not in the basic
set. You may need to use a different text editor.

Ole Nielsby · Nov 7, 2007

Charles said:
I'm designing a C++ application for the web (with FastCGI) and it has
to use UTF-8 because there will be users who will type Asian glyphs.
When I compile the application, if I use ANSI, no problem, it compiles
properly. But if I save the files as UTF-8, I get this error message:

%g++ -o cgi-bin/test.fcgi test.cpp
test.csp.cpp:1: error: stray '\239' in program
test.csp.cpp:1: error: stray '\187' in program
test.csp.cpp:1: error: stray '\191' in program

Notepad inserts these very bytes in front of all utf-8 files. See

http://en.wikipedia.org/wiki/Utf-8#Windows

James Kanze · Nov 7, 2007

Notepad inserts these very bytes in front of all utf-8 files. See

http://en.wikipedia.org/wiki/Utf-8#Windows

Interesting. However, in that case, I would expect to see
'\357', '\273' and '\277' as the stray bytes, rather than the
rather wierd values he saw. (I wonder: is g++ assigning these
to a signed char, and doing the conversion to octal without
noticing that its dealing with a negative value. But I see the
correct values when I try it with g++.)

As far as I can tell, even if the compiler processed the file as
UTF-8, a BOM is illegal in a C++ program, unless the compiler
were to simply eliminate it in phase 1 (where it maps the
physical source file characters to the basic source character
set---in an implementation defined manner). It might be worth
modifying the standard to require a few more characters to be
recognized as white space: requiring '\r' and the BOM would make
life a lot easier in practice.

MeCab UTF-8 Decoding Problem	6	Jun 29, 2013
UTF-8 and strings	44	Jun 7, 2011
Problem with codewars.	5	Dec 4, 2023
hex dump w/ or w/out utf-8 chars	40	Jul 8, 2013
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position	58	Sep 29, 2013
sqlite3 and UTF-8	3	Dec 7, 2010
Encoding of surrogate code points to UTF-8	14	Oct 8, 2013
dealing with UTF-8 file with C++	1	Apr 16, 2008

Problem with UTF-8

Charles

Nemanja Trifunovic

Charles

James Kanze

Charles

Ron Natalie

Ole Nielsby

James Kanze

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads