ifstream >> string with UTF-8?

W

Wolfnoliir

Hi,
Here is an question that must come up all the time but I can't find a
solution.

I would like to get a word or a line from an utf-8 encoded file into a
string but I get '�'s ('?') instead.
The strange thing is, this works fine from standard input:
cin >> someString; //works fine
cout << someString;
but
someIfStream >> someString;
cout << someString;
prints out question marks instead of accentuated characters!
(I'm using Linux and g++ 4.3.3)

Does anyone have an idea why that is or a solution to the problem?
 
V

Victor Bazarov

Wolfnoliir said:
I would like to get a word or a line from an utf-8 encoded file into a
string but I get '�'s ('?') instead.
The strange thing is, this works fine from standard input:
cin >> someString; //works fine
cout << someString;
but
someIfStream >> someString;
cout << someString;
prints out question marks instead of accentuated characters!
(I'm using Linux and g++ 4.3.3)

Does anyone have an idea why that is or a solution to the problem?

Use your "working" 'cin' solution, but redirect the input to be from
your file:

your_test_app < file_with_utf8

and see if there is any difference. As to the cause, my guess would be
that your file stream gets dissynchronised from the encoding POV.

V
 
W

Wolfnoliir

Victor said:
Use your "working" 'cin' solution, but redirect the input to be from
your file:

your_test_app < file_with_utf8

and see if there is any difference. As to the cause, my guess would be
that your file stream gets dissynchronised from the encoding POV.

V

Indeed I get the same result when I do:
your_test_app < file_with_utf8

I'm not actually sure my file is utf-8. It probably isn't considering
that when I do this:
echo éoiàuè > txt
your_test_app < txt
it prints out correctly.

But how can I know what encoding my file is in?
Once I know that I think I can just convert it with iconv.
 
V

Victor Bazarov

Wolfnoliir said:
Indeed I get the same result when I do:
your_test_app < file_with_utf8

I'm not actually sure my file is utf-8.

Uh... Then why are you trying to treat it as such?
> It probably isn't considering
that when I do this:
echo éoiàuè > txt
your_test_app < txt
it prints out correctly.

But you said that 'cin' worked OK, while your ifstream attempt didn't.
You need to find out what is different with your ifstream code compared
to the 'cin'.
But how can I know what encoding my file is in?

Not sure it's a C++ question, to be honest. A file is a file, it
contains bytes. The encoding is something you think up, apply, and it's
not part of the file itself, AFAIUI. You get different results based on
different encodings you apply. The "correctness" of those results is
also in your head only.
Once I know that I think I can just convert it with iconv.

What's 'iconv'?

V
 
W

Wolfnoliir

Victor said:
Uh... Then why are you trying to treat it as such?


But you said that 'cin' worked OK, while your ifstream attempt didn't.
You need to find out what is different with your ifstream code compared
to the 'cin'.

There's nothing different. As I said in my last message, I was wrong.
It's just my that my file has a different encoding than the standard
input my terminal sends (probably utf-8).
Not sure it's a C++ question, to be honest. A file is a file, it
contains bytes. The encoding is something you think up, apply, and it's
not part of the file itself, AFAIUI. You get different results based on
different encodings you apply. The "correctness" of those results is
also in your head only.


What's 'iconv'?

Iconv is a Unix utility that converts a text file from one encoding
(e.g. utf-16) to another (e.g. utf-8).

If nobody knows of a utility to find out what encoding my file is
using, I'll just go and look somewhere else then.

Thanks for your interest in my problem.
 
R

Richard Herring

Wolfnoliir said:
There's nothing different. As I said in my last message, I was wrong.
It's just my that my file has a different encoding than the standard
input my terminal sends (probably utf-8).


Iconv is a Unix utility that converts a text file from one encoding
(e.g. utf-16) to another (e.g. utf-8).


If nobody knows of a utility to find out what encoding my file is
using, I'll just go and look somewhere else then.

Is the 'file' command any help?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top