Why Am I Getting an Inverted Question Mark?

P

Phil Staite

Seems odd. Maybe, just maybe there is an empty or blank line at the
beginning of your source file? In that case during the first iteration
of the while loop line would be empty. Now, it *should* be ok to call
write with a char count of 0 and have it do nothing... But maybe there
is a problem with your stream code? Try adding a simple test:

while (getline(in,line)) {
if( ! line.empty() )
{
out.write(line.c_str(),line.size());
out.put('\n');
}
}
 
M

mary

When I read an HTML file starting with

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">

and then I write it into another file, say OUTPUT.txt, I get an
inverted question mark, "¿",
at the beginning of the OUTPUT.txt file. Why is that?
Thanks!

mary

PS. I use:

string line;
while (getline(in,line)) {
out.write(line.c_str(),line.size());
out.put('\n');
}
 
M

mary

Phil,

Here is the code. It still does it with any file starting with
anything!
Thanks!

Mary

@@@@@@@@@@@@@@@@@@@@@@@

#include <iostream>
#include <fstream>
#include <string>

using namespace std;

string line;
int main()
{
ifstream in("INPUT.txt",ios::in);
if (!in) {
cout << "Cannot Open the INPUT file.\n";
return 1;
}
ofstream out("OUTPUT.txt",ios::eek:ut);
if (!out) {
cout << "Cannot Open the OUTPUT file.\n";
in.close();
return 1;
}
while (getline(in,line)) {
if( ! line.empty() ) {
out.write(line.c_str(),line.size());
out.put('\n');
}
}
in.close();
out.close();
return 0;
}

@@@@@@@@@@@@@@@@@@@@@@@@
 
P

Phlip

mary said:
When I read an HTML file starting with

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=UTF-8">

and then I write it into another file, say OUTPUT.txt, I get an
inverted question mark, "¿",
at the beginning of the OUTPUT.txt file. Why is that?

Are you saving the file with Notepad.exe?

That program prefixes files that it perceives as Unicode (even UTF-8) with a
Byte Order Mark. If you use an editor to open your file in hex (or "binary")
mode, you might see the BOM, FEFF or FFEF, at the beginning.

Your output system does not interpret the codes as UTF-8, so it probably
uses ISO Latin-1. That has no glyph for FF or EF, so you get a "missing
glyph" symbol as ¿.

This could all be wrong, but the details are off-topic, so nobody is allowed
to contradict me.
 
K

Kurt Stutsman

mary said:
out.write(line.c_str(),line.size());
out.put('\n');

I don't see anything wrong with your code, but the above lines could be
simplified to:
out << line << '\n';
 
S

Sven Axelsson

Are you saving the file with Notepad.exe?

That program prefixes files that it perceives as Unicode (even UTF-8) with a
Byte Order Mark. If you use an editor to open your file in hex (or "binary")
mode, you might see the BOM, FEFF or FFEF, at the beginning.

Your output system does not interpret the codes as UTF-8, so it probably
uses ISO Latin-1. That has no glyph for FF or EF, so you get a "missing
glyph" symbol as ¿.

This could all be wrong, but the details are off-topic, so nobody is allowed
to contradict me.

Well, your reasoning is correct, but not your facts. A Unicode file may
start with FEFF or FFFE (not FFEF) to indicate endianness. A UTF-8 file,
however, starts with EFBBBF if it has a BOM mark at all. But, no doubt, the
BOM mark is what the OP is seeing.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,053
Latest member
BrodieSola

Latest Threads

Top