istream altering text

D

dohboy

a kinda newbie here.
I've done a simple little program that reads a text file and counts the number of lines and words.
I had a heck of a time getting it to count properly when I finally discovered the problem was not my coding, but the istream altering the incoming text.
What I was doing was checking each incoming character (seekg) and comparing it to a 'h0a' . What I found was that text files end their lines with a '0d' (CR) and a '0a' (line feed). However, it was reading them off the istream as both being '0a'. It had changed the CR.
My questions are, Is there any other little istream quirks like this I should be aware of? And is there some way to set the stream to not alter what is read?
TIA
-doh
 
A

Alf P. Steinbach

* dohboy:
a kinda newbie here.
I've done a simple little program that reads a text file and counts the number
of lines and words.
I had a heck of a time getting it to count properly when I finally discovered
the problem was not my coding, but the istream altering the incoming text.
What I was doing was checking each incoming character (seekg) and comparing it
to a 'h0a' . What I found was that text files end their lines with a '0d' (CR)
and a '0a' (line feed). However, it was reading them off the istream as both
being '0a'. It had changed the CR.
My questions are, Is there any other little istream quirks like this I should be
aware of? And is there some way to set the stream to not alter what is read?

By default text streams translate the OS convention for newline into
'\n' on input, and vice versa, translate '\n' to OS convention on output.

Since you're counting lines it's probably best to work with that feature
instead of trying to turn it off.


Cheers, & hth.,

- Alf
 
I

Ian Collins

dohboy said:
a kinda newbie here.
I've done a simple little program that reads a text file and counts the number of lines and words.
I had a heck of a time getting it to count properly when I finally discovered the problem was not my coding, but the istream altering the incoming text.
What I was doing was checking each incoming character (seekg) and comparing it to a 'h0a' . What I found was that text files end their lines with a '0d' (CR) and a '0a' (line feed). However, it was reading them off the istream as both being '0a'. It had changed the CR.
My questions are, Is there any other little istream quirks like this I should be aware of? And is there some way to set the stream to not alter what is read?

It'd more of a normalisation than a quirk. It saves the programmer from
tedious platform specific conversion code.

I guess another is the use of the eof flag to hide platform specific
file endings.
 
R

Ron AF Greve

Hi,

Didn't test the following code. But you might want to use something like
this

#include <fstream>
#include <string>
#include <algorithm>

using namespace std;

ifstream Input( Filename.c_str(), ios_base::binary );

// test if open!
//So files are read the same on whatever system then you might want to get
rid of any carriage returns with something like
string Line;
while( getline( Input, Line ) )
{
Line.erase( remove_if( Line.begin(), Line.end(), bind2nd(
equal_to<char>(), (char)13) ), Line.end() );

// From here on Line is the same on unix and ms-windows
}


Regards, Ron AF Greve

http://www.InformationSuperHighway.eu
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,576
Members
45,054
Latest member
LucyCarper

Latest Threads

Top