problem with seekg

J

Julian

Hi,
I am having problems with a function that I have been using in my program to
read sentences from a 'command file' and parse them into commands. the
surprising thing is that the program works fine on some computers and not so
fine on others. I tried debugging and cannot make any sense of it. I
narrowed it down to the seekg function and made this simple program which
(from what I understand) does not seem to be working as expected in all the
computers I have tried so far.
please let me know if I have got something wrong...

#include <iostream>
#include <fstream>
using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
ifstream is("test.txt");
char test[256];
is >> test;
cout << test << endl;
is.seekg(-2,ios::cur);
is >> test;
cout << test << endl;
return 0;
}

here are the different versions of test.txt and their corresponding outputs
in all the cases I expected that the output would be
createModel
el
what am I doing wrong ?

case 1: contents of test.txt :
createModel

OUTPUT:
createModel
del

case 2: contents of test.txt
createModel
testing
/*
*/
OUTPUT:
createModel
eModel

case 3: contents of test.txt :
createModel
testing
/*
*/
CreateElements
196
DMElasticElement
0 195 1
-1
exitCreateElements

OUTPUT:
createModel

thanks
Julian.
 
L

Larry Smith

Julian said:
Hi,
I am having problems with a function that I have been using in my program to
read sentences from a 'command file' and parse them into commands. the
surprising thing is that the program works fine on some computers and not so
fine on others. I tried debugging and cannot make any sense of it. I
narrowed it down to the seekg function and made this simple program which
(from what I understand) does not seem to be working as expected in all the
computers I have tried so far.
please let me know if I have got something wrong...

#include <iostream>
#include <fstream>
using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
ifstream is("test.txt");
char test[256];

test[0] = 0;
is >> test;
cout << test << endl;
is.seekg(-2,ios::cur);

test[0] = 0;
is >> test;
cout << test << endl;
return 0;
}

here are the different versions of test.txt and their corresponding outputs
in all the cases I expected that the output would be
createModel
el
what am I doing wrong ?

case 1: contents of test.txt :
createModel

OUTPUT:
createModel
del

case 2: contents of test.txt
createModel
testing
/*
*/
OUTPUT:
createModel
eModel

case 3: contents of test.txt :
createModel
testing
/*
*/
CreateElements
196
DMElasticElement
0 195 1
-1
exitCreateElements

OUTPUT:
createModel

thanks
Julian.

Remember to account for the (invisible)
newline chars at the end of each line when
seeking backwards ("\r\n" on Windows, "\n"
on unix/linux, "\r" on Mac(?) ).

If 'test.txt' has just one line, it must end with a newline;
otherwise the result is not as expected. e.g. for Windows:

createModel\r\n
 
J

Julian

c
Larry Smith said:
Julian said:
Hi,
I am having problems with a function that I have been using in my program
to
read sentences from a 'command file' and parse them into commands. the
surprising thing is that the program works fine on some computers and not
so
fine on others. I tried debugging and cannot make any sense of it. I
narrowed it down to the seekg function and made this simple program which
(from what I understand) does not seem to be working as expected in all
the
computers I have tried so far.
please let me know if I have got something wrong...

#include <iostream>
#include <fstream>
using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
ifstream is("test.txt");
char test[256];

test[0] = 0;
is >> test;
cout << test << endl;
is.seekg(-2,ios::cur);

test[0] = 0;
is >> test;
cout << test << endl;
return 0;
}

here are the different versions of test.txt and their corresponding
outputs
in all the cases I expected that the output would be
createModel
el
what am I doing wrong ?

case 1: contents of test.txt :
createModel

OUTPUT:
createModel
del

case 2: contents of test.txt
createModel
testing
/*
*/
OUTPUT:
createModel
eModel

case 3: contents of test.txt :
createModel
testing
/*
*/
CreateElements
196
DMElasticElement
0 195 1
-1
exitCreateElements

OUTPUT:
createModel

thanks
Julian.

Remember to account for the (invisible)
newline chars at the end of each line when
seeking backwards ("\r\n" on Windows, "\n"
on unix/linux, "\r" on Mac(?) ).

If 'test.txt' has just one line, it must end with a newline;
otherwise the result is not as expected. e.g. for Windows:

createModel\r\n

thanks for the reply... but I'm not sure if that explains everything. even
if I need to account for the invisible characters, the results don't make
sense... I tried all kinds of combinations.. but it seems like it is seeking
more than 2 offsets although I have just asked to offset 2. if I account for
the invisible characters, it should effectively offset only 1 visible
character... instead it is offsetting 3

another problem: why does the result change with the number of lines in the
file? ... even though I am only reading the first line!
the more number of lines I have, the more it is offsetting.
 
L

Larry Smith

Julian said:
c
Larry Smith said:
Julian said:
Hi,
I am having problems with a function that I have been using in my program
to
read sentences from a 'command file' and parse them into commands. the
surprising thing is that the program works fine on some computers and not
so
fine on others. I tried debugging and cannot make any sense of it. I
narrowed it down to the seekg function and made this simple program which
(from what I understand) does not seem to be working as expected in all
the
computers I have tried so far.
please let me know if I have got something wrong...

#include <iostream>
#include <fstream>
using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
ifstream is("test.txt");
char test[256];
test[0] = 0;
is >> test;
cout << test << endl;
is.seekg(-2,ios::cur);
test[0] = 0;
is >> test;
cout << test << endl;
return 0;
}

here are the different versions of test.txt and their corresponding
outputs
in all the cases I expected that the output would be
createModel
el
what am I doing wrong ?

case 1: contents of test.txt :
createModel

OUTPUT:
createModel
del

case 2: contents of test.txt
createModel
testing
/*
*/
OUTPUT:
createModel
eModel

case 3: contents of test.txt :
createModel
testing
/*
*/
CreateElements
196
DMElasticElement
0 195 1
-1
exitCreateElements

OUTPUT:
createModel

thanks
Julian.
Remember to account for the (invisible)
newline chars at the end of each line when
seeking backwards ("\r\n" on Windows, "\n"
on unix/linux, "\r" on Mac(?) ).

If 'test.txt' has just one line, it must end with a newline;
otherwise the result is not as expected. e.g. for Windows:

createModel\r\n

thanks for the reply... but I'm not sure if that explains everything. even
if I need to account for the invisible characters, the results don't make
sense... I tried all kinds of combinations.. but it seems like it is seeking
more than 2 offsets although I have just asked to offset 2. if I account for
the invisible characters, it should effectively offset only 1 visible
character... instead it is offsetting 3

another problem: why does the result change with the number of lines in the
file? ... even though I am only reading the first line!
the more number of lines I have, the more it is offsetting.

I don't know.

It worked fine for me once I added the two "test[0] = 0;"
lines to the code. I tested it with all three versions
of your example test.txt files (on SUSE Linux 10.0 with
the GCC g++ compiler v4.0.2). With all three versions
of your test.txt, the output was:

createModel
el

You are using the non-Standard "_tmain()" and "_TCHAR".
Is this a Borland compiler? You might ask in a
newsgroup specific to your compiler.

Here's the code I compiled (it's slightly modified
from your original):

// julian.cpp - compile with: g++ -o julian julian.cpp
#include <iostream>
#include <fstream>
using namespace std;

int main(int argc, char * argv[])
{
char test[256];

ifstream is("test.txt");

test[0] = 0;
is >> test;
cout << test << endl;

is.seekg(-2, ios::cur);
test[0] = 0;
is >> test;
cout << test << endl;

return 0;
}
 
J

James Kanze

I am having problems with a function that I have been using in my program to
read sentences from a 'command file' and parse them into commands. the
surprising thing is that the program works fine on some computers and not so
fine on others. I tried debugging and cannot make any sense of it. I
narrowed it down to the seekg function and made this simple program which
(from what I understand) does not seem to be working as expected in all the
computers I have tried so far.
please let me know if I have got something wrong...
#include <iostream>
#include <fstream>

Officially said:
using namespace std;
int _tmain(int argc, _TCHAR* argv[])

This line doesn't compile on my systems. What's _TCHAR? (For
that matter, what's _tmain? I would have expected main here,
and in fact, must use main if I don't want an error at link
time.)
{
ifstream is("test.txt");
char test[256];
is >> test;
cout << test << endl;
is.seekg(-2,ios::cur);

The above line is undefined behavior. In a file opened in text
mode (as yours is), you are only allowed to seek to the
beginning, to the current position, or to a position returned
from a previous call to is.tell.
is >> test;
cout << test << endl;
return 0;
}
here are the different versions of test.txt and their corresponding outputs
in all the cases I expected that the output would be
createModel
el

About the only way to reliably get this effect is to read
character by character, doing an is.tellg() after each
character.
what am I doing wrong ?

Trying to use direct positioning in a text file.

Generally speaking, they don't call them streams for nothing;
you can get away with some direct positionning in a binary file,
and you can place a "bookmark" to go back to in a text file, but
globally, they are designed for streamed input, i.e. sequential
access. You speak of parsing: all of the parsing technologies I
know are designed to work with sequential input, so I'm not sure
why you want to seek.

If worse comes to worse, read large chunks (or all) of your file
into memory, and use random access there. If you're not afraid
of system dependant issues, you might even consider memory
mapping the file. (Note that in a memory mapped file, you will
see the system specific line terminators.)
 
J

Julian

I don't know.
It worked fine for me once I added the two "test[0] = 0;"
lines to the code. I tested it with all three versions
of your example test.txt files (on SUSE Linux 10.0 with
the GCC g++ compiler v4.0.2). With all three versions
of your test.txt, the output was:

createModel
el

You are using the non-Standard "_tmain()" and "_TCHAR".
Is this a Borland compiler? You might ask in a
newsgroup specific to your compiler.

Here's the code I compiled (it's slightly modified
from your original):

// julian.cpp - compile with: g++ -o julian julian.cpp
#include <iostream>
#include <fstream>
using namespace std;

int main(int argc, char * argv[])
{
char test[256];

ifstream is("test.txt");

test[0] = 0;
is >> test;
cout << test << endl;

is.seekg(-2, ios::cur);
test[0] = 0;
is >> test;
cout << test << endl;

return 0;
}

I kinda got it figured out... and you were right, it had something to do
with the 'invisible' characters. though I don't know all the details behind
this.. maybe you can help.
I have been using this free program called notepad++ as a text editor. when
I hit the enter key after a line, it puts in a 'LF' character (found that
when I used the 'view all characters' option)
now when I open this file using windows notepad, all the individual lines
are combined into one line. and there is a weird character at the end of
each 'individual' line where it should have gone to a new line. once I got
rid of those characters and put in an actual enter key using notpad,
everything worked as expected.
I don't know how the files got corrupted like that... or why notepad++ is
putting just the 'LF' instead of a 'CR''LF'
seems like, in windows, if you do just an 'LF', it screws up everything.
maybe you can explain this or point me to some webpage that explains this ?

thanks,
Julian.
 
M

Marcus Kwok

Julian said:
I kinda got it figured out... and you were right, it had something to do
with the 'invisible' characters. though I don't know all the details behind
this.. maybe you can help.
I have been using this free program called notepad++ as a text editor. when
I hit the enter key after a line, it puts in a 'LF' character (found that
when I used the 'view all characters' option)
now when I open this file using windows notepad, all the individual lines
are combined into one line. and there is a weird character at the end of
each 'individual' line where it should have gone to a new line. once I got
rid of those characters and put in an actual enter key using notpad,
everything worked as expected.
I don't know how the files got corrupted like that... or why notepad++ is
putting just the 'LF' instead of a 'CR''LF'
seems like, in windows, if you do just an 'LF', it screws up everything.
maybe you can explain this or point me to some webpage that explains this ?

I've never used notepad++, but it seems like it is saving files in Unix
mode instead of Windows mode. Look around in the options and see if you
can find anything relating to that. I know Vim is able to read and
write both formats, and you can change which mode it is using
on-the-fly.

Getting back on topic, read James Kanze's post. You are not really
supposed to be seeking on text files.
 
J

Julian

I've never used notepad++, but it seems like it is saving files in Unix
mode instead of Windows mode. Look around in the options and see if you
can find anything relating to that. I know Vim is able to read and
write both formats, and you can change which mode it is using
on-the-fly.

Getting back on topic, read James Kanze's post. You are not really
supposed to be seeking on text files.

thanks... I created a fresh file with notepad++ and it uses CRLF... i think
if the file being edited has LF in the first place, then it continues to use
CRLF
i move files back and forth from my school's supercomputer, so maybe thats
how its got messed up.

I will be responding to James Kanze's post... I should probably move to some
other method of parsing the files.

Julian.
 
J

Julian

Thank you very much for your reply. I don't consider myself to be a very
experienced c++ programmer so forgive me if the questions seem
mundane/trivial
Officially, you also need to include <istream>.

do you mean just for correctness ? because I see that istream is a base
class for iostream.. or is there some other reason for explicitly including
istream
This line doesn't compile on my systems. What's _TCHAR? (For
that matter, what's _tmain? I would have expected main here,
and in fact, must use main if I don't want an error at link
time.)

I'm sorry about that... I just created a default win32 console project
using VS2005 and thats what it gave me. I'm really not sure whats the reason
for all that either, but i think it somehow converts to the typical main()
The above line is undefined behavior. In a file opened in text
mode (as yours is), you are only allowed to seek to the
beginning, to the current position, or to a position returned
from a previous call to is.tell.

can you tell me where is the most updated (or correct) documentation for
these functions? because all the places that I looked -basically MSDN and
google search results - do not mention this thing about seeking undefined
for text mode.

if you read my other post, you'll see that the problem was with 'LF'
characters in the text file.
Trying to use direct positioning in a text file.
Generally speaking, they don't call them streams for nothing;
you can get away with some direct positionning in a binary file,
and you can place a "bookmark" to go back to in a text file, but
globally, they are designed for streamed input, i.e. sequential
access. You speak of parsing: all of the parsing technologies I
know are designed to work with sequential input, so I'm not sure
why you want to seek.
If worse comes to worse, read large chunks (or all) of your file
into memory, and use random access there. If you're not afraid
of system dependant issues, you might even consider memory
mapping the file. (Note that in a memory mapped file, you will
see the system specific line terminators.)

I have been using this legacy code that was written by one of my
predecessors... and its probably outdated (or the wrong) way to do things.
I am all for moving to a more commonly used (and free) parsing
technology...but I don't know where to start. I tried looking up parsing in
google once but I was overwhelmed by what was out there. I need to be able
to read a text file that contains strings and numbers... but ignore c-style
comments like '//' and '/*' and '*/'
Is there any easy to use parsing utility that can do that for me (in both
windows and unix) ?

thanks a lot for your help,
Julian.
 
J

James Kanze

Thank you very much for your reply. I don't consider myself to be a very
experienced c++ programmer so forgive me if the questions seem
mundane/trivial
do you mean just for correctness ? because I see that istream is a base
class for iostream.. or is there some other reason for explicitly including
istream

Because the standard says so:).

Officially, <iostream> is not required to define any class. It
is only required to provide external declarations (not
definitions) of the standard iostream objects (e.g. std::cout,
etc.). And you don't need a class definition to provide a
declaration. I'll spare you the real requirements, because they
are extremely (and IMHO unnecessarily) complicated, but it comes
out to roughly the equivalent of:

namespace std {
class ostream ;
class istream ;

extern istream cin ;
extern ostream cout ;
// ... same thing for all of the other objects,
// plus for the wide character classes and objects...
}

For convenience, all of the actual implementations of <iostream>
that I know of start by including <istream> and <ostream>.
People have gotten used to this, many books (including some very
I'm sorry about that... I just created a default win32 console project
using VS2005 and thats what it gave me. I'm really not sure whats the reason
for all that either, but i think it somehow converts to the typical main()

That's what I suspect as well, but since I don't normally have
access to a Windows machine...
can you tell me where is the most updated (or correct) documentation for
these functions? because all the places that I looked -basically MSDN and
google search results - do not mention this thing about seeking undefined
for text mode.

The official documentation would be the ISO standard for C++.
In this case, however, it refers to the C standard---everything
is "as if" such and such a C library function were used on a
FILE*. Which isn't necessarily a bad thing, since the C
standard is somewhat less unreadable than the C++ one:).

With regards as to where you should look for such information,
I'm not sure what to tell you. I was tracking the ANSI
C standard for a customer when it was being written, and writing
a C standard library for them, so I got in on the ground floor,
so to speak. I don't think that trying to read the standard is
a good way to learn.

I would hope that any text which teaches C++ IO would discuss
the issues, but apparently, yours didn't. And given my
background, I've not had the occasion to read such texts myself.
(I have copies of the C and the C++ standards, and the latest
draft for the next version of the C++ standard, on line, and
consult them when I'm unsure of anything. But IMHO, unless you
already have a very good idea about is available and allowed,
they wouldn't be of much use.)
if you read my other post, you'll see that the problem was with 'LF'
characters in the text file.

And, presumably, certain implementations accepting them as end
of lines, and others not.

Note that this is a difficult problem in general. The Windows
convention is to use the two character sequence 0x0D,0x0A as an
end of line indicator. The Unix convention is a single 0x0A,
the traditional Mac convention a 0x0D, and most mainframes don't
use any character at all; the information is stored in the file
format. All of which wouldn't cause too many problems, except
that the C committee decided that within C, the Unix convention
would prevail, so some remapping is necessary at the interface
(and we get the distinction between text and binary files), and
of course the fact that today, thanks to the network, files
written on one system are being read on another, so you can't
really count on the file following the local conventions. IMHO,
a good implementation will handle the Windows, Unix and Mac
conventions transparently on input, and output according to the
native conventions, but I've also had various problems because
of inconsistencies.

The fact that some "char" might in fact be represented by a
varying number of bytes in the physical file is why there are
so many restrictions on where you can position to in text mode.
Note that the problem will become more difficult, not less, as
time goes on---UTF-8 is rapidly becoming a standard 8 bit code,
and with UTF-8, of course, the number of bytes in a single
character can vary from 1 to 6.
I have been using this legacy code that was written by one of my
predecessors... and its probably outdated (or the wrong) way to do things.
I am all for moving to a more commonly used (and free) parsing
technology...but I don't know where to start. I tried looking up parsing in
google once but I was overwhelmed by what was out there. I need to be able
to read a text file that contains strings and numbers... but ignore c-style
comments like '//' and '/*' and '*/'
Is there any easy to use parsing utility that can do that for me (in both
windows and unix) ?

Well, I tend to use lex (or flex) a lot. (Technically speaking,
I think your problem involves tokenizing, which is generally
viewed as a preliminary step to parsing.) Flex is available for
most platforms, although I don't know how you'ld go about
integrating it into your builds if you use Visual Studio. (I
use GNU make everywhere, including with VC++ under Windows.) It
also generally takes a bit of hacking to make it work with C++.
You might want to have a look at the sources to my executable
kloc (http://kanze.james.neuf.fr/code/Exec/kloc/kloc.l); you
certainly won't be able to use it directly, but it should give
you some ideas about one way to handle comments directly in
flex.

Depending on the syntax, and the size of the files, you might be
able to either process the file line by line, or even the entire
file at once. Once you have a block of text in memory, of
course, random positionning, backing up, etc. is no problem.
And you can use tools like boost::regex on the in memory data.

If you don't want to rewrite everything, you might also try
reading the file as binary, and handling the new line
conventions yourself---it isn't that hard to treat \n, \r and
the two character sequence "\r\n" in an identical fashion.
(There will still be problems, of course, if you move to UTF-8
input, with accented or non-Latin characters, but this may not
be an issue for you.)
 
P

Pete Becker

James said:
That's what I suspect as well, but since I don't normally have
access to a Windows machine...

FWIW, these are part of Microsoft's mechanism to support adapting source
code for code to run with either narrow or wide characters, depending on
whether the controlling macro is defined. _TCHAR is a macro that is
replace by either char or wchar_t, and _tmain becomes either main or (I
think) wmain. Much of the OS interface changes names as well, adding an
A or a W suffix to what you think the name is. This leads to mysterious
errors when you happen to use the name of an OS interface function as
the name of a member function: it's actual name changes from file to
file, depending on whether you've included, directly or indirectly, the
header "windows.h":

// header myclass.h
class C
{
void whatever();
};

// source file myclass.cpp
#include "myclass.h"
#include "windows.h"

void C::whatever()
{
}

error: whateverA is not a member of class C.

Sorry about the phony name. I haven't done much Windows programming
recently, so can't come up with common examples off the top of my head.

--

-- Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com)
Author of "The Standard C++ Library Extensions: a Tutorial and
Reference." (www.petebecker.com/tr1book)
 
J

Julian

thank you very much for your reply... you are right, my problem is more
concerned with tokenizing (although I must admit, till just now I didn't
know tokenizing and parsing were two different things). I will look into
flex and regex.

Julian.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,770
Messages
2,569,583
Members
45,073
Latest member
DarinCeden

Latest Threads

Top