inconsistencies when compiling

R

rory

I'm having some inconsistencies in my program that I can't seem to
debug. Below is code that makes a copy of a binary file, appends a
unique ID and string to the copy of the binary file and then checks
the string by searching for the ID which is also a string and
retrieves the text that follows. I'm building the code with scons
along with some other applications that are part of the project. Here
is the strange thing. If I set a new unique string and compile on it's
own my code below works fine. If I then call scons to built all the
project files including the code below it doesn't work. If I then try
to build it on its own again it still doesn't work. The only way to
get it to work again is to change the string ID and compile it again
on its own. I know this is a strictly c++ language list and my post
does refer to scons but all scons does is call the g++ compiler. Can
anyone help me with this problem? Here is my code, although I don't
think the problem is with the code.

////////////////////////////////////////////////////////////////////////////////////////////////////
int main(int argc, char *argv[])
{

if(argc<2){
cout << "\nUsage: copy_append_read <csound file> <output filename>
\nExample: cabbage test.txt test.exe\n";
return 0;
}
std::string Text, str;
long begin, end;

//make a copy of binary cabbage.dat with new name
int ret = copyfileC("cabbage.dat", argv[2]);
if(!ret) cerr << "Error: Could not copy/create file, please make
sure that cabbage.dat is located in the same folder as cabbage";

//read contents of input file
ifstream file(argv[1]);
while(!file.eof())
{
getline(file, str);
Text = Text+str;
}
//write identifier and input file string to runtime
WriteData(argv[2], Text);

//test that data was written and can be retreived
ReadData(argv[2]);

}

///////////////////////////////////////////////////////////////////////////////////////////
//////// Make a copy of the runtime binary
///////////////////////////////////////////////////////////////////////////////////////////
int copyfileC(std::string oldFile, std::string newFile)
{
int c;
FILE *in, *out;
in = fopen(oldFile.c_str(), "rb");

out = fopen(newFile.c_str(), "wb");
if(!in && !out)
return 0;
while((c=fgetc(in))!=EOF)
{
fputc(c,out);
if(ferror(out)) cout << "error when writing new file";
}
fclose(in);
fclose(out);
return 1;
}

///////////////////////////////////////////////////////////////////////////////////////////
//////// Append magic marker and Text to the runtime binary
///////////////////////////////////////////////////////////////////////////////////////////
void WriteData(std::string binfile, std::string Text)
{
ofstream myFile (binfile.c_str(), ios::eek:ut | ios::binary |
ios::app);
if((myFile.rdstate() & ofstream::failbit | ofstream::badbit)!=0)
//write unique marker..
myFile.write ("porytest", 8);
myFile.write (Text.c_str(),Text.length());
if(!myFile) cout << "Write Error..." ;
myFile.close();
}

///////////////////////////////////////////////////////////////////////////////////////////
//////// Test function to see if data was written correctly
///////////////////////////////////////////////////////////////////////////////////////////
void ReadData(std::string binfile)
{
std::ifstream ifstr(binfile.c_str(),std::ios::binary);
std::stringstream temp;
temp << ifstr.rdbuf();
const std::string sentinel("porytest");
const std::string::size_type data_pos(temp.str().find(sentinel,
0)+sentinel.length());
const std::string myText(temp.str().substr(data_pos));
cout << myText;
ifstr.close();
}

Rory.
 
A

Alf P. Steinbach

* rory:
I'm having some inconsistencies in my program that I can't seem to
debug. Below is code that makes a copy of a binary file, appends a
unique ID and string to the copy of the binary file and then checks
the string by searching for the ID which is also a string and
retrieves the text that follows. I'm building the code with scons
along with some other applications that are part of the project. Here
is the strange thing. If I set a new unique string and compile on it's
own my code below works fine. If I then call scons to built all the
project files including the code below it doesn't work. If I then try
to build it on its own again it still doesn't work. The only way to
get it to work again is to change the string ID and compile it again
on its own. I know this is a strictly c++ language list and my post
does refer to scons but all scons does is call the g++ compiler. Can
anyone help me with this problem? Here is my code, although I don't
think the problem is with the code.

I can't help you with the main problem.

However, a few comments on the code.

////////////////////////////////////////////////////////////////////////////////////////////////////
int main(int argc, char *argv[])
{

if(argc<2){

Indentation is a good idea.

cout << "\nUsage: copy_append_read <csound file> <output filename>
\nExample: cabbage test.txt test.exe\n";
return 0;
}
std::string Text, str;
long begin, end;

Variables not used.

//make a copy of binary cabbage.dat with new name
int ret = copyfileC("cabbage.dat", argv[2]);

Make that a bool, and make it const.

if(!ret) cerr << "Error: Could not copy/create file, please make
sure that cabbage.dat is located in the same folder as cabbage";

Are you sure you really want to continue after issuing that message?

//read contents of input file
ifstream file(argv[1]);
while(!file.eof())
{
getline(file, str);

What if this call to getline fails?

Text = Text+str;

1. Unless you have a very smart compiler this construction will most
probably yield O(n^2) running time. Instead, do e.g.

Text += str;

2. Inconsistent naming convention.

3. Are you sure you want to strip out newlines from the file contents?
}


//write identifier and input file string to runtime
WriteData(argv[2], Text);

//test that data was written and can be retreived
ReadData(argv[2]);

}

///////////////////////////////////////////////////////////////////////////////////////////
//////// Make a copy of the runtime binary
///////////////////////////////////////////////////////////////////////////////////////////
int copyfileC(std::string oldFile, std::string newFile)

Should have result type bool.

Arguments should by default be "std::string const&".

{
int c;
FILE *in, *out;
in = fopen(oldFile.c_str(), "rb");

out = fopen(newFile.c_str(), "wb");
if(!in && !out)
return 0;

Confusing && and ||.

while((c=fgetc(in))!=EOF)
{
fputc(c,out);
if(ferror(out)) cout << "error when writing new file";

Are you sure you want to continue looping after issuing that message?

And shouldn't that message go to cerr?
}
fclose(in);
fclose(out);
return 1;
}

///////////////////////////////////////////////////////////////////////////////////////////
//////// Append magic marker and Text to the runtime binary
///////////////////////////////////////////////////////////////////////////////////////////
void WriteData(std::string binfile, std::string Text)


Arguments should by default be "std::string const&".

{
ofstream myFile (binfile.c_str(), ios::eek:ut | ios::binary |
ios::app);
if((myFile.rdstate() & ofstream::failbit | ofstream::badbit)!=0)

The above condition will always be true unless badbit is defined as 0.

Compiler should warn about it.

If it doesn't, up the warning level.

//write unique marker..
myFile.write ("porytest", 8);

This is the only statement governed by the above 'if'.

myFile.write (Text.c_str(),Text.length());
if(!myFile) cout << "Write Error..." ;

Shouldn't that be directed to cerr?

myFile.close();
}

///////////////////////////////////////////////////////////////////////////////////////////
//////// Test function to see if data was written correctly
///////////////////////////////////////////////////////////////////////////////////////////
void ReadData(std::string binfile)

Argument should by default be "std::string const&".

{
std::ifstream ifstr(binfile.c_str(),std::ios::binary);
std::stringstream temp;

Use std::eek:stringstream when you don't intend to read from it.

temp << ifstr.rdbuf();
const std::string sentinel("porytest");
const std::string::size_type data_pos(temp.str().find(sentinel,
0)+sentinel.length());
const std::string myText(temp.str().substr(data_pos));
cout << myText;
ifstr.close();
}

Rory.

Cheers, & hth.,

- Alf
 
R

rory

I have looked into this problem further and the only thing that scons
does different when compiling it is that it creates and object file
first. I don't do that and it works fine, but if I do g++ -c test.cpp
and then g++ test.o -o test.exe it no longer works. Anyone know why
this is?
 
P

Puppet_Sock

If I then call scons to built all the
project files including the code below it doesn't work.

Since you didn't mention in what way it "doesn't work"
it's pretty difficult to diagnose.

Still, it seems unlikely to be a language issue, so you
would seem to be posting your incomplete messages in the
wrong news group.
Socks
 
R

rory

Since you didn't mention in what way it "doesn't work"
it's pretty difficult to diagnose.

Still, it seems unlikely to be a language issue, so you
would seem to be posting your incomplete messages in the
wrong news group.
Socks

Sorry if I wasn't clear but when it doesn't work it does not return
the correct string that follows the unique ID. Instead it returns
rubbish characters in an endless loop beeping all the way? I running
windows and MinGW. Excuse my ignorance, it's clear I'm not an
experienced programmer, but if it is not a language issue what else
could it be?

Rory.


Rory.
 
R

rory

I've tidied up my code as Alf suggested but there is still something
not right. I've tried to narrow it down as much as possible and here
is the current situation. I am building two application. When I
compile and build the code presented above it works, i.e., it finds
the unique ID and then prints out the code that follows. But when I
then compile and build the other application which resides in the same
folder the code above no longer works, it find the unique string ok
but then it starts spitting our rubbish characters in a endless loop.
How can one application affect another when they are not linked in any
way? Can anyone spot any memory problems with the code above?

Rory.
 
R

rory

The problem was occurring because the data file I was copying also
contains a string which matched the unique ID. When I ran my program
and searched for the ID it found it somewhere else in the file and
then started spitting out funny characters. Thanks for the help Alf.

Rory.
 
A

Alf P. Steinbach

* James Kanze:
* rory:
I'm having some inconsistencies in my program that I can't seem to
debug. Below is code that makes a copy of a binary file, appends a
unique ID and string to the copy of the binary file and then checks
^^^^^^

[...]
//read contents of input file
ifstream file(argv[1]);

He doubtlessly needs to specify binary mode. (In a larger
application, he' probably want to imbue the "C" locale as well.)

I don't think binary mode is intended, because he treats the file as a
text file, using std::getline.

And as we know, this is NOT the way to read all of a file.

Yes, I commented on that two lines further down in the code, at the
unchecked call to std::getline.

Fixing that call would lead to fixing the loop condition as well.


Cheers,

- Alf
 
J

James Kanze

* James Kanze:
* rory:
I'm having some inconsistencies in my program that I can't seem to
debug. Below is code that makes a copy of a binary file, appends a
unique ID and string to the copy of the binary file and then checks
^^^^^^
[...]
//read contents of input file
ifstream file(argv[1]);
He doubtlessly needs to specify binary mode. (In a larger
application, he' probably want to imbue the "C" locale as well.)
I don't think binary mode is intended, because he treats the
file as a text file, using std::getline.

That's true, but he does SAY binary. I'm supposing that getline
is the error, but of course, we don't really know.
Yes, I commented on that two lines further down in the code, at the
unchecked call to std::getline.
Fixing that call would lead to fixing the loop condition as well.

If he fixes it correctly, yes. It depends on how he fixes it.

I'm really thinking that someone should write up a beginner's
guide to using iostream, with the standard idioms, and post it
somewhere so we could point to it.
 
J

Jerry Coffin

[ ... ]
I'm really thinking that someone should write up a beginner's
guide to using iostream, with the standard idioms, and post it
somewhere so we could point to it.

Let's see. Reading input:

std::copy(std::istream_iterator<some_type>(instream),
std::istream_iterator<some_type>(),
std::back_inserter(your_collection));

Writing output:

std::copy(your_collection.begin(), your_collection.end(),
std::eek:stream_iterator<some_type>(outstream, "\t"));

Covers an amazing percentage in two statements... :)
 
J

James Kanze

[ ... ]
I'm really thinking that someone should write up a beginner's
guide to using iostream, with the standard idioms, and post it
somewhere so we could point to it.
Let's see. Reading input:

Writing output:
std::copy(your_collection.begin(), your_collection.end(),
std::eek:stream_iterator<some_type>(outstream, "\t"));
Covers an amazing percentage in two statements... :)

Do you know, I rarely use either. Once in a blue moon, I'll use
occasionally use the second for a small set of elements. But
both cases are exceptional. For most simple cases, the input
is:

while ( std::getline( in, line ) ) {
++ lineNo ;
boost::smatch fields ;
if ( ! boost::regex_match( line, fields, syntax ) ) {
// error...
} else {
// process fields...
}
}

or:

while ( std::getline( in, line ) ) {
std::istringstream data( line ) ;
data >> field1 ... >> fieldLast >> std::ws ;
if ( ! data || data.get() != EOF ) {
// error...
} else {
// process fields ...
}
}

Often, I'll use a mixture, using the regular expression for
syntax checking, but an istringstream for extracting the actual
data. (The istringstream will convert numeric values, for
example, which the regular expression won't. And my pre-Boost
regular expression class didn't handle \(...\) either, so I
couldn't use it to collect the fields.) I've also got a number
of ready made classes for breaking a line up into fields
according to "standard" rules, so I frequently write something
like:

while ( std::getline( in, line ) ) {
Gabi::SomeFieldArray fields( line ) ;
// here, fields is nothing more than an std::vector,
// with fields[ 0 ] == line, fields[ 1 ] with the
// first field, etc.

// As you can see, I'm very used to AWK:).
}

In practice, those are about the only cases I use anything other
than character input (with lex/flex). (In any of them, of
course, if the syntax leads me to expect a table, I might set up
a filtering streambuf to return EOF at the end of the table, and
then use std::copy to read the table. But that doesn't seem to
happen very often in my own code.)

Output is simpler, but of course, no where have we begun to talk
about error detection and handling.

But I was actually suggesting a description at a slightly lower
level, the basic loop, for example, but not what you do in it.
 
J

Jerry Coffin

[ ... ]
Do you know, I rarely use either. Once in a blue moon, I'll use

occasionally use the second for a small set of elements. But
both cases are exceptional. For most simple cases, the input
is:

while ( std::getline( in, line ) ) {
++ lineNo ;
boost::smatch fields ;
if ( ! boost::regex_match( line, fields, syntax ) ) {
// error...
} else {
// process fields...
}
}

It looks to me like this could be turned into a form that would fit
something similar to the previous patterns quite easily:

struct matched_line {
// probably want a ctor that specifies the syntax and such...

boost::smatch fields;
};

struct process_fields {
void operator()(matched_line const &) {
// ...
}
};

std::istream &operator>>(std::istream &is, matched_line &m) {
std::string line;
std::getline(is, line);
if (!boost::regex_match(line, fields, syntax)
is.setstate(std::ios::failbit);
}

std::transform(std::istream_iterator<matched_line>(in),
std::istream_iterator<matched_line>(),
std::inserter(output_sink),
process_fields());

If you're summarizing the input, and producing basically a single output
at the end, you generally want to use std::accumulate instead of
std::transform.
or:

while ( std::getline( in, line ) ) {
std::istringstream data( line ) ;
data >> field1 ... >> fieldLast >> std::ws ;
if ( ! data || data.get() != EOF ) {
// error...
} else {
// process fields ...
}
}

This looks much like the previous example to me -- you seem to be
reading fields that make up a record, so I'd make that explicit:

struct record {
type1 field1;
// ...
typeLast fieldLast;
// ...
};

std::istream &operator>>(std::istream &is, record & r) {
std::string line;

std::getline(line, is);
std::istringstream data(line);
data >> field1 >> /* field2, ... */ fieldLast >> std::ws;
// check for errors, etc.
}

Then the processing turns into something like:

std::transform(std::istream_iterator<record>(some_istream),
std::istream_iterator<record>(),
std::inserter(results),
process_fields());

As above, if you're (primarily) producing a summary of the data,
std::accumulate can make more sense than std::transform.

While it's true that these will often (usually?) be at least marginally
larger in terms of lines of code, it's also true that most of the extra
code is more or less boilerplate -- extra class and function definition
"stuff" surrounding roughly the same bits of real code.

My own experience with this has been quite positive -- typing in the
code doesn't take significantly longer, and at least for me, debugging
is reduced substantially. Likewise, I find that the separation of
responisibility makes incomplete analysis more apparent. When the code
does need modification, the separation of responsibility generally makes
it much easier to find what to modify where, and the defined interfaces
between the pieces make it easier to isolate the modification what's
really desired.

[ ... ]
But I was actually suggesting a description at a slightly lower
level, the basic loop, for example, but not what you do in it.

Well, I'll openly admit I was partly being humorous. Nonetheless, there
was a serious point: much of the time, you can use a standard algorithm,
so you don't need to write a loop at all.
 
J

James Kanze

Jerry said:
[ ... ]
Do you know, I rarely use either. Once in a blue moon, I'll use

occasionally use the second for a small set of elements. But
both cases are exceptional. For most simple cases, the input
is:
while ( std::getline( in, line ) ) {
++ lineNo ;
boost::smatch fields ;
if ( ! boost::regex_match( line, fields, syntax ) ) {
// error...
} else {
// process fields...
}
}
It looks to me like this could be turned into a form that would fit
something similar to the previous patterns quite easily:
struct matched_line {
// probably want a ctor that specifies the syntax and such...
boost::smatch fields;
};
struct process_fields {
void operator()(matched_line const &) {
// ...
}
};
std::istream &operator>>(std::istream &is, matched_line &m) {
std::string line;
std::getline(is, line);
if (!boost::regex_match(line, fields, syntax)
is.setstate(std::ios::failbit);
}

If you're summarizing the input, and producing basically a single output
at the end, you generally want to use std::accumulate instead of
std::transform.

Certainly. You *can* hide all of the program in a few fancy >>
or << operators. Something like "aligneq" (see
http://kanze.james.neuf.fr/code-en.html, then navigate through
the sources in the Exec branch), where main() basically just
does:

std::vector< Line > lines ;
std::copy( std::istream_iterator< Line >( source ),
std::istream_iterator< Line >(),
std::back_inserter( lines ) ) ;
std::copy( lines.begin(), lines.end(),
std::eek:stream_iterator< Line >( std::cout ) ) ;

once it's parsed the options. (In this case, it's definitly
necessary to save all of the data in a vector, since the output
formatting depends on data collected over the input
data---things like the length of each field.)

One thing at a time, though. And except for special cases, I'm
not sure that this isn't obfuscation. To tell the truth, I'm
not even sure that it isn't obfuscation here. But it was fun to
write, and I've found it quite easy to modify, adding additional
options as time goes on. But I'm not sure that it's a good
general solution---it works well here because the output is a
direct line by line mapping of the input. (And even here, the
Line class collects a lot of additional data during input, which
is used in output.)
This looks much like the previous example to me

It is.
-- you seem to be
reading fields that make up a record, so I'd make that explicit:
struct record {
type1 field1;
// ...
typeLast fieldLast;
// ...
};
std::istream &operator>>(std::istream &is, record & r) {
std::string line;
std::getline(line, is);
std::istringstream data(line);
data >> field1 >> /* field2, ... */ fieldLast >> std::ws;
// check for errors, etc.
}
Then the processing turns into something like:

As above, if you're (primarily) producing a summary of the
data, std::accumulate can make more sense than std::transform.
While it's true that these will often (usually?) be at least
marginally larger in terms of lines of code, it's also true
that most of the extra code is more or less boilerplate --
extra class and function definition "stuff" surrounding
roughly the same bits of real code.

I think it depends somewhat on the context. If it makes sense
for the parsed data to be a single class, then I'll go this way;
if it doesn't, then I probably won't. The choice of whether
there is a ParsedData class or not is made at a higher level,
according to the design of the application, and I rather think
introducing it only to be able to use istream_iterator is a bit
of obfuscation.

I think it depends somewhat on the context. If it makes sense
for the parsed data to be a single class, then I'll go this way;
if it doesn't, then I probably won't. The choice of whether
there is a ParsedData class or not is made at a higher level,
according to the design of the application, and I rather think
introducing it only to be able to use istream_iterator is a bit
of obfuscation.
My own experience with this has been quite positive -- typing in the
code doesn't take significantly longer, and at least for me, debugging
is reduced substantially. Likewise, I find that the separation of
responisibility makes incomplete analysis more apparent. When the code
does need modification, the separation of responsibility generally makes
it much easier to find what to modify where, and the defined interfaces
between the pieces make it easier to isolate the modification what's
really desired.

My own experience has been varied, as I said. Often,
introducing something along the lines of a ParsedLine class is a
good idea, and in such cases, using istream_iterator is more or
less natural. In other cases, it's more a case of forcing the
design to fit a pre-conceived implementation technique, which
generally isn't a good idea.

It is, at any rate, useful enough that it should be presented in
any presentation of iostream. But I'd still present the basic
loop construct first.
[ ... ]
But I was actually suggesting a description at a slightly lower
level, the basic loop, for example, but not what you do in it.
Well, I'll openly admit I was partly being humorous.
Nonetheless, there was a serious point: much of the time, you
can use a standard algorithm, so you don't need to write a
loop at all.

At any rate, it caused me to reflect some. To the point where
if I ever do find the time to write up such an article, I'll
certainly mention this possibility.
 
J

Jerry Coffin

(e-mail address removed)>, (e-mail address removed)
says...

[ ... ]
Certainly. You *can* hide all of the program in a few fancy >>
or << operators. Something like "aligneq" (see
http://kanze.james.neuf.fr/code-en.html, then navigate through
the sources in the Exec branch), where main() basically just
does:

[ ... copy in and copy out ]

I'm not at all excited about this structure at all. Rather the contrary,
I think it really does obfuscate what the program does -- if main looks
like it just copies the data in and then copies it back out, that's
generally what it should do.

My point isn't to hide the guts of the program into fancy >> and <<
operators -- rather, it's to isolate the physical representation of the
data into a few specific routines, and let the rest of the program work
with the data in a purely logical representation.

At the same time, if it's doing processing, I think that processing
should be apparent, and to the extent reasonable the _type_ of
processing should be apparent from how its being called. That's why I
pointed out the use of accumulate vs. transform. It's also part of why I
stay away from for_each most of the time -- for_each doesn't even give a
clue about what sort of final result to expect from the processing.

[ ... ]
One thing at a time, though. And except for special cases, I'm
not sure that this isn't obfuscation. To tell the truth, I'm
not even sure that it isn't obfuscation here. But it was fun to
write, and I've found it quite easy to modify, adding additional
options as time goes on. But I'm not sure that it's a good
general solution---it works well here because the output is a
direct line by line mapping of the input. (And even here, the
Line class collects a lot of additional data during input, which
is used in output.)

I haven't looked through the code (yet) but your description sounds to
me like it really is obfuscation. To the extent possible, operator>>
should be devoted to reading in data and converting it from a physical
to a logical representation. Of course, it needs to deal with errors and
such, but it generally should NOT do processing beyond that.

That can lead to a problem: istream_iterator is basically purely
sequential, and in some cases you don't want to operate on everything in
sequence. An obvious example is parsing log files for records of a
particular type (or a few particular types). For a job like this, I'd
consider using a boost::filter_iterator.

At least from my viewpoint, the idea is not to obfuscate the program by
hiding all or most of the processing in operators << and >>. Rather,
it's to make the program more transparent by providing a clear division
of responsibility between input conversion, filtering, processing,
possible further filtering, and output.

That's also part of why I rarely use std::for_each -- it does nothing to
even give the reader a clue about what sort of results I'm expecting to
produce from this collection of data. If I use std::transform, that
gives a good idea that each record that's processed will produce a
record of output. By contrast, if I use std::accumulate, they can expect
that I'm producing some sort of summary about the data set as a whole.

The use of standard vs. filter iterators contributes as well -- I could
put the filtering part into the functor that does the processing, but
that hides the intent from the reader. My ultimate intent is for main()
(or whatever function) to give a clear, concise summary of what's being
done.

[ ... ]
I think it depends somewhat on the context. If it makes sense
for the parsed data to be a single class, then I'll go this way;
if it doesn't, then I probably won't. The choice of whether
there is a ParsedData class or not is made at a higher level,
according to the design of the application, and I rather think
introducing it only to be able to use istream_iterator is a bit
of obfuscation.

I definitely wouldn't introduce it _only_ to allow the use of
istream_iterator. At the same time, I'd have to wonder about the design
of a file format if it grouped fields together onto a line, but those
fields really weren't related.

In any case, I don't see it as a question of lines per se -- as I said
before, it's mostly about isolating the knowledge of the physical format
into one place, and as quickly and thoroughly as possible converting it
to a logical format that makes sense for the rest of the program.

[ ... ]
It is, at any rate, useful enough that it should be presented in
any presentation of iostream. But I'd still present the basic
loop construct first.

Well, that's certainly fair. Even I don't claim that this form of I/O is
always right -- and even if it was right for all new code, there would
still be a lot of existing code that doesn't use it.
 
J

James Kanze

Jerry said:
(e-mail address removed)>, (e-mail address removed)
says...
[ ... ]
Certainly. You *can* hide all of the program in a few fancy >>
or << operators. Something like "aligneq" (see
http://kanze.james.neuf.fr/code-en.html, then navigate through
the sources in the Exec branch), where main() basically just
does:
[ ... copy in and copy out ]

I'm not at all excited about this structure at all. Rather the contrary,
I think it really does obfuscate what the program does -- if main looks
like it just copies the data in and then copies it back out, that's
generally what it should do.

That's more or less what I was saying. Although in this case,
one could argue that that's all the program does---reads the
data in and then copies it back out. Of course, the output
format is not exactly the same as the input format (but only
white space is changed). But I'm still not totally convinced
that it's a good idea.
My point isn't to hide the guts of the program into fancy >>
and << operators -- rather, it's to isolate the physical
representation of the data into a few specific routines, and
let the rest of the program work with the data in a purely
logical representation.
At the same time, if it's doing processing, I think that
processing should be apparent, and to the extent reasonable
the _type_ of processing should be apparent from how its being
called. That's why I pointed out the use of accumulate vs.
transform. It's also part of why I stay away from for_each
most of the time -- for_each doesn't even give a clue about
what sort of final result to expect from the processing.

I understand, sort of. In the case of the code I referred to,
the only "processing" is reformatting. And formatting is
traditionally the role of <<. On the other hand, I'd argue that
in most cases, parsing multiple fields is more than what one
expects from >>. To tell the truth, I just don't know. (If I
were doing it again, I'd probably read into the array, breaking
the line down into fields, the extract the necessary global
information from the array, and copy out, possibly using
transform with the global data as the functional object.)
[ ... ]
One thing at a time, though. And except for special cases, I'm
not sure that this isn't obfuscation. To tell the truth, I'm
not even sure that it isn't obfuscation here. But it was fun to
write, and I've found it quite easy to modify, adding additional
options as time goes on. But I'm not sure that it's a good
general solution---it works well here because the output is a
direct line by line mapping of the input. (And even here, the
Line class collects a lot of additional data during input, which
is used in output.)
I haven't looked through the code (yet) but your description sounds to
me like it really is obfuscation. To the extent possible, operator>>
should be devoted to reading in data and converting it from a physical
to a logical representation. Of course, it needs to deal with errors and
such, but it generally should NOT do processing beyond that.

All it does in addition, in this particular case, is collect
information.

The code has sort of "grown" in time, of course, with more and
more added features---it also uses static members in which to
collect the data, which is a sure sign that it wasn't well
designed. But it started out as a quick hack, to solve one
small problem, and then like most "throw away programs", got
reused and reused, each time with another feature being tacked
on. It probably (certainly) needs a major rewrite (as do one or
two other tools in that directory), but I never seem to find the
time.
That can lead to a problem: istream_iterator is basically
purely sequential, and in some cases you don't want to operate
on everything in sequence. An obvious example is parsing log
files for records of a particular type (or a few particular
types). For a job like this, I'd consider using a
boost::filter_iterator.

Another case is when only part of the file has a specific
format; I encounter this a lot. And of course, every text file
should allow empty lines and comments. The filter_iterator
would handle the first, and a filtering streambuf can usually be
used effectively for the second---or both---but you still do
want to be able to include the line number in case of an error.

When all of this is considered, I find it rare that
istream_iterator can be used that much. Most often, the loop
will look something like:

while ( std::getline( in, line ) ) {
++ lineNumber ;
if ( Gabi::trim( line ) != "" ) {
if ( ! parseLine( line ) ) {
std::cerr << progName << ": [" << filename <<
':' << lineNumber << "]: syntax error" << std::endl ;
} else {
processData( ... ) ;
}
}
}

I can imagine ways of handling both the error message and the
line number in operator>> (converting an erroneous line into an
empty line, so that filter_iterator will skip it), but they
really are obfuscation (use of ios::iword(), for example, to
track the line number). Somehow, it just doesn't seem natural.
Where as the above seems like the standard processing idiom for
a text file.

Similarly for output. I almost never use ostream_iterator,
because much of the time, I'm doing something like:

int elementsInLine = 0 ;
for ( C::const_iterator iter = c.begin() ; c != c.end() ; ++ c ) {
if ( elementsInLine == 0 ) {
out << start of line...
} else {
out << element separator...
}
out << *iter ;
++ elementsInLine ;
if ( elementsInLine == maxElementsInLine ) {
out << '\n' ; // or other end of line data...
elementsInLine = 0 ;
}
}
if ( elementsInLine != 0 ) {
out << '\n' ;
}

Depending on the case, there may also be a test for iter ==
c.begin() in the loop, or I'll use a while, put the
incrementation in the loop, and test for iter == c.end() after
it, so that I don't get an extra separator. (But a lot of the
time, when I'm outputting text, it's C++ code, typically table
initializers, so an extra separator at the end doesn't matter.)
At least from my viewpoint, the idea is not to obfuscate the
program by hiding all or most of the processing in operators
<< and >>. Rather, it's to make the program more transparent
by providing a clear division of responsibility between input
conversion, filtering, processing, possible further filtering,
and output.

The problem is that in most peoples eyes, I suspect that
breaking text up into fields is not just "input conversion".
That's why I said before: if the program design would logically
lead you to have a class containing the data in the line, then
fine. If it doesn't, I wouldn't force the issue, and create a
class just so that I could use this idiom; that seems to be the
tail wagging the dog to me.
That's also part of why I rarely use std::for_each -- it does
nothing to even give the reader a clue about what sort of
results I'm expecting to produce from this collection of data.

*IF* the functional object has a good, logical name, I don't
have any problem with it. Again, the problem occurs when you
are forced to create a class type soley to use for_each.

In practice, of course, transform and accumulate are probably
better choices most of the time. But there too: how far do you
push the idiom? My CRC, MD5 and SHA classes are "accumulators",
to be used with std::accumulate. Again, it was fun, and the
idiom looks cool, but is it really more readable? (There's also
the problem that the "accumulator" in std::accumulate gets
copied around an awful lot. In the case of CRC, that's not too
much of a problem---the accumulator may be a class type, but
it's only 16 or 32 bits in size. In the case of MD5 and the SHA
classes, however, it has a very noticeable impact on
performance---accumulating a single char is usually only a write
and an index manipulation, but you end up copying something like
48 bytes, twice.)
If I use std::transform, that gives a good idea that each
record that's processed will produce a record of output. By
contrast, if I use std::accumulate, they can expect that I'm
producing some sort of summary about the data set as a whole.

And what do you use when each output record is derived from
several input records:)?
The use of standard vs. filter iterators contributes as well
-- I could put the filtering part into the functor that does
the processing, but that hides the intent from the reader. My
ultimate intent is for main() (or whatever function) to give a
clear, concise summary of what's being done.

Agreed, but what is "clear" often depends on the readers
expectations.
[ ... ]
I think it depends somewhat on the context. If it makes sense
for the parsed data to be a single class, then I'll go this way;
if it doesn't, then I probably won't. The choice of whether
there is a ParsedData class or not is made at a higher level,
according to the design of the application, and I rather think
introducing it only to be able to use istream_iterator is a bit
of obfuscation.
I definitely wouldn't introduce it _only_ to allow the use of
istream_iterator. At the same time, I'd have to wonder about the design
of a file format if it grouped fields together onto a line, but those
fields really weren't related.

The problem is often that file formats are designed around
pragmatic considerations, not just program design
considerations. If nothing else, you want (must?) allow empty
lines and comments---it's not rare to expect continuatin lines
as well. Depending on the case, it may be more or less
difficult to handle these, and still be able to output the
correct line number in case of error. Or it may have to deal
with two different formats (e.g. a Windows .ini file).

The question is always, how far to you take things? I've
experimented with using filtering streambufs to read .ini
files: the first level strips comments (if they're supported)
and empty linesand handles continuation lines, and tracks the
line number (since nothing downstream sees all of the '\n'
characters); the second is used to read the attribute value
pairs in a section: it's inserted after the [...] line has been
read, and declares EOF when it sees a line starting with a '['.
So you can use an istream_iterator< AttributeValuePair > and
std::copy to read an entire section. But it was just that, an
experiment. I feel that the results really were obfuscation,
and while I'm familiar enough with the idioms to be able to
follow it, I don't really expect that to be the case in general.
I suspect that for most readers, it would be total obfuscation,
and I now use a classical "while ( getline(...) )" loop, with a
couple of if's in the loop, depending on what regular expression
matches.
 
J

Jerry Coffin

Jerry Coffin wrote:

[ ... ]
Another case is when only part of the file has a specific
format; I encounter this a lot.

Yes, it's true that istream_iterators generally expect to read an entire
file. At times it's reasonable to set failbit in operator>> when you
detect the end of data to be read by that operator, at which time you
clear the stream and read the next type of data expected. That's not
always a good way to go either, of course, especially if you're dealing
with highly variable input data.
When all of this is considered, I find it rare that
istream_iterator can be used that much.

Hmm...this may be a difference in what sorts of data we typically
process. I write a lot of code that processes machine-generated files,
which makes the input format much more regular, and things like comments
are rarely an issue.

OTOH, while it's certainly true that some code has to handle variable
and/or error ridden input data robustly, I wouldn't expect that of most
beginners (which, as I recall, was the subject at hand).

[ ... ]
The problem is that in most peoples eyes, I suspect that
breaking text up into fields is not just "input conversion".
That's why I said before: if the program design would logically
lead you to have a class containing the data in the line, then
fine. If it doesn't, I wouldn't force the issue, and create a
class just so that I could use this idiom; that seems to be the
tail wagging the dog to me.

I agree -- though it appears that I tend to see data as related/grouped
when quite a few other people don't (and that's not a matter of a hammer
making everything look like a nail -- for me, it started long before
iostreams came along).
*IF* the functional object has a good, logical name, I don't
have any problem with it. Again, the problem occurs when you
are forced to create a class type soley to use for_each.

Even when it does have a good logical name, I rarely find for_each the
right way to invoke that meaningful function.
In practice, of course, transform and accumulate are probably
better choices most of the time. But there too: how far do you
push the idiom? My CRC, MD5 and SHA classes are "accumulators",
to be used with std::accumulate. Again, it was fun, and the
idiom looks cool, but is it really more readable? (There's also
the problem that the "accumulator" in std::accumulate gets
copied around an awful lot. In the case of CRC, that's not too
much of a problem---the accumulator may be a class type, but
it's only 16 or 32 bits in size. In the case of MD5 and the SHA
classes, however, it has a very noticeable impact on
performance---accumulating a single char is usually only a write
and an index manipulation, but you end up copying something like
48 bytes, twice.)

I've never written one of these this way, so I'm hard put to say much
concrete about them. I can certainly see where they might be more of a
problem than a solution.
And what do you use when each output record is derived from
several input records:)?

Generally accumulate -- whether it's from several records or the entire
set, you're still accumulating data across a number of records, where
transform is really about transforming _each_ record into some output.
Of course, in conjunction with a filter_iterator, it may be every record
that satisfies some criteria, but the idea remains similar.
 
J

James Kanze

[...]
I've never written one of these this way, so I'm hard put to
say much concrete about them. I can certainly see where they
might be more of a problem than a solution.

The idiom for using them is very natural:
std::string md5
= std::accumulate( text.begin(), text.end(), MD5() ) ;
Or even:
text += std::accumulate( text.begin(), text.end(), MD5() ) ;
The problem is purely performance.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,023
Latest member
websitedesig25

Latest Threads

Top