Why am I so stupid?

N

none

Every little thing I try to do using STL turns into a five mile crawl
through raw sewage.

I want to use the STL to lex/parse some text. This must have been done
at least 5.0e+75 times before, but I can't find the solution in the FAQ
or in any examples.

This is my utterly failed attempt:


std::string s = "5.0 This is a string";
std::istringstream i(s);
std::cout << "Before: \"" << s << "\"" << std::endl;

float x;
i >> x;

std::cout << "Extracted a float: " << x << std::endl;
std::cout << "After: \"" << s << "\"" << std::endl;


And this is the output produced:


Before: "5.0 This is a string"
Extracted a float: 5
After: "5.0 This is a string"


All that I want in this world is for that output to be, instead, this:


Before: "5.0 This is a string"
Extracted a float: 5
After: " This is a string"


Then I can just continue parsing the remainder of the string. I have
tried the following, and all have failed:

1) Making the assignment "s = i.str();" after the line "i >> x;" in
hopes that "i.str()" would return only the un-extracted portion of the
string. It doesn't. It returns the entire string.
2) Calling "i.sync();" I don't know why I thought this would help. A
shot in the dark.
3) Calling "i.ignore();" a few times. No effect.
4) Changing the last line to:

std::cout << "After: \"" << i << "\"" << std::endl;

(displaying the value of "i" instead of "s". It displays a hex number,
probably a pointer I guess.

5) Changing the last line to:

std::cout << "After: \"" << i.str() << "\"" << std::endl;

I knew before I tried that it wouldn't help, but was desperate.


How do you get the >> operator to actually remove the characters from a
string?


Help me, Obi Wan. You're my only hope.


P.S.: Please, oh please, don't say Boost. I don't want to add 75000
files to my project just to remove three characters from a string.
 
A

Alf P. Steinbach

* none:
Every little thing I try to do using STL turns into a five mile crawl
through raw sewage.

I want to use the STL to lex/parse some text. This must have been done
at least 5.0e+75 times before, but I can't find the solution in the FAQ
or in any examples.

This is my utterly failed attempt:


std::string s = "5.0 This is a string";
std::istringstream i(s);
std::cout << "Before: \"" << s << "\"" << std::endl;

float x;
i >> x;

std::cout << "Extracted a float: " << x << std::endl;
std::cout << "After: \"" << s << "\"" << std::endl;


And this is the output produced:


Before: "5.0 This is a string"
Extracted a float: 5
After: "5.0 This is a string"


All that I want in this world is for that output to be, instead, this:


Before: "5.0 This is a string"
Extracted a float: 5
After: " This is a string"


Then I can just continue parsing the remainder of the string.

All you need to do is to continue to read from the istringstream (e.g. if you
really wanted to you could read the remainding text into the string by 'getline(
i, s );', but note that doing that thing repeatedly would yield O(n^2) time).

The stream does not change the original string from which it is initialized.

It just changes its internal buffer read position.

I have
tried the following, and all have failed:

1) Making the assignment "s = i.str();" after the line "i >> x;" in
hopes that "i.str()" would return only the un-extracted portion of the
string. It doesn't. It returns the entire string.
2) Calling "i.sync();" I don't know why I thought this would help. A
shot in the dark.
3) Calling "i.ignore();" a few times. No effect.
4) Changing the last line to:

std::cout << "After: \"" << i << "\"" << std::endl;

(displaying the value of "i" instead of "s". It displays a hex number,
probably a pointer I guess.

5) Changing the last line to:

std::cout << "After: \"" << i.str() << "\"" << std::endl;

I knew before I tried that it wouldn't help, but was desperate.


How do you get the >> operator to actually remove the characters from a
string?

It removes characters from the input buffer. :)

Help me, Obi Wan. You're my only hope.


P.S.: Please, oh please, don't say Boost. I don't want to add 75000
files to my project just to remove three characters from a string.

Boost isn't a bad idea: it should be there anyway.

The problem with Boost is that, as you rightly note, it's an extreme amount of
baggage for just a small kernel of functionality that you regularly use.

And even though that's about the same as with many other libraries, one does
wish for some "Boost kernel" project...


Cheers & hth.,

- Alf
 
N

none

Alf said:
All you need to do is to continue to read from the istringstream

I Can't really do that. Maybe there is a way, but it's not obvious. To
write a lexer, you need lots of little functions like "lex_float" and
"lex_int" and so on. Some of these functions rely on members of
std::string, like "find_first_of" and "find_first_not_of," etc. So the
text that's being parsed really needs to exist in a string, not an
istringstream. Each individual function can create an istringstream as a
helper, as in my example code, but the input itself needs to be in a
string.

if you really wanted to you could read the remainding text into the
string by 'getline( i, s );', but note that doing that thing
repeatedly would yield O(n^2) time).

This works! Many many thanks. Execution speed for a lexer isn't really
critical, but it needs to be reasonably efficient when large files are used
as input. I am open to other suggestions!

Boost isn't a bad idea: it should be there anyway.

The problem with Boost is that, as you rightly note, it's an extreme
amount of baggage for just a small kernel of functionality that you
regularly use.

And even though that's about the same as with many other libraries,
one does wish for some "Boost kernel" project...

Ugh. Boost. I have used it many times, and have developed a distaste for
it. Maybe the thing that would make it more friendly would be to split it
up. If all I want is boost::format, for "printf" functionality, then give
me a little package I can download that includes just that. I suppose the
problem is that boost::format relies on a hundred other boost functions and
classes. I don't know the solution.

Anyway, thanks again for the help.
 
B

Bart van Ingen Schenau

none said:
Every little thing I try to do using STL turns into a five mile crawl
through raw sewage.

I want to use the STL to lex/parse some text. This must have been
done at least 5.0e+75 times before, but I can't find the solution in
the FAQ or in any examples.

This is my utterly failed attempt:


std::string s = "5.0 This is a string";
std::istringstream i(s);
std::cout << "Before: \"" << s << "\"" << std::endl;

float x;
i >> x;

std::cout << "Extracted a float: " << x << std::endl;
std::cout << "After: \"" << s << "\"" << std::endl;

Change that last line to:
std::cout << "After: \"" << i.rdbuf() << "\"" << std::endl;
and you will have the requested output.

How do you get the >> operator to actually remove the characters from
a string?

You can't, because operator>> does not work on strings. Only on streams.
It is the same that reading from a file does not destroy the contents of
the original file.
Help me, Obi Wan. You're my only hope.


P.S.: Please, oh please, don't say Boost. I don't want to add 75000
files to my project just to remove three characters from a string.

Bart v Ingen Schenau
 
J

James Kanze

Every little thing I try to do using STL turns into a five
mile crawl through raw sewage.
I want to use the STL to lex/parse some text. This must have
been done at least 5.0e+75 times before, but I can't find the
solution in the FAQ or in any examples.

The STL is not a lexer/parser tool. It addresses a much lower
level. Parts of the STL can be used in parsing.
This is my utterly failed attempt:
std::string s = "5.0 This is a string";
std::istringstream i(s);
std::cout << "Before: \"" << s << "\"" << std::endl;
float x;
i >> x;
std::cout << "Extracted a float: " << x << std::endl;
std::cout << "After: \"" << s << "\"" << std::endl;
And this is the output produced:
Before: "5.0 This is a string"
Extracted a float: 5
After: "5.0 This is a string"
All that I want in this world is for that output to be,
instead, this:
Before: "5.0 This is a string"
Extracted a float: 5
After: " This is a string"

Why would you expect this? If you were using an ifstream, you
wouldn't expect characters extracted from the stream to be
removed from the file.
Then I can just continue parsing the remainder of the string.
I have tried the following, and all have failed:
1) Making the assignment "s = i.str();" after the line "i >> x;" in
hopes that "i.str()" would return only the un-extracted portion of the
string. It doesn't. It returns the entire string.
2) Calling "i.sync();" I don't know why I thought this would help. A
shot in the dark.
3) Calling "i.ignore();" a few times. No effect.
4) Changing the last line to:
std::cout << "After: \"" << i << "\"" << std::endl;
(displaying the value of "i" instead of "s". It displays a
hex number, probably a pointer I guess.

There is no << operator for an istream, so the compiler tries
any implicit conversions available. There's an implicit
conversion to void*; what you're seeing is a result of that.
But you're on the right track.

(
std::cout << "After: \"" << i.rdbuf() << "\"" << std::endl ;
will do what you are trying to do. But you don't want to do
what you are trying to do, see below.)
5) Changing the last line to:
std::cout << "After: \"" << i.str() << "\"" << std::endl;
I knew before I tried that it wouldn't help, but was desperate.
How do you get the >> operator to actually remove the
characters from a string?

You can't, because doing so would break too many expectations
and invariants. A stream is an access method into the
underlying sequence; an input stream should *never* modify the
underlying sequence, in any way.

The answer to your problem, of course, is to use istream for all
of your parsing. After extracting the float, continue
extracting using the same istream (istringstream, etc.). (FWIW:
I have an IteratorInputStream, in which the input sequence is
defined by two STL iterators---it's really simple to do. And
this streambuf has additional functions which return or set the
current iterators, in case part of the input should be parsed
using functions in <algorithm>. But in practice, such cases are
rare; algorithm doesn't contain much useful for parsing.)
 
J

James Kanze

I Can't really do that. Maybe there is a way, but it's not
obvious. To write a lexer, you need lots of little functions
like "lex_float" and "lex_int" and so on. Some of these
functions rely on members of std::string, like "find_first_of"
and "find_first_not_of," etc. So the text that's being parsed
really needs to exist in a string, not an istringstream. Each
individual function can create an istringstream as a helper,
as in my example code, but the input itself needs to be in a
string.

Then you're doing it wrong. Functions in a lexer don't get
passed a complete input; they get passed a position in a stream.
Depending on the strategy used, in C++, this can be an iterator,
an istream or a streambuf, or some custom class which represents
a stream. You do NOT want a lexer or a parser to modify the
underlying data. It's very inefficient if the data are in
memory, and would be a disaster if the data were on disk.
(Imagine if every time you compiled a C++ source, the file you
compiled ended up empty.)

The member functions of std::string are, for the most part,
pretty useless. They all have more or less equivalents in
<algorithm>, which are slightly more useful, but on the whole,
when parsing, you don't go skipping to the next 'x', you extract
characters until some particular condition is reached.
 
B

Boris Schaeling

[...]It just changes its internal buffer read position.

Another idea is then to get the current read position and trim the string.
The code would look something like this after you read from the stream:

ios::pos_type pos = i.rdbuf()->pubseekoff(0, ios::cur, ios::in);
if (pos != ios::pos_type(ios::eek:ff_type(-1)))
s.erase(0, pos);

Boris
 
N

none

James said:
The answer to your problem, of course, is to use istream for all
of your parsing. After extracting the float, continue
extracting using the same istream (istringstream, etc.).

Ok, you (and others) have sold me on the concept of using the istringstream
for parsing, instead of working directly with the string.

But istringstream makes the assumption that everything is separated by
whitespace, at least in the case of the >> operator.

A typical lexer/parser only knows about "tokens." So I might have this
list of tokens:

"(", ")", "+", "-", "*", "/", "sqrt", "sin", "cos", ...

and I might wand to parse an input stream that looks like this:

"(1.2 * -sqrt(7.5e3))"

In other words, I can't rely on tokens to be separated by whitespace. I
need to "peek" before I actually extract, and I might need to peek at more
than one character -- for example, to match the "sqrt" token.

I understand the theory of parsing an input sequence using a grammar, and I
have one that works beautifully in C using a recursive-descent approach.
I'd like to bring it up to date by using an istringstream, or whatever STL
construct is appropriate. I'm finding the low-level nuts and bolts, like
tokenization, very difficult.
 
P

Pascal J. Bourguignon

none said:
Ok, you (and others) have sold me on the concept of using the istringstream
for parsing, instead of working directly with the string.

But istringstream makes the assumption that everything is separated by
whitespace, at least in the case of the >> operator.

A typical lexer/parser only knows about "tokens." So I might have this
list of tokens:

"(", ")", "+", "-", "*", "/", "sqrt", "sin", "cos", ...

and I might wand to parse an input stream that looks like this:

"(1.2 * -sqrt(7.5e3))"

In other words, I can't rely on tokens to be separated by whitespace. I
need to "peek" before I actually extract, and I might need to peek at more
than one character -- for example, to match the "sqrt" token.

I understand the theory of parsing an input sequence using a grammar, and I
have one that works beautifully in C using a recursive-descent approach.
I'd like to bring it up to date by using an istringstream, or whatever STL
construct is appropriate. I'm finding the low-level nuts and bolts, like
tokenization, very difficult.


One word: ABSTRACTION

Now, of course it's more complex than than, you have to understand
what "abstraction" means (a good book to study about abstraction is sicp:
Structure and Interpretation of Computer Programs
http://mitpress.mit.edu/sicp/full-text/book/book-Z-H-4.html
http://swiss.csail.mit.edu/classes/6.001/abelson-sussman-lectures/

http://eli.thegreenplace.net/category/programming/lisp/sicp/
http://www.codepoetics.com/wiki/index.php?title=Topics:SICP_in_other_languages

But specifically, you shouldn't care whether your source is
represented by a string or a istringstream or a list of character or
whatever.

What is a source for your language? It's a sequence of characters.
More over, I notice that you may need to look ahead one character to
know when a token is complete. So you could define this abstract data
type:

class Source{
public:
typedef unsigned char Character;

Source(std::string text);
virtual ~Source();

virtual bool endOfSource();
virtual Character currentCharacter();
virtual Character nextCharacter();
virtual void advance();
};



Then you can easily implement the getNextToken function:

Token getNextToken(Source* s){
eatSpaces(s);
switch(s->currentCharacter()){
case '(': s->advance(); return Token::LeftParenthesis;
case ')': s->advance(); return Token::RightParenthesis;
...
case 'a':case 'b': ... case 'z':
std::string identName(s->currentCharacter());
s->advance();
while(isalnum(s->currentCharacter())){
identName.push_back(s->currentCharacter());
s->advance();
}
return Token::makeIdentifier(identName);
default:
error("Invalid character",s->currentCharacter())
}
}



If you want to report source position in error message, you can add
to Source methods such as:

virtual Position currentPosition();

and define the methods of Token to take a position argument:

case ')':
pos=s->currentPosition();
s->advance();
return Token::makeRightParenthesis(pos);



Right, I didn't mention anything about how to get the characters from
a string, a file or an istringstream. It's not important, you can do
however you want. A simple way is to index the current character in a
std::string, or to read a file or stream character by character.
 
N

none

none said:
James said:
The answer to your problem, of course, is to use istream for all
of your parsing. After extracting the float, continue
extracting using the same istream (istringstream, etc.).

Ok, you (and others) have sold me on the concept of using the
istringstream for parsing, instead of working directly with the
string.
[...]

I'm finding the low-level nuts and bolts, like tokenization,
very difficult.


I managed to make this work using really awful looking loops and
combinations of istringstream::tellg() and istringstream::seekg().

For example, when you extract a float from the stream, you usually want
to save the actual stream that was extracted for later use, e.g. error
reporting.

To do that with istringstream, I had to hack this together:

// lexer::m_last_match is a std::string that holds the last
// successfully parsed token, constant, etc, for reporting the
// location of a syntax error

float lexer::parse_constant(std::istringstream &iss)
{
float retval;

std::streamoff pos_before = iss.tellg();

if (iss >> retval)
{
std::streamoff pos_after = iss.tellg();
iss.seekg(pos_before);

m_last_match.clear();

while (iss.good() && iss.tellg() < pos_after)
{
char c;
iss >> c;
m_last_match += c;
}
}
else
// ...

I had to do something similar just to match a token at all... Use a
for-loop to extract the size() of the token to be matched from the
istringstream, do a regular string compare, if the compare returns
false, put the extracted characters back into the stream...

I guess, somewhere, I got the incorrect impression that the STL was
intended to make things easier. I hereby retract the question posed in
the subject of this thread and change it to "Why is the STL so stupid?"
 
C

Christof Donat

Hi,
I want to use the STL to lex/parse some text. This must have been
done at least 5.0e+75 times before, but I can't find the solution in
the FAQ or in any examples.

I see quite some simple solutions to your problem:

1. Your gramar is simple - then you still can use C strings with C++. Be
carefull with them as usual, but where they are the best tool, they are
the best tool. Alternatively you might try boost again and have a look
at its string algorithms.

2. Your lexical gramar is more complex, but your syntactical gramar is
simple - use regular expresions. I have noticed that you dislike boost,
but still I do recommend it strongly. It includes is a pretty good
regular expression library that should also be usable with little of the
other boost libraries.
In case you'd rather chose C strings over touching boost again, have a
look at lex (or the GNU variant flex).

3. Your lexical gramar and your syntactical gramar are complex - use
boost::spirit, or yacc (GNU variant: bison) together with flex or boosts
regular expression library. Both are capable of building your parser
using the BNF form of your syntactical gramar and regular expressions
for your lexer. spirit is more the C++ style while bison creates really
fast parsers. I have not worked with the original yacc yet, so I can not
tell you any experience with it.

Christof
 
J

James Kanze

Ok, you (and others) have sold me on the concept of using the
istringstream for parsing, instead of working directly with
the string.

For the type of parsing you've been describing (or seem to have
been describing). For most of my own parsing, I use regular
expressions, or lex and yacc.
But istringstream makes the assumption that everything is
separated by whitespace, at least in the case of the >>
operator.

Partially. If you're reading an int, it will stop at the first
non-numeric character.
A typical lexer/parser only knows about "tokens." So I might
have this list of tokens:
"(", ")", "+", "-", "*", "/", "sqrt", "sin", "cos", ...
and I might wand to parse an input stream that looks like this:
"(1.2 * -sqrt(7.5e3))"
In other words, I can't rely on tokens to be separated by
whitespace. I need to "peek" before I actually extract, and I
might need to peek at more than one character -- for example,
to match the "sqrt" token.

So define a type Token, and write an >> operator for it, which
does what you want. (But I'd still just use lex, and be done
with it.)
I understand the theory of parsing an input sequence using a
grammar, and I have one that works beautifully in C using a
recursive-descent approach. I'd like to bring it up to date
by using an istringstream, or whatever STL construct is
appropriate. I'm finding the low-level nuts and bolts, like
tokenization, very difficult.

What are you using in C?
 
J

James Kanze

[...]
But specifically, you shouldn't care whether your source is
represented by a string or a istringstream or a list of
character or whatever.

Yes and no. For some simple parsing jobs, istream contains 90%
of your parser, already implemented; other sequences might not.
For a relatively simple lexer, it might even be reasonable to
define a type Token, and write a >> operator which reads tokens;
I wouldn't recommend this for something like C++ (even without
the preprocessor), but if e.g. his language only uses numbers
(all of which are required to start with a digit), symbols (all
of which must start with an alpha) and a small set of single
character operators or punctuation, it could be an appropriate
solution.

For that matter, it's also possible to define a container or an
accumulator such that your entire parser is invoked by:
std::copy( std::istream_iterator< Token >( source ),
std::istream_iterator< Token >(),
std::back_inserter( parseTree ) ) ;
or
parseTree = std::accumulate(
std::istream_iterator< Token >( source ),
std::istream_iterator< Token >(),
ParseTree() ) ;
I'll admit that this looks more like obfuscation than anything
else to me, though. (But who knows? Maybe in some specific
cases...)
 
N

none

James said:
What are you using in C?

Just stdio. fopen(), fread(), fclose()... then basically just marching
a char* along the buffer, doing strncmp() and similar where needed.
Much of the stdio library is written in highly optimized assembly, so
it's hard to beat the efficiency.

Maybe it was a mistake to think that moving from that to C++ and STL was
the right thing to do. All I ever hear is that "char*" is the most
dangerous thing ever invented and one of the greatest failures of
mankind.

Ok, fine, so let's all be safe and use string and iostream. Well, you
can't do all the things you used to do with char*. Some of them have
replacements, sort of, and some don't. This is the key failing of the
STL, in my opinion. If you're going to make something new, and that new
thing is intended to be considered a "standard" that supercedes some old
thing, then it MUST provide all the functionality of the old thing.

If the attitude toward the STL was "Here's a bunch of new containers
that you can use *in addition to* your familiar old stdio tools," then
great. But that is NOT the attitude at all. The attitude is that
somehow stdio is horrible and should be avoided at all costs and should
be REPLACED by the STL. Ok, but if that's what you (not you personally,
but the ISO or SGI or whoever the hell thought STL was a good idea)
want, then do the work to make it an actual "replacement."

I have come accross a (very) few problems that were easier to solve with
STL than without. std::map and std::vector come in handy often.
Unfortunately, far more often, STL only makes simple things
unnecessarily difficult.

Microsoft, for example, takes a lot of criticism for things like MFC.
"Why waste the effort making CString when there is already
std::string?" I can't say that Microsoft has done any better than STL,
but I can say that I understand why they didn't just jump right on STL
and adopt it. Maybe I'm just not an OO guy at heart, I don't know. I'm
OK with an "int" just being a block of bits in memory and not twelve
layers of inheiritance. Yes, I know that an "int" is still just an
"int" in C++. I'm just making a point about the logic behind OO.

But I WANT to be an OO guy, or at least to give it a chance. I've been
"giving it a chance" for years and what I get in return, mostly, is
consistent disappointment.
 
J

James Kanze

Just stdio. fopen(), fread(), fclose()... then basically
just marching a char* along the buffer, doing strncmp() and
similar where needed.

The equivalent in C++ would be to read the file into a string,
then march along using iterators. Most of what is in <string.h>
can be done just as well using functions in <algorithm>. This
solution has, of course, the advantage of allowing unlimited
look-ahead. And the disadvantage of only working if the entire
file fits into memory.

A frequent compromize is to read line by line. For line
oriented input (where a "line" is significant in parsing), it
also provides a convenient resynchronization point.

There are a few things which aren't readily supported. Numeric
conversions, for example---there's no real equivalent to stdtod,
for example, which uses iterators. The most "obvious" solution
is to manually find the end of the text you want to convert,
then use the two iterators to create a string, to initialize an
istringstream, and finally read from that. Most of the time,
however, I'll use a regular expression to validate the entire
line, then read the entire line from a single istringstream,
possibly (usually, in fact) using user defined extraction
operators. Alterantively, I've written an iterator based input
streambuf and istream, which supports extracting the current
iterator and setting it, so it's easy to move between istream
and iterators, using which ever is most convenient for the next
step.

One final point: the C++ equivalent of <ctype.h> is found in
<locale>. Unlike the iterator idiom used in <algorithm> (which
is just moderately awkward), it is extremely verbose and awkward
to use. The first thing you should probably do is define a
couple of functional objects (predicates corresponding to the
isxxx functions) which use it, which you can use with the
standard algorithms.
Much of the stdio library is written in highly optimized
assembly, so it's hard to beat the efficiency.

Both stdio and iostream deal with IO, and in any quality
implementation, it is the IO which should be the bottleneck.
The functions in <algorithm> are all templates, which means that
(in practice, today, at least), the compiler has direct access
to the source code, and can inline it when appropriate. On the
whole, in a quality implementation, I would expect the functions
in <algorithm> to outperform those in <string.h> or <stdlib.h>.
(But to be honest, the only one I've measured was sort---which
does outperform qsort on the implementations I"ve tested. But
if performance really is a problem, you might want to measure.)
Maybe it was a mistake to think that moving from that to C++
and STL was the right thing to do. All I ever hear is that
"char*" is the most dangerous thing ever invented and one of
the greatest failures of mankind.

C's handling of arrays, in general, is a bit of a disaster, and
C uses what is probably the worse possible implementation of
string. But C has a long history, and a lot of parsers have
been written in it. So it may have slightly better support in
the standard library for certain types of parsing. Still, I do
a lot of parsing in C++, and I've not found it a problem. At
the start, I did need to design a few tools: CTypeFunctor, as
mentionned above, or the iterator based istream. Or my own
RegularExpression class (decidedly pre-Boost, but I still use it
because it has some features particularly useful for parsing,
it's very fast, using a DFA, and it has support for generating
staticly initialized tables, so you don't have to parse the
regular expression at runtime); the real plus in using C++ for
parsing is that such tools, once written, are an order of
magnitude easier to use than in C.
Ok, fine, so let's all be safe and use string and iostream.
Well, you can't do all the things you used to do with char*.
Some of them have replacements, sort of, and some don't.

And there are other functionalities which weren't present in C.
It's a somewhat different idiom, and in some cases, requires a
slightly different approach. I'll admit that for all but the
most trivial parsing, I use regular expressions (and did even
back in C), and it's a lot easier to have a regular expression
class, which manages all of the necessary memory automatically,
than it was to use regular expressions in C.
This is the key failing of the STL, in my opinion. If you're
going to make something new, and that new thing is intended to
be considered a "standard" that supercedes some old thing,
then it MUST provide all the functionality of the old thing.

Even when that functionality was broken? Surely you don't think
C++ needs something like strtok.
If the attitude toward the STL was "Here's a bunch of new
containers that you can use *in addition to* your familiar old
stdio tools," then great. But that is NOT the attitude at
all. The attitude is that somehow stdio is horrible and
should be avoided at all costs and should be REPLACED by the
STL. Ok, but if that's what you (not you personally, but the
ISO or SGI or whoever the hell thought STL was a good idea)
want, then do the work to make it an actual "replacement."
I have come accross a (very) few problems that were easier to
solve with STL than without. std::map and std::vector come in
handy often. Unfortunately, far more often, STL only makes
simple things unnecessarily difficult.

There's some truth in what you're saying, and the two iterator
idiom is far from ideal, most of the time. But using the
standard library, in C++, is still generally an order of
magnitude easier than using said:
Microsoft, for example, takes a lot of criticism for things
like MFC. "Why waste the effort making CString when there is
already std::string?" I can't say that Microsoft has done any
better than STL, but I can say that I understand why they
didn't just jump right on STL and adopt it.

Do you? The main reason they didn't jump on the STL bandwagon
is that STL didn't exist (or at least wasn't known) when MFC was
developed. There are a lot of libraries out there in this
situation, and they all have their own string, vector, map, etc.
Maybe I'm just not an OO guy at heart, I don't know. I'm OK
with an "int" just being a block of bits in memory and not
twelve layers of inheiritance. Yes, I know that an "int" is
still just an "int" in C++. I'm just making a point about the
logic behind OO.
But I WANT to be an OO guy, or at least to give it a chance.
I've been "giving it a chance" for years and what I get in
return, mostly, is consistent disappointment.

Don't worry about OO here. OO is a tool, and not the only one
C++ supports. OO is very useful in what more complex parsers
produce---things like parse trees, for example; it's rather
irrelevant for most parsing issues (which are basically
procedural). And there's practically no OO (at least in the
classical sense) in <algorithm>. If I look at my parser tools,
about the only "OO" component in it is streambuf (and my custom
streambuf's); the rest is still pretty procedural. (Thus, for
example, in RegularExpression, the nodes in the parse tree are
polymorphic, but the parser itself is a classical recursive
descent parser, without the slightest hint of OO, and once I've
got the parse tree, it's rapidly converted into an NFA, which is
then converted, either lazily or by request, into a DFA, neither
of which make the slighest use of OO.)

Where C++ beats C in parsing is not its support for OO, it is
the encapsulation. Thus, in C, my RegularExpression was a
struct, which required explicit initialization and liberation.
And I'd never even found a good means of merging (or'ing)
regular expressions; that had to wait for C++, with classes and
operator overloading. (FWIW, my RegularExpression class
supports things like:

RegularExpression decimal( "[1-9][0-9]*", 10 ) ;
RegularExpression octal( "0[0-7]*", 8 ) ;
RegularExpression hexadecimal( "0[xX][0-9a-fA-F]+", 16 ) ;
RegularExpression number( decimal | octal | hexadecimal ) ;

Matching number will return 10, 8 or 16, depending on which is
matched, and -1 if there is no match.)
 
J

Jorgen Grahn

Just stdio. fopen(), fread(), fclose()... then basically just marching
a char* along the buffer, doing strncmp() and similar where needed.
Much of the stdio library is written in highly optimized assembly, so
it's hard to beat the efficiency.

I don't believe that. At least on Linux and the BSDs, that stuff is
is written in C. That's the only libc implementations I have the
source code for.
Maybe it was a mistake to think that moving from that to C++ and STL was
the right thing to do. All I ever hear is that "char*" is the most
dangerous thing ever invented and one of the greatest failures of
mankind.

Ok, fine, so let's all be safe and use string and iostream. Well, you
can't do all the things you used to do with char*. Some of them have
replacements, sort of, and some don't. This is the key failing of the
STL, in my opinion. If you're going to make something new, and that new
thing is intended to be considered a "standard" that supercedes some old
thing, then it MUST provide all the functionality of the old thing.

If the attitude toward the STL was "Here's a bunch of new containers
that you can use *in addition to* your familiar old stdio tools," then
great. But that is NOT the attitude at all. The attitude is that
somehow stdio is horrible and should be avoided at all costs and should
be REPLACED by the STL.

You are confusing the containers ("STL") with iostreams. It seems to
me that you have problems with iostreams, but somehow focus your
frustration on the containers and std::string.

Try again. I am personally not a big fan of the input part of
iostreams ... for example I never attempt to 'std::cin >> foo'
anything -- if I use iostreams at all for input, I use it to read line
by line into std::strings. Then I do my own parsing. The fact that I
parse a std::string rather than a char array doesn't hinder me in any
way -- it just makes the job easier. I do not miss strlen() and friends.

And if you try again and fail, just go on using the C stuff for a
while longer. It's part of C++ and noone is going to take it away.

/Jorgen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,772
Messages
2,569,591
Members
45,101
Latest member
MarcusSkea
Top