string tokenizing

M

Mike Wahler

David Rubin said:
Mike Wahler wrote:

[snip]
template <typename InsertIter>
void tokenize(const std::string& buf,
const std::string& delim,
InsertIter& ii)
[snip]
int main()
{
std::string buf("We* are/parsing [a---string");
std::string delim(" */[-");
std::deque<std::string> tokens;

tokenize(buf, delim, std::inserter(tokens, tokens.begin()));

This only works with my compiler if I do

std::insert_iterator<std::deque<std::string> > ii(tokens,
tokens.begin());

But you *did* get it to work with the reference parameter
for the iterator, right?

Interesting about having to 'spell it out' like that.
Both ways worked for me, (I also tried 'std::back_inserter' and
std::back_insert_iterator, both of which also worked for me as well).

tokenize(buf, delim, ii);

Otherwise, I get the same errors as before. I guess this means my
compiler is broken?

Seems so to me. Have you checked for a newer version
of g++?

-Mike
 
F

Frank Schmitt

Mike Wahler said:
David Rubin said:
Mike Wahler wrote:

[snip]
template <typename InsertIter>
void tokenize(const std::string& buf,
const std::string& delim,
InsertIter& ii)
[snip]
int main()
{
std::string buf("We* are/parsing [a---string");
std::string delim(" */[-");
std::deque<std::string> tokens;

tokenize(buf, delim, std::inserter(tokens, tokens.begin()));

This only works with my compiler if I do

std::insert_iterator<std::deque<std::string> > ii(tokens,
tokens.begin());

But you *did* get it to work with the reference parameter
for the iterator, right?

Interesting about having to 'spell it out' like that.
Both ways worked for me, (I also tried 'std::back_inserter' and
std::back_insert_iterator, both of which also worked for me as well).

tokenize(buf, delim, ii);

Otherwise, I get the same errors as before. I guess this means my
compiler is broken?

Seems so to me. Have you checked for a newer version
of g++?

FYI: both g++ 3.3.1 and the Intel compiler V7.0 reject the original version.
I guess the problem is that you are forming a non-const reference to a
temporary, which isn't allowed by the standard.
Changing the signature of tokenize to

void tokenize(const std::string& buf,
const std::string& delim,
const InsertIter& ii)

and copying ii inside tokenize to a local helper variable for the loop works.

HTH & kind regards
frank
 
M

Mike Wahler

Frank Schmitt said:
FYI: both g++ 3.3.1 and the Intel compiler V7.0 reject the original version.
I guess the problem is that you are forming a non-const reference to a
temporary, which isn't allowed by the standard.

Thanks. For some reason, that issue always seems to trip
me up. Perhaps I need to write this down a hundred times
on the chalkboard. :)
Changing the signature of tokenize to

void tokenize(const std::string& buf,
const std::string& delim,
const InsertIter& ii)

and copying ii inside tokenize to a local helper variable for the loop
works.

Thanks,
-Mike
 
D

David Rubin

Frank Schmitt wrote:

[snip]
FYI: both g++ 3.3.1 and the Intel compiler V7.0 reject the original version.
I guess the problem is that you are forming a non-const reference to a
temporary, which isn't allowed by the standard.

This is what I suspected, but I was thrown off because the diagnostic
was so abstruse. I was expecting something more along the lines of
"cannot for a non-const reference to a temporary." Anyway, it's
interesting that MSVC++6.0 compiles the code without warning. Using MS
as a point of reference always begs the question of which is correct :)
Changing the signature of tokenize to

void tokenize(const std::string& buf,
const std::string& delim,
const InsertIter& ii)

and copying ii inside tokenize to a local helper variable for the loop works.

What is the point of copying ii inside tokenize if you can just remove
the reference argument altogether and use pass-by-value? Isn't that the
same?

Much thanks,

/david
 
F

Frank Schmitt

David Rubin said:
What is the point of copying ii inside tokenize if you can just remove
the reference argument altogether and use pass-by-value? Isn't that the
same?

Yes, of course - I guess I just have gotten so used to const references instead
of pass-by-value that it has become a reflex for me not to consider
pass-by-value :)

kind regards
frank
 
D

David Rubin

Jerry said:
The facet header I'm using contains some code I posted a while back --
it looks like this:
#include <locale>
#include <algorithm>
template<class T>
class table {
typedef typename std::ctype<T>::mask tmask;
tmask *t;
public:
table() : t(new std::ctype<T>::mask[std::ctype<T>::table_size]) {}
~table() { delete [] t; }
tmask *the_table() { return t; }
};
template<class T>
class ctype_table : table<T>, public std::ctype<T> {
protected:
typedef typename std::ctype<T>::mask tmask;
enum inits { empty, classic };
ctype_table(size_t refs = 0, inits init=classic)
: std::ctype<T>(the_table(), false, refs)
{
if (classic == init)
std::copy(classic_table(),
classic_table()+table_size,
the_table());
else
std::fill_n(the_table(), table_size, mask());
}
public:
tmask *table() {
return the_table();
}
};

I decided after a while that I liked your approach (using istringstr and
locale) better than my tokenize function. I was able to replace the
above code with

#include <algorithm>
#include <locale>

class ctype_table : public std::ctype<char> {
private:
mask tab[table_size];

protected:
enum Init {empty, classic};

ctype_table(Init type=classic) : std::ctype<char>(tab)
{
if (type == classic)
std::copy(classic_table(), classic_table()+table_size, tab);
else
std::fill_n(tab, table_size, space);
}

public:
mask *table() { return tab; }
};

You want to derive from std::ctype<char> rather than std::ctype<T> since
only the char specialization contains the functions and constants you
are using. Also, by deriving from std::ctype<T> [T=char], you can use
type mask and constant space freely (I think 'mask()' is a typo in your
code). Additionally, you don't really need the refs argument (at least
for my application). Lastly, I found on my platform that creating a
static table (tab) results in a smaller executable than allocating tab
off the heap (I assume this was the motivation for privately inheriting
from table).

/david
 
J

Jerry Coffin

[ ... ]
I decided after a while that I liked your approach (using istringstr and
locale) better than my tokenize function. I was able to replace the
above code with

[ code elided ]
You want to derive from std::ctype<char> rather than std::ctype<T> since
only the char specialization contains the functions and constants you
are using.

I'm on my laptop right now, so I don't have the standard handy to
check with, but I don't remember using anything that shouldn't work
with wchar_t, etc., as well.
Also, by deriving from std::ctype<T> [T=char], you can use
type mask and constant space freely (I think 'mask()' is a typo in your
code).

The use of mask() was intentional, and you'll almost certainly get all
sorts of strange errors if you try to substitute just "mask" where I
used mask(). Where I used mask(), it was to create a
default-initialized mask object with which to initialize the objects
in the array. Using mask instead, would result only in compiler
errors because you're specifying a type where it wants an object.
Additionally, you don't really need the refs argument (at least
for my application).

For this application, that's probably right. That part of the code
was written with an eye to generality, not specifically for this
application.
Lastly, I found on my platform that creating a
static table (tab) results in a smaller executable than allocating tab
off the heap (I assume this was the motivation for privately inheriting
from table).

The result can be smaller code, or a _lot_ smaller code -- like none
at all. The header is not required to initialize table_size, and with
an implementation that doesn't initialized it _in the header_, your
code won't compile.

The private inheritance was because table only exists to ensure that
the initialization gets done in the right order. There's no reason to
support casting back to table or anything like that.
 
D

David Rubin

Jerry said:
[ ... ]
I decided after a while that I liked your approach (using istringstr and
locale) better than my tokenize function. I was able to replace the
above code with

[ code elided ]
You want to derive from std::ctype<char> rather than std::ctype<T> since
only the char specialization contains the functions and constants you
are using.

I'm on my laptop right now, so I don't have the standard handy to
check with, but I don't remember using anything that shouldn't work
with wchar_t, etc., as well.

For example, table_size, classic_table(), and the constructor taking a
const mask* argument are only defined in std::ctype said:
Also, by deriving from std::ctype<T> [T=char], you can use
type mask and constant space freely (I think 'mask()' is a typo in your
code).

The use of mask() was intentional, and you'll almost certainly get all
sorts of strange errors if you try to substitute just "mask" where I
used mask(). Where I used mask(), it was to create a
default-initialized mask object with which to initialize the objects
in the array. Using mask instead, would result only in compiler
errors because you're specifying a type where it wants an object.

I was suggesting that you use 'space' rather than 'mask', which, of
course, will give you a compile error. My understanding is that mask()
will create a mask temporary initialized to zero (since it's an integer
type). My *guess* (although there is no guarantee) is that mask() is
equivalent to space in most implementations. Even if it's not, an
'empty' table would then be full of spaces rather than some
implementation-defined value.
For this application, that's probably right. That part of the code
was written with an eye to generality, not specifically for this
application.

Agreed, but then you don't include a 'delete-when-done' argument, and
The result can be smaller code, or a _lot_ smaller code -- like none
at all. The header is not required to initialize table_size, and with
an implementation that doesn't initialized it _in the header_, your
code won't compile.

This is a subtle point. I don't have the standard in front of me, but
isn't this covered by C++PL3ed, 12.2.2:

Class objects are constructed from the bottom up: first the base,
then the members, and then the derived class itself.

This suggests to me that table_size is initialized (at least) by the
base class constructor, and is therefore available when the tab member
is "constructed"...
The private inheritance was because table only exists to ensure that
the initialization gets done in the right order. There's no reason to
support casting back to table or anything like that.

....Otherwise, I agree.

/david
 
J

Jerry Coffin

[ ... ]
For example, table_size, classic_table(), and the constructor taking a
const mask* argument are only defined in std::ctype<char> AFAIK.

Doing some looking, you're right. I may need to re-think the code a
bit.

[ ... ]
I was suggesting that you use 'space' rather than 'mask', which, of
course, will give you a compile error. My understanding is that mask()
will create a mask temporary initialized to zero (since it's an integer
type). My *guess* (although there is no guarantee) is that mask() is
equivalent to space in most implementations.

Your guess is wrong, AFAIK. mask() creates a value that basically says
the character doesn't fit _any_ classification. I.e. it's not a space
or a digit or alphabetic, or control, or anything else. mask is
required to be a bitmask type, and if no bits are set, it doesn't
classify the character as anything at all.
Even if it's not, an
'empty' table would then be full of spaces rather than some
implementation-defined value.

....which would utterly _ruin_ its usefulness. The whole idea is to
produce a table that ONLY classifies a character as a space (for
example) if you say it should be a space. Setting it to fill the table
with a value that said everything was a space would produce utterly
useless results -- when you extract from an istream, it will skip across
anything its locale says it a space character, so doing this would
produce a ctype that always skipped across all input.

[ ... ]
This is a subtle point. I don't have the standard in front of me, but
isn't this covered by C++PL3ed, 12.2.2:

Class objects are constructed from the bottom up: first the base,
then the members, and then the derived class itself.

This suggests to me that table_size is initialized (at least) by the
base class constructor, and is therefore available when the tab member
is "constructed"...

Theoretically that might cover it. Practically speaking, a number of
compilers fail when/if you try to use table_size as the size of an
array. Since I don't care to ignore those compilers, my alternative is
to write code that works with them.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,767
Messages
2,569,572
Members
45,045
Latest member
DRCM

Latest Threads

Top