Problem with user-defined IO streambuf for encoding purposes

H

hsmit.home

Hi everyone,

I'm having some difficulty with the following piece of code. I have
stripped it to it's bare minimum to demonstrate the problem at hand.

Compiler: MS Visual C++ 2005 Express Edition (similar problem arises
with 2008)

Runtime Library: All multi-threaded variants have been seen to fail
[DLL/Static] | [Debug|Release]

Purpose: define a user defined stream buffer that processes each
incoming character and translates it to an encoded value. Place the
encoded character into a local buffer for output. The most simple case
would be an encoder that translates each character to upper case. A
more complicated case would be an encoder that encodes plain-text to
base64 encoding (this is not a one-to-one character encoding, it's a 3
character to 4character encoding, this is why an internal buffer is
needed)

Problem: The code throws an "Unhandled exception at 0x00529bcc in
hsl_d.exe: 0xC0000005: Access violation reading location 0x00000000."
after the "encoderbuf::underflow c = 51" character '3' has been read.
This basically tells me that deep down in the internals of the IO
library something is being dereferenced that is not allocated.

Question(s):
1) Does this occur with other compilers? (requires testing)
2) Is this a problem with the IO library? (unlikely I think)
3) Am I doing something stupid? (more than likely) And if so what?

References:
C++ Standard Library - A Tutorial and Reference (13.13.3 User-Defined
Stream Buffers)


Code:

#include <iostream>
#include <sstream>

class encoderbuf : public std::streambuf {

char mCharBuf[128];
int mBufLen;
int mBufPos;
public:
//--------------------------------------------------------------
/** default constructor */
encoderbuf()
: std::streambuf()
, mBufLen(0)
, mBufPos(0)
{
}

//--------------------------------------------------------------
/** outgoing data */
virtual int_type underflow () {
int_type c = EOF;
if (mBufPos < mBufLen) {
c = mCharBuf[mBufPos++];
}
std::cout << "encoderbuf::underflow c = " << c << std::endl;

return c;
}

//--------------------------------------------------------------
/** incoming data */
virtual int_type overflow (int_type c) {
std::cout << "encoderbuf::eek:verflow c = " << c << std::endl;
//TODO: do encoding here
mCharBuf[mBufLen++] = c;
return c;
}
};

//--------------------------------------------------------------
int main (int argc, char ** argv) {
encoderbuf buf;
std::iostream iostr(&buf);
iostr << 12345 << std::endl;
std::stringstream sstr;
iostr >> sstr.rdbuf(); // EXCEPTION AT PROCESSING CHARACTER '3'

std::string str = sstr.str();
std::cout << "main str = " << str << std::endl;
}


Output:
encoderbuf::eek:verflow c = 49
encoderbuf::eek:verflow c = 50
encoderbuf::eek:verflow c = 51
encoderbuf::eek:verflow c = 52
encoderbuf::eek:verflow c = 53
encoderbuf::eek:verflow c = 10
encoderbuf::underflow c = 49
encoderbuf::underflow c = 50
encoderbuf::underflow c = 51
popup: Unhandled exception at 0x00529bcc in hsl_d.exe: 0xC0000005:
Access violation reading location 0x00000000.
 
J

Jim Langston

Hi everyone,

I'm having some difficulty with the following piece of code. I have
stripped it to it's bare minimum to demonstrate the problem at hand.

Compiler: MS Visual C++ 2005 Express Edition (similar problem arises
with 2008)

Runtime Library: All multi-threaded variants have been seen to fail
[DLL/Static] | [Debug|Release]

Purpose: define a user defined stream buffer that processes each
incoming character and translates it to an encoded value. Place the
encoded character into a local buffer for output. The most simple case
would be an encoder that translates each character to upper case. A
more complicated case would be an encoder that encodes plain-text to
base64 encoding (this is not a one-to-one character encoding, it's a 3
character to 4character encoding, this is why an internal buffer is
needed)

Problem: The code throws an "Unhandled exception at 0x00529bcc in
hsl_d.exe: 0xC0000005: Access violation reading location 0x00000000."
after the "encoderbuf::underflow c = 51" character '3' has been read.
This basically tells me that deep down in the internals of the IO
library something is being dereferenced that is not allocated.

Question(s):
1) Does this occur with other compilers? (requires testing)
2) Is this a problem with the IO library? (unlikely I think)
3) Am I doing something stupid? (more than likely) And if so what?

References:
C++ Standard Library - A Tutorial and Reference (13.13.3 User-Defined
Stream Buffers)


Code:

#include <iostream>
#include <sstream>

class encoderbuf : public std::streambuf {

char mCharBuf[128];
int mBufLen;
int mBufPos;
public:
//--------------------------------------------------------------
/** default constructor */
encoderbuf()
: std::streambuf()
, mBufLen(0)
, mBufPos(0)
{
}

//--------------------------------------------------------------
/** outgoing data */
virtual int_type underflow () {
int_type c = EOF;
if (mBufPos < mBufLen) {
c = mCharBuf[mBufPos++];
}
std::cout << "encoderbuf::underflow c = " << c << std::endl;

return c;
}

//--------------------------------------------------------------
/** incoming data */
virtual int_type overflow (int_type c) {
std::cout << "encoderbuf::eek:verflow c = " << c << std::endl;
//TODO: do encoding here
mCharBuf[mBufLen++] = c;

Most likely you meant
mCharBuf[mBufPos++] = c;
here. mBufPos, not mBufLen.
 
A

Alf P. Steinbach

* Jim Langston:
Hi everyone,

I'm having some difficulty with the following piece of code. I have
stripped it to it's bare minimum to demonstrate the problem at hand.

Compiler: MS Visual C++ 2005 Express Edition (similar problem arises
with 2008)

Runtime Library: All multi-threaded variants have been seen to fail
[DLL/Static] | [Debug|Release]

Purpose: define a user defined stream buffer that processes each
incoming character and translates it to an encoded value. Place the
encoded character into a local buffer for output. The most simple case
would be an encoder that translates each character to upper case. A
more complicated case would be an encoder that encodes plain-text to
base64 encoding (this is not a one-to-one character encoding, it's a 3
character to 4character encoding, this is why an internal buffer is
needed)

Problem: The code throws an "Unhandled exception at 0x00529bcc in
hsl_d.exe: 0xC0000005: Access violation reading location 0x00000000."
after the "encoderbuf::underflow c = 51" character '3' has been read.
This basically tells me that deep down in the internals of the IO
library something is being dereferenced that is not allocated.

Question(s):
1) Does this occur with other compilers? (requires testing)
2) Is this a problem with the IO library? (unlikely I think)
3) Am I doing something stupid? (more than likely) And if so what?

References:
C++ Standard Library - A Tutorial and Reference (13.13.3 User-Defined
Stream Buffers)


Code:

#include <iostream>
#include <sstream>

class encoderbuf : public std::streambuf {

char mCharBuf[128];
int mBufLen;
int mBufPos;
public:
//--------------------------------------------------------------
/** default constructor */
encoderbuf()
: std::streambuf()
, mBufLen(0)
, mBufPos(0)
{
}

//--------------------------------------------------------------
/** outgoing data */
virtual int_type underflow () {
int_type c = EOF;
if (mBufPos < mBufLen) {
c = mCharBuf[mBufPos++];
}
std::cout << "encoderbuf::underflow c = " << c << std::endl;

return c;
}

//--------------------------------------------------------------
/** incoming data */
virtual int_type overflow (int_type c) {
std::cout << "encoderbuf::eek:verflow c = " << c << std::endl;
//TODO: do encoding here
mCharBuf[mBufLen++] = c;

Most likely you meant
mCharBuf[mBufPos++] = c;
here. mBufPos, not mBufLen.

No, I think he meant mBufLen. :)


The call to rdbuf() looks suspicious. Since I try to avoid the unclean
iostreams as much as possible I don't know. But at least it sort of
sounds wrong.


Cheers,

- Alf
 
H

hsmit.home

No
mCharBuf[mBufPos++] = c;

is correct.

For clarity, here is the documentation for the members fields:
int mBufLen; //the number of characters ready to be read from the
internal buffer
int mBufPos; //the position of the current character to be read.

To really simplify things use this code instead (same problem occurs):

#include <iostream>
#include <sstream>

class encoderbuf : public std::streambuf {

std::stringbuf mBuf;
public:
//--------------------------------------------------------------
/** default constructor */
encoderbuf()
: std::streambuf()
{
}

//--------------------------------------------------------------
/** outgoing data */
virtual int_type underflow () {
int_type c = mBuf.sbumpc();
std::cout << "encoderbuf::underflow c = " << c << std::endl;

return c;
}

//--------------------------------------------------------------
/** incoming data */
virtual int_type overflow (int_type c) {
std::cout << "encoderbuf::eek:verflow c = " << c << std::endl;
//TODO: do encoding here
mBuf.sputc(c);
return c;
}
};


//--------------------------------------------------------------
int main (int argc, char ** argv) {
encoderbuf buf;
std::iostream iostr(&buf);
iostr << 12345 << std::endl;
std::stringstream sstr;
iostr >> sstr.rdbuf(); // EXCEPTION AT PROCESSING CHARACTER '3'

std::string str = sstr.str();
std::cout << "main str = " << str << std::endl;
}
 
H

hsmit.home

* Jim Langston:




Hi everyone,
I'm having some difficulty with the following piece of code. I have
stripped it to it's bare minimum to demonstrate the problem at hand.
Compiler: MS Visual C++ 2005 Express Edition (similar problem arises
with 2008)
Runtime Library: All multi-threaded variants have been seen to fail
[DLL/Static] | [Debug|Release]
Purpose: define a user defined stream buffer that processes each
incoming character and translates it to an encoded value. Place the
encoded character into a local buffer for output. The most simple case
would be an encoder that translates each character to upper case. A
more complicated case would be an encoder that encodes plain-text to
base64 encoding (this is not a one-to-one character encoding, it's a 3
character to 4character encoding, this is why an internal buffer is
needed)
Problem: The code throws an "Unhandled exception at 0x00529bcc in
hsl_d.exe: 0xC0000005: Access violation reading location 0x00000000."
after the "encoderbuf::underflow c = 51" character '3' has been read.
This basically tells me that deep down in the internals of the IO
library something is being dereferenced that is not allocated.
Question(s):
1) Does this occur with other compilers? (requires testing)
2) Is this a problem with the IO library? (unlikely I think)
3) Am I doing something stupid? (more than likely) And if so what?
References:
C++ Standard Library - A Tutorial and Reference (13.13.3 User-Defined
Stream Buffers)
Code:
#include <iostream>
#include <sstream>
class encoderbuf : public std::streambuf {
char mCharBuf[128];
int mBufLen;
int mBufPos;
public:
//--------------------------------------------------------------
/** default constructor */
encoderbuf()
: std::streambuf()
, mBufLen(0)
, mBufPos(0)
{
}
//--------------------------------------------------------------
/** outgoing data */
virtual int_type underflow () {
int_type c = EOF;
if (mBufPos < mBufLen) {
c = mCharBuf[mBufPos++];
}
std::cout << "encoderbuf::underflow c = " << c << std::endl;
return c;
}
//--------------------------------------------------------------
/** incoming data */
virtual int_type overflow (int_type c) {
std::cout << "encoderbuf::eek:verflow c = " << c << std::endl;
//TODO: do encoding here
mCharBuf[mBufLen++] = c;
Most likely you meant
mCharBuf[mBufPos++] = c;
here. mBufPos, not mBufLen.

No, I think he meant mBufLen. :)

The call to rdbuf() looks suspicious. Since I try to avoid the unclean
iostreams as much as possible I don't know. But at least it sort of
sounds wrong.

Cheers,

- Alf

--
A: Because it messes up the order in which people normally read text.
Q: Why is it such a bad thing?
A: Top-posting.
Q: What is the most annoying thing on usenet and in e-mail?- Hide quoted text -

- Show quoted text -

Try this instead if you don't like rdbuf. Same problem.

void test_encoderbuf () {
encoderbuf buf;
std::iostream iostr(&buf);
iostr << 12345 << std::endl;
char str[20];
iostr >> str;
std::cout << "test_encoderbuf str = " << str << std::endl;
}
 
J

Jim Langston

No
mCharBuf[mBufPos++] = c;

is correct.

For clarity, here is the documentation for the members fields:
int mBufLen; //the number of characters ready to be read from the
internal buffer
int mBufPos; //the position of the current character to be read.

To really simplify things use this code instead (same problem occurs):

#include <iostream>
#include <sstream>

class encoderbuf : public std::streambuf {

std::stringbuf mBuf;
public:
//--------------------------------------------------------------
/** default constructor */
encoderbuf()
: std::streambuf()
{
}

//--------------------------------------------------------------
/** outgoing data */
virtual int_type underflow () {
int_type c = mBuf.sbumpc();
std::cout << "encoderbuf::underflow c = " << c << std::endl;

return c;
}

//--------------------------------------------------------------
/** incoming data */
virtual int_type overflow (int_type c) {
std::cout << "encoderbuf::eek:verflow c = " << c << std::endl;
//TODO: do encoding here
mBuf.sputc(c);
return c;
}
};


//--------------------------------------------------------------
int main (int argc, char ** argv) {
encoderbuf buf;
std::iostream iostr(&buf);
iostr << 12345 << std::endl;
std::stringstream sstr;
iostr >> sstr.rdbuf(); // EXCEPTION AT PROCESSING CHARACTER '3'

std::string str = sstr.str();
std::cout << "main str = " << str << std::endl;
}

It doesn't crash for me using Microsoft Visual C++ .net 2003, but it doesn't
produce the expected output either:

encoderbuf::eek:verflow c = 49
encoderbuf::eek:verflow c = 50
encoderbuf::eek:verflow c = 51
encoderbuf::eek:verflow c = 52
encoderbuf::eek:verflow c = 53
encoderbuf::eek:verflow c = 10
encoderbuf::underflow c = 49
encoderbuf::underflow c = 50
encoderbuf::underflow c = 51
main str = 2
 
H

hsmit.home

No
mCharBuf[mBufPos++] = c;
is correct.
For clarity, here is the documentation for the members fields:
int mBufLen; //the number of characters ready to be read from the
internal buffer
int mBufPos; //the position of the current character to be read.
To really simplify things use this code instead (same problem occurs):
#include <iostream>
#include <sstream>
class encoderbuf : public std::streambuf {
std::stringbuf mBuf;
public:
//--------------------------------------------------------------
/** default constructor */
encoderbuf()
: std::streambuf()
{
}
//--------------------------------------------------------------
/** outgoing data */
virtual int_type underflow () {
int_type c = mBuf.sbumpc();
std::cout << "encoderbuf::underflow c = " << c << std::endl;
return c;
}
//--------------------------------------------------------------
/** incoming data */
virtual int_type overflow (int_type c) {
std::cout << "encoderbuf::eek:verflow c = " << c << std::endl;
//TODO: do encoding here
mBuf.sputc(c);
return c;
}
};
//--------------------------------------------------------------
int main (int argc, char ** argv) {
encoderbuf buf;
std::iostream iostr(&buf);
iostr << 12345 << std::endl;
std::stringstream sstr;
iostr >> sstr.rdbuf(); // EXCEPTION AT PROCESSING CHARACTER '3'
std::string str = sstr.str();
std::cout << "main str = " << str << std::endl;
}

It doesn't crash for me using Microsoft Visual C++ .net 2003, but it doesn't
produce the expected output either:

encoderbuf::eek:verflow c = 49
encoderbuf::eek:verflow c = 50
encoderbuf::eek:verflow c = 51
encoderbuf::eek:verflow c = 52
encoderbuf::eek:verflow c = 53
encoderbuf::eek:verflow c = 10
encoderbuf::underflow c = 49
encoderbuf::underflow c = 50
encoderbuf::underflow c = 51
main str = 2

--
Jim Langston
(e-mail address removed)- Hide quoted text -

- Show quoted text -

There is a good possiblity that MS has modified the STL library since
the .NET 2003 release.

The exception that is being thrown occurs in <iosfwd>
static int_type __CLRCALL_OR_CDECL to_int_type(const _Elem& _Ch)
{ // convert character to metacharacter
return ((unsigned char)_Ch); //THIS IS WHERE THE PROBLEM OCCURS
}


And this static method is being called from <streambuf>

virtual int_type __CLR_OR_THIS_CALL uflow()
{ // get a character from stream, point past it
return (_Traits::eq_int_type(_Traits::eof(), underflow())
? _Traits::eof() : _Traits::to_int_type(*_Gninc()));
}

So it looks like (*_Gninc()) is dereferencing a NULL pointer.


If you wish to debug into MS's STL you need to being compiling against
the "Multi-threaded Debug (/MTd)" runtime library.

The documentation on the internet with respect to user-defined I/O
buffers is sparse.

Has anyone tried running this code under Linux? I don't have a Linux
machine handy at the moment, but I would be curious to know if this is
strictly a MS issue.

Any further insight into this problem would be very much appreciated.

Thanks.

Hans Smit
 
A

anon

Has anyone tried running this code under Linux? I don't have a Linux
machine handy at the moment, but I would be curious to know if this is
strictly a MS issue.

It crashes.
 
J

James Kanze

I'm having some difficulty with the following piece of code. I have
stripped it to it's bare minimum to demonstrate the problem at hand.

Compiler: MS Visual C++ 2005 Express Edition (similar problem arises
with 2008)

Runtime Library: All multi-threaded variants have been seen to fail
[DLL/Static] | [Debug|Release]

Purpose: define a user defined stream buffer that processes each
incoming character and translates it to an encoded value. Place the
encoded character into a local buffer for output. The most simple case
would be an encoder that translates each character to upper case. A
more complicated case would be an encoder that encodes plain-text to
base64 encoding (this is not a one-to-one character encoding, it's a 3
character to 4character encoding, this is why an internal buffer is
needed)

Problem: The code throws an "Unhandled exception at 0x00529bcc in
hsl_d.exe: 0xC0000005: Access violation reading location 0x00000000."
after the "encoderbuf::underflow c = 51" character '3' has been read.
This basically tells me that deep down in the internals of the IO
library something is being dereferenced that is not allocated.

Question(s):
1) Does this occur with other compilers? (requires testing)
2) Is this a problem with the IO library? (unlikely I think)
3) Am I doing something stupid? (more than likely) And if so what?

References:
C++ Standard Library - A Tutorial and Reference (13.13.3 User-Defined
Stream Buffers)

Code:

#include <iostream>
#include <sstream>

class encoderbuf : public std::streambuf {

char mCharBuf[128];
int mBufLen;
int mBufPos;
public:
//--------------------------------------------------------------
/** default constructor */
encoderbuf()
: std::streambuf()
, mBufLen(0)
, mBufPos(0)
{
}

//--------------------------------------------------------------
/** outgoing data */
virtual int_type underflow () {
int_type c = EOF;
if (mBufPos < mBufLen) {
c = mCharBuf[mBufPos++];
}
std::cout << "encoderbuf::underflow c = " << c << std::endl;
return c;
}

Attention: this function does not meet the requirements for
underflow. Underflow must return the character, leaving it in
the stream. This normally involves at least a one character
buffer. In your case, of course, you've got a longer buffer,
so you could just use that:

int_type c = EOF ;
if ( mBufPos != mBufLen ) {
setg( mCharBuf, mCharBuf + mBufPos, mCharBuf + mBufLen ) ;
c = mCharBuf[ mBufPos ] ;
mBufPos = mBufLen ;
}
return c ;

. Although the problem being addressed is somewhat different,
you'll find a short discussion concerning the requirements of
underflow and overflow in my articles on filtering streambuf's
(http://kanze.james.neuf.fr/articles-en.html).
//--------------------------------------------------------------
/** incoming data */
virtual int_type overflow (int_type c) {
std::cout << "encoderbuf::eek:verflow c = " << c << std::endl;
//TODO: do encoding here
mCharBuf[mBufLen++] = c;
return c;
}
};
//--------------------------------------------------------------
int main (int argc, char ** argv) {
encoderbuf buf;
std::iostream iostr(&buf);
iostr << 12345 << std::endl;
std::stringstream sstr;
iostr >> sstr.rdbuf(); // EXCEPTION AT PROCESSING CHARACTER '3'
std::string str = sstr.str();
std::cout << "main str = " << str << std::endl;
}
Output:
encoderbuf::eek:verflow c = 49
encoderbuf::eek:verflow c = 50
encoderbuf::eek:verflow c = 51
encoderbuf::eek:verflow c = 52
encoderbuf::eek:verflow c = 53
encoderbuf::eek:verflow c = 10
encoderbuf::underflow c = 49
encoderbuf::underflow c = 50
encoderbuf::underflow c = 51
popup: Unhandled exception at 0x00529bcc in hsl_d.exe: 0xC0000005:
Access violation reading location 0x00000000.

I get the correct results with my version of underflow. Both
with g++ (under Linux) and VC++ (under Windows).

What's probably happening is that the implementation, having
read the character with underflow, increments the get pointer
(gptr()). Which is null, since you've never set it. The next
time around, it compares this pointer with egptr(), finds that
they aren't equal, and accesses through it directly. (This is,
at least, what happens with g++ under Solaris. At least, if the
input routing is using sgetn, this is what happens.)

The normal way of implementing input in a streambuf is to
override underflow, using a buffer of at least one character.
(The reason for this is to allow simple and direct support for
putback.)
 
H

hsmit.home

Thank you James for all the useful info. Looks like I've got some
serious reading tonight.

Before I read your response I got my code to do what I wanted. It was
along the lines of what you described. It's still a work in progress.
Here is the result (comments/refinements/improvements of this code are
welcome, in fact they would very much be appreciatted).

Code:
#include <iostream>
#include <sstream>

//--------------------------------------------------------------
/**
  General purpose buffered filter / encoder.
  @todo: still requires testing and refinement.
 */
class encoderbuf : public std::streambuf {

  std::stringbuf mBuf;
  char mIBuf[3]; //TODO: make this dynamic

public:
  //--------------------------------------------------------------
  /** default constructor */
  encoderbuf() {
    mBuf.pubsetbuf(mIBuf, sizeof(mIBuf));
    setg(mIBuf, mIBuf, mIBuf+sizeof(mIBuf));
  }

private:
  //--------------------------------------------------------------
  /** default encoder routine - does nothing */
  virtual void                  encode (int_type c) {
    *gptr() = c;
    gbump(1);
  }

  //--------------------------------------------------------------
  /** flushes the buffered data to the final destination */
  int                            flush () {
    int_type c = mBuf.sbumpc();
    if (c == EOF) {
      return EOF;
    }
    while (c != EOF) {
      encode(c);
      if (gptr() >= egptr()) {
        break;
      }
      c = mBuf.sbumpc();
    }
    setg(mIBuf, mIBuf, gptr());

    return 0;
  }

  //--------------------------------------------------------------
  /** flush the buffered data to output */
  virtual int                   sync () {
    return flush();
  }


  //--------------------------------------------------------------
  /** buffer the incoming data */
  virtual int_type              overflow (int_type c) {
    return mBuf.sputc(c);
  }

  //--------------------------------------------------------------
  /** reset the internal input buffers and flush the buffered
      data to output
   */
  virtual int_type              underflow () {

    memset(mIBuf, 'x', sizeof(mIBuf));
    setg(mIBuf, mIBuf, mIBuf+sizeof(mIBuf));

    if (flush() == EOF) {
      return EOF;
    }

    return ((unsigned char)(*gptr()));
  }
};

//--------------------------------------------------------------
/**
  Example encoder
 */
class uppercasebuf : public encoderbuf {

public:
  //--------------------------------------------------------------
  /** upper case encoder */
  virtual void                  encode (int_type c) {
    *gptr() = toupper(c);
    gbump(1);
  }
};

//--------------------------------------------------------------
int main (int argc, char ** argv) {
  uppercasebuf buf;
  std::iostream iostr(&buf);
  iostr << "abcdef";
  iostr << "123HIG";
  iostr << "aabb" << std::endl;
  std::string str;
  iostr >> str;
  std::cout << "test_encoderbuf str = " << str << std::endl;
}

Cheers,

Hans Smit
 
J

James Kanze

* Jim Langston:

[...]
The call to rdbuf() looks suspicious. Since I try to avoid
the unclean iostreams as much as possible I don't know. But
at least it sort of sounds wrong.

It's the usual way of testing new, user-defined streams. It's
true that it's somewhat surprising, since it's a >> operator
that doesn't parse anything, but since it's designed mainly for
debugging.

Writing your own streambuf isn't very difficult, but you do have
to respect the contract for the virtual functions you implement.
In the case of underflow(), the standard says very clearly that
unless it returns EOF, it must return "the first character of
the pending sequence, without moving the input sequence position
past it." It's possible to do so without using a buffer, but
that requires also overriding uflow() (whose default
implementation increments the pointer into the buffer), and
pbackfail() (since the implementation has to support at least
one character pushback). This has been discussed in published
articles---I know because I wrote them.

And of course, if you think that iostream is unclean, you're
free to propose something better. The original iostream, by
Jerry Schwarz, is in impressively elegant solution to the
contraints he had at that time. IMHO, the standards committee
didn't improve it, and of course, if we were designing it from
scratch today, we'd do some things differently, but I've not
seen any concrete alternatives that are better.
 
H

hsmit.home

* Jim Langston:
[...]
std::stringstream sstr;
iostr >> sstr.rdbuf(); // EXCEPTION AT PROCESSING CHARACTER '3'
The call to rdbuf() looks suspicious. Since I try to avoid
the unclean iostreams as much as possible I don't know. But
at least it sort of sounds wrong.

It's the usual way of testing new, user-defined streams. It's
true that it's somewhat surprising, since it's a >> operator
that doesn't parse anything, but since it's designed mainly for
debugging.

Writing your own streambuf isn't very difficult, but you do have
to respect the contract for the virtual functions you implement.
In the case of underflow(), the standard says very clearly that
unless it returns EOF, it must return "the first character of
the pending sequence, without moving the input sequence position
past it." It's possible to do so without using a buffer, but
that requires also overriding uflow() (whose default
implementation increments the pointer into the buffer), and
pbackfail() (since the implementation has to support at least
one character pushback). This has been discussed in published
articles---I know because I wrote them.
Thanks for this insight. Up until now I have not found any info w.r.t.
overriding uflow. I did notice it was virtual and I actually tried
overriding
this yesterday with no luck in solving the problem described. I'm sure
a little
added thought will guide me in the right direction.
And of course, if you think that iostream is unclean, you're
free to propose something better. The original iostream, by
Jerry Schwarz, is in impressively elegant solution to the
contraints he had at that time. IMHO, the standards committee
didn't improve it, and of course, if we were designing it from
scratch today, we'd do some things differently, but I've not
seen any concrete alternatives that are better.
I had developed my own iostream class a few years ago for embedded
systems. I
came to the conclusion that this is an amazingly difficult task to do
"correctly". The iostream library is (in my opinion) quite a
remarkable piece of
code. The only thing unclean about it, is Microsoft's version of it.
It works,
BUT, it is amazingly difficult to follow/read due to very strange
coding
guidelines and NO embedded documentation. I think the only thing I
would
recommend changing is some of the method names. Method names such as
"showmanyc"
are a little unclear when first working with this library, i.e. I kept
seeing
"show many c" for some strange reason. Of course, I don't expect
changes will
be made any time soon.
 
P

Pete Becker

Method names such as
"showmanyc"
are a little unclear when first working with this library, i.e. I kept
seeing
"show many c" for some strange reason.

There's a footnote in the standard that addresses this: "The morphemes
of showmanyc are es-how-many-see", not "show-manic".
 
H

hsmit.home

There's a footnote in the standard that addresses this: "The morphemes
of showmanyc are es-how-many-see", not "show-manic".

--
Pete
Roundhouse Consulting, Ltd. (www.versatilecoding.com) Author of "The
Standard C++ Library Extensions: a Tutorial and Reference
(www.petebecker.com/tr1book)

That footnote is in quite a few references. The fact that a footnote
had to be issued for this validates my point that some of the naming
conventions in this library are non-intuitive. Nevertheless, the power
and heritage of this library in-validates any comments I have about
it ;-)
 
J

James Kanze

On Dec 12, 4:44 pm, James Kanze <[email protected]> wrote:

[...]
Thanks for this insight. Up until now I have not found any
info w.r.t. overriding uflow. I did notice it was virtual and
I actually tried overriding this yesterday with no luck in
solving the problem described. I'm sure a little added thought
will guide me in the right direction.

To be frank, I'm not too sure about what would be necessary
either. IIRC, it wasn't present in the classical iostreams
(which is what I first learned), or at least, it wasn't part of
the documented interface derived classes were supposed to
implement. If I understand correctly, the idea is that you can
implement an input streambuf with no look-ahead---if the public
functions want to just look at a character, they call underflow;
if they want to extract it as well, they call uflow. But how
underflow, uflow and the push back functions work together isn't
that well documented, and if you don't use at least a one
character buffer, you have to respect them. Thus, for example,
I can't find anywhere where it is explicitly stated that if you
don't establish a "pending sequence" (to use the standard
terminology) in underflow, you must also override uflow---you
have to read the default behavior of uflow in the base class
carefully, to realize that it can only work if the pointers for
the pending sequence are set so that there is at least one
character in it. A very indirect way of imposing the
requirement, if you ask me.

(The only person I know who could probably give a definitive
answer about overriding uflow, and the rest, is Dietmar Kuehl,
and professional reasons prevent him from posting at present.)

Most of the time, the solution is just to use a buffer (a single
character buffer is sufficient), and just override underflow.
I had developed my own iostream class a few years ago for
embedded systems. I came to the conclusion that this is an
amazingly difficult task to do "correctly".

Implementing iostream itself is a lot of work. It's a case
where a lot of additional complexity has been pushed down into
the library, so that the user doesn't have to worry about it.
Deriving your own streambuf, however, is almost trivial, if you
only have to support one direction, and don't need to support
seek. Issues like buffering, pointers for both directions, and
seek can make it more complex. And of course, the complexity
very much depends on the "device" you're supporting---filebuf is
not trivial, since it supports many different access modes, code
translation, etc., on top of just file access.
The iostream library is (in my opinion) quite a remarkable
piece of code. The only thing unclean about it, is
Microsoft's version of it. It works, BUT, it is amazingly
difficult to follow/read due to very strange coding guidelines
and NO embedded documentation.

There are a number of reasons for that, and typically, trying to
read code in the standard library is not as easy as one might
like. (I can assure you that Plauger, the author of the
Microsoft library, can write exceptionally readable code. He
cannot, however, change the constraints imposed by the standard
and his customers, just because they interfere with
readability.)
 
J

James Kanze

Before I read your response I got my code to do what I wanted.
It was along the lines of what you described. It's still a
work in progress. Here is the result
(comments/refinements/improvements of this code are welcome,
in fact they would very much be appreciatted).

Just some quick comments:
Code:
[/QUOTE]
[QUOTE]
#include <iostream>
#include <sstream>[/QUOTE]
[QUOTE]
//--------------------------------------------------------------
/**
General purpose buffered filter / encoder.
@todo: still requires testing and refinement.
*/
class encoderbuf : public std::streambuf {[/QUOTE]
[QUOTE]
std::stringbuf mBuf;
char mIBuf[3]; //TODO: make this dynamic[/QUOTE]

[QUOTE="and skip the stringbuf"]
public:
//--------------------------------------------------------------
/** default constructor */
encoderbuf() {
mBuf.pubsetbuf(mIBuf, sizeof(mIBuf));[/QUOTE]

Note that the behavior of stringbuf::setbuf is implementation
defined, and that an implementation is not required to respect
it.
[QUOTE]
setg(mIBuf, mIBuf, mIBuf+sizeof(mIBuf));[/QUOTE]

And that this statement basically means that you have two
different class instances (the base class and you mBuf member)
sharing the same buffer, in an unspecified manner.
[QUOTE]
}[/QUOTE]
[QUOTE]
private:
//--------------------------------------------------------------
/** default encoder routine - does nothing */
virtual void                  encode (int_type c) {
*gptr() = c;
gbump(1);[/QUOTE]

Uses the buffer defined in the base class.  I'd use the
std::vector, and then something like:

    mBuffer.push_back( c ) ;

Since you don't seem to be supporting buffering on input,
there's no reason to touch the put pointers.  And you definitly
shouldn't be messing with the get pointers on output; at the
very most, you might do something like:
    setg( &mBuffer[ 0 ],
          &mBuffer[ 0 ] + gptr() - eback(),
          &mBuffer[ 0 ] + mBuffer.size() ) ;
(but you could just as well do this in underflow()).
[QUOTE]
}[/QUOTE]
[QUOTE]
//--------------------------------------------------------------
/** flushes the buffered data to the final destination */
int                            flush () {
int_type c = mBuf.sbumpc();
if (c == EOF) {
return EOF;
}
while (c != EOF) {
encode(c);[/QUOTE]

Note that encode manipulates mIBuf, which you passed to mBuf.
If your code works, it is probably because mBuf is ignoring the
setbuf.
[QUOTE]
if (gptr() >= egptr()) {
break;
}
c = mBuf.sbumpc();
}
setg(mIBuf, mIBuf, gptr());[/QUOTE]
[QUOTE]
return 0;
}[/QUOTE]
[QUOTE]
//--------------------------------------------------------------
/** flush the buffered data to output */
virtual int                   sync () {
return flush();
}[/QUOTE]
[QUOTE]
//--------------------------------------------------------------
/** buffer the incoming data */
virtual int_type              overflow (int_type c) {
return mBuf.sputc(c);
}[/QUOTE]
[QUOTE]
//--------------------------------------------------------------
/** reset the internal input buffers and flush the buffered
data to output
*/
virtual int_type              underflow () {[/QUOTE]
[QUOTE]
memset(mIBuf, 'x', sizeof(mIBuf));
setg(mIBuf, mIBuf, mIBuf+sizeof(mIBuf));[/QUOTE]
[QUOTE]
if (flush() == EOF) {
return EOF;
}
return ((unsigned char)(*gptr()));
}
};[/QUOTE]

In fact, if I've understood at all, you're really using two
different buffers, mBuf and mIBuf, and transferring between them
in flush.  As long as you are calling encode for each individual
character, and not on the complete buffer, I wouldn't bother.
Use an std::vector< char > as the buffer: overflow simply calls
encode, which does a push_back on whatever it generates.  (I'd
provide a protected function for this, rather than make the
std::vector directly accessible.)  Underflow simply updates the
get pointers and returns *gptr() if gptr() != egptr(), e.g.:

    int_type
    encoderbuf::underflow()
    {
        setg( &mBuffer[ 0 ],
              &mBuffer[ 0 ] + gptr() - eback(),
              &mBuffer[ 0 ] + mBuffer.size() ) ;
        return gptr() == egptr()
            ?   traits_type::eof()
            :   traits_type::to_int_type( *gptr() ) ;
    }

In this particular case, I'd also ask myself if a streambuf is
really the correct solution.  On the whole, for memory to memory
translations, I'd rather go with a custom insertion_iterator,
invoked as the destination of std::copy, e.g.:

    std::copy( source.being(), source.end(),
               EncodingInserter( dest ) ) ;

(If the encoding is one to one, of course, all you need is a
function and std::transform.)

An insertion iterator is an output iterator (in other words, not
an iterator at all), which means that many of the constraints
normally present on iterators are absent.  Just be aware that it
may be (and in fact will be) copied, so if it needs modifiable
internal state, you have to be careful.
 
H

hsmit.home

James, you have given me so much to chew on, I am deeply indebted.

Before I read your last post, I had finally gotten a reasonably
generic encoderbuf class implemented, there are still a few kinks in
it and I will continue working on this today. I will see if I can
integrate a few of your ideas, since these "kinks" I refered to make
me feel like I am doing something terribly wrong (even though it
works). I need to investigate further before I can properly articulate
this additional problem.

Your insertion iterator idea is something I will be looking into
sometime later (more as an excercies in understanding than anything
else). I am an old newbie to the STL library. Worked with it quite a
bit 7-9 years ago, put it aside for some time, and started working
with it again 6 months ago. I am finally discovering the beautiful
intracies of this remarkable library.

I will post my "semi-final" solution when it's ready.
 
H

hsmit.home

My apologies for posting this large piece of code, but it is complete.
Who knows, it may actually be useful to someone else.

There are 3 sections:
1) the encoderbuf class
2) the 3 test classes: uppercasebuf, fillbuf, dividerbuf
3) the main entry point with the various test cases.

I have taken James' last post to heart and completely rewrote the
encoderbuf class. It is simpler and more to the point.

Comments and suggestions would be very much appreciatted.

A couple questions for James:
1) Your suggested implementation of using mBuffer.push_back(c) is a
good one, but when does it ever get cleared (flushed)?
For example:
encoderbuf buf;
std::iostream iostr(&buf);
std::string str;
iostr << 12345 << std::endl;
iostr >> str ;
iostr << 12345 << std::endl;
iostr >> str ;
iostr << 12345 << std::endl;
iostr >> str ;
iostr << 12345 << std::endl;
iostr >> str ;

The mBuffer.size() is now 20 and growing if I continue in this trend.
The encoderbuf class code below addresses this, but my solution
doesn't feel right. Refer to can_empty() method. Any comments?

2) Back to your previous post: Could you explain the gptr() - eback()
part of this statement:
setg( &mBuffer[ 0 ],
&mBuffer[ 0 ] + gptr() - eback(),
&mBuffer[ 0 ] + mBuffer.size() ) ;

The second argument refers to where the current internal buffer is
currently ready to be written to (IGNext). This I understand. But in
the underflow scenario gptr() should always equal eback(). Or am I
missing something here? My inclination is to write:
setg(&mBuffer[0], &mBuffer[0], &mBuffer[0]+mBuffer.size());

Thanks for all the help,

Hans Smit

Code:
#include <iostream>
#include <sstream>
#include <vector>

//--------------------------------------------------------------
/**
  General purpose buffered filter / encoder.
 */
class encoderbuf : public std::streambuf {

  std::vector<char> mEncBuf;

public:
  //--------------------------------------------------------------
  /** default constructor */
  encoderbuf()
  {
  }

protected:
  //--------------------------------------------------------------
  /**
    The sub class should call this method after a character is
    encoded, or call it multiple times in case of a 1 to N character
    encoding scheme, i.e. base64 encoding, url encoding, etc.

    @param c the encoded character that is to be placed in the
      input buffer.

    @return the number of characters placed in the input buffer
      Currently only 1 is returned, but I want to leave room for
      future development and overflow errors.
  */
  virtual int                    encode (int_type c) {

    if (can_empty()) {
      setg(&mEncBuf[0], &mEncBuf[0], &mEncBuf[0]);
      mEncBuf.clear();
    }

    mEncBuf.push_back(c);

    return 1;
  }

private:
  //--------------------------------------------------------------
  /**
    check to see if the internal buffer has been processed and
    is ready to be emptied. This ensures that memory is reused
    between << and >> shift operations.
    */
  bool                          can_empty () {
    size_t cp = gptr() - eback();
    return (cp != 0);
  }

  //--------------------------------------------------------------
  /**
    Reset the internal input buffers and flush the buffered
    data to output
   */
  virtual int_type              underflow () {

    size_t cp = gptr() - eback(); //I don't get this.
    size_t sz = mEncBuf.size();

    setg(&mEncBuf[0], &mEncBuf[0] + cp, &mEncBuf[0] + sz);

    return gptr() == egptr()
        ?   traits_type::eof()
        :   traits_type::to_int_type( *gptr() ) ;
  }


  //--------------------------------------------------------------
  /**
    Encode the incoming character and place it in internal
    buffer
  */
  virtual int_type              overflow (int_type c) {

    encode(c);

    return c;
  }
};


//--------------------------------------------------------------
/**
  Example encoder: 1 to N character encoding
 */
class fillbuf : public encoderbuf {

  std::stringbuf mOverflowBuf;
  size_t mCount;
public:
  fillbuf (size_t count)
  : mCount(count)
  {
  }

  //--------------------------------------------------------------
  /** fill encoder */
  virtual int                   encode (int_type c) {

    int n = 0;
    for (size_t i = 0 ; i < mCount ; i++) {
      n += encoderbuf::encode(c);
    }

    return n;
  }
};


//--------------------------------------------------------------
/**
  Example encoder: an N to 1 character encoding
 */
class dividerbuf : public encoderbuf {

  std::stringbuf mBuf;
  size_t mCount;
  size_t mLen;
public:
  dividerbuf (size_t count)
  : mCount(count)
  , mLen(0)
  {
  }
  //--------------------------------------------------------------
  /** encoder */
  virtual int                   encode (int_type c) {

    if ((mLen+mCount) % mCount == 0) {
      encoderbuf::encode(c);
    }
    mLen++;

    return 1;
  }
};

//--------------------------------------------------------------
/**
  Example encoder: a 1 to 1 character encoding
 */
class uppercasebuf : public encoderbuf {

public:
  //--------------------------------------------------------------
  /** upper case encoder */
  virtual int                  encode (int_type c) {
    return encoderbuf::encode(toupper(c));
  }
};

//--------------------------------------------------------------
int main (int argc, char ** argv) {

  if (true) {
    encoderbuf buf;
    std::iostream iostr(&buf);
    std::stringstream sstr;
    std::string str;

    iostr << 12345 << std::endl;
    iostr >> str ;
    if (str == "12345") {
      std::cout << "ok. encoderbuf str = " << str << std::endl;
    } else {
      std::cout << "err. encoderbuf str = " << str << std::endl;
    }
    iostr << 6789 << std::endl;
    iostr >> str ;
    if (str == "6789") {
      std::cout << "ok. encoderbuf str = " << str << std::endl;
    } else {
      std::cout << "err. encoderbuf str = " << str << std::endl;
    }

  }
  if (true) {
    encoderbuf buf;
    std::iostream iostr(&buf);
    iostr << 12345 << "," << 1.234;// << std::endl;
    std::stringstream ostr;
    std::string str2;
    //iostr >> str2;
    //iostr.flush();
    //iostr.clear();
    iostr >> ostr.rdbuf();
    std::string str = ostr.str();
    if (str == "12345,1.234") {
      std::cout << "ok. encoderbuf str = " << str << std::endl;
    } else {
      std::cout << "err. encoderbuf str = " << str << std::endl;
    }
  }
  if (true) {
    for (int i = 2 ; i < 32 ; i += 1) {
      encoderbuf buf;
      std::iostream iostr(&buf);
      iostr << "abcdef";
      iostr << 12345;
      iostr << "123HIG";
      iostr << "aabb" << std::endl;
      std::string str;
      iostr >> str;
      if (str == "abcdef12345123HIGaabb") {
        std::cout << "ok. buf len = " << i << ". encoderbuf str = " <<
str << std::endl;
      } else {
        std::cout << "err. buf len = " << i << ". encoderbuf str = "
<< str << std::endl;
      }
    }
  }

  if (true) {
    uppercasebuf buf;
    std::iostream iostr(&buf);
    iostr << "abcdef";
    iostr << "123HIG";
    iostr << "aabb" << std::endl;
    std::string str;
    iostr >> str;
    if (str == "ABCDEF123HIGAABB") {
      std::cout << "ok. uppercasebuf str = " << str << std::endl;
    } else {
      std::cout << "err. uppercasebuf str = " << str << std::endl;
    }
  }
  if (true) {
    fillbuf buf(3);
    std::iostream iostr(&buf);
    iostr << "abcde" << std::endl; ;
    std::string str;
    iostr >> str;
    if (str == "aaabbbcccdddeee") {
      std::cout << "ok. doublerbuf str = " << str << std::endl;
    } else {
      std::cout << "err. doublerbuf str = " << str << std::endl;
    }
  }
  if (true) {
    dividerbuf buf(2);
    std::iostream iostr(&buf);
    iostr << "xy0123456789" << std::endl;
    std::string str;
    iostr >> str;
    if (str == "x02468") {
      std::cout << "ok. dividerbuf str = " << str << std::endl;
    } else {
      std::cout << "err. dividerbuf str = " << str << std::endl;
    }
  }
  return 0;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top