making an istream from a char array

Discussion in 'C++' started by John Salmon, Dec 30, 2006.

  1. John Salmon

    John Salmon Guest

    I'm working with two libraries, one written
    in old school C, that returns a very large
    chunk of data in the form of a C-style,
    NUL-terminated string.

    The other written in a more modern C++
    is a parser for the chunk of bytes returned by
    the first. It expects a reference to a
    std::istream as its argument.

    The chunk of data is very large.
    I'd like to feed the output of the first to
    the second WITHOUT MAKING AN EXTRA IN-MEMORY COPY.

    My attempts to create an istringstream from the
    chunk of data all seem to at least double the
    amount of VM used. Here's a short program demonstrating
    what I've tried. Is there any way to get "inside"
    the istringstream and tell it to use the 'chunk'
    directly, rather than insisting on making a copy?

    Thanks,
    John Salmon

    [jsalmon@river c++]$ cat chararraytostream.cpp
    #include <string>
    #include <sstream>
    #include <cstdlib>
    #include <cstring>
    #include <cstdio>
    using namespace std;

    char *getLotsOfBytes();
    istream& streamParser(istream &s);
    void linuxChkMem(const char *msg);

    void withImplicitString(){
    linuxChkMem("Before getLotsOfBytes: ");
    char *chunk = getLotsOfBytes();
    linuxChkMem("After getLotsOfBytes():");
    {
    istringstream iss(chunk);
    linuxChkMem("After iss(p): ");
    streamParser(iss);
    linuxChkMem("After streamParser(iss): ");
    }
    linuxChkMem("After iss goes out of scope: ");
    free(chunk);
    linuxChkMem("After free(p): ");
    }

    void withExplicitString(){
    linuxChkMem("Before getLotsOfBytes: ");
    char *chunk = getLotsOfBytes();
    linuxChkMem("After getLotsOfBytes():");
    {
    string s(chunk);
    linuxChkMem("After s(chunk): ");
    free(chunk);
    linuxChkMem("After free(p): ");
    istringstream iss(s);
    linuxChkMem("After iss(s): ");
    streamParser(iss);
    linuxChkMem("After streamParser(iss): ");
    }
    linuxChkMem("After iss goes out of scope: ");
    }

    int main(int argc, char **argv){
    printf("with an implicit string constructor\n");
    withImplicitString();
    printf("\nwith an explicit string constructor\n");
    withExplicitString();
    return 0;
    }

    // On linux, tell us how much data space we're using
    // in the VM.
    void linuxChkMem(const char *msg){
    printf("%s", msg);
    fflush(stdout);
    char cmd[50];
    sprintf(cmd, "grep VmData /proc/%d/status", getpid());
    system(cmd);
    }

    static const int SZ = 100*1024*1024;
    // A rough approximation to getLotsOfBytes. In the
    // real application, getLotsOfBytes has these characteristics:
    // - it returns a malloced pointer to a NUL-terminated array of chars.
    // - it is out of my control. E.g., I can't rewrite it in a way
    // that might be more friendly to C++ streams.
    char *getLotsOfBytes(){
    char *p = (char *)malloc(SZ);
    memset(p, ' ', SZ);
    strcpy(p+SZ-50, "3.1415 2.718 1.414");
    return p;
    }

    // A rough approximation to streamParser. In the real
    // application, streamParser takes a ref to an istream
    // and does what it does. Again, I can't easily redefine
    // the interface.
    istream& streamParser(istream& s){
    double x, y, z;
    s >> x >> y >> z;
    printf("x: %f y: %f z: %f\n", x, y, z);
    return s;
    }

    [jsalmon@river c++]$ g++ -O3 chararraytostream.cpp
    [jsalmon@river c++]$ a.out
    with an implicit string constructor
    Before getLotsOfBytes: VmData: 40 kB
    After getLotsOfBytes():VmData: 102444 kB
    After iss(p): VmData: 204848 kB
    x: 3.141500 y: 2.718000 z: 1.414000
    After streamParser(iss): VmData: 204980 kB
    After iss goes out of scope: VmData: 102576 kB
    After free(p): VmData: 172 kB

    with an explicit string constructor
    Before getLotsOfBytes: VmData: 172 kB
    After getLotsOfBytes():VmData: 102576 kB
    After s(chunk): VmData: 204980 kB
    After free(p): VmData: 102576 kB
    After iss(s): VmData: 204980 kB
    x: 3.141500 y: 2.718000 z: 1.414000
    After streamParser(iss): VmData: 204980 kB
    After iss goes out of scope: VmData: 172 kB
    [jsalmon@river c++]$
    John Salmon, Dec 30, 2006
    #1
    1. Advertising

  2. Hello John!
    John Salmon wrote:
    > My attempts to create an istringstream from the
    > chunk of data all seem to at least double the
    > amount of VM used.


    std::istringstream takes a std::string. For creating this
    std::string from a char array, a copy is created. This copy
    is then copied into the std::istringstream. For this purpose,
    you probably don't want to use an std::istringstream. Instead,
    you could use a simple homegrown stream buffer (code see
    below).

    Good luck, Denise!
    --- CUT HERE ---
    #include <istream>
    #include <iostream>
    #include <streambuf>
    #include <string>
    #include <string.h>

    struct membuf:
    std::streambuf
    {
    membuf(char* b, char* e) { this->setg(b, b, e); }
    };

    int main()
    {
    char* buffer = get_huge_buffer_with_data();
    membuf sbuf(buffer, std::find(buffer, buffer + strlen(buffer), 0));
    std::istream in(&sbuf);
    for (std::string line; std::getline(in, line); )
    std::cout << "line: " << line << "\n";
    }
    Denise Kleingeist, Dec 30, 2006
    #2
    1. Advertising

  3. John Salmon wrote:
    > I'm working with two libraries, one written
    > in old school C, that returns a very large
    > chunk of data in the form of a C-style,
    > NUL-terminated string.
    >
    > The other written in a more modern C++
    > is a parser for the chunk of bytes returned by
    > the first. It expects a reference to a
    > std::istream as its argument.
    >
    > The chunk of data is very large.
    > I'd like to feed the output of the first to
    > the second WITHOUT MAKING AN EXTRA IN-MEMORY COPY.


    The "without making a copy" might be a little tricky with istringstream.

    I'm no expert on c++ streams but something like this might work.

    #include <istream>

    class Xistream
    : public std::istream,
    public std::streambuf
    {
    public:
    Xistream( const char * begin, const char * end )
    : std::istream( this )
    {
    setg( const_cast<char *>(begin), const_cast<char *>(begin),
    const_cast<char *>(end) );
    }
    };

    #include <iostream>

    int main()
    {
    const char xx[] = "1 22 33";

    Xistream xi( xx, xx + sizeof(xx) -1);

    int i;
    xi >> i;

    std::cout << i << "\n";

    xi >> i;

    std::cout << i << "\n";

    }
    Gianni Mariani, Dec 30, 2006
    #3
  4. John Salmon

    John Salmon Guest

    >>>>> "Denise" == Denise Kleingeist <> writes:

    Denise> Hello John!
    Denise> John Salmon wrote:
    >> My attempts to create an istringstream from the
    >> chunk of data all seem to at least double the
    >> amount of VM used.


    Denise> std::istringstream takes a std::string. For creating this
    Denise> std::string from a char array, a copy is created. This copy
    Denise> is then copied into the std::istringstream. For this purpose,
    Denise> you probably don't want to use an std::istringstream. Instead,
    Denise> you could use a simple homegrown stream buffer (code see
    Denise> below).

    Denise> Good luck, Denise!
    Denise> --- CUT HERE ---
    Denise> #include <istream>
    Denise> #include <iostream>
    Denise> #include <streambuf>
    Denise> #include <string>
    Denise> #include <string.h>

    Denise> struct membuf:
    Denise> std::streambuf
    Denise> {
    Denise> membuf(char* b, char* e) { this->setg(b, b, e); }
    Denise> };

    Denise> int main()
    Denise> {
    Denise> char* buffer = get_huge_buffer_with_data();
    Denise> membuf sbuf(buffer, std::find(buffer, buffer + strlen(buffer), 0));
    Denise> std::istream in(&sbuf);
    Denise> for (std::string line; std::getline(in, line); )
    Denise> std::cout << "line: " << line << "\n";
    Denise> }

    Thanks! This is exactly what I needed.

    One question - what's the point of the std::find()?

    I don't see how std::find(buffer, buffer+strlen(buffer), 0);
    could ever be different from buffer+strlen(buffer)??

    Cheers,
    John Salmon
    John Salmon, Dec 30, 2006
    #4
  5. Hello John!
    John Salmon wrote:
    > >>>>> "Denise" == Denise Kleingeist <> writes:

    > Denise> membuf sbuf(buffer, std::find(buffer, buffer + strlen(buffer), 0));
    > One question - what's the point of the std::find()?
    >
    > I don't see how std::find(buffer, buffer+strlen(buffer), 0);
    > could ever be different from buffer+strlen(buffer)??


    You are right: it is a left over from a discarded attempt to use
    std::find() instead of strlen()! Just use buffer + strlen(buffer)
    instead.

    Sorry for any confusion caused, Denise!
    Denise Kleingeist, Dec 30, 2006
    #5
  6. John Salmon

    P.J. Plauger Guest

    "John Salmon" <> wrote in message
    news:...

    > I'm working with two libraries, one written
    > in old school C, that returns a very large
    > chunk of data in the form of a C-style,
    > NUL-terminated string.
    >
    > The other written in a more modern C++
    > is a parser for the chunk of bytes returned by
    > the first. It expects a reference to a
    > std::istream as its argument.
    >
    > The chunk of data is very large.
    > I'd like to feed the output of the first to
    > the second WITHOUT MAKING AN EXTRA IN-MEMORY COPY.
    >
    > My attempts to create an istringstream from the
    > chunk of data all seem to at least double the
    > amount of VM used. Here's a short program demonstrating
    > what I've tried. Is there any way to get "inside"
    > the istringstream and tell it to use the 'chunk'
    > directly, rather than insisting on making a copy?


    See the header <strstream>. It does exactly what you want,
    and it's part of the C++ Standard (albeit a bit old
    fashioned).

    P.J. Plauger
    Dinkumware, Ltd.
    http://www.dinkumware.com
    P.J. Plauger, Dec 30, 2006
    #6
  7. John Salmon

    John Salmon Guest

    >>>>> "PJ" == P J Plauger <> writes:

    PJ> "John Salmon" <> wrote in message
    PJ> news:...

    >> I'm working with two libraries, one written
    >> in old school C, that returns a very large
    >> chunk of data in the form of a C-style,
    >> NUL-terminated string.
    >>
    >> The other written in a more modern C++
    >> is a parser for the chunk of bytes returned by
    >> the first. It expects a reference to a
    >> std::istream as its argument.
    >>
    >> The chunk of data is very large.
    >> I'd like to feed the output of the first to
    >> the second WITHOUT MAKING AN EXTRA IN-MEMORY COPY.
    >>
    >> My attempts to create an istringstream from the
    >> chunk of data all seem to at least double the
    >> amount of VM used. Here's a short program demonstrating
    >> what I've tried. Is there any way to get "inside"
    >> the istringstream and tell it to use the 'chunk'
    >> directly, rather than insisting on making a copy?


    PJ> See the header <strstream>. It does exactly what you want,
    PJ> and it's part of the C++ Standard (albeit a bit old
    PJ> fashioned).

    Thanks to Usenet, I now have two workable solutions.

    Googling for strstream turns up lots of warnings that "strstream is
    deprecated", with dire warnings that it may be removed from future
    versions of the standard. OTOH, an istrstream does exactly what I
    want, without any extra custom machinery ( struct membuf : public
    streambuf ).

    Other than simplicity and possible compatibility with future
    standards, is there any reason to prefer one approach over the
    other?

    Cheers,
    John Salmon
    John Salmon, Dec 30, 2006
    #7
  8. John Salmon

    P.J. Plauger Guest

    "John Salmon" <> wrote in message
    news:...

    >>>>>> "PJ" == P J Plauger <> writes:

    >
    > PJ> "John Salmon" <> wrote in message
    > PJ> news:...
    >
    >>> I'm working with two libraries, one written
    >>> in old school C, that returns a very large
    >>> chunk of data in the form of a C-style,
    >>> NUL-terminated string.
    >>>
    >>> The other written in a more modern C++
    >>> is a parser for the chunk of bytes returned by
    >>> the first. It expects a reference to a
    >>> std::istream as its argument.
    >>>
    >>> The chunk of data is very large.
    >>> I'd like to feed the output of the first to
    >>> the second WITHOUT MAKING AN EXTRA IN-MEMORY COPY.
    >>>
    >>> My attempts to create an istringstream from the
    >>> chunk of data all seem to at least double the
    >>> amount of VM used. Here's a short program demonstrating
    >>> what I've tried. Is there any way to get "inside"
    >>> the istringstream and tell it to use the 'chunk'
    >>> directly, rather than insisting on making a copy?

    >
    > PJ> See the header <strstream>. It does exactly what you want,
    > PJ> and it's part of the C++ Standard (albeit a bit old
    > PJ> fashioned).
    >
    > Thanks to Usenet, I now have two workable solutions.
    >
    > Googling for strstream turns up lots of warnings that "strstream is
    > deprecated", with dire warnings that it may be removed from future
    > versions of the standard. OTOH, an istrstream does exactly what I
    > want, without any extra custom machinery ( struct membuf : public
    > streambuf ).
    >
    > Other than simplicity and possible compatibility with future
    > standards, is there any reason to prefer one approach over the
    > other?


    You should prefer strstream because:

    1) it's exactly what you need

    2) it's still part of the C++ Standard

    3) there's no reason to believe it'll become nonstandard anytime
    soon, despite the dire warnings

    4) even if it does officially go away, there's not a sane vendor
    who'll stop supporting it for the next decade

    So what the hell.

    P.J. Plauger
    Dinkumware, Ltd.
    http://www.dinkumware.com
    P.J. Plauger, Dec 30, 2006
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.

Share This Page