iostream and memory-mapped file

Discussion in 'C++' started by wakun@wakun.com, Feb 21, 2006.

  1. Guest

    Hi there,
    I am seeking a fastest way to load a BIG string and parse it as a
    given format. I have a extern function which return a (char *)string in
    BIG size. Now, I am going to parse it with a iterator as following

    char *str = return_a_big_size_str();
    istringstream ss(string(str), istringstream::in);
    istreambuf_iterator<char> bit(ss), eit;
    parsing(bit, eit);

    I found the code shown above is so inefficient because of the big size
    of str.

    BTW, I also save the whole string to a file, says str.txt, and then
    load the file with ifstream

    std::ifstream input("str.txt") ;
    std::istreambuf_iterator bit(input), eit;
    parsing(bit, eit);

    I can't believe that the later program is faster than the previous one.
    Anyway, I think memory-mapped IO maybe a better choice. However, I
    have no idea how memory-mapped file associated with ifstream
    , Feb 21, 2006
    #1
    1. Advertising

  2. Cory Nelson Guest

    it's slow because you are making a lot of copies.

    is your parser templatized to use any kind of char iterator? then it
    would be as easy as parsing(str, str+len). no copying required.
    Cory Nelson, Feb 21, 2006
    #2
    1. Advertising

  3. TB Guest

    skrev:
    > Hi there,
    > I am seeking a fastest way to load a BIG string and parse it as a
    > given format. I have a extern function which return a (char *)string in
    > BIG size. Now, I am going to parse it with a iterator as following
    >


    IO is slow, accept it.

    > char *str = return_a_big_size_str();
    > istringstream ss(string(str), istringstream::in);
    > istreambuf_iterator<char> bit(ss), eit;
    > parsing(bit, eit);
    >
    > I found the code shown above is so inefficient because of the big size
    > of str.
    >


    You could always write your own iterator:

    #include <iterator>
    #include <stdexcept>

    class cstringiterator
    : public std::iterator<std::input_iterator_tag,char> {

    private:
    char const * d_cstring;

    public:
    cstringiterator(char const * cstring = 0)
    : d_cstring(cstring) { }
    cstringiterator(cstringiterator const & csi)
    : d_cstring(csi.d_cstring) { }

    value_type operator*() throw (std::runtime_error) {
    if(!d_cstring) throw std::runtime_error("Access Denied");
    return *d_cstring;
    }
    cstringiterator & operator++() throw () {
    if(d_cstring) {
    if(!*++d_cstring) {
    d_cstring = 0;
    }
    }
    return *this;
    }
    cstringiterator operator++(int) throw () {
    cstringiterator c(d_cstring);
    ++*this;
    return c;
    }
    bool operator==(cstringiterator const & csi) const throw () {
    return d_cstring == csi.d_cstring;
    }
    bool operator!=(cstringiterator const & csi) const throw () {
    return d_cstring != csi.d_cstring;
    }
    };

    #include <ostream>
    #include <algorithm>

    int main(int argc, char* argv[])
    {
    char const * c = "apa";
    std::copy(cstringiterator(c),cstringiterator(),
    std::eek:stream_iterator<char>(std::cout));
    return 0;
    }

    > BTW, I also save the whole string to a file, says str.txt, and then
    > load the file with ifstream
    >
    > std::ifstream input("str.txt") ;
    > std::istreambuf_iterator bit(input), eit;
    > parsing(bit, eit);


    Use an iterator that utilizes internal buffers, and only reads ahead
    when called for; overwriting old buffers and allocates new when needed,
    unless you actually must have complete access to the entire string at
    any time.

    >
    > I can't believe that the later program is faster than the previous one.
    > Anyway, I think memory-mapped IO maybe a better choice. However, I
    > have no idea how memory-mapped file associated with ifstream
    >


    Memory mapping a file is rather platform specific with its own set of
    native api calls. Derive a class from std::basic_filebuf that neatly
    handles it all.

    --
    TB @ SWEDEN
    TB, Feb 21, 2006
    #3
  4. wrote:
    > char *str = return_a_big_size_str();
    > istringstream ss(string(str), istringstream::in);


    The above line create at least two copies of the string which are
    all around at the same time. This is likely to cause swapping on your
    system (at least if the strings are really rather large). This is an
    tremendous performance hit.

    > istreambuf_iterator<char> bit(ss), eit;
    > parsing(bit, eit);


    Hold it! You are parsing your string using stream *buffer* iterators,
    i.e. you are not taking advantage of the formatting facilities of
    streams at all? Why don't you simply pass pointers as the iterators
    to the 'parsing()' function (which, of course, should be function
    template). Assuming, however, that 'parsing()' is not a function
    template, you still have the option to create a suitable stream buffer
    which is used just for the situation described:

    struct membuf:
    std::streambuf
    {
    membuf(char* str) { this->setg(str, str, str + strlen(str)); }
    };
    membuf buffer(str);
    std::istreambuf_iterator<char> bit(&buffer), eit;
    // ...
    --
    <mailto:> <http://www.dietmar-kuehl.de/>
    <http://www.eai-systems.com> - Efficient Artificial Intelligence
    Dietmar Kuehl, Feb 23, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    412
  2. John Tiger
    Replies:
    10
    Views:
    5,552
  3. ai@work
    Replies:
    9
    Views:
    526
    Ron Natalie
    Dec 16, 2004
  4. S. Nurbe

    iostream + iostream.h

    S. Nurbe, Jan 14, 2005, in forum: C++
    Replies:
    7
    Views:
    752
    red floyd
    Jan 15, 2005
  5. red floyd
    Replies:
    3
    Views:
    519
    Dietmar Kuehl
    Mar 8, 2005
Loading...

Share This Page