Short string class

Discussion in 'C++' started by Brian, Apr 15, 2010.

  1. Brian

    Brian Guest

    Shalom

    Is there a short string class around? I've searched a little on the
    net and didn't find anything. I'm interested in a class that limits
    (possibly at compile time) the length to 255, so marshalling the
    string's length only requires one byte. Thanks in advance.


    Brian Wood
    http://webEbenezer.net
    (651) 251-9384
     
    Brian, Apr 15, 2010
    #1
    1. Advertising

  2. Brian wrote:
    > Is there a short string class around? I've searched a little on the
    > net and didn't find anything. I'm interested in a class that limits
    > (possibly at compile time) the length to 255, so marshalling the
    > string's length only requires one byte. Thanks in advance.


    Feel free to write one. :)

    All you have to do is to replace size_t by unsigned char. But I am in
    doubt what the real benefit should be.


    Marcel
     
    Marcel Müller, Apr 15, 2010
    #2
    1. Advertising

  3. Brian

    Brian Guest

    On Apr 15, 2:25 pm, Marcel Müller <>
    wrote:
    > Brian wrote:
    > > Is there a short string class around?  I've searched a little on the
    > > net and didn't find anything.  I'm interested in a class that limits
    > > (possibly at compile time) the length to 255, so marshalling the
    > > string's length only requires one byte.  Thanks in advance.

    >
    > Feel free to write one. :)
    >
    > All you have to do is to replace size_t by unsigned char. But I am in
    > doubt what the real benefit should be.
    >


    The benefit is saving 3 or more bytes in bandwidth in the marshalling
    process. Since strings are common it adds up. 255 chars is more
    than enough for 95% of my uses of strings.


    Brian Wood
     
    Brian, Apr 15, 2010
    #3
  4. Brian

    Brian Guest

    On Apr 15, 2:52 pm, Paavo Helde <> wrote:

    >
    > This seems like a special marshalling code would be needed, no need for a
    > special string class.
    >
    > Paavo



    Possibly. However, I can only get 128 values in a byte
    with my variable-length integer stuff, so there would be some
    advantage to a unique type. If the limit were given at compile
    time, the marshalling could use the minimum number of bytes and
    not have to use the more complicated variable length encoding.
    Maybe supporting both would be ideal.


    Brian Wood
     
    Brian, Apr 15, 2010
    #4
  5. Paavo Helde wrote:
    > This seems like a special marshalling code would be needed, no need for a
    > special string class.


    Yes, indeed.

    If only storage counts and the protocol is your choice, you may use a
    dynamic 7 bit encoding:

    0x00 .. 0x7f -> 0 .. 127 Bytes length
    0x80 0x00 .. 0xff 0x7f -> 128 .. 16511
    0x80 0x80 0x00 .. 0xff 0xff 0x7f -> 16512 .. 2113663
    0x80 0x80 0x80 0x00 ... -> 2113664 .. 270549119
    0x80 0x80 0x80 0x80 0x00 ... -> 270549120 .. 3.46E+10

    Since the string content has dynamic length anyway it should be not too
    complicated to deal with a dynamic length field too. If access
    performance counts, you could store the length at negative offsets from
    the *this. (A C++ class with a custom allocator could wrap that.)

    But if you only want to conserve bytes, storing headerless ZLIB
    compressed strings might be significantly more effective.


    Marcel
     
    Marcel Müller, Apr 15, 2010
    #5
  6. Brian

    Brian Guest

    On Apr 15, 3:38 pm, Marcel Müller <>
    wrote:
    > Paavo Helde wrote:
    > > This seems like a special marshalling code would be needed, no need for a
    > > special string class.

    >
    > Yes, indeed.
    >


    I don't think it is so clear. I'm currently using this
    function:

    template <typename T>
    void
    stringGroupCount(Counter& cntr, T const& grp)
    {
    cntr.MultiplyAndAdd(grp.size(), sizeof(uint32_t) );

    typename T::const_iterator It = grp.begin();
    typename T::const_iterator End = grp.end();
    for (; It != End; ++It) {
    cntr.Add((*It).length());
    }
    }

    to count how many bytes are in a collection of strings.
    The line that multiplies the multiplies the number of
    elements in the collection times the sizeof value would
    have to be replaced with a loop. So having a distinct
    short string class has some advantages.


    >
    > But if you only want to conserve bytes, storing headerless ZLIB
    > compressed strings might be significantly more effective.
    >


    I'm not sure what you mean by that.


    Brian Wood
     
    Brian, Apr 15, 2010
    #6
  7. Brian

    Brian Guest

    On Apr 15, 4:59 pm, Brian <> wrote:
    > On Apr 15, 3:38 pm, Marcel Müller <>
    > wrote:
    >
    > > Paavo Helde wrote:
    > > > This seems like a special marshalling code would be needed, no need for a
    > > > special string class.

    >
    > > Yes, indeed.

    >
    > I don't think it is so clear.  I'm currently using this
    > function:
    >
    > template <typename T>
    > void
    > stringGroupCount(Counter& cntr, T const& grp)
    > {
    >   cntr.MultiplyAndAdd(grp.size(), sizeof(uint32_t) );
    >
    >   typename T::const_iterator It = grp.begin();
    >   typename T::const_iterator End = grp.end();
    >   for (; It != End; ++It) {
    >     cntr.Add((*It).length());
    >   }
    >
    > }
    >
    > to count how many bytes are in a collection of strings.
    > The line that multiplies the multiplies the number of
    > elements in the collection times the sizeof value would
    > have to be replaced with a loop.  


    I guess that's not right. The number of bytes needed to
    marshall the string length could be calculated in the
    existing loop.


    Brian Wood
     
    Brian, Apr 15, 2010
    #7
  8. Brian

    Ian Collins Guest

    On 04/16/10 07:31 AM, Brian wrote:
    > On Apr 15, 2:25 pm, Marcel Müller<>
    > wrote:
    >> Brian wrote:
    >>> Is there a short string class around? I've searched a little on the
    >>> net and didn't find anything. I'm interested in a class that limits
    >>> (possibly at compile time) the length to 255, so marshalling the
    >>> string's length only requires one byte. Thanks in advance.

    >>
    >> Feel free to write one. :)
    >>
    >> All you have to do is to replace size_t by unsigned char. But I am in
    >> doubt what the real benefit should be.
    >>

    >
    > The benefit is saving 3 or more bytes in bandwidth in the marshalling
    > process. Since strings are common it adds up. 255 chars is more
    > than enough for 95% of my uses of strings.


    Um, would you put as much effort into removing 3 characters from a
    string? This does look like a lot of effort for little gain.

    --
    Ian Collins
     
    Ian Collins, Apr 15, 2010
    #8
  9. Brian

    tonydee Guest

    On Apr 16, 3:34 am, Brian <> wrote:
    > Is there a short string class around?  I've searched a little on the
    > net and didn't find anything.  I'm interested in a class that limits
    > (possibly at compile time) the length to 255, so marshalling the
    > string's length only requires one byte.  Thanks in advance.


    I'm not aware of one. But, given your interest is in marshalling and
    not memory use, why not extend std::string trivially...? (I know
    that's a touchy subject around here, but hey ;-P). By reserving the
    single value 255 as a "longer-string" sentinel, you can make it safer
    too. A illustration of this I haven't even tried to compile below...

    Cheers,
    Tony

    struct Short_String : std::string
    {
    std::eek:stream& marshall_to(std::eek:stream& os)
    {
    if (size() < 255)
    os << (uint8_t)size();
    else
    {
    os << (uint8_t)255;
    uint32_t nsize = htonl(size());
    os.write(&nsize, 4);
    }
    return os << *this;
    }

    // marshall_from similarly...
    };
     
    tonydee, Apr 16, 2010
    #9
  10. Brian

    Brian Guest

    On Apr 15, 8:41 pm, tonydee <> wrote:
    >
    > I'm not aware of one.  But, given your interest is in marshalling and
    > not memory use, why not extend std::stringtrivially...?  (I know
    > that's a touchy subject around here, but hey ;-P).  By reserving the
    > single value 255 as a "longer-string" sentinel, you can make it safer
    > too.  A illustration of this I haven't even tried to compile below...
    >
    > Cheers,
    > Tony
    >
    > struct Short_String : std::string
    > {
    >     std::eek:stream& marshall_to(std::eek:stream& os)
    >     {
    >         if (size() < 255)
    >             os << (uint8_t)size();
    >         else
    >         {
    >             os << (uint8_t)255;
    >             uint32_t nsize = htonl(size());
    >             os.write(&nsize, 4);
    >         }
    >         return os << *this;
    >     }
    >
    >     // marshall_from similarly...
    >
    > };
    >
    >


    I hacked up a "lil_string" now based on a string implementation
    by Christian Stigen Larsen -- http://sublevel3.org.

    http://webEbenezer.net/posts/lil_string.hh
    http://webEbenezer.net/posts/lil_string.cc
    http://webEbenezer.net/posts/lil_test.cc

    If an operation would cause the size to exceed 255 bytes,
    it throws an exception.

    Brian Wood
     
    Brian, Apr 19, 2010
    #10
  11. Brian

    Jorgen Grahn Guest

    On Thu, 2010-04-15, Brian wrote:
    > On Apr 15, 3:38 pm, Marcel Müller <>
    > wrote:


    >> But if you only want to conserve bytes, storing headerless ZLIB
    >> compressed strings might be significantly more effective.
    >>

    >
    > I'm not sure what you mean by that.


    He means you can painfully invent and implement schemes for saving a
    bit here and there, but you are probably going to be beaten by the guy
    who ignores that and just slaps on a standard compression algorithm on
    top of the file format or networking protocol. ZLIB is the most
    popular of these.

    /Jorgen

    --
    // Jorgen Grahn <grahn@ Oo o. . .
    \X/ snipabacken.se> O o .
     
    Jorgen Grahn, Apr 25, 2010
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Austin
    Replies:
    7
    Views:
    9,043
    Tim Tyler
    Dec 2, 2003
  2. David Geering

    longs, long longs, short short long ints . . . huh?!

    David Geering, Jan 8, 2007, in forum: C Programming
    Replies:
    15
    Views:
    565
    Keith Thompson
    Jan 11, 2007
  3. Replies:
    4
    Views:
    831
    Kaz Kylheku
    Oct 17, 2006
  4. Ioannis Vranos

    unsigned short, short literals

    Ioannis Vranos, Mar 4, 2008, in forum: C Programming
    Replies:
    5
    Views:
    683
    Eric Sosman
    Mar 5, 2008
  5. Andre
    Replies:
    5
    Views:
    542
    Keith Thompson
    Jul 17, 2012
Loading...

Share This Page