'binary string' and vector<byte> specialization

J

Jeffrey Walton

Hi All,

I have a need for a 'binary string'. I'm going to side step the size
of a byte and just assume it is 8 bits (in reality, all the platforms
I support use octets - no PDP-10s on the hardware list). So a byte
would be either unsigned char or uint8_t.

I noticed that neither msvc [1] nor stdc++/libg++ [2] appear to
provide a specialization for a vector<byte> (poking and proding with
WinDbg and gdb). Does anyone have pointers to some existing code?

I'm interested in some basic_string operations (such as concatenation
and substr); but I'm not interested in others (such as find).
Conceptually, I don't think basic_string<byte> is appropriate. Plus,
there's a tax to be paid due to facets, locales, and traits.

I also can't see using a stock vector<byte> when taking into
consideration (1) there are no endian issues, (2) there are no
alignment issues (3) vectors have been unambiguously contiguous since
TR1 (4) c functions, such as memcpy and memset, can be orders of
magnitude faster than the discrete c++ counter parts (such as
std::copy(...) and friends).

On the wish list: I would also like a O(1) delete from the beginning
of the binary string. I suspect a lazy delete coupled with a 'leading
offset' will do the trick.

Finally, I'm not interested in bringing another dependency, so
libraries such as boost need not apply (http://www.boost.org/doc/
libs/).

Jeff

[1] C Run-Time Libraries ,
http://msdn.microsoft.com/en-us/library/abx4dbyh(VS.80).aspx
[2] Standard C++ Library, http://gcc.gnu.org/libstdc++/
 
K

Kai-Uwe Bux

Jeffrey said:
Hi All,

I have a need for a 'binary string'. I'm going to side step the size
of a byte and just assume it is 8 bits (in reality, all the platforms
I support use octets - no PDP-10s on the hardware list). So a byte
would be either unsigned char or uint8_t.

I noticed that neither msvc [1] nor stdc++/libg++ [2] appear to
provide a specialization for a vector<byte> (poking and proding with
WinDbg and gdb). Does anyone have pointers to some existing code?

I'm interested in some basic_string operations (such as concatenation
and substr); but I'm not interested in others (such as find).
Conceptually, I don't think basic_string<byte> is appropriate. Plus,
there's a tax to be paid due to facets, locales, and traits.

I don't follow: could you explain in more detail why

basic_string< unsigned char >

is inappropriate? As for the "tax": the character traits hardly involve any
overhead--they just come as a template parameter telling basic_string which
c-string routines to use in the implementation; and there is no facet and no
locale involved with strings, they only enter the picture with streams.

What I would do is: create a class byte_string that offers precisely the
interface you are looking for and internally uses basic_string<unsigned
char> for the implementation. That way, you avoid facets and locales being
pulled in once you do something like std::cout << my_byte_string.

I also can't see using a stock vector<byte> when taking into
consideration (1) there are no endian issues, (2) there are no
alignment issues (3) vectors have been unambiguously contiguous since
TR1 (4) c functions, such as memcpy and memset, can be orders of
magnitude faster than the discrete c++ counter parts (such as
std::copy(...) and friends).

On the wish list: I would also like a O(1) delete from the beginning
of the binary string. I suspect a lazy delete coupled with a 'leading
offset' will do the trick.

When profiling shows the need, you could add that feature to the little
wrapper class from above.


[...]

Best

Kai-Uwe Bux
 
J

James Kanze

Jeffrey Walton wrote:
I have a need for a 'binary string'. I'm going to side step the size
of a byte and just assume it is 8 bits (in reality, all the platforms
I support use octets - no PDP-10s on the hardware list). So a byte
would be either unsigned char or uint8_t.
I noticed that neither msvc [1] nor stdc++/libg++ [2] appear to
provide a specialization for a vector<byte> (poking and proding with
WinDbg and gdb). Does anyone have pointers to some existing code?
I'm interested in some basic_string operations (such as concatenation
and substr); but I'm not interested in others (such as find).
Conceptually, I don't think basic_string<byte> is appropriate. Plus,
there's a tax to be paid due to facets, locales, and traits.
I don't follow: could you explain in more detail why
basic_string< unsigned char >
is inappropriate?

Possibly because it's undefined behavior.
"std::basic_string<unsigned char>" is the equivalent of
"std::basic_string<unsigned char, std::char_traits<unsigned
char> >". An implementation is not required to furnish
"std::char_traits<unsigned char>", if it does, its contents are
up to the implementation, and you're not allowed to provide
a version yourself, since it's in std::. (In practice, I know
that both VC++ and g++ provide it, or did, and that their
versions were not compatible, and had different semantics.)
 
K

Kai-Uwe Bux

James said:
Jeffrey Walton wrote:
I have a need for a 'binary string'. I'm going to side step the size
of a byte and just assume it is 8 bits (in reality, all the platforms
I support use octets - no PDP-10s on the hardware list). So a byte
would be either unsigned char or uint8_t.
I noticed that neither msvc [1] nor stdc++/libg++ [2] appear to
provide a specialization for a vector<byte> (poking and proding with
WinDbg and gdb). Does anyone have pointers to some existing code?
I'm interested in some basic_string operations (such as concatenation
and substr); but I'm not interested in others (such as find).
Conceptually, I don't think basic_string<byte> is appropriate. Plus,
there's a tax to be paid due to facets, locales, and traits.
I don't follow: could you explain in more detail why
basic_string< unsigned char >
is inappropriate?

Possibly because it's undefined behavior.
"std::basic_string<unsigned char>" is the equivalent of
"std::basic_string<unsigned char, std::char_traits<unsigned
char> >". An implementation is not required to furnish
"std::char_traits<unsigned char>", if it does, its contents are
up to the implementation, and you're not allowed to provide
a version yourself, since it's in std::. (In practice, I know
that both VC++ and g++ provide it, or did, and that their
versions were not compatible, and had different semantics.)

I always forget about that. But it cannot be too hard to write a traits
class for unsigned char that does exactly what the OP wants. I don't think
that the percieved inappropriateness of basic_string<> lies there.


Best

Kai-Uwe Bux
 
J

James Kanze

But it cannot be too hard to write a traits
class for unsigned char that does exactly what the OP wants.

No. Especially as you can start by deriving from
std::char_traits<char>, which will give you a fair percentage of
the functions which should work "as is".
 
J

Jeffrey Walton

James said:
Jeffrey Walton wrote:
I have a need for a 'binary string'. I'm going to side step the size
of a byte and just assume it is 8 bits (in reality, all the platforms
I support use octets - no PDP-10s on the hardware list). So a byte
would be either unsigned char or uint8_t.
I noticed that neither msvc [1] nor stdc++/libg++ [2] appear to
provide a specialization for a vector<byte> (poking and proding with
WinDbg and gdb). Does anyone have pointers to some existing code?
I'm interested in some basic_string operations (such as concatenation
and substr); but I'm not interested in others (such as find).
Conceptually, I don't think basic_string<byte> is appropriate. Plus,
there's a tax to be paid due to facets, locales, and traits.
I don't follow: could you explain in more detail why
  basic_string< unsigned char >
is inappropriate?
Possibly because it's undefined behavior.
"std::basic_string<unsigned char>" is the equivalent of
"std::basic_string<unsigned char, std::char_traits<unsigned
char> >".  An implementation is not required to furnish
"std::char_traits<unsigned char>", if it does, its contents are
up to the implementation, and you're not allowed to provide
a version yourself, since it's in std::.  (In practice, I know
that both VC++ and g++ provide it, or did, and that their
versions were not compatible, and had different semantics.)

I always forget about that. But it cannot be too hard to write a traits
class for unsigned char that does exactly what the OP wants. I don't think
that the percieved inappropriateness of basic_string<> lies there.
I used basic_string<unsigned char> because it was quick and dirty (and
it included operations such as substr). I'm really interested in a
vector<unsigned char> already specialized, that includes optimizations
and substr/subvector semantics. Sorry about the confusion.

Jeff
 
J

Jeffrey Walton

Hi All,

I have a need for a 'binary string'. I'm going to side step the size
of a byte and just assume it is 8 bits (in reality, all the platforms
I support use octets - no PDP-10s on the hardware list). So a byte
would be either unsigned char or uint8_t.

I noticed that neither msvc [1] nor stdc++/libg++ [2] appear to
provide a specialization for a vector<byte> (poking and proding with
WinDbg and gdb). Does anyone have pointers to some existing code?

I'm interested in some basic_string operations (such as concatenation
and substr); but I'm not interested in others (such as find).
Conceptually, I don't think basic_string<byte> is appropriate. Plus,
there's a tax to be paid due to facets, locales, and traits.

I also can't see using a stock vector<byte> when taking into
consideration (1) there are no endian issues, (2) there are no
alignment issues (3) vectors have been unambiguously contiguous since
TR1 (4) c functions, such as memcpy and memset, can be orders of
magnitude faster than the discrete c++ counter parts (such as
std::copy(...) and friends).
Interestingly, it looks like C++0x addresses this: see the discussion
of PODs at http://www2.research.att.com/~bs/C++0xFAQ.html.
On the wish list: I would also like a O(1) delete from the beginning
of the binary string. I suspect a lazy delete coupled with a 'leading
offset' will do the trick.

Finally, I'm not interested in bringing another dependency, so
libraries such as boost need not apply (http://www.boost.org/doc/
libs/).

Jeff

[1] C Run-Time Libraries ,http://msdn.microsoft.com/en-us/library/abx4dbyh(VS.80).aspx
[2] Standard C++ Library,http://gcc.gnu.org/libstdc++/
 
A

Alf P. Steinbach /Usenet

* Jeffrey Walton, on 15.09.2010 22:34:
Hi All,

I have a need for a 'binary string'. I'm going to side step the size
of a byte and just assume it is 8 bits (in reality, all the platforms
I support use octets - no PDP-10s on the hardware list). So a byte
would be either unsigned char or uint8_t.

I noticed that neither msvc [1] nor stdc++/libg++ [2] appear to
provide a specialization for a vector<byte> (poking and proding with
WinDbg and gdb). Does anyone have pointers to some existing code?

I'm interested in some basic_string operations (such as concatenation
and substr); but I'm not interested in others (such as find).
Conceptually, I don't think basic_string<byte> is appropriate. Plus,
there's a tax to be paid due to facets, locales, and traits.

As discussed else-thread, nope.

I also can't see using a stock vector<byte> when taking into
consideration (1) there are no endian issues, (2) there are no
alignment issues (3) vectors have been unambiguously contiguous since
TR1 (4) c functions, such as memcpy and memset, can be orders of
magnitude faster than the discrete c++ counter parts (such as
std::copy(...) and friends).

This seems a bit confused.

On the wish list: I would also like a O(1) delete from the beginning
of the binary string. I suspect a lazy delete coupled with a 'leading
offset' will do the trick.

Finally, I'm not interested in bringing another dependency, so
libraries such as boost need not apply (http://www.boost.org/doc/
libs/).

Jeff

[1] C Run-Time Libraries ,
http://msdn.microsoft.com/en-us/library/abx4dbyh(VS.80).aspx
[2] Standard C++ Library, http://gcc.gnu.org/libstdc++/

Cheers & hth.,

- Alf
 
J

Jeffrey Walton

* Jeffrey Walton, on 15.09.2010 22:34:
I have a need for a 'binary string'. I'm going to side step the size
of a byte and just assume it is 8 bits (in reality, all the platforms
I support use octets - no PDP-10s on the hardware list). So a byte
would be either unsigned char or uint8_t.
I noticed that neither msvc [1] nor stdc++/libg++ [2] appear to
provide a specialization for a vector<byte>  (poking and proding with
WinDbg and gdb). Does anyone have pointers to some existing code?
I'm interested in some basic_string operations (such as concatenation
and substr); but I'm not interested in others (such as find).
Conceptually, I don't think basic_string<byte>  is appropriate. Plus,
there's a tax to be paid due to facets, locales, and traits.

As discussed else-thread, nope.
Interesting.The Technical Report on C++ Performance [1] seems to
indicate otherwise. Perhaps I read it wrong.

Jeff

[1] http://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf
 
B

Bo Persson

Jeffrey said:
* Jeffrey Walton, on 15.09.2010 22:34:
I have a need for a 'binary string'. I'm going to side step the
size of a byte and just assume it is 8 bits (in reality, all the
platforms I support use octets - no PDP-10s on the hardware
list). So a byte would be either unsigned char or uint8_t.
I noticed that neither msvc [1] nor stdc++/libg++ [2] appear to
provide a specialization for a vector<byte> (poking and proding
with WinDbg and gdb). Does anyone have pointers to some existing
code?
I'm interested in some basic_string operations (such as
concatenation and substr); but I'm not interested in others (such
as find). Conceptually, I don't think basic_string<byte> is
appropriate. Plus, there's a tax to be paid due to facets,
locales, and traits.

As discussed else-thread, nope.
Interesting.The Technical Report on C++ Performance [1] seems to
indicate otherwise. Perhaps I read it wrong.

Probably. Using locale and its facets has a cost, but basic_string
isn't doing that. With a half-decent compiler, char_traits has no cost
either.

The reason that there is no specialization for std::vector<byte> or
std::copy(byte*...) is probably that the compilers understand enough
to produce the right code anyway. You would probably be surprised to
see that copying bytes with a for-loop produces the same code as a
call to memcpy. It does!


Bo Persson
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,065
Latest member
OrderGreenAcreCBD

Latest Threads

Top