Zero terminated strings

Flash Gordon · Aug 4, 2009

jameskuyper said:
It's C++ - you can't stop people from writing new string classes.
However, if you don't want your algorithm to be affected by the
existence of such classes, that's trivial to arrange.

Ah, but I want to use Fred Blog's library which will take strings, so I
need it to take The One True String Type. I'll be calling some of his
functions, passing strings, and also passing functions which take
strings as parameters. Oh, and I'll be passing functions that take
references to strings, and references to strings.

Keep going.

I don't want the overhead of indirect calls normally implied by the use
of classes for what should be a basic type

I initially thought that the two different string types you were
talking about were char* and wchar_t* in C; on the basis of that
assumption, I wrote:

Well, I don't want that...

However, here it sounds like you're referring to char* and
std::string.

and I don't want that either.

In that case, keep in mind that support for char* is
mandated by the fact that one of the design objectives for C++ has
always been to remain backwards compatible with C. That objective has
occasionally been sacrificed in order to meet other objectives, but
there's no compelling need to delete support for C-style null-
terminated strings.

I agree there are good reasons for C++ supporting C style strings, and
I'm not saying it should be dropped in C++. However, such support is one
reason for using a differently language which does not have that baggage
when I would serious string processing.

Also, I don't see that it being implemented by a library (as far as user
level use is concerned) gives any benefit over a different language
where strings are a basic type in the language.

robertwessel2 · Aug 4, 2009

Ah, but I want to use Fred Blog's library which will take strings, so I
need it to take The One True String Type. I'll be calling some of his
functions, passing strings, and also passing functions which take
strings as parameters. Oh, and I'll be passing functions that take
references to strings, and references to strings.

Well, if you've written your own string class, you can create a
conversion operator that will automatically convert it to a standard
string when you're calling something that expects that. Conversion
functions are easy to overuse, so should be used sparingly, but they
can make that kind of thing a lot more transparent.

I don't want the overhead of indirect calls normally implied by the use
of classes for what should be a basic type

There shouldn't be an indirect call to execute a member function for
something like std:string, since there are no virtual functions.
Implementation oddities aside, of course – I’ve never seen a
std:string that requires virtual function calls.

Well, I don't want that...

and I don't want that either.

I agree there are good reasons for C++ supporting C style strings, and
I'm not saying it should be dropped in C++. However, such support is one
reason for using a differently language which does not have that baggage
when I would serious string processing.

Well, std:string is only a specialization of the basic_string
template, specifically it's a basic_string composed of chars.
std:wstring is for strings of wchar_t's. You could define others if
you felt the need (although you’d probably have to provide a new
char_trait for the new type).

Also std:string provides for automatic conversion from C style strings
in most cases, and, and a fairly straight forward way (you have to
explicitly request that with the c_str() function) to get a C style
string out of a std:string.

Also, I don't see that it being implemented by a library (as far as user
level use is concerned) gives any benefit over a different language
where strings are a basic type in the language.

Well, it doesn't necessarily hurt to do so either. And it might give
you more flexibility later - for example, you could easily define some
other string type by specializing basic_string differently.

spinoza1111 · Aug 5, 2009

spinoza1111said:

Wrong. In the absence of understanding, terminology is meaningless,
and therefore unimportant. With understanding, however, comes the
realisation that communication is important. The proper use of
terminology greatly eases communication. That is why the proper use
of terminology is important. In the case of "sentinel" vs
"terminator", however, neither term is incorrect, and an
understanding of both is useful because some people use one and some
the other (and, no doubt, some use both).

The very idea that communication is even possible assumes the absence
of bullshit.

Furthermore, it is not true as a general law or saw that "the proper
use of terminology greatly eases communication", because I believe
that what you in turn consider "the proper use of terminology" is
something learned by rote.

In my poem, I draw a distinction between what "Master
Kong" (Confucius) meant by the "rectification of names" and "proper
terminology" because in my experience pedants can use proper
terminology without understanding "names". "Terminology" means
specific symbols whereas Kong's names were shared ideas in Husserl's
sense.

The terminology-obsessed flame and bully the person who understands
what they understand but uses a different language, whereas the
scholar through civility attempts to find common ground. I don't think
you're good at this.

The scholar really, really does not care what you call a NUL at the
end of the string. He searches instead in de Saussure's sense for a
structural relationshhip.

Phil Carmody · Aug 5, 2009

Beej Jorgensen said:
What I'm saying is: if you imagine that the string implementation is
simply a conforming black box, then you are still subject to the same
terminator-related security issues for every possible implementation,
both of arrays and linked lists. Eric provided an example of this.

OK. In-band versus out-of-band control. Absolutely.

Phil

Nick Keighley · Aug 5, 2009

[sero terminated strings]
<snip>

One could inductively reason about
the list of requirements that led to the null-terminated design, but there
probably wasn't any such formal or thorough analyses. The analysis is easier
done in retrospect. I'm not sure what portion of the C/C++ programming
languages crowd has either opinion, but I have an incling that most would
opt for a different solution if it was C language creation time right now.
For doing major string handling you want a language where strings are a
first class type, at which point whether the strings are counted, use a
sentinel or whatever becomes largely irrelevant to the person using the
language (there are ways of embedding a sentinel in a
sentinel-terminated blob of data).

Click to expand...

Click to expand...

by first class you mean assignable and capable being compared
for equality?

Click to expand...

string s;
string t = "hello";
s = "hello";
if (s == t) f (s);

Click to expand...

You can write code like that in C++

Click to expand...

And then if f is defined as

void f(string str)
{
/* Modify str */

}

Then str is passed by value (if other built in types are) so the
modification to str does not affect s in the caller...

All the memory management handled for you, just as it is for type int...
including when you do things like string concatenation etc

The type being available without having to take extra measures...

a language that provides the right hooks can add first class types.
Eg. C++ (and maybe Navia-C)

Click to expand...

in what way has C++ "faked it". You can some pretty clever things
with C++ strings.

Click to expand...

Well, I can't say I've learned C++, so I don't know how closely it fits.

I'm not sure I'd like to rely on an application that relied
on "simple tricks" to recover from comms errors!

Click to expand...

Why not use a decent link-level protocol?

Click to expand...

Simple "tricks" are all you need to implement a decent link level
protocol. I know, I've done it. I did it so well with worked with the
data going across 4 serial links, one of which was un-shieled 3-wire
(data-in, data-out and signal ground) in an electrically noisy environment.

Wait for a valid first byte of a header (anything else you receive is
obviously invalid)

Check the header bytes (including message type and length, for which you
know the valid range) as they are coming in and assume that you have
dropped at least one byte if any of them are wrong. Check the checksum
at the end.

If you are happy with a checksum then you didn't have a very noisy
line.

"simple tricks" go something like

parity, checksum, crc, golay encoding (or other inter leaving
scheme),
FEC

I submit that somewher ein the middle of thta it stops being simple.

<snip basic link layer protocol>

take a look ar LAP-D or a varient.

What is relevant to my original point (giving people experience of
working on systems with highly unreliable data transfer, and having them
solve the problem) is seeing the interface failing abysmally (before I
rewrote it as described) because there was *not* a link level protocol
seriously hammered in to me how unreliable data is when it comes from
else where, *even* when the "else where" is another computer sitting
within about four feet of the final destination and me having complete
control over all systems involved!

I suppose inventing all this is educational (I've built such
protocols
myself) but you'd really be better off not reinventing wheels.

Tanenbaum's books used to be good (they even have jokes in them!)

James Kuyper · Aug 5, 2009

Flash said:
jameskuyper wrote: ....

Ah, but I want to use Fred Blog's library which will take strings, so I
need it to take The One True String Type. I'll be calling some of his
functions, passing strings, and also passing functions which take
strings as parameters. Oh, and I'll be passing functions that take
references to strings, and references to strings.

You'll have to talk with Fred Blog about that, then. That will be
equally true of any sufficiently general language, even if it does
provide a built-in string type. Fred Blog can still choose to implement
his own alternative to the built-in type; the only way you can prevent
that is by making the language insufficiently powerful to implement such
alternatives.

I don't want the overhead of indirect calls normally implied by the use
of classes for what should be a basic type

As a general rule, despite the fact that indirect calls are implied,
they (or at least, the associated overheads) won't actually happen with
an implementation that provides decent optimization. That's because most
of the critical function calls will be inlined.

Conversely, a sufficiently poor implementation for a language that does
have stings as a basic type, the overheads of indirect calls may occur
despite the fact that they are not implied.

Also, I don't see that it being implemented by a library (as far as user
level use is concerned) gives any benefit over a different language
where strings are a basic type in the language.

As a template library, it is potentially vastly more flexible than a
built-in type. You've indicated that you don't want to make any use of
such flexibility, and that's fine - you don't have to pay any attention
to the other ways the templates can be used, and you won't pay any
significant performance penalty just because those other uses are
possible. However, for someone who does want to provide their own
alternative char-like type and/or an alternative to
std::char_traits<char>, it's convenient to know that there's an existing
framework they can build upon, they don't need to build the whole thing
themselves.

However, I'm not in complete disagreement with you on this; I prefer C's
implementation of complex numbers to the implementation by C++,
precisely because I believe that it could, if implementors cared to, be
optimized more efficiently as a built-in type than as a C++ class. My
gut feeling is that this advantage would be smaller for a built-in
string type than for a built-in complex type. I would expect that many
implementations not specifically targeted at the numerical analysis
field don't put more than the minimum amount of effort into complex math
needed to qualify as conforming, in which case the difference could go
either way.

I've no actual measurements to back that up, but then my current work
doesn't involve complex math. If and when I ever have a need to do
complex math, and get to choose between using C++ and C99 to do it, then
I'll make appropriate measurements.

Flash Gordon · Aug 5, 2009

Nick said:
Nick said:

[sero terminated strings]

Click to expand...

Click to expand...

Simple "tricks" are all you need to implement a decent link level
protocol. I know, I've done it. I did it so well with worked with the
data going across 4 serial links, one of which was un-shieled 3-wire
(data-in, data-out and signal ground) in an electrically noisy environment.

Wait for a valid first byte of a header (anything else you receive is
obviously invalid)

Check the header bytes (including message type and length, for which you
know the valid range) as they are coming in and assume that you have
dropped at least one byte if any of them are wrong. Check the checksum
at the end.

Click to expand...

If you are happy with a checksum then you didn't have a very noisy
line.

It wasn't just checksum. As I said, there were other checks on the header.

"simple tricks" go something like

parity,

Handled by the UART on a byte by byte basis. As are start/stop bits.

checksum, crc,

CRC is a type of checksum (that is what the last C stands for). I can't
remember what type of checksum was used.

golay encoding (or other inter leaving
scheme),
FEC

I did not need to go that far to successfully reject the invalid
packets, and some of the time it was rejecting more packets than it was
receiving.

I submit that somewher ein the middle of thta it stops being simple.

<snip basic link layer protocol>

take a look ar LAP-D or a varient.

I agree you can get on to far more complex schemes.

I suppose inventing all this is educational (I've built such
protocols
myself) but you'd really be better off not reinventing wheels.

Some of it was already invented, and I was young and naive. It was also
easy to implement what I did in assembler, which was the only language I
had available on this project.

Tanenbaum's books used to be good (they even have jokes in them!)

Maybe, but I'm out of that market now. However, the original point
stands, that having to deal with something where you get enough visible
corruption teaches you the need to validate.

People need to actually experience bad data breaking things before they
really learn the need to validate it.

robertwessel2 · Aug 5, 2009

However, I'm not in complete disagreement with you on this; I prefer C's
implementation of complex numbers to the implementation by C++,
precisely because I believe that it could, if implementors cared to, be
optimized more efficiently as a built-in type than as a C++ class. My
gut feeling is that this advantage would be smaller for a built-in
string type than for a built-in complex type. I would expect that many
implementations not specifically targeted at the numerical analysis
field don't put more than the minimum amount of effort into complex math
needed to qualify as conforming, in which case the difference could go
either way.

I've no actual measurements to back that up, but then my current work
doesn't involve complex math. If and when I ever have a need to do
complex math, and get to choose between using C++ and C99 to do it, then
I'll make appropriate measurements.

Even that's up to the compiler implementer. Many compilers implement
intrinsics for many floating point functions, and if there was
benefit, that could certainly be applied to members of complex. Intel
actually does supply a (small) handful of complex intrinsics in ICC.
OTOH, I'm not sure how much better the compiler could do for addition,
subtraction and multiplication, the inlined versions of those are
certainly straight-forward enough, and should be subject to the usual
optimizations. Division is a bit messier, of course (although it does
fold to a pair of relatively simple expressions).

Antoninus Twink · Aug 12, 2009

Is there something like "a cast of characters FAQ" for this NG
somewhere?

There was a thread a year or two ago categorizing the "regulars" into a
series of archetypal Usenet types as found on some webpage. It was funny
insightful and fairly amusing. Unfortunately, Google's completely broken
Usenet search tool won't let me find it at the moment.

I wanna know who's posts I can skip over

You'll soon figure it out.

rather than when just feeling like watching a SITCOM!

Sadly, watching the regs' antics is about the best part of this group,
as they've done their best to stifle all interesting technical
discussions over many years.

Antoninus Twink · Aug 12, 2009

Is there ANYONE in this newsgroup not personally attacking someone??!

No one springs to mind.

Zero Byte Terminated Strings	10	Mar 27, 2007
Working with NON-NULL terminated strings	4	Jul 14, 2007
strncpy() and null terminated strings	4	Apr 8, 2004
Reading null terminated strings in Java	9	Feb 4, 2009
Exact Arithmetic and Strings	4	Jul 13, 2010
Null-terminated strings with struct module?	2	Mar 5, 2004
Null character and JavaScript strings	16	Mar 4, 2011
FAQ 6.23 How can I match strings with multibyte characters?	0	Jan 11, 2011

Zero terminated strings

Flash Gordon

robertwessel2

spinoza1111

Phil Carmody

Nick Keighley

James Kuyper

Flash Gordon

robertwessel2

Antoninus Twink

Antoninus Twink

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads