Zero terminated strings

K

Keith Thompson

Nobody said:
I think he's talking about NULs.

I think he's talking about null characters.

ASCII and EBCDIC use the term NUL to refer to the null character
(i.e., the character with encoding 0); the C standard does not.
 
T

Tech07

Keith Thompson said:
Richard Heathfield said:
spinoza1111 said: [snip]
Oh, was that a little close to the bone? I apologise. I didn't realise
you were sensitive to insanity references. I will try to remember not
to make such references in future - although I don't promise always
to succeed in that attempt. After all, almost everything you post is
redolent of insanity. Nevertheless, I'll do my best to refrain.

Richard, I suggest that it's time to stop feeding the troll. If you
refrained from responding to anything spinoza1111 posts (by use of
a killfile if necessary), I think it would substantially improve
this newsgroup's signal-to-noise ratio. Almost everyone here
already knows that spinoza1111 is wrong about nearly everything;
those few who don't won't be convinced otherwise.

Is there something like "a cast of characters FAQ" for this NG somewhere?
(Can you say "Desperate Housewives" counterparts? Hehehe.) I wanna know
who's posts I can skip over when I'm looking for on-topic info rather than
when just feeling like watching a SITCOM! :p
 
E

Eric Sosman

Tech07 said:
Is there something like "a cast of characters FAQ" for this NG somewhere?
(Can you say "Desperate Housewives" counterparts? Hehehe.) I wanna know
who's posts I can skip over when I'm looking for on-topic info rather than
when just feeling like watching a SITCOM! :p

There is, and I maintain it with the Tools->Message Filters
function of Thunderbird. I will, however, not share it; my
ideas of "signal" and "noise" are quite obviously not universal.

I revised my FAQ not long ago, and am regretting it.
 
T

Tech07

Eric Sosman said:
Tech07 said:
Nulls in linked-list links are not data. They are part of the machinery
of a program and under developer control.

This distinction is too subtle for me. You're saying that
charArray[42] is data but that ptr->next is not? And that
ptr->next is "under developer control" but charArray[42] is
not? What sort of "control" can the developer exert on a
piece of memory that is "not data" like ptr->next?

You tried to lump together into a single category every possible pattern
that uses some kind of sentinel. I've nothing more to say on that than what
I already have. Think about it some more and move on.
 
T

Tech07

Kenny McCormack said:
<snip>

Eric really is a complete and total loon - and should be completely
ignored (especially by you, Jacob), except as an object of ridicule (the
purpose of this post).

The level of "loon" that Eric boy displays, in every post, is simpley,
to put it bluntly (and there is no other way to put it), breathtaking.

To so completely miss the point of everything (as he consistently does)
is, truly, a skill.

Is there ANYONE in this newsgroup not personally attacking someone??! Quit
with the ad hominem already, puh-leeez! Geez.
 
T

Tech07

Flash Gordon said:
Gareth said:
That depends. It's probably true for long strings. If your string
implementation looks roughly like

struct string
{
unsigned len;
char * str;
}

then for short strings, looking up the length could easily cause a
cache miss -- or even a page fault -- depending on memory access
pattern and string accesses require an extra level of indirection.
NULL terminated strings have guaranteed memory locality.

A more likely structure would be...

struct string
{
unsigned len; /* possibly size_t rather than unsigned */
char str[];
}

Either form is correct when applied against the appropriate goals of the
string. That is key. The first step is to define the requirements (and there
is not just one valid set of those), rather than trying to "find the problem
to solve given a solution".
 
E

Eric Sosman

Tech07 said:
Eric Sosman said:
Tech07 said:
But in any case, what's special about strings? If using a sentinel
to terminate a string is a Bad Thing, why isn't it also a Bad Thing to
use a sentinel to terminate other sequences? Linked lists, for example:
Do you stop traversing when you find a NULL link (or an "I'm the end"
bit), or do you stop when a node counter tells you to?
Nulls in linked-list links are not data. They are part of the machinery
of a program and under developer control.
This distinction is too subtle for me. You're saying that
charArray[42] is data but that ptr->next is not? And that
ptr->next is "under developer control" but charArray[42] is
not? What sort of "control" can the developer exert on a
piece of memory that is "not data" like ptr->next?

You tried to lump together into a single category every possible pattern
that uses some kind of sentinel. I've nothing more to say on that than what
I already have. Think about it some more and move on.

Okay, so your distinction between "data" and "machinery"
is at best a side-issue. But my question remains: If sentinels
are evil when they terminate strings, why are they not evil
when they terminate other sequential structures? Or turn it
around: What property of strings distinguishes them from other
sequential structures in such a way as to make sentinels evil?

Long, long ago I used a language that represented strings
as linked lists of characters: a 32-bit word for each, with
the character code in eight high-order bits and a 24-bit pointer
in the rest. The last character of the string had a zero link.
Was this sentinel value evil because it terminated a string? Or
was it excused because it was "machinery" for a linked list?
Did its string-ness or its list-ness carry the day? Why?
 
A

Alan Curry

The CA will issue the certificate for a domain like
PayPal.com\0.badguy.com because the hacker legitimately owns the root
domain badguy.com. Then, due to a flaw found in the way SSL is
implemented in many browsers, Firefox and others theoretically can be
fooled into reading his certificate as if it were one that came from the

Opportunity for "new string library" advocates: submit your patch to convert
Firefox completely to your favorite new string library, immediately solving
all bugs in this category. I'm sure they'll thank you.
 
K

Kenny McCormack

Tech07 said:
Is there ANYONE in this newsgroup not personally attacking someone??! Quit
with the ad hominem already, puh-leeez! Geez.

Mr. Sousedman never tires of beating up on Jacob.
I guess it is all the life he has.
 
J

James Kuyper

Tech07 wrote:
....
Is there ANYONE in this newsgroup not personally attacking someone??! Quit
with the ad hominem already, puh-leeez! Geez.

Until you've got a kill-file set up to implement your personal
preferences, this newsgroup can be pretty aggravating. Don't worry, it
gets better once you've jettisoned enough of the noise.
 
T

Tech07

Tech07 said:
"string" is actually the more general computer science term. "character
string" is a sub category. "string" is used as a synonym for "byte string"
where each byte may be 0 thru 255. In this thread though, "string" was
apparently being used as "character string"???

Above, of course, I was limiting the notion of "character string" to
something akin to "ASCII string". Throw in Unicode and it's a whole
different ballgame.
 
T

Tech07

Keith Thompson said:
I think he's talking about null characters.

ASCII and EBCDIC use the term NUL to refer to the null character
(i.e., the character with encoding 0); the C standard does not.

While that is factual, it is not correct (whoever made ASCII, not you).
While there are 127 ASCII codepoints (outside of "extended 8-bit ASCII"),
only a subset are characters or printables. 'NUL' is the name given to the
codepoint 0, and yes, apparently/unfortunately they call it "the null
character" even though it is not a character at all.
 
T

Tech07

Alan Curry said:
Opportunity for "new string library" advocates: submit your patch to
convert
Firefox completely to your favorite new string library, immediately
solving
all bugs in this category. I'm sure they'll thank you.

I don't think a library is sufficient. Primitive strings are inextricably
tied to the implementation (compiler level). To be comprehensive about it,
you need compiler/language support.
 
T

Tech07

Eric Sosman said:
Tech07 said:
Eric Sosman said:
Tech07 wrote:

But in any case, what's special about strings? If using a
sentinel
to terminate a string is a Bad Thing, why isn't it also a Bad Thing to
use a sentinel to terminate other sequences? Linked lists, for
example:
Do you stop traversing when you find a NULL link (or an "I'm the end"
bit), or do you stop when a node counter tells you to?
Nulls in linked-list links are not data. They are part of the machinery
of a program and under developer control.
This distinction is too subtle for me. You're saying that
charArray[42] is data but that ptr->next is not? And that
ptr->next is "under developer control" but charArray[42] is
not? What sort of "control" can the developer exert on a
piece of memory that is "not data" like ptr->next?

You tried to lump together into a single category every possible pattern
that uses some kind of sentinel. I've nothing more to say on that than
what I already have. Think about it some more and move on.

Okay, so your distinction between "data" and "machinery"
is at best a side-issue.

Not hardly. That you can't see the difference of concerns between terminated
character strings and a linked list node using a null or sentinel to
indicate end-of-list or beg-of-list behooves you to do some homework and
write an essay about it, IMO.
But my question remains: If sentinels

I wouldn't call the null terminator it a sentinel, but I know what you mean.
are evil when they terminate strings, why are they not evil
when they terminate other sequential structures?

Again, I'm not having that discussion with you.
Or turn it
around: What property of strings distinguishes them from other
sequential structures in such a way as to make sentinels evil?

THAT is your homework assignment! (Part of it anyway).
Long, long ago I used a language that represented strings
as linked lists of characters: a 32-bit word for each, with
the character code in eight high-order bits and a 24-bit pointer
in the rest. The last character of the string had a zero link.
Was this sentinel value evil because it terminated a string? Or
was it excused because it was "machinery" for a linked list?
Did its string-ness or its list-ness carry the day? Why?

See, you have your essay introduction started already. Less than 20 pgs if
double spaced please.
 
C

Chris M. Thomasson

Richard Heathfield said:
Chris M. Thomasson said:
[Null-terminated strings] have advantages and disadvantages. They
are simple, they are supported, and they are fast
[...]

I am not exactly sure how using zero-terminated strings could be
faster than using a string abstraction which always knows it's
length:

Well, I did say "fast", not "faster" or "fastest"!

By bad. Sorry about that.



Having said that,
allocation does take time, which is why we so often point out that it
is wise to minimise the number of malloc calls - and how are you
going to get space for your counted string if not via malloc?

I am probably totally misunderstanding you but what about something like:
______________________________________________________________________
#include <stdio.h>
#include <stddef.h>


struct string {
size_t len;
char* buf;
};


#define STRING_STATICINIT(mp_str) { \
sizeof(mp_str) - 1, mp_str \
}


int main(void) {
struct string const str = STRING_STATICINIT("Hello");
printf("len = %lu\nbuf = %s\n", (unsigned long)str.len, str.buf);
getchar();
return 0;
}
______________________________________________________________________




Granted this string is immutable. But how about this hack:
______________________________________________________________________
#include <stdio.h>
#include <stddef.h>


#define CONCAT_X(mp_token1, mp_token2) \
mp_token1##mp_token2

#define CONCAT(mp_token1, mp_token2) \
CONCAT_X(mp_token1, mp_token2)


struct string {
size_t len;
char* buf;
};


#define STRING_DEFINE(mp_name, mp_str) \
char CONCAT(dummy_, mp_name)[] = mp_str; \
struct string mp_name = { sizeof(mp_str) - 1, NULL }; \
mp_name.buf = CONCAT(dummy_, mp_name)


int main(void) {
STRING_DEFINE(str, "Hello");
printf("len = %lu\nbuf = %s\n", (unsigned long)str.len, str.buf);
getchar();
return 0;
}
______________________________________________________________________



Don't
get me wrong - it's a trade-off, and counted strings do have
significant speed advantages too, some of which Jacob Navia has
already pointed out. The longer the strings you have to deal with,
the more significant these gains are.

Agreed.
 
K

Kenny McCormack

Tech07 wrote:
...

Until you've got a kill-file set up to implement your personal
preferences, this newsgroup can be pretty aggravating. Don't worry, it
gets better once you've jettisoned enough of the noise.

But do keep in mind that most people here claim not to use killfiles (or
at least did so until very recently) and many of those who do claim to
be using one, are clearly lying. Proof of this last is easily available
by perusing the archives, paying particular attention to the works of
"Han From China".
 
S

spinoza1111

Above, of course, I was limiting the notion of "character string" to
something akin to "ASCII string". Throw in Unicode and it's a whole
different ballgame.

Yeah, and yer out. .Net and Java both seamlessly handle strings that
can contain 7 bit ASCII, 8 bit, and 16 bit world unicode because of OO
encapsulation. C++ does to an extent. C is UNUSABLE for international
data processing, and as Britain and America slowly become second-rank
powers vis a vis India and China, this will make C programs useless.
Britain's and America's savage attack on women and children (who
constitute the majority) in Iraq and Afghanistan, mislabeled "wars"
were gestures of rage because increasingly, their economies are being
hollowed-out.

In the City of London and on Wall street, thug programmers may still
make the big bucks for risky code written in C which bombs when it
receives a trade from India or China. But they are steps away from
structural unemployment, and better men than they are, are fighting
for the prize of central Asia, and losing.

I don't have any hard data as to whether Indian or Chinese developers
avoid C because of its lack of international capabilities. They may
like it because of the intellectual challenges it provides in getting
anything done (for example, I was building linked-list string handling
while working for a real estate appraiser: fun fun fun). But with .Net
you don't have to waste any time on internationalization.
 
S

spinoza1111

Tech07 said:
Eric Sosman said:
Tech07 wrote:

    But in any case, what's special about strings?  If using a sentinel
to terminate a string is a Bad Thing, why isn't it also a Bad Thing to
use a sentinel to terminate other sequences?  Linked lists, for example:
Do you stop traversing when you find a NULL link (or an "I'm the end"
bit), or do you stop when a node counter tells you to?
Nulls in linked-list links are not data. They are part of the machinery
of a program and under developer control.
    This distinction is too subtle for me.  You're saying that
charArray[42] is data but that ptr->next is not?  And that
ptr->next is "under developer control" but charArray[42] is
not?  What sort of "control" can the developer exert on a
piece of memory that is "not data" like ptr->next?
You tried to lump together into a single category every possible pattern
that uses some kind of sentinel. I've nothing more to say on that than what
I already have. Think about it some more and move on.

     Okay, so your distinction between "data" and "machinery"
is at best a side-issue.  But my question remains: If sentinels
are evil when they terminate strings, why are they not evil
when they terminate other sequential structures?  Or turn it
around: What property of strings distinguishes them from other
sequential structures in such a way as to make sentinels evil?

     Long, long ago I used a language that represented strings
as linked lists of characters: a 32-bit word for each, with
the character code in eight high-order bits and a 24-bit pointer
in the rest.  The last character of the string had a zero link.
Was this sentinel value evil because it terminated a string?  Or
was it excused because it was "machinery" for a linked list?
Did its string-ness or its list-ness carry the day?  Why?

No it was not evil because it was a separate item. The problem with C
strings is that NUL can't be part of the data.

None of my goddamn business, but why only one char in each entry?
That's a lot of overhead. In the related area of multiple-precision
calculation using C, John Nash (a brilliant programmer as well as
mathematician) used arrays of long ints which at the time were 32
bits. He didn't use arrays of decimal digits, he used base 32768 like
a gentleman and a scholar. As I would.

Why didn't you consider putting a "string" in the node? Oops, I
forgot, C don't got strings. OK, howsabout a char array containing a
NUL terminated string, C's lameass quasi-string, with a NUL pointer to
this garbage when the string contains NULs? That sucks performance
wise. OK, how about a fixed length char array? That would be
isomorphic to the Nash solution for multiple-precision.

I don't understand a linked list with so much machinery and so little
data. It ain't elegant.
 
S

spinoza1111

Richard Heathfield said:
spinoza1111said: [snip]
Oh, was that a little close to the bone? I apologise. I didn't realise
you were sensitive to insanity references. I will try to remember not
to make such references in future - although I don't promise always
to succeed in that attempt. After all, almost everything you post is
redolent of insanity. Nevertheless, I'll do my best to refrain.
Richard, I suggest that it's time to stop feeding the troll.  If you
refrained from responding to anythingspinoza1111posts (by use of
a killfile if necessary), I think it would substantially improve
this newsgroup's signal-to-noise ratio.  Almost everyone here
already knows thatspinoza1111is wrong about nearly everything;
those few who don't won't be convinced otherwise.

Is there something like "a cast of characters FAQ" for this NG somewhere?
(Can you say "Desperate Housewives" counterparts? Hehehe.) I wanna know
who's posts I can skip over when I'm looking for on-topic info rather than
when just feeling like watching a SITCOM! :p- Hide quoted text -

Right, hurt people's feelings (so that they trigger flame wars and
worsen the problem you pretend to complain about). Destroy their
employability especially if they've actually accomplished something as
opposed to sitting in a chair eating Doritos at a succession of
meaningless jobs.

I would like to make it axiomatic that everyone is welcome UNTIL they
start, until they initiate, a personal attack on a person's
programming competence. If they do, I say the person targeted has the
right and duty to respond in kind with maximum force.

If a poster writes about a targeted poster in the third person, or
spreads unfounded rumors about his competence (in a medium where
people do not collaborate on common projects, and competence cannot
for this reason be determined), then the mark or target has the
absolute right of verbal self-defense. Instead of criticising him,
other posters should overcome the homophobia that fills this group
like mustard gas and post supportive comments.

Many of you have NEVER used the first name or patronym of your
enemies, not once. But it's amazing how you can overcome differences
by starting out a response with the person's name. You don't want to
use the first name because you're homophobes? Then use the patronym
with Mr or Ms as appropriate.

I've said it before and I say it again. People dip into the most
recent posts and do not do due diligence. As a result, they will see,
as a matter of statistics, responses to attacks and not the initial
attack, which will be hidden. The bravest posters, so subject to
attacks which may harm their employability, will appear to be trouble-
makers.

All this would end if people followed Gerald Weinberg's guidelines for
"structured walkthroughs" and spoke strictly about CODE here. But most
of you get a rather pathetic hard-on when you see your fellow man
being brutalized.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,197
Latest member
Sean29G025

Latest Threads

Top