trim whitespace v3

Seebs · Aug 28, 2010

OK. Let's agree to differ. I find extra statements (especially when
they have an implicit omission) highly suggestive but I agree there is
no rule against them.

I'm still wrestling with this. I think you may be persuading me that,
strictly speaking, the argv[] members are not guaranteed to stay stable,
and that's a QoI issue.

I am not 100% sure that this is a simple restatement. The reason is
that volatile-qualified objects *also* hold their last-stored value
throughout their lifetime -- it is just that the last store may be
external to the program. 6.2.4 is quote clear about that. The wording
in question seems to be saying something new that is not directly
related to volatile objects. I.e. it does not need to be said about any
object -- volatile or not and that makes me less sure that it can simply
be regarded as restating the obvious.

Hmm.

This is leaning me towards thinking that the things pointed to by
argv, and the things pointed to by those in turn, really *aren't*
"objects" in the standard way.

This would be a more persuasive line if it were not for that fact that
all the (other) situations where an object isn't modifiable seem to be
very explicitly stated. I suspect that the non-modifiablity of argv[0]
though argv[argc-1] will turn out to be the only one that is not
explicit.

Could be.

At this point, I'm pretty sure this is some sort of defect, because we
have reasonable people reaching contradictory views on something that
I assume the standard intended to specify or imply. I agree that it
would be surprising for argv[] to change without being modified
by the program, or for anything to go wrong if you modified them, but
I am not sure either of these is formally guaranteed.

I can conceive of an implementation in which there would be some
technical reason for surprises to occur.

-s

Tim Rentsch · Aug 28, 2010

Ben Bacarisse said:
Seebs said:

My thoughts are:
1. Unless otherwise specified, everything retains its last stored value.
2. Unless otherwise specified, are not necessarily modifiable.
3. It is not a violation of any rule for the standard to occasionally
specify something which was implicit in other information already
available.

Click to expand...

OK. Let's agree to differ. I find extra statements (especially when
they have an implicit omission) highly suggestive but I agree there is
no rule against them.

I think that the statement that the contents of the strings retain their
last stored values is probably harmless but not necessary -- I think it
would have to be true anyway.

Not so. They're not qualified-volatile, therefore, they must not change
WHETHER OR NOT there's a restatement of that in this section.

Click to expand...

I am not 100% sure that this is a simple restatement. The reason is
that volatile-qualified objects *also* hold their last-stored value
throughout their lifetime -- it is just that the last store may be
external to the program. 6.2.4 is quote clear about that. The wording
in question seems to be saying something new that is not directly
related to volatile objects. I.e. it does not need to be said about any
object -- volatile or not and that makes me less sure that it can simply
be regarded as restating the obvious.

If you took out the claim that the contents of the strings don't change,
I don't think the meaning of the spec would change.

I won't keep baning on about this. I am not sure we are getting any
further and, in truth, does either of us care? My lingering curiosity
is now as to why you read a sentence that excludes argv[x] from the two
properties it imparts to argv and to argv[x][y] as if one of the two
properties need not be mentioned at all.

Click to expand...

Because the standard has an explicit statement elsewhere that objects
retain their last stored values, with some exceptions. This was never
identified as an exception. So it didn't need to be mentioned at all.

However, there's no general rule that all things you can have
pointers to are modifiable. String literals give us a nice example
of something that's not declared const, but which is not modifiable.

Click to expand...

This would be a more persuasive line if it were not for that fact that
all the (other) situations where an object isn't modifiable seem to be
very explicitly stated. I suspect that the non-modifiablity of argv[0]
though argv[argc-1] will turn out to be the only one that is not
explicit.

I realize I'm late to the party here, but I thought I'd throw my
two cents in.

I agree with the conclusion that the pointers in the argv array
keep their original values, but not using the same reasoning. Look
at the language of 5.1.2.2.1p2 item 3:

If the value of argc is greater than zero, the array members
argv[0] through argv[argc-1] inclusive shall contain
pointers to strings, which are given ...

What it says is that the array members shall contain pointers to
strings (and not just any strings, but the strings identified by the
subsequent "which" clause); not just that they do when the program
starts, but that they do. Since there is no time qualification, the
stated restrictions should apply throughout program execution, ie,
the array values are unchanging.

Furthermore, note the language in subsequent items of 5.1.2.2.1p2:

If the value of argc is greater than zero, the string
pointed to by argv[0] represents the program name;
argv[0][0] shall be the null character if the program name
is not available from the host environment. If the value of
argc is greater than one, the strings pointed to by argv[1]
through argv[argc-1] represent the program parameters.

The parameters argc and argv and the strings pointed to by
the argv array shall be modifiable by the program, and
retain their last-stored values between program startup and
program termination.

Use of the phrase "the strings" in these items implicitly carry a
statement that the pointers to these strings, ie, the array values,
are unchanging; otherwise the language would have been "whatever
strings pointed to by" or "any strings pointed to by" or something
similar. Clearly whoever wrote the paragraph is expecting that
the values in the argv array point to the same strings throughout
execution.

Finally, note that 5.1.2.2.1p2 item 3 contains a "shall" requirement.
By 4p2, if this requirement is violated, the behavior is undefined.
The wording of 5.1.2.2.1p2 item 3 seems clear enough that the strings
in question are those given "implementation-defined values by the
host environment". Any change to an element of the argv array would
violate the stated requirement, resulting in undefined behavior.

Ben Bacarisse · Aug 28, 2010

Tim Rentsch said:
I realize I'm late to the party here, but I thought I'd throw my
two cents in.

I agree with the conclusion that the pointers in the argv array
keep their original values, but not using the same reasoning. Look
at the language of 5.1.2.2.1p2 item 3:

If the value of argc is greater than zero, the array members
argv[0] through argv[argc-1] inclusive shall contain
pointers to strings, which are given ...

What it says is that the array members shall contain pointers to
strings (and not just any strings, but the strings identified by the
subsequent "which" clause); not just that they do when the program
starts, but that they do. Since there is no time qualification, the
stated restrictions should apply throughout program execution, ie,
the array values are unchanging.

I don't entirely agree that there is no time qualification. I'd say
the whole section name gives a time qualification.

However, since I don't dispute (and have never disputed) the claim that
none of these objects can change value spontaneously, I am very happy to
agree. My worry was simply the significance that has been attached to
the omission of argv[0] through argv[argc-1] from the wording of
5.1.2.2.1 p3.

If you are sure that excluding these from the clause about holding their
value has no significance (because of the argument you present here) are
you equally sure that their exclusion from the "modifiable" clause is
deliberate? I.e. do you hold that argv[1] = 0; (when argc > 1) renders
a program undefined, while argv = 0; and argv[1][0] = 0; are fine?

Furthermore, note the language in subsequent items of 5.1.2.2.1p2:

If the value of argc is greater than zero, the string
pointed to by argv[0] represents the program name;
argv[0][0] shall be the null character if the program name
is not available from the host environment. If the value of
argc is greater than one, the strings pointed to by argv[1]
through argv[argc-1] represent the program parameters.

The parameters argc and argv and the strings pointed to by
the argv array shall be modifiable by the program, and
retain their last-stored values between program startup and
program termination.

Use of the phrase "the strings" in these items implicitly carry a
statement that the pointers to these strings, ie, the array values,
are unchanging; otherwise the language would have been "whatever
strings pointed to by" or "any strings pointed to by" or something
similar. Clearly whoever wrote the paragraph is expecting that
the values in the argv array point to the same strings throughout
execution.

Finally, note that 5.1.2.2.1p2 item 3 contains a "shall" requirement.
By 4p2, if this requirement is violated, the behavior is undefined.
The wording of 5.1.2.2.1p2 item 3 seems clear enough that the strings
in question are those given "implementation-defined values by the
host environment". Any change to an element of the argv array would
violate the stated requirement, resulting in undefined behavior.

That is certainly the literal wording. But similarly, argc is
modifiable but "[t]he value of argc shall be nonnegative" (with, in your
view, no time restriction). Does that mean that argc = 42; is OK but
argc = -1; makes a program undefined?

BartC · Aug 28, 2010

Define ideas

Trim leading and trailing whitespace from any null terminated string
found in memory, no matter whether data or garbage. Reports failure
and quits if null terminator not found within system defined limit.

Parameters
char * to string
NULL, or size_t * for trimlen result

On success returns 0

On failure returns -1 and after setting errno to one of
EINVAL
EOVERFLOW
*/

Producing a single spec for trimming a string, which does all the different
things one might want, is difficult.

For example, whether the trimming is done in-place, whether the trimmed
string is shifted so it starts in the same place, whether a copy is made
instead, and whether the caller or callee allocates space, and so on.

Instead I've put together a spec that just returns details of a trimmed
string, without actually modifying anything. Then this can be used as a
basic building block for a number of trim functions:

char* trim_info(char* inpstr, int inplen, int *itrimlen);

Inputs:

inpstr Point to string which may need trimming. Can be NULL, then
result is NULL too. String can be empty, or consist entirely
of white space.

inplen Optional length of inpstr, if known. Otherwise can left as
0. (Will not cause problems for zero-length strings.)

Outputs:

itrimlen Point to int to receive length of trimmed substring. Can be
NULL, then is ignored.

Function result:

Points to start of trimmed substring within inpstr. Will be
NULL when inpstr is NULL. When inpstr is entirely white
space, then will point to the nul terminator of inpstr, and
*itrimlen will be zero.

Errors:

When inpstr does not point to a valid string, or inplen is
given but is incorrect, or itrimlen does not point to a
valid int, then results are undefined.

(ints should be unsigned, but that clutters up the prototype decl. I don't
worry about strings longer than UINT_MAX)

Typical usage:

int newlen;
char *newstr;

newstr=trim_info(" bart ",0,&newlen);
printf("New str: <%.*s>\n",newlen,newstr);

(Tested with a 5-minute implementation that I won't bother to post.)

BartC · Aug 28, 2010

Define ideas

Trim leading and trailing whitespace from any null terminated string
found in memory, no matter whether data or garbage. Reports failure
and quits if null terminator not found within system defined limit.

Parameters
char * to string
NULL, or size_t * for trimlen result

On success returns 0

On failure returns -1 and after setting errno to one of
EINVAL
EOVERFLOW
*/

Producing a single spec for trimming a string, which does all the different
things one might want, is difficult.

For example, whether the trimming is done in-place, whether the trimmed
string is shifted so it starts in the same place, whether a copy is made
instead, and whether the caller or callee allocates space, and so on.

Instead I've put together a spec that just returns details of a trimmed
string, without actually modifying anything. Then this can be used as a
basic building block for a number of trim functions:

char* trim_info(char* inpstr, int inplen, int *itrimlen);

Inputs:

inpstr Point to string which may need trimming. Can be NULL, then
result is NULL too. String can be empty, or consist entirely
of white space.

inplen Optional length of inpstr, if known. Otherwise can left as
0. (Will not cause problems for zero-length strings.)

Outputs:

itrimlen Point to int to receive length of trimmed substring. Can be
NULL, then is ignored.

Function result:

Points to start of trimmed substring within inpstr. Will be
NULL when inpstr is NULL. When inpstr is entirely white
space, then will point to the nul terminator of inpstr, and
*itrimlen will be zero.

Errors:

When inpstr does not point to a valid string, or inplen is
given but is incorrect, or itrimlen does not point to a
valid int, then results are undefined.

(ints should be unsigned, but that clutters up the prototype decl. I don't
worry about strings longer than UINT_MAX)

Typical usage:

int newlen;
char *newstr;

newstr=trim_info(" bart ",0,&newlen);
printf("New str: <%.*s>\n",newlen,newstr);

(Tested with a 5-minute implementation that I won't bother to post.)

Tim Rentsch · Aug 28, 2010

Ben Bacarisse said:
Tim Rentsch said:

I realize I'm late to the party here, but I thought I'd throw my
two cents in.

I agree with the conclusion that the pointers in the argv array
keep their original values, but not using the same reasoning. Look
at the language of 5.1.2.2.1p2 item 3:

If the value of argc is greater than zero, the array members
argv[0] through argv[argc-1] inclusive shall contain
pointers to strings, which are given ...

What it says is that the array members shall contain pointers to
strings (and not just any strings, but the strings identified by the
subsequent "which" clause); not just that they do when the program
starts, but that they do. Since there is no time qualification, the
stated restrictions should apply throughout program execution, ie,
the array values are unchanging.

Click to expand...

I don't entirely agree that there is no time qualification. I'd say
the whole section name gives a time qualification.

However, since I don't dispute (and have never disputed) the claim that
none of these objects can change value spontaneously, I am very happy to
agree. My worry was simply the significance that has been attached to
the omission of argv[0] through argv[argc-1] from the wording of
5.1.2.2.1 p3.

If you are sure that excluding these from the clause about holding their
value has no significance (because of the argument you present here) are
you equally sure that their exclusion from the "modifiable" clause is
deliberate? I.e. do you hold that argv[1] = 0; (when argc > 1) renders
a program undefined, while argv = 0; and argv[1][0] = 0; are fine?

Furthermore, note the language in subsequent items of 5.1.2.2.1p2:

If the value of argc is greater than zero, the string
pointed to by argv[0] represents the program name;
argv[0][0] shall be the null character if the program name
is not available from the host environment. If the value of
argc is greater than one, the strings pointed to by argv[1]
through argv[argc-1] represent the program parameters.

The parameters argc and argv and the strings pointed to by
the argv array shall be modifiable by the program, and
retain their last-stored values between program startup and
program termination.

Use of the phrase "the strings" in these items implicitly carry a
statement that the pointers to these strings, ie, the array values,
are unchanging; otherwise the language would have been "whatever
strings pointed to by" or "any strings pointed to by" or something
similar. Clearly whoever wrote the paragraph is expecting that
the values in the argv array point to the same strings throughout
execution.

Finally, note that 5.1.2.2.1p2 item 3 contains a "shall" requirement.
By 4p2, if this requirement is violated, the behavior is undefined.
The wording of 5.1.2.2.1p2 item 3 seems clear enough that the strings
in question are those given "implementation-defined values by the
host environment". Any change to an element of the argv array would
violate the stated requirement, resulting in undefined behavior.

Click to expand...

That is certainly the literal wording. But similarly, argc is
modifiable but "[t]he value of argc shall be nonnegative" (with, in your
view, no time restriction). Does that mean that argc = 42; is OK but
argc = -1; makes a program undefined?

Well that wasn't the best argument I've ever made. In retrospect
it has more holes than swiss cheese.

Looking at this afresh, it seems clear that 5.1.2.2.1p2 is talking
about constraints on what an implementation may (or may not) do with
respect to these parameters and parameter values, not what a program
may do. The implementation is obliged to make the two parameters
behave just like regular (non-const) objects, and that applies also
to the strings pointed at by the (original) argv array. However,
implementations are not obliged to make the argv array be modifiable
(since that condition is never stated). Hence it might not be, and
so any attempt by the program to modify its elements may result in
failure (presumably undefined behavior since what happens is never
defined). The situation is analogous to giving an argument whose
(pre-casted) type is 'char *const *'. However, the argv array need
not be declared in the regular C sense, so we can't talk about its
declaration or its type in the usual way.

Getting back to the earlier question about the "retaining their
last-stored values" condition -- I think I would argue that the last
item in 5.1.2.2.1p2 is meant to be read as giving requirements on
how these various objects respond to certain kinds of program
behavior, and that condition should be understood in that context.
If a program can't modify a particular object, there isn't any
reason to say anything about whether the object must retain the
/program's/ last-stored value, because there is no such animal.
The statement is meant to apply only to what happens in response
to program behavior; constraints on these values as far as the
implementation's actions are concerned is given in the other items.

John Kelly · Aug 28, 2010

The parameters argc and argv and the strings pointed to by
the argv array shall be modifiable by the program, and
retain their last-stored values between program startup and
program termination.

Use of the phrase "the strings" in these items implicitly carry a
statement that the pointers to these strings, ie, the array values,
are unchanging;

It explicitly says "the argv array shall be modifiable." Any implicit
contradictory conclusion is clearly wrong.

otherwise the language would have been "whatever
strings pointed to by" or "any strings pointed to by" or something
similar. Clearly whoever wrote the paragraph is expecting that
the values in the argv array point to the same strings throughout
execution.

Not all programmers at good at writing prose, much less "standards."

Finally, note that 5.1.2.2.1p2 item 3 contains a "shall" requirement.
By 4p2, if this requirement is violated, the behavior is undefined.
The wording of 5.1.2.2.1p2 item 3 seems clear enough that the strings
in question are those given "implementation-defined values by the
host environment". Any change to an element of the argv array would
violate the stated requirement, resulting in undefined behavior.

The host environment gets you up and running, then hands the environment
over to you.

Some of these undefined behavior debates are just ludicrous.

Ben Bacarisse · Aug 28, 2010

John Kelly said:
It explicitly says "the argv array shall be modifiable." Any implicit
contradictory conclusion is clearly wrong.

No it does not, and the evidence is right there -- the first quoted text
is from the standard.

<snip>

Nick · Sep 2, 2010

Seebs said:
Ahh, it's not quite *utterly* pointless.

Wait a second. I just realized that I was wrong to begin with.
I was unconsciously assuming a "p++". But of course, this isn't a
p++, it's a ++p, and p was pointing to the last character, so ++p
is right.

If p had been pointing just past the last character, you could make a
case for "*p++" on the grounds that p should always point to the NEXT
character.

But now that I'm less sleepy, I actually think the ++ is almost certainly
correct and mandatory.

I think you're right, and I was too in awe of your legendary bug
spotting skills to stick up for myself enough (!).

Seebs · Sep 2, 2010

I think you're right, and I was too in awe of your legendary bug
spotting skills to stick up for myself enough (!).

The reason I'm an awesome bug-spotter is that I spot a whole lot of things
which look suspicious, and some of them are bugs.

This comes down to a strategy imposed on me to some extent by the natural
traits of my brain. My brain is fundamentally a bit unreliable around the
edges, but extremely fast. I flunked third-grade math, because they had a
"worksheet" you had to do to complete the class, which consisted of 100
single-digit addition problems, and you had to get all of them right. I
couldn't. Obviously, I knew all the answers, but I never got under about
5 wrong. Why? Because my brain can't focus on something boring long enough
to complete such a task.

On the other hand, I'm fast. So I've gradually gotten trained to leap to
conclusions quickly, but place almost no weight on them -- I don't
particularly expect myself to be right very often, I don't commit emotionally
to my first guess. I just come up with theories and test them. A lot.

It's an interesting tradeoff. On the whole, it works reasonably well as
long as my code is reviewed by people who are better at methodical approaches.
As a code reviewer, I tend to find a half-dozen to a dozen things to ask about
in any given hunk of code, and while usually there are good explanations for
all of them, that means I tend to catch bugs that look plausible but it's hard
to be sure. And I also tend to catch a lot of things which aren't bugs...
Overall, it's a useful strategy as long as not everyone does it.

-s

Nick · Sep 3, 2010

io_x said:
So is not possible s->length<n?

Only if there are more consecutive spaces in s->value than the length
given in s->length. s->value is a C null-terminated string (which can
be passed to any standard C function), and so s->length should always
equal strlen(s->value). There's a bug elsewhere in the library, or
people have been poking around inside the structures (which they
shouldn't) if that is not true. One could, I suppose, ASSERT(s->length
== strlen(s)) but that's really something you want to turn off as soon
as possible as it would be horribly expensive (the whole purpose of
using s->length is to avoid having to walk the string to find the end).

trim whitespace, bullet proof version	63	Aug 21, 2010
trim whitespace	194	Aug 19, 2010
trim	6	Sep 9, 2009
Trim string	42	Aug 28, 2009
Request for source code review of simple Ising model	88	Apr 10, 2014
Strange bug	65	Nov 19, 2010
malloc and maximum size	56	Oct 14, 2011
Dead Code?	4	Oct 10, 2007

trim whitespace v3

Seebs

Tim Rentsch

Ben Bacarisse

BartC

BartC

Tim Rentsch

John Kelly

Ben Bacarisse

Nick

Seebs

Nick

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads