detabbing again

F

Frank

Lovecreatesbeauty posted a soln to Malcolm's detabbing challenge which
I'd like to address again. It seems to behave:


F:\gfortran\dan>k t50 ot3.txt
main: strtol error

F:\gfortran\dan>k 555555555555555555555 ot3.txt
main: Result too large

F:\gfortran\dan>type k2.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <limits.h>

int exptab(int n, const char *path)
{
FILE *fin, *fout;
char *path_exp, *exp_suffix = ".exp";
int c;
const int cn = n;

path_exp = malloc(strlen(path) + strlen(exp_suffix) + 1);
if (!path_exp){
fprintf(stderr, "%s: %s\n", __func__, "malloc error");
return -1;
}
strcat(strcpy(path_exp, path), exp_suffix);
fout = fopen(path_exp, "w");
free(path_exp);
if (!fout){
fprintf(stderr, "%s(%d): %s\n", __FILE__, __LINE__, "fopen");
return -1;
}
fin = fopen(path, "r");
if (!fin){
fprintf(stderr, "%s(%d): %s\n", __FILE__, __LINE__, "fopen");
return -1;
}
while ((c=fgetc(fin)) != EOF){
n = cn;
if (c == '\t') while (n--) fputc(' ', fout);
else fputc(c, fout);
}
fclose(fin);
fclose(fout);
return 0;

}

/*test*/
#include <errno.h>
int main(int argc, char *argv[])
{
int n;
char *endp;

if (argc != 3){
fprintf(stderr, "%s: <num> <filename>\n", argv[0]);
return -1;
}
errno = 0;
n = strtol(argv[1], &endp, 10);
if (n == LONG_MIN || n == LONG_MAX){
perror(__func__);
return -1;
}
if (*endp != '\0'){
fprintf(stderr, "%s: %s\n", __func__, "strtol error");
return -1;
}
exptab(atoi(argv[1]), argv[2]);
return 0;
}

// gcc k2.c -Wall -o k.exe

F:\gfortran\dan>

Wouldn't it be better if n were declared as a long or to use strtod
instead?

I can't quite figure out what's going on with endp using K&R as a
reference. Under what conditions is it null?

Thanks for your comment,
 
B

Ben Bacarisse

Frank said:
Lovecreatesbeauty posted a soln to Malcolm's detabbing challenge which
I'd like to address again.
int n;
char *endp;

if (argc != 3){
fprintf(stderr, "%s: <num> <filename>\n", argv[0]);
return -1;
}
errno = 0;
n = strtol(argv[1], &endp, 10);
if (n == LONG_MIN || n == LONG_MAX){
perror(__func__);
return -1;
}
if (*endp != '\0'){
fprintf(stderr, "%s: %s\n", __func__, "strtol error");
return -1;
}
Wouldn't it be better if n were declared as a long or to use strtod
instead?

I wouldn't bother with strtod but I agree that n is better declared
long. The parameter passed to the detab function can be int, of
course, but capturing the result of the conversion allows for more
range checking. In C99 I'd consider using strtoimax.
I can't quite figure out what's going on with endp using K&R as a
reference. Under what conditions is it null?

It is NULL at your bidding -- meaning you don't want strtol to tell
you where it stops -- strtol can't set it. I am just trying to be
clear. I think you mean "when is *endp == 0"?

It is == 0 when the scanning stops at the end of the string. This can
happen when there were no characters ("" is passed, for example) but
if there is no conversion error and *endp == 0 then the whole string
was valid and correctly converted.
 
F

Frank

I wouldn't bother with strtod but I agree that n is better declared
long.  The parameter passed to the detab function can be int, of
course, but capturing the result of the conversion allows for more
range checking.  In C99 I'd consider using strtoimax.

#include <stdlib.h>
long strtol( const char *start, char **end, int base );

ok so a pointer, a double pointer and an integer.
It is NULL at your bidding -- meaning you don't want strtol to tell
you where it stops -- strtol can't set it.  I am just trying to be
clear.  I think you mean "when is *endp == 0"?

It is == 0 when the scanning stops at the end of the string.  This can
happen when there were no characters ("" is passed, for example) but
if there is no conversion error and *endp == 0 then the whole string
was valid and correctly converted.

#include <stdlib.h>
long strtol( const char *start, char **end, int base );

Description:
The strtol() function returns whatever it encounters first in start as
a long, doing the conversion to base if necessary. end is set to point
to whatever is left in start after the long. If the result can not be
represented by a long, then strtol() returns either LONG_MAX or
LONG_MIN. Zero is returned upon error.

q1) I would guess that there were min/max constraints on LONG_MAX and
LONG_MIN. What are they in C99?

q2) What is likely to happen to LTB's strtol() call when a person
types
<executable> 42 ot3.txt
?

18 holes of golf in on this perfect day in the desert southwest. Pico
de Gallo, bisket, guacomole sauce, all on a morningly bowl of raisin
bran. Cheers,
 
F

Frank

I wouldn't bother with strtod but I agree that n is better declared
long.  The parameter passed to the detab function can be int, of
course, but capturing the result of the conversion allows for more
range checking.  In C99 I'd consider using strtoimax.

#include <stdlib.h>
long strtol( const char *start, char **end, int base );

ok so a pointer, a double pointer and an integer.
It is NULL at your bidding -- meaning you don't want strtol to tell
you where it stops -- strtol can't set it.  I am just trying to be
clear.  I think you mean "when is *endp == 0"?

It is == 0 when the scanning stops at the end of the string.  This can
happen when there were no characters ("" is passed, for example) but
if there is no conversion error and *endp == 0 then the whole string
was valid and correctly converted.

#include <stdlib.h>
long strtol( const char *start, char **end, int base );

Description:
The strtol() function returns whatever it encounters first in start as
a long, doing the conversion to base if necessary. end is set to point
to whatever is left in start after the long. If the result can not be
represented by a long, then strtol() returns either LONG_MAX or
LONG_MIN. Zero is returned upon error.

q1) I would guess that there were min/max constraints on LONG_MAX and
LONG_MIN. What are they in C99?

q2) What is likely to happen to LTB's strtol() call when a person
types
<executable> 42 ot3.txt
?

18 holes of golf in on this perfect day in the desert southwest. Pico
de Gallo, bisket, guacomole sauce, all on a morningly bowl of raisin
bran. Cheers,
 
F

Frank

path_exp = malloc(strlen(path) + strlen(exp_suffix) + 1);

An attacker could pass you a path just lower that SIZE_MAX. This
calcultation then produces an allocated buffer of one of two bytes, and
opens the way for buffer overrrun exploits.

My mileage varies. An attacker has to get through me. I can't fit
through my doctor's front door without bending my head because a lot
of munchkins thought that I was outside of specs. Tja.

2 Each instance of these macros shall be replaced by a constant
expression suitable for use
in #if preprocessing directives, and this expression shall have the
same type as would an
expression that is an object of the corresponding type converted
according to the integer
promotions. Its implementation-defined value shall be equal to or
greater in magnitude
(absolute value) than the corresponding value given below, with the
same sign. An
implementation shall define only the macros corresponding to those
typedef names it
actually provides.231)
-- limits of ptrdiff_t
PTRDIFF_MIN -65535
PTRDIFF_MAX +65535
-- limits of sig_atomic_t
SIG_ATOMIC_MIN see below
SIG_ATOMIC_MAX see below
-- limit of size_t
SIZE_MAX 65535

q3) This looks oddly concrete. Does that strike anyone else as weird?
 
B

Barry Schwarz

On Sun, 16 Aug 2009 16:45:07 -0700 (PDT), Frank

snip
#include <stdlib.h>
long strtol( const char *start, char **end, int base );

Description:
The strtol() function returns whatever it encounters first in start as
a long, doing the conversion to base if necessary. end is set to point
to whatever is left in start after the long. If the result can not be
represented by a long, then strtol() returns either LONG_MAX or
LONG_MIN. Zero is returned upon error.

You might want to use the description in n1256 which provides
considerably more detail.
q1) I would guess that there were min/max constraints on LONG_MAX and
LONG_MIN. What are they in C99?

You can find this in n1256 also.
q2) What is likely to happen to LTB's strtol() call when a person
types
<executable> 42 ot3.txt

Did you mean LCB? The return value will be 42L and endp will be set
to point to the '\0' that terminates the string pointed to by argv[1].
What did you expect to happen?
 
B

Ben Bacarisse

Frank said:
q1) I would guess that there were min/max constraints on LONG_MAX and
LONG_MIN. What are they in C99?

A link to a PDF of C99 (plus Technical Corrigenda) was recently
posted. It would be worth having one to hand if this sort of question
pops up a lot.

LONG_MAX can be no less than 2147483647 and LONG_MIN must be no bigger
than -2147483647 (i.e. long must have at least 31 value but the range
can be symmetric).
q2) What is likely to happen to LTB's strtol() call when a person
types
<executable> 42 ot3.txt
?

n = strtol(argv[1], &endp, 10);

argv[1] points to the '4' in "42". After the call, n will be 42
(having been converted from long to int) and endp will point to the
zero character just after the 2 (i.e. endp == &argv[1][2]) so *endp is
zero.

I may have misled you in an earlier reply. Whether the second
parameter to strtol is NULL is up to you (if you pass NULL strtol will
not be able to tell you where the conversion finished) but endp can
and does get set by strtol because a pointer to it is passed.
 
F

Frank

On Sun, 16 Aug 2009 16:45:07 -0700 (PDT), Frank


snip



You might want to use the description in n1256 which provides
considerably more detail.




You can find this in n1256 also.

ok. I made the download *again* but need some time, and fresh eyes,
to search through it.
q2) What is likely to happen to LTB's strtol() call when a person
types
<executable> 42 ot3.txt

Did you mean LCB?  The return value will be 42L and endp will be set
to point to the '\0' that terminates the string pointed to by argv[1].
What did you expect to happen?

LCB, yes. Does endp point to a pointer that points to zero?
 
F

Frank

A link to a PDF of C99 (plus Technical Corrigenda) was recently
posted.  It would be worth having one to hand if this sort of question
pops up a lot.

LONG_MAX can be no less than 2147483647 and LONG_MIN must be no bigger
than -2147483647 (i.e. long must have at least 31 value but the range
can be symmetric).

ok so without doing too damage I could assume that they are
symmetrical and large.
q2) What is likely to happen to LTB's strtol() call when a person
types
<executable> 42 ot3.txt
?

  n = strtol(argv[1], &endp, 10);

argv[1] points to the '4' in "42".  After the call, n will be 42
(having been converted from long to int) and endp will point to the
zero character just after the 2 (i.e. endp == &argv[1][2]) so *endp is
zero.

I don't understand this syntax:
endp == &argv[1][2]
I may have misled you in an earlier reply.  Whether the second
parameter to strtol is NULL is up to you (if you pass NULL strtol will
not be able to tell you where the conversion finished) but endp can
and does get set by strtol because a pointer to it is passed.

I'm still looking for the pinata on this one. No sweat. Cheers,
 
F

Frank

A link to a PDF of C99 (plus Technical Corrigenda) was recently
posted.  It would be worth having one to hand if this sort of question
pops up a lot.

LONG_MAX can be no less than 2147483647 and LONG_MIN must be no bigger
than -2147483647 (i.e. long must have at least 31 value but the range
can be symmetric).

ok so without doing too damage I could assume that they are
symmetrical and large.
q2) What is likely to happen to LTB's strtol() call when a person
types
<executable> 42 ot3.txt
?

  n = strtol(argv[1], &endp, 10);

argv[1] points to the '4' in "42".  After the call, n will be 42
(having been converted from long to int) and endp will point to the
zero character just after the 2 (i.e. endp == &argv[1][2]) so *endp is
zero.

I don't understand this syntax:
endp == &argv[1][2]
I may have misled you in an earlier reply.  Whether the second
parameter to strtol is NULL is up to you (if you pass NULL strtol will
not be able to tell you where the conversion finished) but endp can
and does get set by strtol because a pointer to it is passed.

I'm still looking for the pinata on this one. No sweat. Cheers,
 
F

Frank

A link to a PDF of C99 (plus Technical Corrigenda) was recently
posted.  It would be worth having one to hand if this sort of question
pops up a lot.

LONG_MAX can be no less than 2147483647 and LONG_MIN must be no bigger
than -2147483647 (i.e. long must have at least 31 value but the range
can be symmetric).

ok so without doing too damage I could assume that they are
symmetrical and large.
q2) What is likely to happen to LTB's strtol() call when a person
types
<executable> 42 ot3.txt
?

  n = strtol(argv[1], &endp, 10);

argv[1] points to the '4' in "42".  After the call, n will be 42
(having been converted from long to int) and endp will point to the
zero character just after the 2 (i.e. endp == &argv[1][2]) so *endp is
zero.

I don't understand this syntax:
endp == &argv[1][2]
I may have misled you in an earlier reply.  Whether the second
parameter to strtol is NULL is up to you (if you pass NULL strtol will
not be able to tell you where the conversion finished) but endp can
and does get set by strtol because a pointer to it is passed.

I'm still looking for the pinata on this one. No sweat. Cheers,
 
B

Ben Bacarisse

Frank said:
q2) What is likely to happen to LTB's strtol() call when a person
types
<executable> 42 ot3.txt

Did you mean LCB?  The return value will be 42L and endp will be set
to point to the '\0' that terminates the string pointed to by argv[1].
What did you expect to happen?

LCB, yes. Does endp point to a pointer that points to zero?

The code is, in effect:

char *endp;
int n = strtol("42", &endp, 10);

Now to your question: "Does endp point to a pointer that points to
zero?". The answer is no, largely because the question is confused.

endp is not a pointer to a pointer so from that part on the question
is meaningless. The second parameter of strtol is a pointer to a
pointer and the argument we pass is &endp. Whenever the address of a
variable is passed (& is the key here) we know that the function can
(and probably will) change this variable -- there is little point
otherwise.
 
B

Ben Bacarisse

Frank said:
ok so without doing too damage I could assume that they are
symmetrical and large.

Yes, if 2**31 is large to you (for some people it's quite small).
q2) What is likely to happen to LTB's strtol() call when a person
types
<executable> 42 ot3.txt
?

  n = strtol(argv[1], &endp, 10);

argv[1] points to the '4' in "42".  After the call, n will be 42
(having been converted from long to int) and endp will point to the
zero character just after the 2 (i.e. endp == &argv[1][2]) so *endp is
zero.

I don't understand this syntax:
endp == &argv[1][2]

It is a comparison, yes? It tells us a truth about endp; specifically
that endp is, after the call, the same in value as &argv[1][2].

&argv[1][2] is the address of the third character of the second
command-line argument. It is parsed: &((argv[1])[2]). argv[1] is
string "42". argv[1][2] is the zero byte that ends this string. & of
the whole lot gives us the address of (i.e. a pointer to) this zero
byte.

I just wanted another way to tell you what has happened to endp.
Seeing things expressed in lots of different ways sometimes helps.

<snip>
 
K

Keith Thompson

Frank said:
ok so without doing too damage I could assume that they are
symmetrical and large.

Not quite. Typical implementations have slightly non-symmetric
ranges for signed integer types, for example LONG_MIN==-2147483648,
LONG_MAX==2147483647.

But it's safe to assume that LONG_MIN is either -LONG_MAX or
-LONG_MAX-1, i.e., the assymetry is at most one extra negative value.

Practically all systems these days use 2's-complement representation
for signed integers and therefore have these non-symmetric ranges.

[...]
 
K

Keith Thompson

Frank said:
#include <stdlib.h>
long strtol( const char *start, char **end, int base );

ok so a pointer, a double pointer and an integer.
[...]

Let me encourage you to avoid the phrase "double pointer", unless
you're referring to type "double*. "end" above can be referred to as
a pointer to pointer.
 
B

Barry Schwarz

ok. I made the download *again* but need some time, and fresh eyes,
to search through it.

Most pdf, html, and text readers have a find function you can use.
LONG_MAX appears in only six places plus the index which might even be
easier to use.
 
F

Frank

Let me encourage you to avoid the phrase "double pointer", unless
you're referring to type "double*.  "end" above can be referred to as
a pointer to pointer.

Yeah, I didn't want to say double pointer but didn't have any other
words at hand. What would you call a pointer with 2 degrees of
indirection?
 
F

Frank

Most pdf, html, and text readers have a find function you can use.
LONG_MAX appears in only six places plus the index which might even be
easier to use.

Yeah, Barry, these pdf's make for a wonderfully-useful tool. LONG_MIN
turned up 13 matches. I thought these were salient:

-- minimum value for an object of type long int
LONG_MIN -2147483647 // -(231 - 1)
-- maximum value for an object of type long int
LONG_MAX +2147483647 // 231 - 1

more or less, symmetrical and large

I think this is relevant to LCB's code:

n = strtol(argv[1], &endp, 10);
if (n == LONG_MIN || n == LONG_MAX){
perror(__func__);
return -1;
}

The strtol, strtoll, strtoul, and strtoull functions return the
converted
value, if any. If no conversion could be performed, zero is returned.
If the correct value
is outside the range of representable values, LONG_MIN, LONG_MAX,
LLONG_MIN,
LLONG_MAX, ULONG_MAX, or ULLONG_MAX is returned (according to the
return type
and sign of the value, if any), and the value of the macro ERANGE is
stored in errno.

It would seem that strtol is within its right to return LLONG_MIN. I
think this code needs > and < where it now has equals.

H.2.2 Integer types
1 The signed C integer types int, long int, long long int, and the
corresponding
unsigned types are compatible with LIA-1. If an implementation adds
support for the
LIA-1 exceptional values ''integer_overflow'' and ''undefined'', then
those types are
LIA-1 conformant types. C's unsigned integer types are ''modulo'' in
the LIA-1 sense
in that overflows or out-of-bounds results silently wrap. An
implementation that defines
signed integer types as also being modulo need not detect integer
overflow, in which case,
only integer divide-by-zero need be detected.
2 The parameters for the integer data types can be accessed by the
following:
maxint INT_MAX, LONG_MAX, LLONG_MAX, UINT_MAX, ULONG_MAX,
ULLONG_MAX
minint INT_MIN, LONG_MIN, LLONG_MIN
3 The parameter ''bounded'' is always true, and is not provided. The
parameter ''minint''
is always 0 for the unsigned types, and is not provided for those
types.

No idea what they're talking about here. Cheers,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,049
Latest member
Allen00Reed

Latest Threads

Top