Cannot compile with _FILE_OFFSET_BITS = 64

S

Scott.zhou

10 #define _FILE_OFFSET_BITS 64
11
12 int main(void) {
13 int fd;
14
15 if ((fd = open("file.hole", O_WRONLY | O_CREAT, 0664)) < 0) {
16 perror("open:");
17 exit(-1);
18 }
19
20 if (write(fd, "a", 1) != 1) {
21 perror("write:");
22 exit(-1);
23 }
24
25 if (lseek(fd, 4*1024*1024*1024, SEEK_SET) == -1) { // integer
overflow
26 perror("lseek:");
27 exit(-1);
28 }
29
30 if (write(fd, "A", 1) != 1) {
31 perror("write:");
32 exit(-1);
33 }
34
35 close(fd);
36
37 exit(0);
38 }
In this example, although I explicitly define _FILE_OFFSET_BITS 64, i
get the warning"integer overflow"
in the manual of lseek64, I saw this:
-------------------------------------------------------------------------------------------------
lseek
Prototype:

off_t lseek(int fd, off_t offset, int whence);

The library routine lseek() uses the type off_t. This is a
32-bit signed type on 32-bit architectures, unless one com-
piles with

#define _FILE_OFFSET_BITS 64

in which case it is a 64-bit signed type.
 
H

Harald van Dijk

25 if (lseek(fd, 4*1024*1024*1024, SEEK_SET) == -1) { // integer
overflow

In this example, although I explicitly define _FILE_OFFSET_BITS 64, i
get the warning"integer overflow"

4 is an int. 1024 is an int. 1024 is an int. 1024 is an int. When you
multiply ints, you get an int, and the result must be within the range of
an int. Even if that int would later be converted to another type. You
first need to convert the 4 to a wider type, and then start multiplying.
 
R

Ravishankar S

Scott.zhou said:
10 #define _FILE_OFFSET_BITS 64
11
12 int main(void) {
13 int fd;
14
15 if ((fd = open("file.hole", O_WRONLY | O_CREAT, 0664)) < 0) {
16 perror("open:");
17 exit(-1);
18 }
19
20 if (write(fd, "a", 1) != 1) {
21 perror("write:");
22 exit(-1);
23 }
24
25 if (lseek(fd, 4*1024*1024*1024, SEEK_SET) == -1) { // integer
overflow
No automatic conversion to 64 bit type is done by compiler. So make it
explicit by making one of the constants as 64 bit type. This makes entire
expression 64 bit.

=> 4*1024*1024*1024LL // 64 Bit constant made by LL suffix.
 
W

Walter Roberson

25 if (lseek(fd, 4*1024*1024*1024, SEEK_SET) == -1) { // integer
overflow

The maximum int required to be supported in C is 32767. For any
constant integer expression that exceeds that, you need to use L or LL
or UL or ULL on the constants, such as 4L*1024L*1024L .
However, the maximum long required to be supported in C is 2**31-1
and 4*1024*1024*1024 exceeds that. You could naively true switching
to unsigned long, 4UL*1024UL*1024UL*1024UL but that would be 2**32
and the maximum unsigned long required to be supported in C is 2**32-1.

So, you will have to use a C99 compiler if you want to be sure of
being able to use a value of 4*1024*1024*1024 -- some C89 implementations
do provide large enough long or unsigned long, but relying on
that would not be portable. For portability you will need C99 and
to use long long or unsigned long long, 4LL*1024LL*1024LL*1024LL .

Next, you will need to somehow convert that value
4LL*1024LL*1024LL*1024LL into the type expected for that parameter
by lseek. lseek is not part of the C standard, so the C standard
does not say what that parameter should be. The particular machine
I looked at a moment ago documents its particular lseek as expecting
off_t . I do not happen to recall at the moment whether off_t
is defined as being an arithemetic type or whether it might be
allowed to be a non-arithmetic type such as a structure. You will
need to investigate that. The manual page you showed an excerpt of
defined off_t as an arithmetic type, but that might only happen
to be the case on that particular system; if you are following some
standard or other beyond C (e.g., POSIX) then if you want portability
to other systems that support that particular standard, you will
need to follow what that standard says, not what your local man page
says.


_FILE_OFFSET_BITS and most of the routines you call upon in
your program are not part of the C standard. For the proper use
of those functions, you will need to consult a newsgroup specific
to your system. The only reason that I answered your question
here is that it happens that 4*1024*1024*1024 is a construct
analyzable with respect to what -is- specified for the C language.
 
U

Ulrich Eckhardt

Scott.zhou wrote roughly:
lseek64(fd, 4*1024*1024*1024, SEEK_SET) // integer overflow

Others already explained what is happening, but I'd suggest another approach
that works without C99's long long type simply cast the expression to
off_t, which will automatically be the correct 64 bit type, regardless of
whether you use C89 or C99.
In this example, although I explicitly define _FILE_OFFSET_BITS 64, i
get the warning"integer overflow" [...]

I hope you also understand where the problem comes from! The point has
nothing to do with lseek() but rather with how C handles arithmetic, in
particular that it doesn't suddenly switch to a bigger integer type.

Uli
 
R

Richard Tobin

lseek64(fd, 4*1024*1024*1024, SEEK_SET) // integer overflow
[/QUOTE]
Others already explained what is happening, but I'd suggest another approach
that works without C99's long long type simply cast the expression to
off_t, which will automatically be the correct 64 bit type, regardless of
whether you use C89 or C99.

Casting the expression won't help, if it's already overflowed as an
int.

-- Richard
 
U

Ulrich Eckhardt

Others already explained what is happening, but I'd suggest another
approach that works without C99's long long type simply cast the
expression to off_t, which will automatically be the correct 64 bit type,
regardless of whether you use C89 or C99.

Casting the expression won't help, if it's already overflowed as an
int.[/QUOTE]

Argh, well caught. Casting the first constant in above expression does the
job though.

Uli
 
J

James Kuyper

Ravishankar said:
No automatic conversion to 64 bit type is done by compiler. So make it
explicit by making one of the constants as 64 bit type. This makes entire
expression 64 bit.

=> 4*1024*1024*1024LL // 64 Bit constant made by LL suffix.

That's not sufficient. The above is equivalent to
((((4*1024)*1024)*1024LL). The very first multiplication is safe; but
the second one might overflow, depending upon the value of INT_MAX. You
could solve this by moving the LL to the second 1024, but I think that's
an overly subtle solution; a maintenance programmer might not realize
why the exact position of the 'LL' is critical. I'd apply the LL suffix
to all 4.
 
J

Joe Wright

Ulrich said:
Argh, well caught. Casting the first constant in above expression does the
job though.

Uli
You presume the first expression (4) will be evaluated before the last
(1024). Why?
 
H

Harald van Dijk

You presume the first expression (4) will be evaluated before the last
(1024). Why?

(off_t)4*1024*1024*1024
is equivalent to
(((((off_t)4)*1024)*1024)*1024)
regardless of which constant is evaluated first. The compiler is
certainly allowed to evaluate the expression right-to-left (or any other
order), but the last 1024 is multiplied by an expression of type off_t,
so the multiplication still happens in off_t arithmetic.
 
J

jameskuyper

Joe said:
You presume the first expression (4) will be evaluated before the last
(1024). Why?

He is making no assumptions about the order of evaluation of the
numeric literals. His statement is based upon the assumption, mandated
by the C grammar, that the multiplication expressions be evaluated in
order from left to right. It is those multiplications that require
type conversions, where needed to bring both sides to a common type,
and making sure that the first multiplication involves an operand that
is large enough to store the final result is sufficient to ensure that
all of the calculations are performed without overflow.
 
K

Keith Thompson

James Kuyper said:
That's not sufficient. The above is equivalent to
((((4*1024)*1024)*1024LL). The very first multiplication is safe; but
the second one might overflow, depending upon the value of
INT_MAX. You could solve this by moving the LL to the second 1024, but
I think that's an overly subtle solution; a maintenance programmer
might not realize why the exact position of the 'LL' is critical. I'd
apply the LL suffix to all 4.

Although lseek() is non-standard (it's POSIX, not standard C), I'll
mention that its second argument is of type off_t, which is not
necessarily the same as long long.

In the interest of writing what you actually mean, it might be better
to use a cast to off_t rather than the "LL" suffix. I'd also make the
value a named constant rather than a magic number:

#define THE_OFFSET ((off_t)4*1024*1024*1024)
...
if (lseek(fd, THE_OFFSET, SEEK_SET) == -1) {
...

Or you could apply the cast to all four constants, but if it's
isolated in a #define that's probably not as important.

The need for the conversion could be avoided by using a single literal
rather than a multiplication, but 4294967296 is a bit obscure. But
you *might* consider using 0x100000000. (Personally, I prefer the
cast.)
 
C

CBFalconer

Scott.zhou said:
10 #define _FILE_OFFSET_BITS 64
11
12 int main(void) {
13 int fd;
14
15 if ((fd = open("file.hole", O_WRONLY | O_CREAT, 0664)) < 0) {
16 perror("open:");
17 exit(-1);
18 }

There is no such routine in std C as "open". There are no such
macros as O_WRONLY or O_CREAT. C programs that do i/o normally
require #include <stdio.h>, maybe more. -1 is not a valid argument
for exit (you can use 0 always, or EXIT_FAILURE or EXIT_SUCCESS
when you have #include <stdlib.h>.

Since your program is not standard C, it is off-topic here.
Consider using standard C routines, such as fopen.
 
R

Richard Tobin

James Kuyper said:
That's not sufficient. The above is equivalent to
((((4*1024)*1024)*1024LL). The very first multiplication is safe; but
the second one might overflow, depending upon the value of INT_MAX.

Theoretically, but I doubt there's any implementation with 64-bit
lseek() and ints that can't hold 2^22.
You
could solve this by moving the LL to the second 1024, but I think that's
an overly subtle solution; a maintenance programmer might not realize
why the exact position of the 'LL' is critical. I'd apply the LL suffix
to all 4.

I think using it on the first operand should be a recognisable idiom
to a competent C programmer.

-- Richard
 
K

Keith Thompson

CBFalconer said:
There is no such routine in std C as "open". There are no such
macros as O_WRONLY or O_CREAT. C programs that do i/o normally
require #include <stdio.h>, maybe more. -1 is not a valid argument
for exit (you can use 0 always, or EXIT_FAILURE or EXIT_SUCCESS
when you have #include <stdlib.h>.

Since your program is not standard C, it is off-topic here.
Consider using standard C routines, such as fopen.

The problem he was having wasn't specific to the non-standard
functions he was using. He provided the prototype for the function on
which the error occurred. His actual problem had to do with an error
in an expression that used only standard constructs (and that problem
was solved some time ago).
 
J

James Kuyper

Richard said:
Theoretically, but I doubt there's any implementation with 64-bit
lseek() and ints that can't hold 2^22.

I am by nature a theoretician, and I don't know any reason why such an
implementation couldn't exist. Whether it does exist doesn't matter to
me; as long as it's possible for such an implementation to conform to
the relevant standards, one might eventually be created, even if there
are none in existence at this time. It's trivial to write code in such a
fashion that it would work on such an implementation, and I think the
code should therefore be so written.

I think using it on the first operand should be a recognisable idiom
to a competent C programmer.

Yes, but incompetent C programmers are commonplace, and from my past
experience I suspect that there is a good chance that sooner or later
one of them will be assigned to do maintenance work on some of my code.
It's not possible to make fool-proof code, fools are too ingenious.
However, it is possible to make code fool-resistant, and I do so
whenever the costs are not excessive.
 
R

Richard Tobin

Theoretically, but I doubt there's any implementation with 64-bit
lseek() and ints that can't hold 2^22.
[/QUOTE]
I am by nature a theoretician, and I don't know any reason why such an
implementation couldn't exist. Whether it does exist doesn't matter to
me; as long as it's possible for such an implementation to conform to
the relevant standards, one might eventually be created, even if there
are none in existence at this time. It's trivial to write code in such a
fashion that it would work on such an implementation, and I think the
code should therefore be so written.

I suppose that depends on your aims. I've had to deal with computer
systems that were stupid in various ways. People had gone through
amazing contortions to make software run on them, and the unfortunate
result was that others ended up having to use them instead of better
machines. A system designed now with ints less than 32 bits would be
such a system. I'd rather such systems died for lack of use, so if I
have useful code I'd prefer that it didn't work on them.

-- Richard
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,280
Latest member
BGBBrock56

Latest Threads

Top