fseek on a file opened with _popen

T

thomas.mertes

Hello

Recently I discovered some problem. I have some C code
which determines how many bytes are available in a
file. Later I use this information to malloc a buffer of
the correct size before I read the bytes.
Determining the number of bytes available in a
file is done in 5 steps:

1. Use tell(aFile) to get the current position.
2. Use fseek(aFile, 0, SEEK_END) to move to the end.
3. Get the current position with tell(aFile) (this is the
size of the file in bytes).
4. I move to the position which I got in step 1 with fseek().
5. Subtract the current position from the file size to
get the number of bytes available.

This code is certainly not the most elegant solution but
it is portable. The code works for normal files under
windows and linux. The portability is also the reason
why I use tell() and fseek() instead of windows specific
code.

When I open a file with _popen I get a different result:
- Under linux the tell() of step 1 returns -1 which means
the file is not seekable. I can recognice this situation
and react accordingly (I cannot malloc the buffer beforehand.
Instead I malloc a smaller buffer which is realloced until
all bytes are read).
- Under windows the tell() of step 1 returns 0 which
means the file is seekable and is currently at position 0.
The other calls of fseek() and ftell() succeed also and
indicate that the number of available bytes is 0.
Therefore my program thinks that there are no bytes
available in the file opened with _popen.

The information that it is a file opened with _popen is
not available at that place in my program.

Now my question:
Is it possible to find out that a file (available in a
variable of type FILE * ) was opened with _popen?

Something like: Turn the FILE * into a handle and ask a
function about the file type. It is no problem for me to
insert windows specific code under an #ifdef

Thanks in advance Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 
M

Mark Bluemel

Hello

Recently I discovered some problem. I have some C code
which determines how many bytes are available in a
file.
[snip]

The code works for normal files under
windows and linux.
[snip]

When I open a file with _popen I get a different result:
- Under linux the tell() of step 1 returns -1 which means
the file is not seekable. I can recognice this situation
and react accordingly (I cannot malloc the buffer beforehand.
Instead I malloc a smaller buffer which is realloced until
all bytes are read).

I hope you don't grow it a byte at a time :)
- Under windows the tell() of step 1 returns 0 which
means the file is seekable and is currently at position 0.
The other calls of fseek() and ftell() succeed also and
indicate that the number of available bytes is 0.
Therefore my program thinks that there are no bytes
available in the file opened with _popen.
The information that it is a file opened with _popen is
not available at that place in my program.

Could you consider making it available, by providing a wrapper
mechanism? (That would probably be my favoured approach, rather
than looking for platform specifics...)
Now my question:
Is it possible to find out that a file (available in a
variable of type FILE * ) was opened with _popen?

As _popen is not part of the C standard, it's not really
something we would consider here. You'd probably do better
asking in a Windows newsgroup.
 
J

Joachim Schmitz

Hello

Recently I discovered some problem. I have some C code
which determines how many bytes are available in a
file. Later I use this information to malloc a buffer of
the correct size before I read the bytes.
Determining the number of bytes available in a
file is done in 5 steps:

1. Use tell(aFile) to get the current position.
Don't you man ftell() rather than tell()?
If not you're most üprobably lost here as that won't be a standard function.
2. Use fseek(aFile, 0, SEEK_END) to move to the end.
3. Get the current position with tell(aFile) (this is the
size of the file in bytes).
4. I move to the position which I got in step 1 with fseek().
5. Subtract the current position from the file size to
get the number of bytes available.

This code is certainly not the most elegant solution but
it is portable. The code works for normal files under
windows and linux. The portability is also the reason
why I use tell() and fseek() instead of windows specific
code.

When I open a file with _popen I get a different result:
no function _popen() in standard C (I think). In POSIX there's popen() (i.e.
without the leading underscore)
- Under linux the tell() of step 1 returns -1 which means
the file is not seekable. I can recognice this situation
and react accordingly (I cannot malloc the buffer beforehand.
Instead I malloc a smaller buffer which is realloced until
all bytes are read).
- Under windows the tell() of step 1 returns 0 which
means the file is seekable and is currently at position 0.
The other calls of fseek() and ftell() succeed also and
indicate that the number of available bytes is 0.
Therefore my program thinks that there are no bytes
available in the file opened with _popen.

The information that it is a file opened with _popen is
not available at that place in my program.

Now my question:
Is it possible to find out that a file (available in a
variable of type FILE * ) was opened with _popen?

Something like: Turn the FILE * into a handle and ask a
function about the file type. It is no problem for me to
insert windows specific code under an #ifdef
OT here (I think) but "int filno(FILE *stream);" might be what you're
looking for

Bye, Jojo
 
R

Richard Tobin

- Under linux the tell() of step 1 returns -1 which means
the file is not seekable. I can recognice this situation
and react accordingly (I cannot malloc the buffer beforehand.
Instead I malloc a smaller buffer which is realloced until
all bytes are read).

Why not use this strategy always?

As an optimisation, you could use the ftell() strategy to determine
the initial size to malloc().

-- Richard
 
T

thomas.mertes

Recently I discovered some problem. I have some C code
which determines how many bytes are available in a
file.
[snip]

The code works for normal files under
windows and linux.
[snip]

When I open a file with _popen I get a different result:
- Under linux the tell() of step 1 returns -1 which means
the file is not seekable. I can recognice this situation
and react accordingly (I cannot malloc the buffer beforehand.
Instead I malloc a smaller buffer which is realloced until
all bytes are read).

I hope you don't grow it a byte at a time :)

Actually I grow it in steps of 4096.
Could you consider making it available, by providing a wrapper
mechanism? (That would probably be my favoured approach, rather
than looking for platform specifics...)

A simplified version of the function using this functionality
is (please don't start nitpicking):

-----------------------------------------------
#include "stdlib.h"
#include "stdio.h"

#define READ_BLOCK_SIZE 4096
#define SIZ_STRI(len) ((sizeof(struct stristruct) - \
sizeof(unsigned char)) + (len) * sizeof(unsigned char))

typedef struct stristruct {
unsigned long int size;
unsigned char mem[1];
} *stritype;

stritype filGets (FILE *aFile, long length)

{
long current_file_position;
unsigned long int bytes_requested;
unsigned long int bytes_there;
unsigned long int read_size_requested;
unsigned long int block_size_read;
unsigned long int allocated_size;
unsigned long int result_size;
unsigned char *memory;
stritype resized_result;
stritype result;

/* filGets */
if (length < 0) {
result = NULL;
} else {
bytes_requested = (unsigned long int) length;
allocated_size = bytes_requested;
result = (stritype) malloc(SIZ_STRI(allocated_size));
if (result == NULL) {
/* Determine how many bytes are available in aFile */
if ((current_file_position = ftell(aFile)) != -1) {
fseek(aFile, 0, SEEK_END);
bytes_there = (ftell(aFile) - current_file_position);
fseek(aFile, current_file_position, SEEK_SET);
/* Now we know that bytes_there bytes are available
in aFile */
if (bytes_there < bytes_requested) {
allocated_size = bytes_there;
result = (stritype) malloc(SIZ_STRI(allocated_size));
if (result == NULL) {
return(NULL);
} /* if */
} else {
return(NULL);
} /* if */
} /* if */
} /* if */
if (result != NULL) {
/* We have allocated at least as many bytes as
are available in the file */
result_size = (unsigned long int) fread(result->mem, 1,
(size_t) allocated_size, aFile);
} else {
/* We do not know how many bytes are avaliable therefore we
read blocks of READ_BLOCK_SIZE until we reach EOF */
allocated_size = READ_BLOCK_SIZE;
result = (stritype) malloc(SIZ_STRI(allocated_size));
if (result == NULL) {
return(NULL);
} else {
read_size_requested = READ_BLOCK_SIZE;
if (read_size_requested > bytes_requested) {
read_size_requested = bytes_requested;
} /* if */
block_size_read = fread(result->mem, 1,
read_size_requested, aFile);
result_size = block_size_read;
while (block_size_read == READ_BLOCK_SIZE &&
result_size < bytes_requested) {
allocated_size = result_size + READ_BLOCK_SIZE;
resized_result = (stritype)
realloc(result, SIZ_STRI(allocated_size));
if (resized_result == NULL) {
free(result);
return(NULL);
} else {
result = resized_result;
memory = (unsigned char *) result->mem;
read_size_requested = READ_BLOCK_SIZE;
if (result_size + read_size_requested >
bytes_requested) {
read_size_requested = bytes_requested - result_size;
} /* if */
block_size_read = fread(&memory[result_size], 1,
read_size_requested, aFile);
result_size += block_size_read;
} /* if */
} /* while */
} /* if */
} /* if */
result->size = result_size;
if (result_size < allocated_size) {
resized_result = (stritype)
realloc(result, SIZ_STRI(result_size));
if (resized_result == NULL) {
free(result);
return(NULL);
} else {
result = resized_result;
} /* if */
} /* if */
} /* if */
return(result);
} /* filGets */
-------------------------------------------

The function _popen() is not a standard function, but popen()
is. Btw.: Under windows I use MinGW and there the function
is also popen(). The problem stays open:

if you open a file with popen() (MinGW probably also cygwin)
under windows and you do a ftell() or fseek() you just
succeed as if it is an empty file. If you do the same in
linux the ftell() and fseek() functions return -1 which
indicate that the file is not seekable.

If someone has an idea: Please help.

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 
T

thomas.mertes

Why not use this strategy always?

As an optimisation, you could use the ftell() strategy to determine
the initial size to malloc().

This is just what I want. But for a pipe created with popen this
strategy is not possible: You cannot know how big a pipe can
grow. Therefore ftell() and fseek() return -1 for pipes.
Under windows it does not work for files (pipes) opened with
_popen() since ftell() and fseek() return 0 instead of -1.
Therefore I look for a possibility to recognize this situation.

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 
T

thomas.mertes

Don't you man ftell() rather than tell()?
Yes you are right: I mean ftell()
If not you're most üprobably lost here as that won't be a standard function.




no function _popen() in standard C (I think). In POSIX there's popen() (i.e.
without the leading underscore)

I looked at the popen() more closely and I use popen()
under linux (gcc) and under windows (MinGW). the only place
using _popen() would be under windows (MSVC). But the actual
problem occours under windows(MinGW). So I can claim that
I actually use the POSIX popen().
OT here (I think) but "int filno(FILE *stream);" might be what you're
looking for

Does the fileno() function return a file handle under
windows?

May be I can use fstat and check for S_ISFIFO.
If that works MinGW has a bug.

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 
R

Richard Tobin

Why not use this strategy always?
As an optimisation, you could use the ftell() strategy to determine
the initial size to malloc().
[/QUOTE]
This is just what I want. But for a pipe created with popen this
strategy is not possible: You cannot know how big a pipe can
grow.

You misunderstand. *Don't* try to recognise the situation. *Always*
use the grow-the-buffer-as-you-read approach, so that you don't have
to know the size in advance.

But use the result of the ftell() strategy for the initial size.
It will be wrong if it happens to be a pipe, but it doesn't matter
that it's wrong - you'll just start with a buffer of zero bytes and
grow it to the right size as you read.

-- Richard
 
J

Joachim Schmitz

Does the fileno() function return a file handle under
windows?
No idea, but it does in POSIX
$ man fileno
....
fileno - Maps a stream pointer to a file descriptor
....
The fileno() function returns the file descriptor of a stream
May be I can use fstat and check for S_ISFIFO.
Indeed. But you could also use stat(), which works on a filename rather than
on a file descriptor.
If that works MinGW has a bug.

Bye, Jojo
 
T

thomas.mertes

You misunderstand. *Don't* try to recognise the situation. *Always*
use the grow-the-buffer-as-you-read approach, so that you don't have
to know the size in advance.

You are right: I missunderstand you, sorry.
But use the result of the ftell() strategy for the initial size.
It will be wrong if it happens to be a pipe, but it doesn't matter
that it's wrong - you'll just start with a buffer of zero bytes and
grow it to the right size as you read.

Sounds not bad, I will think over that.
The function does not always read the rest of a file.
It gets a length limit. The prototype of filGets is:

stritype filGets (FILE *aFile, long length)

My general strategy to the function is:

A) Do a malloc() for the requested length
B) Attempt to read the requested amount of bytes
(not all requested bytes may be available).
C) Realloc() the malloced area to the actual size.

So it is quite simple in the normal case.
But this function is also used to read whole files.
This is done by using very high values for 'length'.
Now two things can happen.

- The malloc() succeeds: The general strategy works.
- The malloc() fails: This is the case I was talking
about in this discussion.

If the malloc() fails it still would have higher
performance to just use one malloc() and one fread().
Therefore I started to write code to find out the
available bytes. I belived that the ftell()/fseek()
strategy would work exactly for all files where it
is possible to determine the available bytes. Well,
this was theory and windows under MinGW is something
different.

For me is the 'read from the file in small chunks"
strategy only the last resort. Not because I think
that the reading would be slower, but because it
needs lots of reallocs for a probably very big
buffer. So some bad things can happen:

a) The reallocs cost time.
b) It may fail because the heap was thrashed to
much (a single malloc would have succeeded).

Btw.: In the meantime I tried to use fstat() and
S_ISREG() and use the ftell()/fseek() strategy only
for regular files.

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 
T

thomas.mertes

No idea, but it does in POSIX
$ man fileno
...
fileno - Maps a stream pointer to a file descriptor
...
The fileno() function returns the file descriptor of a stream


Indeed. But you could also use stat(), which works on a filename rather than
on a file descriptor.

If I would know the filename at this place, I would
probably also know the type of the file without
referring to fstat().

Btw.: I tested with fstat() and it works under
linux and windows. Currently I do the ftell()/fstat()
strategy to determine the size of a file only for
regular files.

Since my solution works I would say that
MinGw has a bug when using ftell()/fseek() for pipes:
Instead of -1 the functions return 0 for pipes (at least for
the pipes opened with popen() ).

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 
G

Gordon Burditt

Recently I discovered some problem. I have some C code
which determines how many bytes are available in a
file. Later I use this information to malloc a buffer of
the correct size before I read the bytes.
Determining the number of bytes available in a
file is done in 5 steps:

1. Use tell(aFile) to get the current position.

Do you mean ftell() here?
2. Use fseek(aFile, 0, SEEK_END) to move to the end.

If the file is a binary file, SEEK_END need not be meaningfully supported.
Also, shouldn't that second argument of fseek() be 0L, not 0?
3. Get the current position with tell(aFile) (this is the
size of the file in bytes).

Do you mean ftell() here? ftell() on a text file need not return
a number of anything; it might be a bitfield combination of
track, train, sector, offset within sector, cylinder, etc., and
subtracting two of them may not give any meaningful result.
4. I move to the position which I got in step 1 with fseek().
5. Subtract the current position from the file size to
get the number of bytes available.

This code is certainly not the most elegant solution but
it is portable. The code works for normal files under
windows and linux. The portability is also the reason
why I use tell() and fseek() instead of windows specific
code.
When I open a file with _popen I get a different result:

There are a number of things that look like an open file but aren't
seekable. Tty devices (the console or serial ports), and pipes are
included. Sockets aren't seekable either. Certain magnetic tape
devices might have spotty support for seeking beyond rewind.
- Under linux the tell() of step 1 returns -1 which means
the file is not seekable. I can recognice this situation
and react accordingly (I cannot malloc the buffer beforehand.
Instead I malloc a smaller buffer which is realloced until
all bytes are read).

I claim you need to deal with the possibility of a growing file
*anyway*, so the approach of malloc()/realloc() always needs to
be used.
- Under windows the tell() of step 1 returns 0 which
means the file is seekable and is currently at position 0.
The other calls of fseek() and ftell() succeed also and
indicate that the number of available bytes is 0.
Therefore my program thinks that there are no bytes
available in the file opened with _popen.

If you know you opened the file with popen(), then you know it's
a pipe. Deal with it.
The information that it is a file opened with _popen is
not available at that place in my program.

Now my question:
Is it possible to find out that a file (available in a
variable of type FILE * ) was opened with _popen?

I suggest that if you skip steps 1 through 5 and replace it with
estimated_file_size = 4096;
then use your strategy of malloc()/realloc(), you cover all cases
without needing that information. The number 4096 for an initial
buffer size is chosen to be small enough to not be a huge waste of
memory on small files, and to be a reasonably efficient block size
for reading files. 3 bytes is too small and 50 megabytes is way
too big for typical files (except maybe video or large databases).
Tune for your application as appropriate. You can also tune how much
more memory to get each time when the initial buffer isn't enough.

I'll mention here that the assumption that a file will fit in memory,
especially on a 32-bit system, is somewhat shaky. Consider especially
that a DVD holds 4.7GB (more for DL), which is bigger than the
address space of a 32-bit system. Whether this is a problem depends
on the type of files your application uses.
Something like: Turn the FILE * into a handle and ask a
function about the file type. It is no problem for me to
insert windows specific code under an #ifdef

For POSIX systems, look up fileno() and fstat(), with particular
attention to the st_mode structure field.
 
T

thomas.mertes

Do you mean ftell() here?

Yes I mean ftell(aFile).
If the file is a binary file, SEEK_END need not be meaningfully supported.
Also, shouldn't that second argument of fseek() be 0L, not 0?


Do you mean ftell() here?

Yes I mean ftell().
ftell() on a text file need not return
a number of anything; it might be a bitfield combination of
track, train, sector, offset within sector, cylinder, etc., and
subtracting two of them may not give any meaningful result.

On linux/unix/bsd/windows my approach works at least for
regular files. In which operating systems does ftell() return
a bitfield in the way you said?
There are a number of things that look like an open file but aren't
seekable. Tty devices (the console or serial ports), and pipes are
included. Sockets aren't seekable either. Certain magnetic tape
devices might have spotty support for seeking beyond rewind.

Yes, I know that there are open files which are not seekable.
I expect such files to return -1 on ftell() and fseek().
I claim you need to deal with the possibility of a growing file
*anyway*, so the approach of malloc()/realloc() always needs to
be used.

I consider the malloc()/realloc() approach more time
consuming and I assume that the heap could also be thrashed.
Therefore I want to use this approach also as last resort.
If you know you opened the file with popen(), then you know it's
a pipe. Deal with it.

I am talking about a function which gets a FILE * parameter.
This function is part of a library. This library is used in
an interpreter or is linked to in compiled programs. If you
are interested: I am talking about the Seed7 interpreter
and about compiled Seed7 programs. The Seed7 programs are
compiled to C, further compiled with a C compiler and then
the library is linked to it. Therefore the programmer of
the Seed7 program knows that he is opening a pipe with
popen(), but the library has not this information.

For now I took the approach to find out in the makefile
(when using 'make depend') if ftell() works correct for
a pipe opened with popen(). In that case I replace the
ftell() with my own version which checks the filetype with
fstat() and calls the original ftell() only for a regurar
file and returns -1 otherwise.

I am just angry that I have to code around a bug of
windows/mingw.
I suggest that if you skip steps 1 through 5 and replace it with
estimated_file_size = 4096;
then use your strategy of malloc()/realloc(), you cover all cases
without needing that information. The number 4096 for an initial
buffer size is chosen to be small enough to not be a huge waste of
memory on small files, and to be a reasonably efficient block size
for reading files. 3 bytes is too small and 50 megabytes is way
too big for typical files (except maybe video or large databases).
Tune for your application as appropriate. You can also tune how much
more memory to get each time when the initial buffer isn't enough.

My malloc()/realloc() strategy works in steps of 4096.
I'll mention here that the assumption that a file will fit in memory,
especially on a 32-bit system, is somewhat shaky. Consider especially
that a DVD holds 4.7GB (more for DL), which is bigger than the
address space of a 32-bit system. Whether this is a problem depends
on the type of files your application uses.

Reading a whole file is just a special case use of the
function. The prototype of the library function is:

stritype filGets (FILE *aFile, long length);

This function can be used similar to the fgets() function
of C. While fgets() gets the buffer as parameter filGets()
mallocs the buffer instead of getting it. Therefore it
can malloc a buffer of exact the right size. filGets() has
also the parameters in a different sequence. The name of
the function in a Seed7 program is gets(file, length).

Recognizing an out of memory situation and raising an
exception in Seed7 is part of the functions task. The
Seed7 user program would get an exception instead of the
characters read.
For POSIX systems, look up fileno() and fstat(), with particular
attention to the st_mode structure field.

This is what I do now.

Greetings Thomas Mertes

Seed7 Homepage: http://seed7.sourceforge.net
Seed7 - The extensible programming language: User defined statements
and operators, abstract data types, templates without special
syntax, OO with interfaces and multiple dispatch, statically typed,
interpreted or compiled, portable, runs under linux/unix/windows.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,014
Latest member
BiancaFix3

Latest Threads

Top