sysopen failures

Marc Girod · Aug 6, 2010

Hello,

A script saves mails sent by a crontab--so, there may be bursts...
It uses sysopen, I assume to make sure it doesn't overwrite existing
files.
At times, we get bursts of errors (File exists), which I trace to the
sysopen call.
However, I cannot find that all the corresponding files would have
existed.
I read the doc and get to:

In many systems the "O_EXCL" flag is available for opening
files
in exclusive mode. This is not locking: exclusiveness
means here
that if the file already exists, sysopen() fails. "O_EXCL"
may
not work on network filesystems, and has no effect unless
the
"O_CREAT" flag is set as well.

The script does write to a network filesystems (home directory on a
remote filer, 4 ms round-trip).

Shoud I look for a replacement for sysopen?
Or for an other theory to explain the problem?

The bit of code doing the open:

if(defined($mode)? sysopen(FILE, $file, O_EXCL | O_CREAT | O_WRONLY,
$mode):
sysopen(FILE, $file, O_EXCL | O_CREAT | O_WRONLY))
{
_dump(*FILE, @$r_lines);
return close(FILE);
}
return(0);

Thanks,
Marc

John W. Krahn · Aug 7, 2010

Marc said:
A script saves mails sent by a crontab--so, there may be bursts...
It uses sysopen, I assume to make sure it doesn't overwrite existing
files.
At times, we get bursts of errors (File exists), which I trace to the
sysopen call.
However, I cannot find that all the corresponding files would have
existed.
I read the doc and get to:

In many systems the "O_EXCL" flag is available for opening
files
in exclusive mode. This is not locking: exclusiveness
means here
that if the file already exists, sysopen() fails. "O_EXCL"
may
not work on network filesystems, and has no effect unless
the
"O_CREAT" flag is set as well.

The script does write to a network filesystems (home directory on a
remote filer, 4 ms round-trip).

Shoud I look for a replacement for sysopen?
Or for an other theory to explain the problem?

The bit of code doing the open:

if(defined($mode)? sysopen(FILE, $file, O_EXCL | O_CREAT | O_WRONLY,
$mode):
sysopen(FILE, $file, O_EXCL | O_CREAT | O_WRONLY))
{

Your code does not check the return value from sysopen() so the error
message you are receiving is not related to sysopen().

_dump(*FILE, @$r_lines);
return close(FILE);
}
return(0);

John

Marc Girod · Aug 7, 2010

Your code does not check the return value from sysopen() so the error
message you are receiving is not related to sysopen().

Doesn't it?
The block is an if block.
If the condition (the return from sysopen) is false, the function
returns 0 unconditionally.

Besides, the error I get from $!, at the time of reporting in the
calling function is, as I wrote it, 'File exists'.

And last: it is not *my* code.
Marc

Peter J. Holzer · Aug 7, 2010

Doesn't it?
The block is an if block.
If the condition (the return from sysopen) is false, the function
returns 0 unconditionally.

But you haven't shown where the error is reported. We could only guess
that it's the next thing after â€œreturn 0â€.

Besides, the error I get from $!, at the time of reporting in the
calling function is, as I wrote it, 'File exists'.

There may be something between the sysopen and the printing of $! which
causes that error. Since we haven't seen that code we can't tell.
I do agree that it's very likely that â€œFile existsâ€ is set by sysopen in
this case.

And last: it is not *my* code.

You posted it, so in the context of this thread it's your code.

hp

Peter J. Holzer · Aug 7, 2010

A script saves mails sent by a crontab--so, there may be bursts...
It uses sysopen, I assume to make sure it doesn't overwrite existing
files.
At times, we get bursts of errors (File exists), which I trace to the
sysopen call.
However, I cannot find that all the corresponding files would have
existed.
I read the doc and get to:

In many systems the "O_EXCL" flag is available for opening
files
in exclusive mode. This is not locking: exclusiveness
means here
that if the file already exists, sysopen() fails. "O_EXCL"
may
not work on network filesystems, and has no effect unless
the
"O_CREAT" flag is set as well.

If the network filesystem doesn't support O_EXCL, then the open will
succeed even though it shouldn't (or it will fail every time with
EINVAL), it won't fail when the file doesn't exist.

The script does write to a network filesystems (home directory on a
remote filer, 4 ms round-trip).

Shoud I look for a replacement for sysopen?

No. sysopen is the closest you can get to the OS. If sysopen reporte
EEXIST, then the OS really thinks the file exists at the time.

Or for an other theory to explain the problem?

Yes. Most likely causes are IMHO:

* The file really exists at the time. You have to find out why
(maybe your script is supposed to remove the file before it
terminates and it either doesn't do it or a previous invocation
hasn't finished yet). If you know why the solution is probably
obvious.
* The file did exist and has already been removed at the time the
script runs, but the information about the file's existence is still
cached by the OS. In this case you should check the configuration of
the file system (on both the client and the server).

hp

John W. Krahn · Aug 7, 2010

Marc said:
Doesn't it?
The block is an if block.
If the condition (the return from sysopen) is false,

Or true.

the function returns 0 unconditionally.
Correct.

Besides, the error I get from $!, at the time of reporting in the
calling function is, as I wrote it, 'File exists'.

By that time $! could have been changed by some other system function.

You need to capture or print the value of $! immediately after the
system function that sets it, for example:

sysopen my $FILE, $file, O_EXCL | O_CREAT | O_WRONLY, $mode or do {
warn "Cannot open '$file' $!";
return 0;
};

*Note that "$mode" is really the permissions field, and the MODE field
is actually "O_EXCL | O_CREAT | O_WRONLY".

John

Marc Girod · Aug 9, 2010

If the network filesystem doesn't support O_EXCL, then the open will
succeed even though it shouldn't (or it will fail every time with
EINVAL), it won't fail when the file doesn't exist.
Thanks.

No. sysopen is the closest you can get to the OS.
OK.

Yes. Most likely causes are IMHO:

* The file really exists at the time. You have to find out why
(maybe your script is supposed to remove the file before it
terminates and it either doesn't do it or a previous invocation
hasn't finished yet). If you know why the solution is probably
obvious.
* The file did exist and has already been removed at the time the
script runs, but the information about the file's existence is still
cached by the OS. In this case you should check the configuration of
the file system (on both the client and the server).

Well, I have to figure out how either could be possible.
I cannot see what in the code could remove anything...
Those files are left to be accessible from a web page.
There is a loop which increments a digit in the file name and tries
100 times.
I don't think this is very smart, but I cannot see how the 100
different files could exist.

The error returned in the end is of course only the one for the last
try.
So that I don't *know* why the 99 first attempts failed.
But in the general case, I cannot find the 100th file either.

I may leave this for a while: I have more urgent, and the symptom has
stopped for now.
But I don't doubt it will come back.

Thanks.
Marc

Marc Girod · Aug 9, 2010

Or true.
Indeed.

By that time $! could have been changed by some other system function.
Yes.

You need to capture or print the value of $! immediately after the
system function that sets it, for example:
Yes...

*Note that "$mode" is really the permissions field, and the MODE field
is actually "O_EXCL | O_CREAT | O_WRONLY".

I saw that. It was set to 0444.
Thanks. I'll save a pointer to this and come back later.

Marc

Peter J. Holzer · Aug 9, 2010

[sysopen(... O_EXCL ...) fails with EEXIST although the file shouldn't
exist]

Well, I have to figure out how either could be possible.
I cannot see what in the code could remove anything...
Those files are left to be accessible from a web page.
There is a loop which increments a digit in the file name and tries
100 times.
I don't think this is very smart, but I cannot see how the 100
different files could exist.

The error returned in the end is of course only the one for the last
try.
So that I don't *know* why the 99 first attempts failed.
But in the general case, I cannot find the 100th file either.

I would log each failure with all the details I can think of. In this
case:

* exact name of the file to be created
* $! immediately after the failure
* cwd at the time of the failure
* try to lstat the file just after the failure (but after logging $!,
or you will change it!) and log all relevant information if the
file exists - this will help you to determine where the spurious
files come from

I may leave this for a while: I have more urgent, and the symptom has
stopped for now.
But I don't doubt it will come back.

If you add the diagnostics now you will have the information next time
the problem occurs.

hp

Martijn Lievaart · Aug 9, 2010

Well, I have to figure out how either could be possible. I cannot see
what in the code could remove anything... Those files are left to be
accessible from a web page. There is a loop which increments a digit in
the file name and tries 100 times.
I don't think this is very smart, but I cannot see how the 100 different
files could exist.

The error returned in the end is of course only the one for the last
try.
So that I don't *know* why the 99 first attempts failed. But in the
general case, I cannot find the 100th file either.

Stupid suggestion and does not match exactly what you wrote above, but:
Are there maybe two instances of your program running and interfering
with each other?

M4

Marc Girod · Aug 10, 2010

Stupid suggestion and does not match exactly what you wrote above, but:
Are there maybe two instances of your program running and interfering
with each other?

Thanks... This script is invoked via a ~/.forward file, i.e. for every
incoming mail. I guess it is very possible that the same mail is
presented twice, e.g. if it contains the address twice (?)
Again something I should log.
But each instance generates its own filenames, and sure, the first
would easily clash (protected only by timestamp), but it should impact
only the first of the two instances...

Marc

C.DeRykus · Aug 10, 2010

Thanks... This script is invoked via a ~/.forward file, i.e. for every
incoming mail. I guess it is very possible that the same mail is
presented twice, e.g. if it contains the address twice (?)
Again something I should log.
But each instance generates its own filenames, and sure, the first
would easily clash (protected only by timestamp), but it should impact
only the first of the two instances...

Another wild speculation: a .forward loop could bombard
a server with the same message rapidly until the loop's
detected and shutdown. A possible error race condition
might ensue...Still, I suppose a .forward loop shutdown
would've been under the microscope by now.

The added logging/diagnostics should flush this out too.

Martijn Lievaart · Aug 10, 2010

Thanks... This script is invoked via a ~/.forward file, i.e. for every
incoming mail. I guess it is very possible that the same mail is
presented twice, e.g. if it contains the address twice (?) Again
something I should log.
But each instance generates its own filenames, and sure, the first would
easily clash (protected only by timestamp), but it should impact only
the first of the two instances...

In that case I suggest you make sure the filenames are really unique
(hint, use the pid, available in $$), or implement some sort of locking
mechanism.

M4

Peter J. Holzer · Aug 12, 2010

In that case I suggest you make sure the filenames are really unique
(hint, use the pid, available in $$),

Since the files are not removed before the program exits, using the pid
doesn't make the filenames unique. Even with a timestamp with a
resolution of one second there is a chance that the pid wraps around too
fast (although it's pretty unlikely).

or implement some sort of locking
mechanism.

O_EXCL is "some kind of locking mechanism": Any already existing file is
"locked" and won't be overwritten. It doesn't wait (wouldn't be very
useful since files aren't deleted) but fails immediately. Then the
program tries a new file name. This looks ok (it is also a usual method
for creating temporary files). Maybe the "try next file filename" part
doesn't work correctly - we haven't seen the code and it probably isn't
well tested.

Oh, and for everything running from a .forward file: Be sure to use the
correct exit codes from sys_exit.h.

hp

C.DeRykus · Aug 12, 2010

O_EXCL is "some kind of locking mechanism": Any already existing file is
"locked" and won't be overwritten. It doesn't wait (wouldn't be very
useful since files aren't deleted) but fails immediately. Then the
program tries a new file name. This looks ok (it is also a usual method
for creating temporary files). Maybe the "try next file filename" part
doesn't work correctly - we haven't seen the code and it probably isn't
well tested.

It's not clear to me what O_EXCL will do though
since the files are remote and the OP's doc say:

"O_EXCL" may not work on network filesystems ...

This makes me wonder if "not work" might well
manifest as a flurry of false positives for
an error at times but work normally at other
times.

I don't know if there are other network locking
mechanisms that'd be easy or reliable though.

Peter J. Holzer · Aug 12, 2010

It's not clear to me what O_EXCL will do though
since the files are remote and the OP's doc say:

"O_EXCL" may not work on network filesystems ...

This makes me wonder if "not work" might well
manifest as a flurry of false positives for
an error at times but work normally at other
times.

I already covered this in my first posting in this thread. If you can
add anything substantial to that, please do.

hp

C.DeRykus · Aug 12, 2010

I already covered this in my first posting in this thread. If you can
add anything substantial to that, please do.

I must've missed it. I did see the remark that
'it's very likely that “File exists” is set by
sysopen' without explicit mention of sysopen not
being guaranteed to work over the network.

Michael Vilain · Aug 13, 2010

C.DeRykus said:
I must've missed it. I did see the remark that
'it's very likely that ³File exists² is set by
sysopen' without explicit mention of sysopen not
being guaranteed to work over the network.

I've been following this thread since it started. This is the first
mention of the file being on a networked filesystem.

C.DeRykus · Aug 13, 2010

I've been following this thread since it started. This is the first
mention of the file being on a networked filesystem.

See the original post:

"The script does write to a network filesystems
(home directory on a remote filer, 4 ms round-trip)."

Peter J. Holzer · Aug 13, 2010

I must've missed it.

<[email protected]>

hp

FAQ 5.16 How come when I open a file read-write it wipes it out?	0	Apr 26, 2011
FAQ 5.7 How do I make a temporary file name?	0	Feb 17, 2011
problem with sysopen() on nfs	2	Sep 18, 2004
Is syswrite faster or print	12	Jan 27, 2009
flock not locking	5	Aug 10, 2006
[ANN] JRuby 1.1RC2 Released	1	Feb 16, 2008
GNU Smalltalk 3.0 released	1	Jan 7, 2008
[ANN] JRuby 1.2.0 Released	1	Mar 16, 2009

sysopen failures

Marc Girod

John W. Krahn

Marc Girod

Peter J. Holzer

Peter J. Holzer

John W. Krahn

Marc Girod

Marc Girod

Peter J. Holzer

Martijn Lievaart

Marc Girod

C.DeRykus

Martijn Lievaart

Peter J. Holzer

C.DeRykus

Peter J. Holzer

C.DeRykus

Michael Vilain

C.DeRykus

Peter J. Holzer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads