opening a file

C

cartercc

If you don't care whether the file opens, why are you opening it?

My point was that you can conjure up an exception for every rule.

In my job, I constantly write and run scripts that open and close
files. Many of these scripts are on-the-fly scripts. I've been at work
for three hours today, and already I've written two of these kinds of
scripts and have run around two dozen. Very rarely do I bother
checking the return value of open(), and I don't ever recall having an
open fail. The biggest problem I have is with typographical errors,
and warnings catch those.

I totally agree with error checking, and the scripts that I write for
others to run ALWAYS include this kind of error checking. The scripts
that I write FOR MYSELF for processing files rarely do, however,
because open() never fails and I'm too lazy to take the extra effort
(small though it may be) to type out the 'or die' clause. My quibble
is not with the rule, but with the idea that the rule has no
exceptions. Every rule has exceptions.

If it's any consolation, I am the only one that suffers the penalty
for not checking whether or not a file opens in my own scripts, and I
am perfectly willing to risk incurring the penalty to keep from typing
a few extra keystrokes with every invocation of open. This is a
deliberate choice on my part, and I have never sought to impose that
choice on anyone else.

CC
 
T

Tad J McClellan

cartercc said:
Like other ironclad rules, this also has exceptions.


I have never seen one (even after reading all of your post).

Using the 'or
die' construct


The "check the return value" part was meant to be ironclad.

The "what action to take part" (eg. die) was not meant to be ironclad.

If die() is not what you want to do when the open failed, then
replace it with code that does do what you want.

has costs (albeit minimal) and when the costs outweigh
the benefit, you shouldn't use it.


I submit that the costs never outweigh the benefit.

You are free to disagree.

Example: opening the file is
tangential to the script so you don't care whether the file opens but
you DO care if the script dies.


I assume that your program is going to read or write from the
unopened filehandle?

If not, then why call open() at all?

If so, then your program will spew superfluous warnings, potentially
masking real warnings. (and if you are programming without warnings
enabled, then you deserve any pain you receive.)


At a minimum, you need to check before using the filehandle. eg:

# log stuff if we can:
if (open my $log, '>>', '/tmp/logfile' ) {
print $log "some logged info here\n"
}
# never use $log below here...

It's true that in the vast majority of cases a prudent script will
perform error checking for open(),


I still have not seen a single exception...

but it's also true that one
shouldn't follow the rule blindly forgetting the purpose and the
effects of error checking.


I cannot envision where blindly following my original advice
is unwarranted.

Which is exactly why I state it the way I do.

I've had this discussion many times before...

If it is ever disproved, then I'll change how I state it.
 
T

Tim Greer

cartercc said:
Like other ironclad rules, this also has exceptions. Using the 'or
die' construct has costs (albeit minimal) and when the costs outweigh
the benefit, you shouldn't use it. Example: opening the file is
tangential to the script so you don't care whether the file opens but
you DO care if the script dies.

It's true that in the vast majority of cases a prudent script will
perform error checking for open(), but it's also true that one
shouldn't follow the rule blindly forgetting the purpose and the
effects of error checking.

CC

Right, but checking the return value doesn't mean to die if it fails to
open. It just means to not start processing data that might not exist
(for one of many examples). Checking the return value could result in
many different ways of dealing with the following code and what you're
doing. Of course, in this case of the OP, this was a very specific
thing, because they needed that data and didn't know why it wasn't
opening. I'd imagine you always have a reason to check the return
value in any program, in any situation, and you just deal with the
results one way or the other (whatever is most appropriate). Sometimes
that's to die, and sometimes it'll skip the following code logic that
would have processed some data, or any number of logistical reasoning.
 
C

cartercc

This isn't a big deal. I often don't check the return value of open
and I'm quite willing to accept the consequences. However, this only
applies to scripts written for ME on MY system, and I know that system
intimately. When I'm busy, I do this literally hundreds of times a
week, perhaps (on a very long day) a hundred times a day. I know what
I'm doing.

On a philosophical note, OF COURSE (!) there are exceptions to every
rule. Please note that by 'rule' I refer to discretionary behavior
that we generally refer to as 'good practice'. The rule that every
statement in Perl ends with a semicolon is a mandatory rule, and a
script will throw a compile time error if we miss a semicolon.

For another example, consider warnings. One of my tasks is to process
a file which may consists of several hundred thousand rows, totaling
different kinds of values. When I run the script with warnings, I get
an uninitialized value warning for every row printed to the screen,
and the script takes a significant amount of time to run. When I run
the script without warnings, the script runs quickly. To silence the
warnings I can either (1) initialize a hash value for each row, or (2)
run without warnings. I choose (2).

Again, this isn't a big deal, and I TOTALLY AGREE that the return
value for open() should generally be checked, and that warnings should
generally be enables. However, for the reasons I stated, I often don't
do this, and I am perfectly willing to accept the consequences.

CC
 
J

Jürgen Exner

cartercc said:
Like other ironclad rules, this also has exceptions. Using the 'or
die' construct has costs (albeit minimal)

Cost in terms of what? In terms of execution time or memory should be
negligable except in very extreme cases, in particular because accessing
the file system is so expensive on the OS side anyway that you will
probably have difficulties even measuring the additional cost of die().
and when the costs outweigh the benefit, you shouldn't use it.

Of course.
Example: opening the file is
tangential to the script so you don't care whether the file opens but
you DO care if the script dies.

Then don't use die() but some other appropriate error handling, maybe
logging the even or whatever is suitable for that situation.
It's true that in the vast majority of cases a prudent script will
perform error checking for open(), but it's also true that one
shouldn't follow the rule blindly forgetting the purpose and the
effects of error checking.

Fair enough. But I still can't imagine a situation where you wouldn't
care about the success of an open() call.

jue
 
T

Tim Greer

cartercc said:
My point was that you can conjure up an exception for every rule.

In my job, I constantly write and run scripts that open and close
files. Many of these scripts are on-the-fly scripts. I've been at work
for three hours today, and already I've written two of these kinds of
scripts and have run around two dozen. Very rarely do I bother
checking the return value of open(), and I don't ever recall having an
open fail. The biggest problem I have is with typographical errors,
and warnings catch those.

I totally agree with error checking, and the scripts that I write for
others to run ALWAYS include this kind of error checking. The scripts
that I write FOR MYSELF for processing files rarely do, however,
because open() never fails and I'm too lazy to take the extra effort
(small though it may be) to type out the 'or die' clause. My quibble
is not with the rule, but with the idea that the rule has no
exceptions. Every rule has exceptions.

If it's any consolation, I am the only one that suffers the penalty
for not checking whether or not a file opens in my own scripts, and I
am perfectly willing to risk incurring the penalty to keep from typing
a few extra keystrokes with every invocation of open. This is a
deliberate choice on my part, and I have never sought to impose that
choice on anyone else.

CC

I suppose you're aware then, that if it did fail on a file that you did
need data from, that you would spend more than the 1 second a warn, die
or method you create to note the problem, and start digging through the
file, adding the checks, only when it does fail. If you didn't make a
typo and the file exists and is readable, then yeah, you'll probably
never have a problem. But, the same is true of checking the return.
You still would never have a problem and never see it die, warn or log
an error -- even if it silently ignores it (due to the return value)
and moves on fine, in a friendly manner. I couldn't imagine how it's
really saving any time at all by not adding a simple check. If
anything, create some very simple module or library and use that for
the open routines on all of your scripts, even if to just log and
happily and blindly move on (because yeah, sometimes it's okay to just
keep moving). Anyway, it's certainly up to you if it's just your code
and you affected.
 
T

Tim Greer

Tad said:
At a minimum, you need to check before using the filehandle. eg:

# log stuff if we can:
if (open my $log, '>>', '/tmp/logfile' ) {
print $log "some logged info here\n"
}
# never use $log below here...

And, for the sake of clarity, that above example _is_ checking the
return value with if (open...), before it continues working with the
data, so I can't help but agree with Tad. I absolutely can't imagine
how you'd do this otherwise, unless you want to run the risk of open
(my $fileh, '<', $filename); and then just start blindly working with
$fileh. That seems more than a preference, but a high risk. If you
know you don't make mistakes typing, ever, and you know the files are
always there, always readable and their data and its output and your
scripts are trivial, then go for it.
 
T

Tim Greer

cartercc said:
This isn't a big deal. I often don't check the return value of open
and I'm quite willing to accept the consequences. However, this only
applies to scripts written for ME on MY system, and I know that system
intimately. When I'm busy, I do this literally hundreds of times a
week, perhaps (on a very long day) a hundred times a day. I know what
I'm doing.

I don't think anyone's upset if you do this yourself on your own systems
for your own scripts, and it doesn't affect anyone else. Still, it
doesn't seem like a good idea out of laziness or being busy. Checking
the return of open should actually save you time in every case, so it
just seems strange you'd not _want_ to do it. Of course, if you choose
not to, that's cool, since you don't seem to do it that way for other
people.

Still, you replied to the thread saying what appeared to suggest that
not everyone should check the return value in every case. I don't
imagine any circumstance or reason in a single case not to, and it
should even save time and hassles. If you just meant to not use
something nasty like die(), then that's completely valid, though one
really should never have a reason not to check the return value.
Saying you know you should, but don't always do it or want to, is
another thing, but it just seemed that you were suggesting it and it
struck me as odd for that reason.

For another example, consider warnings. One of my tasks is to process
a file which may consists of several hundred thousand rows, totaling
different kinds of values. When I run the script with warnings, I get
an uninitialized value warning for every row printed to the screen,
and the script takes a significant amount of time to run. When I run
the script without warnings, the script runs quickly. To silence the
warnings I can either (1) initialize a hash value for each row, or (2)
run without warnings. I choose (2).

There is probably a better way to do it, where you're not having to
initialize each row (but that's just a guess without seeing the
script). Sometimes things here or there can be annoying and you can
save time, but when you do that, I don't know that you could ever trust
the output of the script when it all comes down to it, and in which
case why take the time to create the script or run it, unless you're
just doing something for a general idea or rough estimate of the
output?

Again, this isn't a big deal, and I TOTALLY AGREE that the return
value for open() should generally be checked, and that warnings should
generally be enables. However, for the reasons I stated, I often don't
do this, and I am perfectly willing to accept the consequences.

I think people would disagree about it not being a big deal, but I also
don't think anyone minds how you run your own personal scripts. I've
never seen anyone that didn't make a goof in their code at some point,
and I'd prefer to have something catch it right away. I really can't
say I agree there are reasons, but to each their own.
 
C

cartercc

Fair enough. But I still can't imagine a situation where you wouldn't
care about the success of an open() call.

I processed a file yesterday where I broke the input file into four
output files. In doing so, I wrote intermediate values to four
intermediate files simply for the sake of a sanity check. The output
was perfect, and I never looked at or even opened the intermediate
files. Yes, I could have checked to see if they had been opened and
all that, and if I had had a problem I certainly would have.

I'm not advocating not doing all the error checking, opening files,
warnings, and all that. All I'm saying is that IMO when you're writing
throw away scripts for your own convenience, a little sloppiness can
be excused ... not justified, but only excused. To be fair, I started
off doing the file open error checking, and only slipped into some
shortcuts after writing a lot of scripts. Again, in my work, I know
immediately when something doesn't work, like a typo in a file name --
but the consequence is the same: a file not existing or being empty
means a mistake, and it doesn't take any more effort to fix the
problem in one case or another.

Again, when I write scripts for others, I don't use shortcuts (which
also includes documentation of variables, functions, etc.)

CC
 
C

Charlton Wilbur

TG> I think people would disagree about it not being a big deal, but
TG> I also don't think anyone minds how you run your own personal
TG> scripts. I've never seen anyone that didn't make a goof in
TG> their code at some point, and I'd prefer to have something catch
TG> it right away. I really can't say I agree there are reasons,
TG> but to each their own.

More to the point, if you're posting to a newsgroup to ask a small group
of experts to debug your code, it's essential that you have done
everything you can to find the bug yourself.

This means that if there's an open call, you check the return value.
This means that you have use strict; and use warnings; right after the
shebang line.

If you are enough of an experienced Perl programmer to never ever need
help, you can do whatever you like with the result of your open calls.
But for the rest of us lesser mortals, any way we can automatically find
our mistakes without having to post to an audience of thousands is a
good thing.

Charlton
 
T

Tim Greer

Charlton said:
TG> I think people would disagree about it not being a big deal,
but TG> I also don't think anyone minds how you run your own
personal
TG> scripts. I've never seen anyone that didn't make a goof in
TG> their code at some point, and I'd prefer to have something
catch
TG> it right away. I really can't say I agree there are reasons,
TG> but to each their own.

More to the point, if you're posting to a newsgroup to ask a small
group of experts to debug your code, it's essential that you have done
everything you can to find the bug yourself.

This means that if there's an open call, you check the return value.
This means that you have use strict; and use warnings; right after the
shebang line.

If you are enough of an experienced Perl programmer to never ever need
help, you can do whatever you like with the result of your open calls.
But for the rest of us lesser mortals, any way we can automatically
find our mistakes without having to post to an audience of thousands
is a good thing.

Charlton

Good points, but in fairness (and I might have missed it), carter is
simply saying he doesn't for his own code. I don't think they actually
would post code without checks either if they were posting for help or
posting an answer. I get what he's saying, I just don't know that I
agree that there's ever a reason to save a few seconds of time for the
sake of it being your own code you're sure about, even if it's trivial
data its working with. Copy and paste the simple check (whatever that
might be) and you get the best of both worlds. But, that's just my
view.
 
T

Tad J McClellan

cartercc said:
The rule that every
statement in Perl ends with a semicolon is a mandatory rule,


No it isn't.

It is in C. It is in Java.

It isn't in Perl.

In C, Java etc, the semicolon is a statement "terminator".

In Perl, the semicolon is a statement "separator".

The difference is subtle but is there nonetheless.

and a
script will throw a compile time error if we miss a semicolon.


This script won't:

----------------------
#!/usr/bin/perl
use warnings;
use strict;

# Look Ma! No semicolon...

print "Hello world\n"

# ... because the print() has no following statement to "separate"
----------------------


For another example, consider warnings. One of my tasks is to process
a file which may consists of several hundred thousand rows, totaling
different kinds of values. When I run the script with warnings, I get
an uninitialized value warning for every row printed to the screen,
and the script takes a significant amount of time to run. When I run
the script without warnings, the script runs quickly. To silence the
warnings I can either (1) initialize a hash value for each row, or (2)
run without warnings. I choose (2).


Actually, you can choose between

2a) run without warnings

2b) turn warnings off for the one line that is making warnings
that you don't care to see:

{ no warnings 'uninitialized';
print undef
}

(no semicolon yet again)

With 2b, you still get assistance from warnings everywhere else.

Again, this isn't a big deal,


It is for people who know where the costs of software development are
(in maintenance).

As I said here once way back in '01:

I'm just some guy on Usenet, so feel free to ignore me, but
if you worked for me, you wouldn't work for me.

:)
 
H

Hans Mulder

cartercc said:
In my job, I constantly write and run scripts that open and close
files. Many of these scripts are on-the-fly scripts. I've been at work
for three hours today, and already I've written two of these kinds of
scripts and have run around two dozen. Very rarely do I bother
checking the return value of open(), and I don't ever recall having an
open fail. The biggest problem I have is with typographical errors,
and warnings catch those.

You could consider adding:

use Fatal qw/:void open close/;

to the boiler plate code at the top of every script. That will add an
"...or die" to every 'open' or 'close' command where the return value
isn't used. This gives you useful error messages without stroking keys.

Hope this helps,

-- HansM
 
M

Mart van de Wege

cartercc said:
This isn't a big deal. I often don't check the return value of open
and I'm quite willing to accept the consequences. However, this only
applies to scripts written for ME on MY system, and I know that system
intimately. When I'm busy, I do this literally hundreds of times a
week, perhaps (on a very long day) a hundred times a day. I know what
I'm doing.

On a philosophical note, OF COURSE (!) there are exceptions to every
rule.

On a philosophical note, when it comes to programming, you *always*
check the outcome of *any* statement that interacts with outside data.

You're free to disregard that rule, like you are free to disregard
other rules, but you set yourself up for bugs and security
vulnerabilities if you do.

Mart
 
P

Peter J. Holzer

If you don't care whether the file opens, why are you opening it?
At the very least I think you'd need to know it didn't open so you
don't attempt to do I/O with it.

I can think of one situation where you don't need to check whether the
open succeeded: When you try to open a file for reading and a
non-existent file is exactly equivalent to an empty file. Then

open(my $fh, '<', $optional_file);
while (<$fh>) {
# whatever
}

is fine from a purely functional point of view. But I'd still write that
as

if (open(my $fh, '<', $optional_file)) {
while (<$fh>) {
# whatever
}
}

just to assure the reader of the program (probably myself in six months)
that I haven't forgotten the check.

hp
 
P

Peter J. Holzer

My point was that you can conjure up an exception for every rule.

In my job, I constantly write and run scripts that open and close
files. Many of these scripts are on-the-fly scripts. I've been at work
for three hours today, and already I've written two of these kinds of
scripts and have run around two dozen. Very rarely do I bother
checking the return value of open(), and I don't ever recall having an
open fail.

You have never, ever mistyped a file name, started a script in the wrong
directory or as the wrong user? You, sir, have my utmost admiration -
or you would have if I believed you, which I don't.
The biggest problem I have is with typographical errors,
and warnings catch those.

Warnings don't catch mistyped filenames. Checking the return value of
open does.

hp

PS: I recall that not so long ago you started a thread with the subject
"crisis perl".
 
E

Eric Pozharski

Cost in terms of what? In terms of execution time or memory should be
negligable except in very extreme cases, in particular because accessing
the file system is so expensive on the OS side anyway that you will
probably have difficulties even measuring the additional cost of die().

I can't say for cartercc, but I have difficulties obviously

perl -wle '
use Benchmark qw|countit cmpthese timethese|;
my $lit = qq{abc\txyz};
my $t = timethese 50_000, {
die => sub { open my $fh, q|>|, q|/dev/null| or die; },
maybe => sub { open(my $fh, q|>|, q|/dev/null|) || die; },
live => sub { open my $fh, q|>|, q|/dev/null|; }, };
cmpthese $t;
'
Benchmark:
timing 50000 iterations of
die, live, maybe
....

die: 4 wallclock secs ( 1.80 usr + 0.91 sys = 2.71 CPU) @
18450.18/s (n=50000)

live: 3 wallclock secs ( 1.60 usr + 0.99 sys = 2.59 CPU) @
19305.02/s (n=50000)

maybe: 3 wallclock secs ( 1.72 usr + 0.90 sys = 2.62 CPU) @
19083.97/s (n=50000)

Rate die maybe live
die 18450/s -- -3% -4%
maybe 19084/s 3% -- -1%
live 19305/s 5% 1% --

Rate die maybe live
die 19608/s -- -2% -2%
maybe 20000/s 2% -- -0%
live 20080/s 2% 0% --

Rate die maybe live
die 19380/s -- -1% -2%
maybe 19531/s 1% -- -1%
live 19763/s 2% 1% --

And once I've even had that (though failed to recreate)

Rate live die
live 20000/s -- -1%
die 20161/s 1% --

*CUT*
 
P

Peter J. Holzer

For another example, consider warnings. One of my tasks is to process
a file which may consists of several hundred thousand rows, totaling
different kinds of values. When I run the script with warnings, I get
an uninitialized value warning for every row printed to the screen,
and the script takes a significant amount of time to run. When I run
the script without warnings, the script runs quickly.

Run-time is the least problem here. If your script spews out any
warnings (much less one warning for each line of useful output) during
normal operation, you are doing something wrong. How are you supposed to
notice the serious warnings if you always get a lot of spurious
warnings?
To silence the warnings I can either (1) initialize a hash value for
each row,

There are other ways. For example you can use the //, || or ?: operators
to use default values:

for (@keys) {
printf("%-15s %10.2f\n", $_, $values{$_} // 0;
}
or (2) run without warnings. I choose (2).

You are throwing the baby out with the bath water. You can turn off
specific warnings for a specific scope:


use warnings;
...

for my $row (@rows) {
no warnings 'uninitialized';

# silently prints all undef values in @$row as ''
print join('\t', @$row), "\n";
}

hp
 
X

xhoster

Peter J. Holzer said:
You have never, ever mistyped a file name, started a script in the wrong
directory or as the wrong user? You, sir, have my utmost admiration -
or you would have if I believed you, which I don't.

I've done the first two. I don't recall doing the third. When I got
warnings about reads or writes on a closed filehandles, I immediately knew
that I screwed up, and it was pretty easy to figure out how. (Just as I
would have had I checked and warned on the return value of open.)
Warnings don't catch mistyped filenames. Checking the return value of
open does.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
T

Tim Greer

I've done the first two.  I don't recall doing the third.  When I got
warnings about reads or writes on a closed filehandles, I immediately
knew that I screwed up, and it was pretty easy to figure out how. 
(Just as I would have had I checked and warned on the return value of
open.)

I'd imagine it'd be easy enough to figure out in a lot of cases,
especially if you use unique filehandle names and such, but if you
didn't and the script grew, it could open the potential for problems to
crop up that weren't immediately obvious, though I still imagine it
wouldn't take very long to find the issue. I suppose the issue varies
upon the risk and how you're using data, how important that data is,
and how important the results are -- though probably none of those
things are technically going to increase by not checking the return
value, it could in some situations (but again, that would be due to
poor logic in the script anyway, but if we were all perfect, we'd not
need failures to be reported, let alone return values). I'm certain
people can (and have) created scripts without good checking that work
fine and may work fine indefinitely, but it's just easier to
intentionally put in checks and fail safes as you go along in my
opinion.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top