Good practice to detect empty string?

ipellew · Dec 21, 2004

Hi all;

Pls advise the perlophiliacs method of deciding a string is empty.

I am using
if ( $@ || $c_var eq "" ) {
but constantly read `eq` is expensive.

For example is
if ( $@ || ! length $c_var ) {
better, faster, cheaper

Regards
Ian

Keith Keller · Dec 21, 2004

Pls advise the perlophiliacs method of deciding a string is empty.

I am using
if ( $@ || $c_var eq "" ) {
but constantly read `eq` is expensive.

....and using $@ in the absence of an eval is silly.

Is there something wrong with

if ($c_var)

? It's not exactly the same, but since you provide no context it's
hard to know what you really need.

--keith

Anno Siegel · Dec 21, 2004

Hi all;

Pls advise the perlophiliacs method of deciding a string is empty.

I am using
if ( $@ || $c_var eq "" ) {
but constantly read `eq` is expensive.

For example is
if ( $@ || ! length $c_var ) {
better, faster, cheaper

If you really need to know, "use Benchmark", but that's futile
micro-optimization. The idiomatic way is to test for length.
Be sure the string is defined at all.

Anno

jl_post · Dec 22, 2004

I am using
if ( $@ || $c_var eq "" ) {
but constantly read `eq` is expensive.

For example is
if ( $@ || ! length $c_var ) {
better, faster, cheaper

Dear Ian,

I'm not convinced that $var eq "" is necessarily more expensive
than length($var) . The reason I think this is because the eq
operator can report a false value as soon as it detects a character in
the variable it is examining, whereas the length() function must count
every single character in $var, even if $var is millions of characters
long.

The method that is more expensive really depends on the
implementation of the two functions/operators. If you really want to
know which one is more expensinve for the task at hand, use the
Benchmark module (read "perldoc Benchmark" to find out how to use it).

But to be honest, it really doesn't matter which method is better,
faster, cheaper. They are pretty much the same in terms of efficiency.
Sure, one may use up a few more clock cycles than the other, but this
is a small constant value that is practically imperceptable, even by
computer standards (in fact, when I got used the Benchmark module I saw
the warning: "(warning: too few iterations for a reliable count)" even
when I used a count of ten million).

A lot of programmers fall into the trap of thinking that if they
always use the faster, more efficient operators that their code will
run much faster than before. This is true only if the algorithms used
in these options behave better with large data (are you familiar with
Big-O notation?). So if your program can't handle large amounts of
data very well (that is, if it had a Big-O value of N-squared), simply
converting all your '$val eq ""' conditions to '!length($val)' isn't
going to make your program magically handle large amounts of data.
That's because eq and length() have roughly the same Big-O value. To
make your program run faster, you'd have to modify its algorithms so
that none of them are N-squared (or worse). At this point, the use of
eq versus length() is really a moot point.

To illustrate, if using the length() function is one-millionth of a
second faster than using eq, it will only make a noticeable difference
if length() (or eq) is used (on the order of) one million times more
often than anything else (and then, the difference might only be one
second). That is, if you want to check for the existence of an empty
string only five, one hundred, or even a thousand times in your code,
it really won't make a difference whether you use eq or length().
Theoretically, one method will be faster than the other, but you
couldn't time this difference with a stopwatch, even if you had faster
reflexes than anybody else in the world. And like I mentioned above,
even Perl's Benchmark module has trouble perceiving this time
difference.

In my opinion, you should usually use the function/operation that is
more readable (and, of course, you have to decide for yourself which is
more readable). If you spend two minutes converting the code to
something that is theoretically faster, you might not even save one
second of total running time (from every time you run the program).
And if it takes someone in the future three extra minutes to figure out
what you were trying to do, that's more than four minutes and 59
seconds wasted changing your code, thinking that your code will become
faster, better, cheaper.

I realize I wrote a lot about this subject, but to summarize, let me
say this:

Making code run faster almost always means eliminating the
bottlenecks. Changing '$var eq ""' to '!length($var)' might make a
difference (probably super small) but it won't eliminate a bottleneck.

Here is a real-world analogy (if you like these kinds of things):

There is a ten-mile-long road that people drive their cars on. Most
of this road has two lanes. But for some reason, five miles along the
road, the two lanes merge into one lane, but only for 100 meters (after
which they become two lanes again).

Ordinarily this isn't a problem when there are few cars on the road.
As a car reaches the place where the two lanes become one, it switches
lanes (if needed), and then switches back when there are two lanes
again.

But during periods of heavy traffic, this lane merge causes a
bottleneck. Multiple cars are trying to squeeze into one lane at the
same time, creating a bottleneck and backing up traffic for miles.
This is unacceptable, and a solution must be found.

Someone might say that the speed limit should be raised from 55 mph
to 60 mph, because 60 mph is faster, and therefore more efficient, and
will make the cars move faster. Another person might say to make the
stretch of road that only has one lane shorter so that there is more of
the road with two full lanes.

Their intentions are good, but none of these solutions eliminate the
bottleneck, which is what is slowing down traffic. A solution that is
much better than either of those just listed would be to insert a
second lane (where there is currently only one lane) for cars to use
instead of having to merge. (In fact, you could even reduce the speed
limit to 50 mph with this solution and it would still work better than
the solution to only raise the speed limit to 60 mph!)

And while raising the speed limit to 60 mph sounds good, it won't
even save you a full minute when the bottleneck is present. With the
bottleneck, the traffic might be backed up for hours, so just
eliminating one minute won't make all that much difference. Eliminate
the bottleneck and hours of driving time will be saved, even when the
speed limit is significantly slower.

And that's why I think you shouldn't worry about whether you should
use eq or length(). Just go with the one that is more readable and
easier to maintain and understand, and you will end up saving more time
in the future by not having to figure some possibly convoluted code
that might not make much difference in the end at all.

This quote is widely attributed to Donald Knuth:

"Premature optimization is the root of all evil."

The point of the quote is that if you try to optimize a section of code
before you can prove that it needs to be optimized, you may end up
writing obfuscated, difficult-to-read code for nothing.
I hope this helps, Ian.

-- Jean-Luc Romano

Uri Guttman · Dec 22, 2004

jpc> I'm not convinced that $var eq "" is necessarily more expensive
jpc> than length($var) . The reason I think this is because the eq
jpc> operator can report a false value as soon as it detects a character in
jpc> the variable it is examining, whereas the length() function must count
jpc> every single character in $var, even if $var is millions of characters
jpc> long.

why must length count all the chars? how will it know when the string
ends? does the string end in a zero byte? but perl strings can have any
binary data? so how does perl figure out the length of strings? hmmm.

<snip of overly massive tome on this subject>

jpc> This quote is widely attributed to Donald Knuth:

jpc> "Premature optimization is the root of all evil."

jpc> The point of the quote is that if you try to optimize a section of code
jpc> before you can prove that it needs to be optimized, you may end up
jpc> writing obfuscated, difficult-to-read code for nothing.
jpc> I hope this helps, Ian.

why didn't you just say that and cut out most of the rest (including
your comments on how length works in perl)?

uri

Joe Smith · Dec 23, 2004

I'm not convinced that $var eq "" is necessarily more expensive
than length($var) . The reason I think this is because the eq
operator can report a false value as soon as it detects a character in
the variable it is examining, whereas the length() function must count
every single character in $var

Perl's length() function does not count characters.
The information is already present in the guts of a scalar value.
Therefore your reasoning is incorrect.
-Joe

jl_post · Dec 23, 2004

Uri said:
why must length count all the chars? how will it know
when the string ends? does the string end in a zero
byte? but perl strings can have any binary data? so
how does perl figure out the length of strings? hmmm.

Hmmm... I didn't think of that. You bring up a good point.
Reflecting on what you just said, I'm remembering the Devel:

eek
module. The Devel:

eek:

ump() function lists a string's length, so
I'm guessing that the length() function could probably get that
attribute from the same place that Devel:

eek:

ump() does.

Thanks for pointing that out.

<snip of overly massive tome on this subject>

jpc> This quote is widely attributed to Donald Knuth:
jpc> "Premature optimization is the root of all evil."

why didn't you just say that and cut out most of the rest (including
your comments on how length works in perl)?

Since you asked, I'll explain.

This subject has come up several times with my peers, and I'm still
amazed what some programmers will favor in the name of efficiency and
speed. For example, some people will refuse to ever use the line:

$i++;

when $i is just an integer. Instead, they will say the code is wrong
unless it is written as:

++$i;

or:

$i += 1;

The reason they think that using the post-increment operator is wrong
is because it makes an extra copy that is never used (which is slower
and less efficient).

Now, they might have a point if $i is a blessed reference pointing
to a huge structure, but when $i is just an integer, it won't save you
any noticeable difference to use pre-increment instead of
post-increment.

But I've had people challenge me on this. They say that if you're
writing code, it should be as efficient as possible because it could
get called in a very tight loop that gets called a large number of
times.

And while I agree that code should be efficient, I point out that if
the code they write is running slowly, changing a post-increment
operator to a (presumably faster) pre-decrement operator isn't going to
speed up the program any satisfiable (or noticeable) amount. What will
make the difference instead is to re-write any algorithms with a Big-O
notation of N-squared (or worse) to be ones that have a Big-O notation
of N log(N) (or better).

And no matter how many times I try to convince them that a
bottleneck won't be eliminated just by replaceing something as trivial
as a post-increment operator with a pre-decrement operator, the person
I'm talking with often ends the discussion with: "Well... I'm still
going to use the more efficient code." Unfortunately, all too often
that means that their code will be more difficult to read and
understand (for others, of course), especially when they omit comments
explaining what their code is attempting to do and why it was written
that way. And often, their "more efficient" code is more bug-prone
than the equivalent "inferior, inefficient" code.

It seemed like you understood my point. But a lot of people don't.
They hear a cute little quote like this one I read from
http://www-106.ibm.com/developerworks/library/l-optperl.html :

All of this help, though, comes at a slight performance
cost. I keep warnings and strict on while programming
and debugging, and I switch it off once the script is
ready to be used in the real world. It won't save much,
but every millisecond counts.

I totally disagree with this (I won't go into the reasons why). But my
point is that many people will read this and use this as their
manifesto not to use warnings and strict.

I can counteract with another cute quote, but I've found that if a
person has been swayed by a cute-sy quote, they generally won't get
swayed back by another.

By the way the original poster posted his message, he seemed to
think that the faster method was good while all the rest were bad! He
may have obtained this notion the same way I did: when a computer
science professor gave a lecture on operations and how expensive they
are and how they ultimately cost money.

To answer your question, one quote alone is usually not enough to
sway a person's beliefs, so I felt the need to back it up with a
real-world example and scenario in the hopes that it would educate the
original poster.

I didn't mean to offend you or any other poster on this newsgroup
with my long response, but it's a pet peeve of mine when others write
obsfuscated code in the name of efficiency, particularly when the
amount of time saved from the total run-times of every run of the
"efficient" program amounts to less than a second. That's why I felt
that a thorough response was in order.

I hope this makes sense, Uri. (And thanks for pointing out that
thing about using length().)

-- Jean-Luc

jl_post · Dec 23, 2004

Joe said:
Perl's length() function does not count characters.
The information is already present in the guts of a
scalar value. Therefore your reasoning is incorrect.

I see I was wrong. Thanks for pointing that out.

I realized later that I could see this information by using the
Devel:

eek module, like this:

perl -MDevel:eek -e "Dump('perl')"

SV = PV(0x225208) at 0x1823e98
REFCNT = 1
FLAGS = (PADBUSY,PADTMP,POK,READONLY,pPOK)
PV = 0x182ac34 "perl"\0
CUR = 4
LEN = 5

Again, thanks.

-- Jean-Luc

A. Sinan Unur · Dec 23, 2004

Uri Guttman wrote:
....

By the way the original poster posted his message, he seemed to
think that the faster method was good while all the rest were bad! He
may have obtained this notion the same way I did: when a computer
science professor gave a lecture on operations and how expensive they
are and how they ultimately cost money.

I can why Jean-Luc responded the way he did (it was a little long for my
taste though

I have seen people attempt to find the least cost path through a graph with
a bazillion edges by first enumerating all the possible paths. They even
react by not believing the simple calculations that prove their program
will have to run for eons before it can ever come up with an answer. The
same people tend to also be overly impressed with obscure optimization
tricks.

That makes anyone who wants to optimize the following a little suspect and
possibly in need of some advice.

my $var;

# ...

$var = 'default' unless defined $var and length $var;

Jürgen Exner · Dec 23, 2004

This subject has come up several times with my peers, and I'm still
amazed what some programmers will favor in the name of efficiency and
speed. For example, some people will refuse to ever use the line:

$i++;

when $i is just an integer. Instead, they will say the code is wrong
unless it is written as:

++$i;

or:

$i += 1;

The reason they think that using the post-increment operator is wrong
is because it makes an extra copy that is never used (which is slower
and less efficient).

Tell them to take a class in basic compiler construction. Well, compile time
optimizations are an advanced topic, so they will have to take two classes.
But any compiler, that does not fold all three statements into the most
efficient form is not worth its money, even if it's free.

jue

Anno Siegel · Dec 23, 2004

This quote is widely attributed to Donald Knuth:

"Premature optimization is the root of all evil."

The origin of this popular saying is not clear. Knuth did use it in
_Structured Programming with goto Statements_: "We /should/ forget about
small efficiencies, say about 97% of the time: premature optimization
is the root of all evil."

However, when interviewed about it, Knuth attributed it to Tony
"Quicksort" Hoare. Hoare again doesn't want to own up and vaguely
blames it on Edsger "Harmful" Dijkstra. Dijkstra apparently hasn't
commented (and won't, he died in 2002).

Anno

Using symbolic references to invoke package methods - good or bad practice?	15	Apr 20, 2012
How to return boolean true and false	3	Dec 4, 2022
Why is str(None) == 'None' and not an empty string?	6	Aug 28, 2013
Re-raising a RuntimeError - good practice?	6	Oct 24, 2013
App_Code Class Help: good practice	2	Aug 13, 2006
Data saving in condition of changing reality	0	Apr 29, 2022
How to use PDF-lib and how to center each line of texts on the page?	1	Aug 16, 2023
HCaptcha - How to stop page from refreshing on submit if captcha is not checked/validated	1	Aug 29, 2023

Good practice to detect empty string?

ipellew

Keith Keller

Anno Siegel

jl_post

Uri Guttman

Joe Smith

jl_post

jl_post

A. Sinan Unur

Jürgen Exner

Anno Siegel

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads