Good practice to detect empty string?

Discussion in 'Perl Misc' started by ipellew@pipemedia.co.uk, Dec 21, 2004.

  1. Guest

    Hi all;

    Pls advise the perlophiliacs method of deciding a string is empty.

    I am using
    if ( $@ || $c_var eq "" ) {
    but constantly read `eq` is expensive.

    For example is
    if ( $@ || ! length $c_var ) {
    better, faster, cheaper

    Regards
    Ian
    , Dec 21, 2004
    #1
    1. Advertising

  2. Keith Keller Guest

    On 2004-12-21, <> wrote:
    >
    > Pls advise the perlophiliacs method of deciding a string is empty.
    >
    > I am using
    > if ( $@ || $c_var eq "" ) {
    > but constantly read `eq` is expensive.


    ....and using $@ in the absence of an eval is silly.

    Is there something wrong with

    if ($c_var)

    ? It's not exactly the same, but since you provide no context it's
    hard to know what you really need.

    --keith

    --
    -francisco.ca.us
    (try just my userid to email me)
    AOLSFAQ=http://wombat.san-francisco.ca.us/cgi-bin/fom
    Keith Keller, Dec 21, 2004
    #2
    1. Advertising

  3. Anno Siegel Guest

    <> wrote in comp.lang.perl.misc:
    > Hi all;
    >
    > Pls advise the perlophiliacs method of deciding a string is empty.
    >
    > I am using
    > if ( $@ || $c_var eq "" ) {
    > but constantly read `eq` is expensive.
    >
    > For example is
    > if ( $@ || ! length $c_var ) {
    > better, faster, cheaper


    If you really need to know, "use Benchmark", but that's futile
    micro-optimization. The idiomatic way is to test for length.
    Be sure the string is defined at all.

    Anno
    Anno Siegel, Dec 21, 2004
    #3
  4. Guest

    wrote:
    >
    > I am using
    > if ( $@ || $c_var eq "" ) {
    > but constantly read `eq` is expensive.
    >
    > For example is
    > if ( $@ || ! length $c_var ) {
    > better, faster, cheaper



    Dear Ian,

    I'm not convinced that $var eq "" is necessarily more expensive
    than length($var) . The reason I think this is because the eq
    operator can report a false value as soon as it detects a character in
    the variable it is examining, whereas the length() function must count
    every single character in $var, even if $var is millions of characters
    long.

    The method that is more expensive really depends on the
    implementation of the two functions/operators. If you really want to
    know which one is more expensinve for the task at hand, use the
    Benchmark module (read "perldoc Benchmark" to find out how to use it).

    But to be honest, it really doesn't matter which method is better,
    faster, cheaper. They are pretty much the same in terms of efficiency.
    Sure, one may use up a few more clock cycles than the other, but this
    is a small constant value that is practically imperceptable, even by
    computer standards (in fact, when I got used the Benchmark module I saw
    the warning: "(warning: too few iterations for a reliable count)" even
    when I used a count of ten million).

    A lot of programmers fall into the trap of thinking that if they
    always use the faster, more efficient operators that their code will
    run much faster than before. This is true only if the algorithms used
    in these options behave better with large data (are you familiar with
    Big-O notation?). So if your program can't handle large amounts of
    data very well (that is, if it had a Big-O value of N-squared), simply
    converting all your '$val eq ""' conditions to '!length($val)' isn't
    going to make your program magically handle large amounts of data.
    That's because eq and length() have roughly the same Big-O value. To
    make your program run faster, you'd have to modify its algorithms so
    that none of them are N-squared (or worse). At this point, the use of
    eq versus length() is really a moot point.

    To illustrate, if using the length() function is one-millionth of a
    second faster than using eq, it will only make a noticeable difference
    if length() (or eq) is used (on the order of) one million times more
    often than anything else (and then, the difference might only be one
    second). That is, if you want to check for the existence of an empty
    string only five, one hundred, or even a thousand times in your code,
    it really won't make a difference whether you use eq or length().
    Theoretically, one method will be faster than the other, but you
    couldn't time this difference with a stopwatch, even if you had faster
    reflexes than anybody else in the world. And like I mentioned above,
    even Perl's Benchmark module has trouble perceiving this time
    difference.

    In my opinion, you should usually use the function/operation that is
    more readable (and, of course, you have to decide for yourself which is
    more readable). If you spend two minutes converting the code to
    something that is theoretically faster, you might not even save one
    second of total running time (from every time you run the program).
    And if it takes someone in the future three extra minutes to figure out
    what you were trying to do, that's more than four minutes and 59
    seconds wasted changing your code, thinking that your code will become
    faster, better, cheaper.

    I realize I wrote a lot about this subject, but to summarize, let me
    say this:

    Making code run faster almost always means eliminating the
    bottlenecks. Changing '$var eq ""' to '!length($var)' might make a
    difference (probably super small) but it won't eliminate a bottleneck.

    Here is a real-world analogy (if you like these kinds of things):

    There is a ten-mile-long road that people drive their cars on. Most
    of this road has two lanes. But for some reason, five miles along the
    road, the two lanes merge into one lane, but only for 100 meters (after
    which they become two lanes again).

    Ordinarily this isn't a problem when there are few cars on the road.
    As a car reaches the place where the two lanes become one, it switches
    lanes (if needed), and then switches back when there are two lanes
    again.

    But during periods of heavy traffic, this lane merge causes a
    bottleneck. Multiple cars are trying to squeeze into one lane at the
    same time, creating a bottleneck and backing up traffic for miles.
    This is unacceptable, and a solution must be found.

    Someone might say that the speed limit should be raised from 55 mph
    to 60 mph, because 60 mph is faster, and therefore more efficient, and
    will make the cars move faster. Another person might say to make the
    stretch of road that only has one lane shorter so that there is more of
    the road with two full lanes.

    Their intentions are good, but none of these solutions eliminate the
    bottleneck, which is what is slowing down traffic. A solution that is
    much better than either of those just listed would be to insert a
    second lane (where there is currently only one lane) for cars to use
    instead of having to merge. (In fact, you could even reduce the speed
    limit to 50 mph with this solution and it would still work better than
    the solution to only raise the speed limit to 60 mph!)

    And while raising the speed limit to 60 mph sounds good, it won't
    even save you a full minute when the bottleneck is present. With the
    bottleneck, the traffic might be backed up for hours, so just
    eliminating one minute won't make all that much difference. Eliminate
    the bottleneck and hours of driving time will be saved, even when the
    speed limit is significantly slower.

    And that's why I think you shouldn't worry about whether you should
    use eq or length(). Just go with the one that is more readable and
    easier to maintain and understand, and you will end up saving more time
    in the future by not having to figure some possibly convoluted code
    that might not make much difference in the end at all.

    This quote is widely attributed to Donald Knuth:

    "Premature optimization is the root of all evil."

    The point of the quote is that if you try to optimize a section of code
    before you can prove that it needs to be optimized, you may end up
    writing obfuscated, difficult-to-read code for nothing.
    I hope this helps, Ian.

    -- Jean-Luc Romano
    , Dec 22, 2004
    #4
  5. Uri Guttman Guest

    >>>>> "jpc" == jl post@hotmail com <> writes:

    jpc> I'm not convinced that $var eq "" is necessarily more expensive
    jpc> than length($var) . The reason I think this is because the eq
    jpc> operator can report a false value as soon as it detects a character in
    jpc> the variable it is examining, whereas the length() function must count
    jpc> every single character in $var, even if $var is millions of characters
    jpc> long.

    why must length count all the chars? how will it know when the string
    ends? does the string end in a zero byte? but perl strings can have any
    binary data? so how does perl figure out the length of strings? hmmm.

    <snip of overly massive tome on this subject>

    jpc> This quote is widely attributed to Donald Knuth:

    jpc> "Premature optimization is the root of all evil."

    jpc> The point of the quote is that if you try to optimize a section of code
    jpc> before you can prove that it needs to be optimized, you may end up
    jpc> writing obfuscated, difficult-to-read code for nothing.
    jpc> I hope this helps, Ian.

    why didn't you just say that and cut out most of the rest (including
    your comments on how length works in perl)?

    uri

    --
    Uri Guttman ------ -------- http://www.stemsystems.com
    --Perl Consulting, Stem Development, Systems Architecture, Design and Coding-
    Search or Offer Perl Jobs ---------------------------- http://jobs.perl.org
    Uri Guttman, Dec 22, 2004
    #5
  6. Joe Smith Guest

    wrote:

    > I'm not convinced that $var eq "" is necessarily more expensive
    > than length($var) . The reason I think this is because the eq
    > operator can report a false value as soon as it detects a character in
    > the variable it is examining, whereas the length() function must count
    > every single character in $var


    Perl's length() function does not count characters.
    The information is already present in the guts of a scalar value.
    Therefore your reasoning is incorrect.
    -Joe
    Joe Smith, Dec 23, 2004
    #6
  7. Guest

    Uri Guttman wrote:
    >
    > why must length count all the chars? how will it know
    > when the string ends? does the string end in a zero
    > byte? but perl strings can have any binary data? so
    > how does perl figure out the length of strings? hmmm.



    Hmmm... I didn't think of that. You bring up a good point.
    Reflecting on what you just said, I'm remembering the Devel::peek
    module. The Devel::peek::Dump() function lists a string's length, so
    I'm guessing that the length() function could probably get that
    attribute from the same place that Devel::peek::Dump() does.

    Thanks for pointing that out.


    > <snip of overly massive tome on this subject>
    >
    > jpc> This quote is widely attributed to Donald Knuth:
    > jpc> "Premature optimization is the root of all evil."
    >
    > why didn't you just say that and cut out most of the rest (including
    > your comments on how length works in perl)?



    Since you asked, I'll explain.

    This subject has come up several times with my peers, and I'm still
    amazed what some programmers will favor in the name of efficiency and
    speed. For example, some people will refuse to ever use the line:

    $i++;

    when $i is just an integer. Instead, they will say the code is wrong
    unless it is written as:

    ++$i;

    or:

    $i += 1;

    The reason they think that using the post-increment operator is wrong
    is because it makes an extra copy that is never used (which is slower
    and less efficient).

    Now, they might have a point if $i is a blessed reference pointing
    to a huge structure, but when $i is just an integer, it won't save you
    any noticeable difference to use pre-increment instead of
    post-increment.

    But I've had people challenge me on this. They say that if you're
    writing code, it should be as efficient as possible because it could
    get called in a very tight loop that gets called a large number of
    times.

    And while I agree that code should be efficient, I point out that if
    the code they write is running slowly, changing a post-increment
    operator to a (presumably faster) pre-decrement operator isn't going to
    speed up the program any satisfiable (or noticeable) amount. What will
    make the difference instead is to re-write any algorithms with a Big-O
    notation of N-squared (or worse) to be ones that have a Big-O notation
    of N log(N) (or better).

    And no matter how many times I try to convince them that a
    bottleneck won't be eliminated just by replaceing something as trivial
    as a post-increment operator with a pre-decrement operator, the person
    I'm talking with often ends the discussion with: "Well... I'm still
    going to use the more efficient code." Unfortunately, all too often
    that means that their code will be more difficult to read and
    understand (for others, of course), especially when they omit comments
    explaining what their code is attempting to do and why it was written
    that way. And often, their "more efficient" code is more bug-prone
    than the equivalent "inferior, inefficient" code.

    It seemed like you understood my point. But a lot of people don't.
    They hear a cute little quote like this one I read from
    http://www-106.ibm.com/developerworks/library/l-optperl.html :

    > All of this help, though, comes at a slight performance
    > cost. I keep warnings and strict on while programming
    > and debugging, and I switch it off once the script is
    > ready to be used in the real world. It won't save much,
    > but every millisecond counts.


    I totally disagree with this (I won't go into the reasons why). But my
    point is that many people will read this and use this as their
    manifesto not to use warnings and strict.

    I can counteract with another cute quote, but I've found that if a
    person has been swayed by a cute-sy quote, they generally won't get
    swayed back by another.

    By the way the original poster posted his message, he seemed to
    think that the faster method was good while all the rest were bad! He
    may have obtained this notion the same way I did: when a computer
    science professor gave a lecture on operations and how expensive they
    are and how they ultimately cost money.

    To answer your question, one quote alone is usually not enough to
    sway a person's beliefs, so I felt the need to back it up with a
    real-world example and scenario in the hopes that it would educate the
    original poster.

    I didn't mean to offend you or any other poster on this newsgroup
    with my long response, but it's a pet peeve of mine when others write
    obsfuscated code in the name of efficiency, particularly when the
    amount of time saved from the total run-times of every run of the
    "efficient" program amounts to less than a second. That's why I felt
    that a thorough response was in order.

    I hope this makes sense, Uri. (And thanks for pointing out that
    thing about using length().)

    -- Jean-Luc
    , Dec 23, 2004
    #7
  8. Guest

    Joe Smith wrote:
    >
    > Perl's length() function does not count characters.
    > The information is already present in the guts of a
    > scalar value. Therefore your reasoning is incorrect.



    I see I was wrong. Thanks for pointing that out.

    I realized later that I could see this information by using the
    Devel::peek module, like this:


    > perl -MDevel::peek -e "Dump('perl')"

    SV = PV(0x225208) at 0x1823e98
    REFCNT = 1
    FLAGS = (PADBUSY,PADTMP,POK,READONLY,pPOK)
    PV = 0x182ac34 "perl"\0
    CUR = 4
    LEN = 5


    Again, thanks.

    -- Jean-Luc
    , Dec 23, 2004
    #8
  9. [OT] Re: Good practice to detect empty string?

    "" <> wrote in
    news::

    > Uri Guttman wrote:


    >
    >> <snip of overly massive tome on this subject>
    >>
    >> jpc> This quote is widely attributed to Donald Knuth:
    >> jpc> "Premature optimization is the root of all evil."
    >>
    >> why didn't you just say that and cut out most of the rest (including
    >> your comments on how length works in perl)?


    ....

    > By the way the original poster posted his message, he seemed to
    > think that the faster method was good while all the rest were bad! He
    > may have obtained this notion the same way I did: when a computer
    > science professor gave a lecture on operations and how expensive they
    > are and how they ultimately cost money.


    I can why Jean-Luc responded the way he did (it was a little long for my
    taste though :)

    I have seen people attempt to find the least cost path through a graph with
    a bazillion edges by first enumerating all the possible paths. They even
    react by not believing the simple calculations that prove their program
    will have to run for eons before it can ever come up with an answer. The
    same people tend to also be overly impressed with obscure optimization
    tricks.

    That makes anyone who wants to optimize the following a little suspect and
    possibly in need of some advice.

    my $var;

    # ...

    $var = 'default' unless defined $var and length $var;


    --
    A. Sinan Unur
    d
    (remove '.invalid' and reverse each component for email address)
    A. Sinan Unur, Dec 23, 2004
    #9
  10. wrote:
    > This subject has come up several times with my peers, and I'm still
    > amazed what some programmers will favor in the name of efficiency and
    > speed. For example, some people will refuse to ever use the line:
    >
    > $i++;
    >
    > when $i is just an integer. Instead, they will say the code is wrong
    > unless it is written as:
    >
    > ++$i;
    >
    > or:
    >
    > $i += 1;
    >
    > The reason they think that using the post-increment operator is wrong
    > is because it makes an extra copy that is never used (which is slower
    > and less efficient).


    Tell them to take a class in basic compiler construction. Well, compile time
    optimizations are an advanced topic, so they will have to take two classes.
    But any compiler, that does not fold all three statements into the most
    efficient form is not worth its money, even if it's free.

    jue
    Jürgen Exner, Dec 23, 2004
    #10
  11. Anno Siegel Guest

    <> wrote in comp.lang.perl.misc:

    > This quote is widely attributed to Donald Knuth:
    >
    > "Premature optimization is the root of all evil."


    The origin of this popular saying is not clear. Knuth did use it in
    _Structured Programming with goto Statements_: "We /should/ forget about
    small efficiencies, say about 97% of the time: premature optimization
    is the root of all evil."

    However, when interviewed about it, Knuth attributed it to Tony
    "Quicksort" Hoare. Hoare again doesn't want to own up and vaguely
    blames it on Edsger "Harmful" Dijkstra. Dijkstra apparently hasn't
    commented (and won't, he died in 2002).

    Anno
    Anno Siegel, Dec 23, 2004
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matt
    Replies:
    4
    Views:
    1,373
    Roedy Green
    Jun 23, 2004
  2. vlsidesign
    Replies:
    26
    Views:
    971
    Keith Thompson
    Jan 2, 2007
  3. SM
    Replies:
    9
    Views:
    499
  4. Tzury Bar Yochay
    Replies:
    1
    Views:
    404
    Gabriel Genellina
    Mar 24, 2008
  5. Savvoulidis Iordanis

    Empty gridview cells and checking for empty string

    Savvoulidis Iordanis, Sep 5, 2008, in forum: ASP .Net
    Replies:
    1
    Views:
    518
    Savvoulidis Iordanis
    Sep 5, 2008
Loading...

Share This Page