Catching invalid string comparisons

Discussion in 'Java' started by Ulrich Eckhardt, Apr 9, 2008.

  1. Greetings!

    I guess many of you have already seen code (especially from C++ programmers)
    like the following:

    String s = someFunction();
    if(s == "0")
    doThis();
    else
    doThat();

    I wonder, is there a way to catch such comparisons where a non-fundamental
    type is compared to a constant? I'm actually looking for a way to avoid
    errors that occur due to this, maybe a warning when a comparison is done
    for identity where one of the two sides required boxing (Is that the term?)
    first.

    Further, I wonder about this case:

    Integer n = otherFunction();
    if(n == 42)
    dontPanic();

    Other than the above, I haven't actually tried it, but I would expect this
    code to also not do what a naive programmer would expect.


    Just for the record, in case you didn't spot the problem in above code, this
    is how the above should be written:

    if(s.equals("0")) ...
    if(n.equals(42)) ...
    // Note: I'm not sure about the latter.

    The reason is that '==' determines identity for objects while it determines
    equality for builtin types like 'int' (please correct me if I'm wrong or if
    this explanation is misleading in any way).

    thanks

    Uli

    --
    Sator Laser GmbH
    Geschäftsführer: Michael Wöhrmann, Amtsgericht Hamburg HR B62 932
    Ulrich Eckhardt, Apr 9, 2008
    #1
    1. Advertising

  2. Lew wrote:
    > Ulrich Eckhardt wrote:
    >> Greetings!
    >>
    >> I guess many of you have already seen code (especially from C++
    >> programmers) like the following:
    >>
    >> String s = someFunction();
    >> if(s == "0")
    >> doThis();
    >> else
    >> doThat();
    >>
    >> I wonder, is there a way to catch such comparisons where a
    >> non-fundamental type is compared to a constant? I'm actually looking for
    >> a way to avoid errors that occur due to this, maybe a warning when a
    >> comparison is done for identity where one of the two sides required
    >> boxing (Is that the term?) first.

    >
    > There is no boxing with Strings, and String is a pretty fundamental type,
    > so I'm not quite sure I get these comments.


    Bear with me, I probably didn't express myself correctly because I'm not
    that fluent in Java.

    The problem I'm facing is this:

    String s1 = "foo";
    String s2 = s1.substring(1,1); // yields empty string!
    if(s2 == "")
    System.out.println("s2 is empty");
    else
    System.out.println("s2 is not empty");

    This code actually checks if s2 is identical to an empty string literal, but
    the programmer expected it to check if it is equal to an empty string.

    > Anyhow, it's usually a mistake to compare String values with ==, except
    > when comparing against null. It's such a common mistake that IDEs
    > include an option to warn you of it.
    >
    >> Further, I wonder about this case:
    >>
    >> Integer n = otherFunction();
    >> if(n == 42)
    >> dontPanic();
    >>
    >> Other than the above, I haven't actually tried it, but I would expect
    >> this code to also not do what a naive programmer would expect.

    >
    > What would a naive programmer expect? What do you expect this snippet to
    > do?


    Actually, I would expect this to behave just like the code using a string
    above, i.e. that it checks if 'n' is identical to the integer literal '42'
    and not if they are equal, but that is because I know that Java makes this
    distinction between identity and equality.

    Note that I find this behaviour a bit confusing, because if I had written it
    like this:

    int n = yetAnotherFunction();
    if(n == 42)
    doPanic();

    really checks if 'n' is equal to '42', right?


    >> Just for the record, in case you didn't spot the problem in above code,
    >> this is how the above should be written:
    >>
    >> if(s.equals("0")) ...
    >> if(n.equals(42)) ...

    >
    > The 's' case is correct, but the 'n' case is just silly.


    Why silly? I don't get it. To me it's the same principle in both cases, in
    order to compare an object (like String or Integer) for equality with a
    constant, you use the equals() function and _not_ ==, which would check for
    identity.


    Uli

    --
    Sator Laser GmbH
    Geschäftsführer: Michael Wöhrmann, Amtsgericht Hamburg HR B62 932
    Ulrich Eckhardt, Apr 9, 2008
    #2
    1. Advertising

  3. Ulrich Eckhardt

    Guest

    In article <>,
    Ulrich Eckhardt <> wrote:
    > Lew wrote:
    > > Ulrich Eckhardt wrote:


    [ snip ]

    > The problem I'm facing is this:
    >
    > String s1 = "foo";
    > String s2 = s1.substring(1,1); // yields empty string!
    > if(s2 == "")
    > System.out.println("s2 is empty");
    > else
    > System.out.println("s2 is not empty");
    >
    > This code actually checks if s2 is identical to an empty string literal, but
    > the programmer expected it to check if it is equal to an empty string.
    >
    > > Anyhow, it's usually a mistake to compare String values with ==, except
    > > when comparing against null. It's such a common mistake that IDEs
    > > include an option to warn you of it.
    > >
    > >> Further, I wonder about this case:
    > >>
    > >> Integer n = otherFunction();
    > >> if(n == 42)
    > >> dontPanic();
    > >>
    > >> Other than the above, I haven't actually tried it, but I would expect
    > >> this code to also not do what a naive programmer would expect.

    > >
    > > What would a naive programmer expect? What do you expect this snippet to
    > > do?

    >
    > Actually, I would expect this to behave just like the code using a string
    > above, i.e. that it checks if 'n' is identical to the integer literal '42'
    > and not if they are equal, but that is because I know that Java makes this
    > distinction between identity and equality.
    >
    > Note that I find this behaviour a bit confusing, because if I had written it
    > like this:
    >
    > int n = yetAnotherFunction();
    > if(n == 42)
    > doPanic();
    >
    > really checks if 'n' is equal to '42', right?
    >
    >
    > >> Just for the record, in case you didn't spot the problem in above code,
    > >> this is how the above should be written:
    > >>
    > >> if(s.equals("0")) ...
    > >> if(n.equals(42)) ...

    > >
    > > The 's' case is correct, but the 'n' case is just silly.

    >
    > Why silly? I don't get it. To me it's the same principle in both cases, in
    > order to compare an object (like String or Integer) for equality with a
    > constant, you use the equals() function and _not_ ==, which would check for
    > identity.


    To me it makes better sense if you realize that the variable s is
    not actually a String object, but a *reference to* a String object.
    "==" works on such variables exactly the way it works on primitives --
    compares for equality -- but two (non-null) references are equal
    if, well, they're equal -- which for references means that they
    point to the same object. Comparing the objects themselves,
    rather than the references, requires use of the equals() method.

    Does that help? If you're coming from C++ this behavior will
    likely seem baffling at first, but IMO it makes good sense if
    you realize that non-primitive variables in Java are references
    to objects rather than objects.

    --
    B. L. Massingill
    ObDisclaimer: I don't speak for my employers; they return the favor.
    , Apr 9, 2008
    #3
  4. Ulrich Eckhardt

    Roedy Green Guest

    On Wed, 09 Apr 2008 11:17:07 +0200, Ulrich Eckhardt
    <> wrote, quoted or indirectly quoted someone
    who said :

    >I wonder, is there a way to catch such comparisons where a non-fundamental
    >type is compared to a constant? I


    I'm sure others will give you all sorts of suggestions, but the way I
    do it is using the code inspector (lint) that comes with Intellij
    Idea. That is just one of thousands of errors or suspect codings it
    catches.
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
    Roedy Green, Apr 9, 2008
    #4
  5. Ulrich Eckhardt

    Roedy Green Guest

    On Wed, 09 Apr 2008 11:17:07 +0200, Ulrich Eckhardt
    <> wrote, quoted or indirectly quoted someone
    who said :

    >
    > Integer n = otherFunction();
    > if(n == 42)
    > dontPanic();
    >
    >Other than the above, I haven't actually tried it, but I would expect this
    >code to also not do what a naive programmer would expect.


    If you need boxing and you are using an old Java, the compiler will
    complain. If you need boxing and you are using a new compiler it will
    autobox for you. So don't worry about that.
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
    Roedy Green, Apr 9, 2008
    #5
  6. Ulrich Eckhardt

    Mark Space Guest

    Matt Humphrey wrote:

    > I think what you're saying is that (for the benefit of those less
    > experienced with Java) you would like the IDE to highlight cases where == is


    I think what he's say is that he'd like the Java compiler to do this,
    emit warnings when there's questionable use of == operator. If the IDE
    wants to do it also, that's great.
    Mark Space, Apr 9, 2008
    #6
  7. Ulrich Eckhardt

    Mark Space Guest

    wrote:

    > To me it makes better sense if you realize that the variable s is
    > not actually a String object, but a *reference to* a String object.


    I agree. 'if ( s == "" )' checks to see if s is the same object as the
    null string object that the compiler/runtime has created to represent
    your literal. This could actually work, if s has been interned.

    String s1 = "foo";
    String s2 = s1.substring(1,1); // yields empty string!
    s2 = s2.intern();
    if(s2 == "")
    System.out.println("s2 is empty");
    else
    System.out.println("s2 is not empty");

    I believe this will work (I didn't test it). == compares object
    references. It doesn't check to see if two objects are "identical" it
    checks to see if they are in fact the same object.

    Basically, == does a bit-wise check for identity on any primitive. (Or
    at least that's how I think of it, it might actually work differently
    internally.) In this respect, comparing references in Java is exactly
    like comparing pointers in C.
    Mark Space, Apr 9, 2008
    #7
  8. Ulrich Eckhardt <> writes:

    > I guess many of you have already seen code (especially from C++ programmers)
    > like the following:
    >
    > String s = someFunction();
    > if(s == "0")
    > doThis();
    > else
    > doThat();


    This (anti-)pattern is common enough that most static
    analyzers will check for it. I've had great experiences
    with FindBugs (findbugs.sourceforge.net) and PMD
    (pmd.sourceforge.net).



    --
    Mark Jeffcoat
    Austin, TX
    Mark Jeffcoat, Apr 9, 2008
    #8
  9. wrote:
    > In article <>,
    > Ulrich Eckhardt <> wrote:
    >> Lew wrote:
    >> > Ulrich Eckhardt wrote:
    >> >> Just for the record, in case you didn't spot the problem in above
    >> >> code, this is how the above should be written:
    >> >>
    >> >> if(s.equals("0")) ...
    >> >> if(n.equals(42)) ...
    >> >
    >> > The 's' case is correct, but the 'n' case is just silly.

    >>
    >> Why silly? I don't get it. To me it's the same principle in both cases,
    >> in order to compare an object (like String or Integer) for equality with
    >> a constant, you use the equals() function and _not_ ==, which would check
    >> for identity.

    >
    > To me it makes better sense if you realize that the variable s is
    > not actually a String object, but a *reference to* a String object.


    Yes, this is a better way to express it.

    > "==" works on such variables exactly the way it works on primitives --
    > compares for equality -- but two (non-null) references are equal
    > if, well, they're equal -- which for references means that they
    > point to the same object. Comparing the objects themselves,
    > rather than the references, requires use of the equals() method.


    Right, I understood that.

    > If you're coming from C++ this behavior will likely seem baffling
    > at first, but IMO it makes good sense if you realize that non-primitive
    > variables in Java are references to objects rather than objects.


    Yes, I had to wrap my head around that first, but it makes sense in Java way
    of doing things.

    However, there is one thing which I don't understand yet:

    Integer n = new Integer(42);

    'n' is now a reference to an Integer object. So, if I do this:

    if(n==42) ...

    it should check if 'n' refers to the same object as '42' here. However, it
    doesn't, it actually checks if the Integer that 'n' refers to has the same
    value, which - to me - seems inconsistent with the String case. Is there a
    way to explain this inconsistency so that it makes sense again? Or do I
    just have to learn the rules for each type?

    thanks

    Uli

    --
    Sator Laser GmbH
    Geschäftsführer: Michael Wöhrmann, Amtsgericht Hamburg HR B62 932
    Ulrich Eckhardt, Apr 10, 2008
    #9
  10. Mark Space wrote:
    > wrote:
    >> To me it makes better sense if you realize that the variable s is
    >> not actually a String object, but a *reference to* a String object.

    >
    > I agree. 'if ( s == "" )' checks to see if s is the same object as the
    > null string object that the compiler/runtime has created to represent
    > your literal. This could actually work, if s has been interned.
    >
    > String s1 = "foo";
    > String s2 = s1.substring(1,1); // yields empty string!
    > s2 = s2.intern();
    > if(s2 == "")
    > System.out.println("s2 is empty");
    > else
    > System.out.println("s2 is not empty");
    >
    > I believe this will work (I didn't test it).


    In a test, this printed "s2 is not empty", because 's2' doesn't refer to the
    same object as '""' does. However, I could well imagine that a smarter
    compiler replaces all references to an empty string with references to the
    same string, which could then result in the other output.

    Uli

    --
    Sator Laser GmbH
    Geschäftsführer: Michael Wöhrmann, Amtsgericht Hamburg HR B62 932
    Ulrich Eckhardt, Apr 10, 2008
    #10
  11. Ulrich Eckhardt <> wrote:
    > However, there is one thing which I don't understand yet:
    > Integer n = new Integer(42);
    > 'n' is now a reference to an Integer object. So, if I do this:
    > if(n==42) ...


    If you compiled this, and ran javap on it, you'd see
    that n is really unboxed for the comparison, so you
    get a comparison of two int-primitives.

    If you had two Integer objects, then the comparison would
    be strictly on the references. So you will get "true" or
    "false" from the comparison, basically depending on whether
    the references just point to the same object anyway (due to
    previous copying of the reference), or if their "box" was
    autogenerated *and* the value was within byte-range.

    Integer n=42, m=42; // ==> m==n
    Integer n=new Integer(42), m=<whatever>(42); // ==> m!=n
    Integer n=4200, m=4200; // ==> m!=n

    For those not wishing to play lottery, I recommend .equals(),
    or explicit cast to primitives.
    Andreas Leitgeb, Apr 10, 2008
    #11
  12. Andreas Leitgeb <> wrote:
    > Ulrich Eckhardt <> wrote:
    >> However, there is one thing which I don't understand yet:
    >> Integer n = new Integer(42);
    >> 'n' is now a reference to an Integer object. So, if I do this:
    >> if(n==42) ...


    > If you compiled this, and ran javap on it, you'd see
    > that n is really unboxed for the comparison, so you
    > get a comparison of two int-primitives.


    I forgot the other question...
    Since Strings are not just boxes wrapped around primitives,
    this logic of course does not apply at all to Strings.

    The logic of comparing two String objects does share some
    of the peculiarities of comparing numeric box-objects, by
    which I mean, that references "sometimes" are equal for
    equal Stings, even if one is not a direct copy of the other
    reference. For String, however, this is more explicit
    and (imo) easier understandable and even somewhat controllable
    through it's method intern() - in contrast you can't enforce
    a canonical Integer-object for values outside the byte-range.
    Andreas Leitgeb, Apr 10, 2008
    #12
  13. Ulrich Eckhardt

    Guest

    In article <>,
    Ulrich Eckhardt <> wrote:
    > wrote:
    > > In article <>,
    > > Ulrich Eckhardt <> wrote:
    > >> Lew wrote:
    > >> > Ulrich Eckhardt wrote:
    > >> >> Just for the record, in case you didn't spot the problem in above
    > >> >> code, this is how the above should be written:
    > >> >>
    > >> >> if(s.equals("0")) ...
    > >> >> if(n.equals(42)) ...
    > >> >
    > >> > The 's' case is correct, but the 'n' case is just silly.
    > >>
    > >> Why silly? I don't get it. To me it's the same principle in both cases,
    > >> in order to compare an object (like String or Integer) for equality with
    > >> a constant, you use the equals() function and _not_ ==, which would check
    > >> for identity.

    > >
    > > To me it makes better sense if you realize that the variable s is
    > > not actually a String object, but a *reference to* a String object.

    >
    > Yes, this is a better way to express it.
    >
    > > "==" works on such variables exactly the way it works on primitives --
    > > compares for equality -- but two (non-null) references are equal
    > > if, well, they're equal -- which for references means that they
    > > point to the same object. Comparing the objects themselves,
    > > rather than the references, requires use of the equals() method.

    >
    > Right, I understood that.


    Just for the record -- I wasn't meaning to imply that you didn't,
    just restating it as part of an attempt to suggest a perspective
    that might help it make (more) sense.

    > > If you're coming from C++ this behavior will likely seem baffling
    > > at first, but IMO it makes good sense if you realize that non-primitive
    > > variables in Java are references to objects rather than objects.

    >
    > Yes, I had to wrap my head around that first, but it makes sense in Java way
    > of doing things.
    >
    > However, there is one thing which I don't understand yet:
    >
    > Integer n = new Integer(42);
    >
    > 'n' is now a reference to an Integer object. So, if I do this:
    >
    > if(n==42) ...
    >
    > it should check if 'n' refers to the same object as '42' here. However, it
    > doesn't, it actually checks if the Integer that 'n' refers to has the same
    > value, which - to me - seems inconsistent with the String case. Is there a
    > way to explain this inconsistency so that it makes sense again? Or do I
    > just have to learn the rules for each type?


    Others have explained about what's actually happening in that case
    (autoboxing).

    Maybe this one is easier to understand if you think "well, there's
    no sensible way to directly compare n, which is a reference to
    an Integer, with the integer primitive 42, so the compiler puts
    in code to perform appropriate conversions" ?

    Very much off the top of my head, but I feel sure someone will correct
    me if I've misspoken. :)

    --
    B. L. Massingill
    ObDisclaimer: I don't speak for my employers; they return the favor.
    , Apr 10, 2008
    #13
  14. Ulrich Eckhardt

    Mark Space Guest

    Ulrich Eckhardt wrote:

    >
    > In a test, this printed "s2 is not empty", because 's2' doesn't refer to the
    > same object as '""' does. However, I could well imagine that a smarter
    > compiler replaces all references to an empty string with references to the
    > same string, which could then result in the other output.


    Um, ok so I tested it myself.

    init:
    deps-jar:
    compile:
    run:
    s2 is empty
    BUILD SUCCESSFUL (total time: 1 second)


    What the heck are you using to test? Which compiler and runtime? This is
    runtime thing however, not a compile thing btw.

    Do you mean my code snippet exactly or another test? Are there any
    ClassLoaders involved besides the bootstrap one? I'm still somewhat
    surprised. Could you test my snippet exactly? Or maybe your runtime is
    buggered-up.
    Mark Space, Apr 10, 2008
    #14
  15. Mark Space wrote:
    > s2 is empty
    > BUILD SUCCESSFUL (total time: 1 second)
    >
    >
    > What the heck are you using to test? Which compiler and runtime? This is
    > runtime thing however, not a compile thing btw.


    Argh, I didn't actually run your snippet but rather mine, which lacks the
    call

    s2 = s2.intern();

    and that makes a big difference! ;)

    sorry for the confusion

    Uli

    --
    Sator Laser GmbH
    Geschäftsführer: Michael Wöhrmann, Amtsgericht Hamburg HR B62 932
    Ulrich Eckhardt, Apr 11, 2008
    #15
  16. Ulrich Eckhardt

    Mark Space Guest

    Ulrich Eckhardt wrote:

    > sorry for the confusion


    Ah, ok. To be fair, intern() does use a pool of strings, which I assume
    is limited in capacity. So a call to intern() could possibly fail, or
    bump an older string out of the pool. Something to be wary of ....
    Mark Space, Apr 11, 2008
    #16
  17. Ulrich Eckhardt

    Roedy Green Guest

    On Wed, 09 Apr 2008 07:43:53 -0400, Lew <> wrote,
    quoted or indirectly quoted someone who said :

    > It's such
    >a common mistake that IDEs include an option to warn you of it.


    You can use == with interned strings, but strings conceivably might
    become uninterned via serialisation. It is safe to use equals all the
    time, even with interned Strings, though a little slower. You still
    get a quick == match when the strings are interned as part of the
    equals check.
    --

    Roedy Green Canadian Mind Products
    The Java Glossary
    http://mindprod.com
    Roedy Green, Apr 24, 2008
    #17
  18. Lew <> wrote:
    > One thing that irritates me about the IDE feature to check for == (or != ) in
    > lieu of equals() comparisons is that I've seen it flag
    > if ( s == null )
    > as a violation.


    I'd consider it a bug in the particular IDE.
    It renders the whole feature almost useless.
    Andreas Leitgeb, Apr 24, 2008
    #18
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    413
    Mike Schilling
    Jun 27, 2005
  2. Horn

    string comparisons

    Horn, Oct 3, 2003, in forum: C++
    Replies:
    7
    Views:
    367
    Ashish
    Oct 6, 2003
  3. Patrick.O.Ige
    Replies:
    1
    Views:
    1,938
    Patrick.O.Ige
    Jul 2, 2006
  4. Sam R
    Replies:
    1
    Views:
    495
  5. kevin
    Replies:
    0
    Views:
    952
    kevin
    Jan 16, 2008
Loading...

Share This Page