contains

Discussion in 'Java' started by bob, Sep 8, 2011.

  1. bob

    bob Guest

    Is there any case-insensitive version of the String contains method?
     
    bob, Sep 8, 2011
    #1
    1. Advertising

  2. On 11-09-08 04:44 AM, bob wrote:
    > Is there any case-insensitive version of the String contains method?
    >

    Not that I'm aware of. An easy way to do the deed is to use Pattern;
    something like

    boolean isContained =
    Pattern.compile("isThisStringContained",
    Pattern.CASE_INSENSITIVE)
    .matcher("stringThatMayContainOther").find();

    does the trick.

    A possibly preferable alternative to doing case-insensitive string
    operations is simply to uppercase (or lowercase) both Strings before
    doing the operation. .toLowerCase() and .toUpperCase() are String
    methods that are available for this purpose. If you plan to do this a
    lot you can write up a small utility method.

    AHS
     
    Arved Sandstrom, Sep 8, 2011
    #2
    1. Advertising

  3. bob

    Lew Guest

    Arved Sandstrom wrote:
    > bob wrote:
    >> Is there any case-insensitive version of the String contains method?
    >>

    > Not that I'm aware of. An easy way to do the deed is to use Pattern;
    > something like
    >
    > boolean isContained =
    > Pattern.compile("isThisStringContained",
    > Pattern.CASE_INSENSITIVE)
    > .matcher("stringThatMayContainOther").find();
    >
    > does the trick.
    >
    > A possibly preferable alternative to doing case-insensitive string
    > operations is simply to uppercase (or lowercase) both Strings before
    > doing the operation. .toLowerCase() and .toUpperCase() are String
    > methods that are available for this purpose. If you plan to do this a
    > lot you can write up a small utility method.


    Beware of uppercasing and lowercasing - the results can be surprising.

    String whatThe = "ß".toUpperCase().toLowerCase();

    What should be the value of '"ß".equalsIgnoreCase("ss")'?

    What should be the value of '"ß".toUpperCase().toLowerCase().equals("ß")'?

    --
    Lew
     
    Lew, Sep 8, 2011
    #3
  4. On 11-09-08 10:37 AM, Lew wrote:
    > Arved Sandstrom wrote:
    >> bob wrote:
    >>> Is there any case-insensitive version of the String contains method?
    >>>

    >> Not that I'm aware of. An easy way to do the deed is to use Pattern;
    >> something like
    >>
    >> boolean isContained =
    >> Pattern.compile("isThisStringContained",
    >> Pattern.CASE_INSENSITIVE)
    >> .matcher("stringThatMayContainOther").find();
    >>
    >> does the trick.
    >>
    >> A possibly preferable alternative to doing case-insensitive string
    >> operations is simply to uppercase (or lowercase) both Strings before
    >> doing the operation. .toLowerCase() and .toUpperCase() are String
    >> methods that are available for this purpose. If you plan to do this a
    >> lot you can write up a small utility method.

    >
    > Beware of uppercasing and lowercasing - the results can be surprising.
    >
    > String whatThe = "ß".toUpperCase().toLowerCase();
    >
    > What should be the value of '"ß".equalsIgnoreCase("ss")'?
    >
    > What should be the value of '"ß".toUpperCase().toLowerCase().equals("ß")'?
    >

    That's a good point, but it's no mystery that uppercasing/lowercasing
    outside the standard 26-letter Latin alphabet has the odd pitfall here
    and there, like Eszett rules.

    So my second suggestion applies in particular to Strings that contain
    Latin characters. For anything else you'd best be aware of the rules and
    the nature of your text.

    AHS
     
    Arved Sandstrom, Sep 8, 2011
    #4
  5. bob

    Lew Guest

    On Thursday, September 8, 2011 7:33:34 AM UTC-7, Arved Sandstrom wrote:

    Your post had trouble, being in "UTC-7" and dealing with 8-bit characters.

    > Lew wrote:
    >> Arved Sandstrom wrote:
    >>> bob wrote:
    >>>> Is there any case-insensitive version of the String contains method?
    >>>>
    >>> Not that I'm aware of. An easy way to do the deed is to use Pattern;
    >>> something like
    >>>
    >>> boolean isContained =
    >>> Pattern.compile("isThisStringContained",
    >>> Pattern.CASE_INSENSITIVE)
    >>> .matcher("stringThatMayContainOther").find();
    >>>
    >>> does the trick.
    >>>
    >>> A possibly preferable alternative to doing case-insensitive string
    >>> operations is simply to uppercase (or lowercase) both Strings before
    >>> doing the operation. .toLowerCase() and .toUpperCase() are String
    >>> methods that are available for this purpose. If you plan to do this a
    >>> lot you can write up a small utility method.

    >>
    >> Beware of uppercasing and lowercasing - the results can be surprising.
    >>
    >> String whatThe = "ß".toUpperCase().toLowerCase();
    >>
    >> What should be the value of '"ß".equalsIgnoreCase("ss")'?
    >>
    >> What should be the value of '"ß".toUpperCase().toLowerCase().equals("ß")'?
    >>

    > That's a good point, but it's no mystery that uppercasing/lowercasing
    > outside the standard 26-letter Latin alphabet has the odd pitfall here
    > and there, like Eszett rules.
    >
    > So my second suggestion applies in particular to Strings that contain
    > Latin characters. For anything else you'd best be aware of the rules and
    > the nature of your text.


    Which means, for all practical purposes, always be aware of the rules and nature of your text. Even here in the U.S. of A., we use lots of letters that don't fall into your (apparent) definition of "Latin" characters, for example, Latin-American names and loan words. Never mind that in almost any context that a programmer cares about, you have to deal with locales. Advising a programmer to deal with just the restriction to ASCII is a ludicroussuggestion. You pretty much always have to be aware, at a minimum, of eight-bit characters, and really all of UTF-8. To do otherwise is very irresponsible.

    Also, by definition "ß" is a Latin character, in the sense that it's in the Latin-1 (ISO8859-1) character set.

    If you write for the 128
    Then you deserve your horrible fate.

    --
    Lew
     
    Lew, Sep 8, 2011
    #5
  6. Lew <> wrote:
    > On Thursday, September 8, 2011 7:33:34 AM UTC-7, Arved Sandstrom wrote:
    > Your post had trouble, being in "UTC-7" and dealing with 8-bit characters.


    What does (your?) timezone UTC-7 have to do with Arved's ISO-8859-1 encoding?
     
    Andreas Leitgeb, Sep 8, 2011
    #6
  7. On 11-09-08 11:47 AM, Lew wrote:
    > On Thursday, September 8, 2011 7:33:34 AM UTC-7, Arved Sandstrom wrote:
    >
    > Your post had trouble, being in "UTC-7" and dealing with 8-bit characters.
    >
    >> Lew wrote:
    >>> Arved Sandstrom wrote:
    >>>> bob wrote:
    >>>>> Is there any case-insensitive version of the String contains method?
    >>>>>
    >>>> Not that I'm aware of. An easy way to do the deed is to use Pattern;
    >>>> something like
    >>>>
    >>>> boolean isContained =
    >>>> Pattern.compile("isThisStringContained",
    >>>> Pattern.CASE_INSENSITIVE)
    >>>> .matcher("stringThatMayContainOther").find();
    >>>>
    >>>> does the trick.
    >>>>
    >>>> A possibly preferable alternative to doing case-insensitive string
    >>>> operations is simply to uppercase (or lowercase) both Strings before
    >>>> doing the operation. .toLowerCase() and .toUpperCase() are String
    >>>> methods that are available for this purpose. If you plan to do this a
    >>>> lot you can write up a small utility method.
    >>>
    >>> Beware of uppercasing and lowercasing - the results can be surprising.
    >>>
    >>> String whatThe = "ß".toUpperCase().toLowerCase();
    >>>
    >>> What should be the value of '"ß".equalsIgnoreCase("ss")'?
    >>>
    >>> What should be the value of '"ß".toUpperCase().toLowerCase().equals("ß")'?
    >>>

    >> That's a good point, but it's no mystery that uppercasing/lowercasing
    >> outside the standard 26-letter Latin alphabet has the odd pitfall here
    >> and there, like Eszett rules.
    >>
    >> So my second suggestion applies in particular to Strings that contain
    >> Latin characters. For anything else you'd best be aware of the rules and
    >> the nature of your text.

    >
    > Which means, for all practical purposes, always be aware of the rules and nature of your text. Even here in the U.S. of A., we use lots of letters that don't fall into your (apparent) definition of "Latin" characters, for example, Latin-American names and loan words. Never mind that in almost any context that a programmer cares about, you have to deal with locales. Advising a programmer to deal with just the restriction to ASCII is a ludicrous suggestion. You pretty much always have to be aware, at a minimum, of eight-bit characters, and really all of UTF-8. To do otherwise is very irresponsible.


    I'm not exactly advising any programmer to deal with just ASCII; what I
    am saying here is that if you know that your text is ASCII text (*way*
    more common than you make out) that lowercasing and uppercasing in this
    particular situation is a potential approach. By ASCII text I still mean
    Unicode; simply the ASCII subset thereof.

    Incidentally, as a Canadian programmer who deals with a lot of
    government information systems, and no small number of business
    information systems, I can tell you straight up that it's not *my*
    decision as to what characters are acceptable - that's decided by
    analysts and business stakeholders. In fact it is still not uncommon for
    many government systems at provincial levels not to deal with French
    accents, let alone anything else outside the ASCII subset.

    For example, one provincial vital information registry I've worked on,
    which must communicate with other registries and also with federal
    government systems, specifically has severe restrictions (including no
    French-accented characters) so as to radically reduce the potential for
    string operation errors in various use cases. There are excellent
    reasons for doing this, namely that you have to deal with legacy data in
    legacy systems that don't know from accents at all. This is common.

    I'd be surprised if the majority of US-developed software is quite as
    accommodating as you suggest. Partially for the same reason as I mention
    above - as soon as one system starts using the full gamut of Unicode for
    storing text information, any other systems that it talks to must have
    the same data, or you must translate. If you're starting from scratch
    then go full monty with Unicode and have fun; if you're dealing with all
    those legacy datasets out there then it's not so easy.

    And it's also not the programmer's decision - or shouldn't be - to start
    using techniques such as those described in
    http://www.rgagnon.com/javadetails/java-0456.html.

    > Also, by definition "ß" is a Latin character, in the sense that it's in the Latin-1 (ISO8859-1) character set.
    >
    > If you write for the 128
    > Then you deserve your horrible fate.
    >

    On a go-forward basis starting with a clean slate, absolutely, I agree.

    AHS
     
    Arved Sandstrom, Sep 8, 2011
    #7
  8. bob

    Lew Guest

    Andreas Leitgeb wrote:
    > Lew wrote:
    >> On Thursday, September 8, 2011 7:33:34 AM UTC-7, Arved Sandstrom wrote:
    >> Your post had trouble, being in "UTC-7" and dealing with 8-bit characters.

    >
    > What does (your?) timezone UTC-7 have to do with Arved's ISO-8859-1 encoding?


    Nothing. I highlighted the wrong thing. What I meant to say was that his post didn't properly display the sharp-S character, and I got confused. My bad.

    --
    Lew
     
    Lew, Sep 9, 2011
    #8
  9. bob

    Arne Vajhøj Guest

    On 9/8/2011 3:44 AM, bob wrote:
    > Is there any case-insensitive version of the String contains method?


    Not builtin, but a loop and String equalsIgnoreCase should solve
    the problem as good as possible.

    Arne
     
    Arne Vajhøj, Sep 9, 2011
    #9
  10. On 08/09/2011 3:37 PM, Arved Sandstrom wrote:
    > I'm not exactly advising any programmer to deal with just ASCII; what I
    > am saying here is that if you know that your text is ASCII text (*way*
    > more common than you make out) that lowercasing and uppercasing in this
    > particular situation is a potential approach. By ASCII text I still mean
    > Unicode; simply the ASCII subset thereof.


    Not only that -- if everything is passed through
    ..toLowerCase().toUpperCase() then the input set of strings gets
    projected down onto a particular set of canonical representations. Some
    stuff will get conflated, but I think they amount only to alternative
    spellings of the same thing -- so finding matches among them does amount
    to there being substrings in common among the original inputs.
     
    supercalifragilisticexpialadiamaticonormalizeringe, Sep 9, 2011
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Brian
    Replies:
    1
    Views:
    5,989
    kangfucius
    Feb 21, 2005
  2. James Dyer
    Replies:
    5
    Views:
    649
  3. KathyB
    Replies:
    3
    Views:
    2,415
    Axel Dahmen
    Jul 1, 2003
  4. Andrew Connell
    Replies:
    1
    Views:
    550
    Michael Evanchik
    Nov 10, 2003
  5. Dylan Phillips
    Replies:
    0
    Views:
    370
    Dylan Phillips
    Nov 13, 2003
Loading...

Share This Page