regex problem: 'greater than' 'less than' and 'equals' not matching!

Discussion in 'Java' started by falcon, Feb 22, 2006.

  1. falcon

    falcon Guest

    I have a very strange problem. I want to replace every thing in a
    string except letters, numbers, space, and certain symbols listed in
    the regex expression below

    "blahblah !@#$%^&*()--.,<>=".replaceAll("[^A-Za-z0-9/-?:().,'+^| ]","")

    I expect to get the following string back:
    "blahblah ^().,"

    but I actually get the following:
    "blahblah ^().,<>="

    Notice the greater/less than and equals signs are still there!

    I did a quick check using this site:
    http://www.fileformat.info/tool/regex.htm and I get the same result
    back. What's going on here???
    falcon, Feb 22, 2006
    #1
    1. Advertising

  2. falcon

    colirl Guest

    Hi,

    I have not actually run the regex with my fixes but first of all there
    are a few problems. Characters like $, |, [, ), \, / and so on are
    peculiar cases in regular expressions. If you want to match for one of
    those then you have to preceed it by a backslash. So:

    \| # Vertical bar
    \[ # An open square bracket
    \) # A closing parenthesis
    \* # An asterisk
    \^ # A carat symbol
    \/ # A slash
    \\ # A backslash

    Try this for your ( and ) and see if it makes any difference! I dont'
    have the time to test the fix but thats what you need to do for special
    charachters.
    colirl, Feb 22, 2006
    #2
    1. Advertising

  3. falcon

    falcon Guest

    colirl,
    That doesn't seem to work. Besides, it is replacing most of right
    characters with blanks, for some reason it keeps relational symbols
    (><=).
    falcon, Feb 22, 2006
    #3
  4. falcon

    falcon Guest

    Sorry, it does work, I had to move some chars in my regext string
    around, but adding back slashes to those characters which have special
    meaning apparently was the problem. Thanks colirl!
    falcon, Feb 22, 2006
    #4
  5. falcon

    colirl Guest

    Ok you need to escape the - symbol because as I said, specail
    characters need to be escaped.

    try [^A-Za-z0-9/\-?:().,'+^| ] as your expression. gets rid of
    the ><= for me.
    colirl, Feb 22, 2006
    #5
  6. falcon

    colirl Guest

    so then you get


    blahblah ^()--.,


    Solution:
    "blahblah !@#$%^&*()--.,<>=".replaceAll("[^A-Za-z0-9/\-?:().,'+^|
    ]","")


    Enjoy :)
    colirl, Feb 22, 2006
    #6
  7. falcon writes:

    > I have a very strange problem. I want to replace every thing in a
    > string except letters, numbers, space, and certain symbols listed in
    > the regex expression below
    >
    > "blahblah !@#$%^&*()--.,<>=".replaceAll("[^A-Za-z0-9/-?:().,'+^| ]","")


    /-? contains :;<=>.
    Jussi Piitulainen, Feb 22, 2006
    #7
  8. falcon

    falcon Guest

    Jussi,
    I already fixed the problem, but its amusing that I missed seeing /-?
    as *from '/'* *to '?'*

    Thanks :)
    falcon, Feb 22, 2006
    #8
  9. falcon

    Rob Skedgell Guest

    colirl wrote:

    > Ok you need to escape the - symbol because as I said, specail
    > characters need to be escaped.
    >
    > try [^A-Za-z0-9/\-?:().,'+^| ] as your expression. gets rid of
    > the ><= for me.


    Or you can put a '-' unescaped as the last character in a character
    class, since it doesn't form part of a range e.g. "[a-z-]" will match
    "-". Admittedly the 1.5.0 javadocs for java.util.regex.Pattern at
    <http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html#cc>
    don't make that clear: "inside a character class ... the expression -
    becomes a range forming metacharacter."

    --
    Rob Skedgell <>
    GnuPG/PGP: 7DA3 1579 C0DD 8748 C05A B984 E2A2 3234 D14B 6DD7
    Rob Skedgell, Feb 22, 2006
    #9
  10. falcon

    colirl Guest

    Rob,
    ya, I had forgotten about that one. :).
    colirl, Feb 22, 2006
    #10
  11. falcon

    Roedy Green Guest

    On 22 Feb 2006 09:29:23 -0800, "falcon" <> wrote,
    quoted or indirectly quoted someone who said :

    >Sorry, it does work, I had to move some chars in my regext string
    >around, but adding back slashes to those characters which have special
    >meaning apparently was the problem. Thanks colirl!


    It its pretty hairy since \ is used both by regex and string literals
    for quoting. so a literal \ becomes \\\\
    see http://mindprod.com/jgloss\regex.html#QUOTING
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, Feb 24, 2006
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?TWlrZQ==?=

    greater then / less then

    =?Utf-8?B?TWlrZQ==?=, Nov 4, 2004, in forum: ASP .Net
    Replies:
    2
    Views:
    1,582
    Kevin Spencer
    Nov 4, 2004
  2. fake ID
    Replies:
    1
    Views:
    14,416
  3. fake ID
    Replies:
    0
    Views:
    588
    fake ID
    Feb 10, 2006
  4. Replies:
    6
    Views:
    397
  5. Dwight Army of Champions
    Replies:
    4
    Views:
    2,744
    John H.
    Mar 17, 2010
Loading...

Share This Page