Regex doesn't recognize single quote

Discussion in 'Java' started by Jerric, Jan 6, 2012.

  1. Jerric

    Jerric Guest

    Hi, I need to remove special characters, except \w and single quotes,
    from a string, can someone please help me on the regex?

    for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
    the following code, but it removed single quote. seems to me java
    cannot handle the pattern like [^'].

    String val = "ab'de+fg";
    val = val.replaceAll("[^\\w']+", "");

    Thanks a lot,
    Jerric, Jan 6, 2012
    #1
    1. Advertising

  2. On Fri, 06 Jan 2012 14:08:49 -0800, Jerric wrote:

    > Hi, I need to remove special characters, except \w and single quotes,
    > from a string, can someone please help me on the regex?
    >
    > for example, I have "ab'de+fg", I want to get "ab'defg", and I tried the
    > following code, but it removed single quote. seems to me java cannot
    > handle the pattern like [^'].
    >
    > String val = "ab'de+fg";
    > val = val.replaceAll("[^\\w']+", "");
    >

    Did you try escaping the single quote?


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
    Martin Gregorie, Jan 6, 2012
    #2
    1. Advertising

  3. Jerric

    Daniel Pitts Guest

    On 1/6/12 2:08 PM, Jerric wrote:
    > Hi, I need to remove special characters, except \w and single quotes,
    > from a string, can someone please help me on the regex?
    >
    > for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
    > the following code, but it removed single quote. seems to me java
    > cannot handle the pattern like [^'].
    >
    > String val = "ab'de+fg";
    > val = val.replaceAll("[^\\w']+", "");
    >
    > Thanks a lot,

    It works for me, which indicates the problem is somewhere in the code
    you didn't post. Here is an SSCCE:

    public class Works {
    public static void main(String[] args) {
    String val = "ab'de+fg";
    System.out.println(val.replaceAll("[^\\w']+", ""));

    }
    }

    Try posting exactly the code which causes the problem.
    Daniel Pitts, Jan 6, 2012
    #3
  4. Jerric

    Roedy Green Guest

    On Fri, 6 Jan 2012 14:08:49 -0800 (PST), Jerric <>
    wrote, quoted or indirectly quoted someone who said :

    >Hi, I need to remove special characters, except \w and single quotes,
    >from a string, can someone please help me on the regex?


    That is not what a regex is for. Just use a StringBuilder the length
    of your String. Then loop through the chars with charAt. If the
    character is a ' or \w, ignore it, else append. If it gets complex,
    use a switch or if it gets really complicated use a BitSet.


    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    If you can't remember the name of some method,
    consider changing it to something you can remember.
    Roedy Green, Jan 7, 2012
    #4
  5. Jerric

    Stefan Ram Guest

    Roedy Green <> writes:
    >That is not what a regex is for.


    How do you know what it is for?

    >Just use a StringBuilder the length of your String. Then
    >loop through the chars with charAt. If the character is a
    >' or \w, ignore it, else append. If it gets complex, use a
    >switch or if it gets really complicated use a BitSet.


    This might be needless (as far as we know right now)
    optimization bloating the code reducing its readability and
    low-level thinking, which might be required sometimes, but
    does not serve as a general rule. Still it is nice to know
    how it could be done if required.
    Stefan Ram, Jan 7, 2012
    #5
  6. Jerric

    Roedy Green Guest

    On 7 Jan 2012 11:42:26 GMT, -berlin.de (Stefan Ram) wrote,
    quoted or indirectly quoted someone who said :

    >>That is not what a regex is for.

    >
    > How do you know what it is for?


    Regexes are for searching for patterns. Transforming or deleting
    characters is much simpler done with a for loop.

    How do I know what a regex is for? I am familiar with the API. I have
    attempted to use them for various purposes and discovered they were
    suitable for some and not for others.
    >
    >>Just use a StringBuilder the length of your String. Then
    >>loop through the chars with charAt. If the character is a
    >>' or \w, ignore it, else append. If it gets complex, use a
    >>switch or if it gets really complicated use a BitSet.

    >
    > This might be needless (as far as we know right now)
    > optimization bloating the code reducing its readability and
    > low-level thinking, which might be required sometimes, but
    > does not serve as a general rule. Still it is nice to know
    > how it could be done if required.


    What is your simpler implementation?

    /** remove ' and \w from string
    * @param s string to process
    * @return string without ' or \w
    */
    private static String scrunch( final String s )
    {
    final Stringbuilder sb = new StringBuilder( s.length() );
    for (int i=0; i<s.length(); i++ )
    {
    char c = s.charAt(i);
    if ( !( c = '\'' || c = '\w' ) )
    {
    sb.append ( c );
    }
    }
    return sb.toString();
    }
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    If you can't remember the name of some method,
    consider changing it to something you can remember.
    Roedy Green, Jan 7, 2012
    #6
  7. Jerric

    Roedy Green Guest

    On 7 Jan 2012 11:42:26 GMT, -berlin.de (Stefan Ram) wrote,
    quoted or indirectly quoted someone who said :

    > How do you know what it is for?


    I see what you mean. I saw the problem as the pattern translation of
    various characters to various other characters. The problem is
    actually simpler than that. It translates various different
    characters all to the same empty "character".

    I find the replace methods dangerous. They are improperly named and
    thus it is easy to accidentally use a regex or non-regex. They also
    have to compile the pattern every time. I tend to avoid them.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    If you can't remember the name of some method,
    consider changing it to something you can remember.
    Roedy Green, Jan 7, 2012
    #7
  8. On 08/01/12 08:41, Roedy Green wrote:
    > On 7 Jan 2012 11:42:26 GMT, -berlin.de (Stefan Ram) wrote,
    > quoted or indirectly quoted someone who said :
    >
    >>> That is not what a regex is for.

    >>
    >> How do you know what it is for?

    >
    > Regexes are for searching for patterns. Transforming or deleting
    > characters is much simpler done with a for loop.
    >
    > How do I know what a regex is for? I am familiar with the API. I have
    > attempted to use them for various purposes and discovered they were
    > suitable for some and not for others.
    >>
    >>> Just use a StringBuilder the length of your String. Then
    >>> loop through the chars with charAt. If the character is a
    >>> ' or \w, ignore it, else append. If it gets complex, use a
    >>> switch or if it gets really complicated use a BitSet.

    >>
    >> This might be needless (as far as we know right now)
    >> optimization bloating the code reducing its readability and
    >> low-level thinking, which might be required sometimes, but
    >> does not serve as a general rule. Still it is nice to know
    >> how it could be done if required.

    >
    > What is your simpler implementation?
    >
    > /** remove ' and \w from string
    > * @param s string to process
    > * @return string without ' or \w
    > */
    > private static String scrunch( final String s )
    > {
    > final Stringbuilder sb = new StringBuilder( s.length() );
    > for (int i=0; i<s.length(); i++ )
    > {
    > char c = s.charAt(i);
    > if ( !( c = '\'' || c = '\w' ) )
    > {
    > sb.append ( c );
    > }
    > }
    > return sb.toString();
    > }


    In most cases is better to use a StringBuilder to perform replacements,
    but in this particular case String.replaceAll() is better. By the way,
    the escape sequence \w is not a java regular escape sequence but belongs
    to the pattern syntax (although you should already know about it, as you
    say you are familiar with the API).

    Anyway a simpler implementation (and one which works, because yours
    doesn't):

    /** remove ' and \w from string
    * @param s string to process
    * @return string without ' or \w
    */
    private static String scrunch( final String s ) {
    return s.replaceAll("[^'\\w]+", "");
    }
    Rafael Villar, Jan 7, 2012
    #8
  9. Mea Culpa, Sorry, it seems Roedy didn't understand the original problem,
    and also I didn't understand what Roedy was understanding (sorry Roedy)

    Anyway, a simpler method that does what Roedy intends to do:

    /** remove ' and \w from string
    * @param s string to process
    * @return string without ' or \w
    */
    private static String scrunch( final String s ) {
    return s.replaceAll("['\\w]+", "");
    }

    However the original problem is unknown as the original code is actually
    working.
    Rafael Villar, Jan 7, 2012
    #9
  10. Jerric

    Stefan Ram Guest

    Roedy Green <> writes:
    >What is your simpler implementation?
    >/** remove ' and \w from string
    > * @param s string to process
    > * @return string without ' or \w
    > */
    >private static String scrunch( final String s )
    >{
    >final Stringbuilder sb = new StringBuilder( s.length() );
    >for (int i=0; i<s.length(); i++ )
    > {
    > char c = s.charAt(i);
    > if ( !( c = '\'' || c = '\w' ) )


    Even with »==« instead of »=« (a »final « in front of the
    »char c« should help to detect such errors) and »\\w«
    instead of »\w« (»\w« is an illegal escape character in
    Java), the comparison with '\\w' is not what the OP actually
    wanted.

    Maybe you just want other people not to use regular
    expressions because you personally can't read them, but why
    should your personal knowledge (which is a by-product of you
    personal history) be a limitation of anyones else's work?

    > {
    > sb.append ( c );
    > }
    > }
    >return sb.toString();
    >}


    static String scrunch( final String s )
    { final java.lang.String string = s.toString();
    final java.lang.String result = s.replaceAll( "('|\\\\w)", "" );
    return new String( result ); }

    (Assuming the class »String« has an appropriate constructor.)

    (This implements your documentation, not what the OP wanted.)
    Stefan Ram, Jan 7, 2012
    #10
  11. On 12-01-07 06:48 PM, Roedy Green wrote:
    > On 7 Jan 2012 11:42:26 GMT, -berlin.de (Stefan Ram) wrote,
    > quoted or indirectly quoted someone who said :
    >
    >> How do you know what it is for?

    >
    > I see what you mean. I saw the problem as the pattern translation of
    > various characters to various other characters. The problem is
    > actually simpler than that. It translates various different
    > characters all to the same empty "character".
    >
    > I find the replace methods dangerous. They are improperly named and
    > thus it is easy to accidentally use a regex or non-regex. They also
    > have to compile the pattern every time. I tend to avoid them.


    The methods that accept 'char' or 'CharSequence" are named 'replace'.
    The two methods that use regexes are called 'replaceAll' and
    'replaceFirst'. I don't see a possibility of accidents here.

    The methods are not remotely improperly named: they replace text. That
    some of them use literals, and others use regular expressions, to
    specify what text is to be replaced, does not alter that central fact.

    AHS

    --
    ....wherever the people are well informed they can be trusted with their
    own government...
    -- Thomas Jefferson, 1789
    Arved Sandstrom, Jan 8, 2012
    #11
  12. Jerric

    Lew Guest

    > Roedy Green wrote:
    >> What is your simpler implementation?
    >>
    >> /** remove ' and \w from string
    >> * @param s string to process
    >> * @return string without ' or \w
    >> */
    >> private static String scrunch( final String s )
    >> {
    >> final Stringbuilder sb = new StringBuilder( s.length() );
    >> for (int i=0; i<s.length(); i++ )
    >> {
    >> char c = s.charAt(i);
    >> if ( !( c = '\'' || c = '\w' ) )
    >> {
    >> sb.append ( c );
    >> }
    >> }
    >> return sb.toString();
    >> }


    That will not perform the specified action, which is to remove non-word
    characters and to _keep_ apostrophes. '\w' is not legitimate Java syntax,
    thus will cause a compilation error.
    "It is a compile-time error if the character following a backslash in an
    escape is not an ASCII b, t, n, f, r, ", ', \, 0, 1, 2, 3, 4, 5, 6, or 7."
    <http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.6>

    The simpler approach was already posted by Daniel Pitts, and has the added
    virtues of both meeting the requirement and compiling:

    public class Works {
    public static void main(String[] args) {
    String val = "ab'de+fg";
    System.out.println(val.replaceAll("[^\\w']+", ""));
    }
    }

    --
    Lew
    Honi soit qui mal y pense.
    http://upload.wikimedia.org/wikipedia/commons/c/cf/Friz.jpg
    Lew, Jan 8, 2012
    #12
  13. Jerric

    Jim Janney Guest

    Daniel Pitts <> writes:

    > On 1/6/12 2:08 PM, Jerric wrote:
    >> Hi, I need to remove special characters, except \w and single quotes,
    >> from a string, can someone please help me on the regex?
    >>
    >> for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
    >> the following code, but it removed single quote. seems to me java
    >> cannot handle the pattern like [^'].
    >>
    >> String val = "ab'de+fg";
    >> val = val.replaceAll("[^\\w']+", "");
    >>
    >> Thanks a lot,

    > It works for me, which indicates the problem is somewhere in the code
    > you didn't post. Here is an SSCCE:
    >
    > public class Works {
    > public static void main(String[] args) {
    > String val = "ab'de+fg";
    > System.out.println(val.replaceAll("[^\\w']+", ""));
    >
    > }
    > }
    >
    > Try posting exactly the code which causes the problem.


    Since replaceAll is being used, the closure is unnecessary, so this can
    be shortened by one character :)

    --
    Jim Janney
    Jim Janney, Jan 8, 2012
    #13
  14. Jerric

    Daniel Pitts Guest

    On 1/7/12 6:02 PM, Jim Janney wrote:
    > Daniel Pitts<> writes:
    >
    >> On 1/6/12 2:08 PM, Jerric wrote:
    >>> Hi, I need to remove special characters, except \w and single quotes,
    >>> from a string, can someone please help me on the regex?
    >>>
    >>> for example, I have "ab'de+fg", I want to get "ab'defg", and I tried
    >>> the following code, but it removed single quote. seems to me java
    >>> cannot handle the pattern like [^'].
    >>>
    >>> String val = "ab'de+fg";
    >>> val = val.replaceAll("[^\\w']+", "");
    >>>
    >>> Thanks a lot,

    >> It works for me, which indicates the problem is somewhere in the code
    >> you didn't post. Here is an SSCCE:
    >>
    >> public class Works {
    >> public static void main(String[] args) {
    >> String val = "ab'de+fg";
    >> System.out.println(val.replaceAll("[^\\w']+", ""));
    >>
    >> }
    >> }
    >>
    >> Try posting exactly the code which causes the problem.

    >
    > Since replaceAll is being used, the closure is unnecessary, so this can
    > be shortened by one character :)
    >

    Perhaps, but I wouldn't be surprised if there was a performance
    difference in the two. I'm not saying there definitely is, but there
    very well could be.

    Also, they are only equivalent because the replacement string is zero
    length.
    Daniel Pitts, Jan 9, 2012
    #14
  15. Jerric

    Daniel Pitts Guest

    Wow, that is some of the worst String manipulation code I've seen.

    On 1/7/12 3:36 PM, Stefan Ram wrote:
    > static String scrunch( final String s )
    > { final java.lang.String string = s.toString();

    s.toString() == s for all non-null instances of String. Unneeded.

    > final java.lang.String result = s.replaceAll( "('|\\\\w)", "" );

    You don't need an intermediate here.

    > return new String( result ); }

    Strings are (mostly) immutable. There are extremely few good reasons to
    invoke the String(String) constructor manually. Not to mention
    s.replaceAll() will already potentially return a new String.
    >
    > (Assuming the class »String« has an appropriate constructor.)

    It does, but why use it unless you want to guaranty that they are
    ..equals, but !=.



    I'm not even going to comment on your insane style, as I think you've
    rebuffed all comments in the past. What I will comment on is the lack
    of consistency in this snippet. Some places use use "String" and others
    "java.lang.String".


    > (This implements your documentation, not what the OP wanted.)

    So does this, but with less waste and confusion.
    static String scrunch( final String source) {
    return s.replaceAll( "('|\\\\w)", "" );
    }
    Daniel Pitts, Jan 9, 2012
    #15
  16. Jerric

    Stefan Ram Guest

    Daniel Pitts <> writes:
    >I'm not even going to comment on your insane style, as I think you've
    >rebuffed all comments in the past. What I will comment on is the lack
    >of consistency in this snippet. Some places use use "String" and others
    >"java.lang.String".


    »String« is a class name used by Roedy.

    The actual class bound to the name of »String« depends on
    the context the snippet given by Roedy will be placed in.

    Since I have no information on that class »String«,
    I started by converting the String instance into a
    java.lang.String instance. Then, I was able to apply the
    operations of java.lang.String, which /are/ known to me.
    In the final end, I had to convert the java.lang.String
    instance back to an instance of the class »String«,
    because this was required by the interface of that method
    as given by Roedy.
    Stefan Ram, Jan 9, 2012
    #16
  17. Jerric

    Daniel Pitts Guest

    On 1/9/12 10:23 AM, Stefan Ram wrote:
    > Daniel Pitts<> writes:
    >> I'm not even going to comment on your insane style, as I think you've
    >> rebuffed all comments in the past. What I will comment on is the lack
    >> of consistency in this snippet. Some places use use "String" and others
    >> "java.lang.String".

    >
    > »String« is a class name used by Roedy.
    >
    > The actual class bound to the name of »String« depends on
    > the context the snippet given by Roedy will be placed in.
    >
    > Since I have no information on that class »String«,
    > I started by converting the String instance into a
    > java.lang.String instance. Then, I was able to apply the
    > operations of java.lang.String, which /are/ known to me.
    > In the final end, I had to convert the java.lang.String
    > instance back to an instance of the class »String«,
    > because this was required by the interface of that method
    > as given by Roedy.
    >

    Since String is in the java.lang package, it is safe to assume that
    "String" refers to the java.lang.String class, unless you are given
    context otherwise.
    Daniel Pitts, Jan 9, 2012
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    13,151
  2. Eric Layman
    Replies:
    3
    Views:
    616
    Rad [Visual C# MVP]
    Apr 14, 2007
  3. Evan
    Replies:
    1
    Views:
    384
    Marc 'BlackJack' Rintsch
    Nov 4, 2008
  4. Evan
    Replies:
    3
    Views:
    506
    Marc 'BlackJack' Rintsch
    Nov 4, 2008
  5. vikrant
    Replies:
    8
    Views:
    349
    vikrant
    May 17, 2007
Loading...

Share This Page