Remove punctuation from String?

Discussion in 'Java' started by dfhLASST, Nov 11, 2004.

  1. dfhLASST

    dfhLASST Guest

    What is the best way to remove all non-alphabetic characters (e.g. symbols,
    spaces etc.) from a String?

    My original plan was to loop round the chars in the String and add them to
    an array if the value of the chars are alphabetic (i.e. >=65 and <=122).
    I've ran into problems with this and it seems more complex than the problem
    should be.

    Any suggestions?
     
    dfhLASST, Nov 11, 2004
    #1
    1. Advertising

  2. dfhLASST wrote:
    > What is the best way to remove all non-alphabetic characters (e.g. symbols,
    > spaces etc.) from a String?
    >
    > My original plan was to loop round the chars in the String and add them to
    > an array if the value of the chars are alphabetic (i.e. >=65 and <=122).
    > I've ran into problems with this and it seems more complex than the problem
    > should be.


    The problem is more complex than you think. Are you absolutely sure than
    you're only ever going to process English text? If not, use Character.isLetter()
    for the condition.

    For the accumulation of the output string, use StringBuffer (I gess that's
    where you encountered obvious problems).
     
    Michael Borgwardt, Nov 11, 2004
    #2
    1. Advertising

  3. dfhLASST

    Chris Smith Guest

    dfhLASST wrote:
    > What is the best way to remove all non-alphabetic characters (e.g. symbols,
    > spaces etc.) from a String?
    >
    > My original plan was to loop round the chars in the String and add them to
    > an array if the value of the chars are alphabetic (i.e. >=65 and <=122).
    > I've ran into problems with this and it seems more complex than the problem
    > should be.
    >
    > Any suggestions?


    str = str.replaceAll("[^A-Za-z]", "");

    or, if you want more than just ASCII characters:

    str = str.replaceAll("[^\\p{L}]", "");

    --
    www.designacourse.com
    The Easiest Way To Train Anyone... Anywhere.

    Chris Smith - Lead Software Developer/Technical Trainer
    MindIQ Corporation
     
    Chris Smith, Nov 11, 2004
    #3
  4. dfhLASST

    dfhLASST Guest

    "Michael Borgwardt" <> wrote in message
    news:...
    > dfhLASST wrote:
    > > What is the best way to remove all non-alphabetic characters (e.g.

    symbols,
    > > spaces etc.) from a String?
    > >
    > > My original plan was to loop round the chars in the String and add them

    to
    > > an array if the value of the chars are alphabetic (i.e. >=65 and <=122).
    > > I've ran into problems with this and it seems more complex than the

    problem
    > > should be.

    >
    > The problem is more complex than you think. Are you absolutely sure than
    > you're only ever going to process English text? If not, use

    Character.isLetter()
    > for the condition.
    >
    > For the accumulation of the output string, use StringBuffer (I gess that's
    > where you encountered obvious problems).


    Thanks, yeah I used that.

    For future reference for anyone else here is my method:


    public String stripPunctuation(String s) {

    StringBuffer sb = new StringBuffer();

    for (int i = 0; i < s.length(); i++) {
    if ((s.charAt(i) >= 65 && s.charAt(i) <= 90) || (s.charAt(i) >= 97 &&
    s.charAt(i) <= 122)) {

    sb = sb.append(s.charAt(i));
    }
    }

    return sb.toString();
    }
     
    dfhLASST, Nov 11, 2004
    #4
  5. dfhLASST

    Woebegone Guest

    "Michael Borgwardt" <> wrote in message
    news:...
    > dfhLASST wrote:
    >> What is the best way to remove all non-alphabetic characters (e.g.
    >> symbols,
    >> spaces etc.) from a String?


    8<
    >
    > The problem is more complex than you think. Are you absolutely sure than
    > you're only ever going to process English text? If not, use
    > Character.isLetter()
    > for the condition.
    >
    > For the accumulation of the output string, use StringBuffer (I gess that's
    > where you encountered obvious problems).


    I've used something like the following in cases where I know the processing
    is constrained to a given (relatively small) set of characters, e.g. English
    text. It has the advantage of allowing easy extension by adding characters
    to ALPHABET without necessarily requiring char codes.

    /* */
    public class StringCleanser {
    public static final String ALPHABET =
    "ABCDEFGHIJKLMNOPQRSTUVWXYZ" +
    "abcdefghijklmnopqrstuvwxyz";
    public static boolean isAlphabetic(char c) {
    return StringCleanser.ALPHABET.indexOf(c) != -1;
    }
    public static String cleanse(String s) {
    StringBuffer buf = new StringBuffer();
    for (int i = 0; i < s.length(); i++) {
    if (StringCleanser.isAlphabetic(s.charAt(i))) {
    buf.append(s.charAt(i));
    }
    }
    return buf.toString();
    }
    public static void main(String[] args) {
    String in = "L e,f.t/o;v'e[r]L1e.2t3t4e ,5r6s7";
    System.out.println(StringCleanser.cleanse(in));
    }
    }
    /* */
    --
    Regards,
    Sean.
     
    Woebegone, Nov 11, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Tashfeen Bhimdi
    Replies:
    6
    Views:
    2,676
    Alf P. Steinbach
    Oct 11, 2006
  2. kylin
    Replies:
    1
    Views:
    603
    Chris Rebert
    Nov 4, 2009
  3. Beznas
    Replies:
    8
    Views:
    258
    Evertjan.
    Sep 10, 2003
  4. Kev Jackson
    Replies:
    12
    Views:
    194
    Adam Sanderson
    Jan 12, 2006
  5. Anat
    Replies:
    2
    Views:
    242
    Lasse Reichstein Nielsen
    May 25, 2006
Loading...

Share This Page