Newbie Question: delete all non alphanumeric characters

Discussion in 'Ruby' started by Theallnighter Theallnighter, Jul 21, 2006.

  1. Hi all,
    how can i delete all non alphanumeric characters in a string ? thanks
     
    Theallnighter Theallnighter, Jul 21, 2006
    #1
    1. Advertisements

  2. string.gsub(/[0-9a-z]+/i, '')
     
    Logan Capaldo, Jul 21, 2006
    #2
    1. Advertisements

  3. Theallnighter Theallnighter

    Tom Werner Guest

    That deletes all alphanumeric. To delete all non-alphanumeric:

    string.gsub(/[^0-9a-z]/i, '')

    --
    Tom Werner
    Helmets to Hardhats
    Software Developer

    www.helmetstohardhats.org
     
    Tom Werner, Jul 21, 2006
    #3
  4. Doh! I'm obviously not awake yet this ---err-- afternoon.
     
    Logan Capaldo, Jul 21, 2006
    #4
  5. Theallnighter Theallnighter

    Jim Cochrane Guest

    I've also just started to learn Ruby, so thought I'd reply for the practice -
    Here's one solution:


    ------------------------------------------------------------------------
    #!/usr/bin/ruby

    x = "There are 2007 beans and 15234 grains of rice in this bag."
    puts x
    x.gsub!(/\W/, '')
    puts x

    ------------------------------------------------------------------------

    output:

    There are 2007 beans and 15234 grains of rice in this bag.
    Thereare2007beansand15234grainsofriceinthisbag

    --
     
    Jim Cochrane, Jul 21, 2006
    #5
  6. Well the only "problem" with that is

    x = '\w includes_under_scores_too'
     
    Logan Capaldo, Jul 21, 2006
    #6
  7. Theallnighter Theallnighter

    Jim Cochrane Guest

    Woah! Thanks for pointing that out. It looks like
    http://www.ruby-doc.org/docs/ruby-doc-bundle/UsersGuide/rg/regexp.html
    has a bug:

    \w letter or digit; same as [0-9A-Za-z]

    It's missing a _.

    Here's a fixed version:


    #!/usr/bin/ruby

    x = "There are 2007 beans_and 15234 grains of rice in this bag."
    puts x
    x.gsub!(/\W/, '')
    puts x
    x.gsub!(/\W|_/, '')
    puts "fixed:"
    puts x
     
    Jim Cochrane, Jul 21, 2006
    #7
  8. for fun, I started irb, then typed

    "567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')

    It returned

    67576hgjhgjh&**)

     
    dominique.plante, Jul 21, 2006
    #8
  9. Theallnighter Theallnighter

    Jim Cochrane Guest

    Oops - the above has a bug (although it still "works"). Here's a fixed
    version, with an opposite example further demonstrating the bug in the
    ruby doc site:


    #!/usr/bin/ruby

    s = "There are 2007 beans_and 15234 grains of rice in this bag."
    x = s.dup
    y = s.dup
    puts "original:"
    puts x
    x.gsub!(/\W/, '')
    puts "\nbroken:"
    puts x
    y.gsub!(/\W|_/, '')
    puts "\nfixed:"
    puts y

    puts "\nopposite:"
    z = s.dup
    z.gsub!(/\w/, '')
    puts z

    --

    original:
    There are 2007 beans_and 15234 grains of rice in this bag.

    broken:
    Thereare2007beans_and15234grainsofriceinthisbag

    fixed:
    Thereare2007beansand15234grainsofriceinthisbag

    opposite:
     
    Jim Cochrane, Jul 21, 2006
    #9
  10. Theallnighter Theallnighter

    Tom Werner Guest

    The carat goes inside the brackets (it inverses the character class)

    Tom

    --
    Tom Werner
    Helmets to Hardhats
    Software Developer

    www.helmetstohardhats.org
     
    Tom Werner, Jul 21, 2006
    #10
  11. for fun, I started irb, then typed
    No wonder. There was only one character at the begining of the string....



    Regards,
    Rimantas
     
    Rimantas Liubertas, Jul 21, 2006
    #11
  12. And it should look like this:

    "567576hgjhgjh&**)".sub(/[^0-9a-zA-Z]+/i, '')

    Note the +
    --
    Jeremy Tregunna



    "One serious obstacle to the adoption of good programming languages
    is the notion that everything has to be sacrificed for speed. In
    computer languages as in life, speed kills." -- Mike Vanier
     
    Jeremy Tregunna, Jul 21, 2006
    #12
  13. Theallnighter Theallnighter

    Tom Werner Guest

    #sub only does one replacement; adding a + will replace one chunk of
    non-alphas, but not any others in the string.

    Tom

    --
    Tom Werner
    Helmets to Hardhats
    Software Developer

    www.helmetstohardhats.org
     
    Tom Werner, Jul 21, 2006
    #13
  14. typo, sorry.
    --
    Jeremy Tregunna



    "One serious obstacle to the adoption of good programming languages
    is the notion that everything has to be sacrificed for speed. In
    computer languages as in life, speed kills." -- Mike Vanier
     
    Jeremy Tregunna, Jul 21, 2006
    #14
  15.  
    Logan Capaldo, Jul 22, 2006
    #15
  16. Theallnighter Theallnighter

    Joe Karma Guest

    TMTOWTDI:

    username.delete('^A-Za-z0-9')

    ...I just thought I'd add a little variety to this collection of
    Regexp-centric solutions.
     
    Joe Karma, Jul 22, 2006
    #16
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.