Newbie Question: delete all non alphanumeric characters

Discussion in 'Ruby' started by Theallnighter Theallnighter, Jul 21, 2006.

  1. Theallnighter Theallnighter, Jul 21, 2006
    #1
    1. Advertising

  2. On Jul 21, 2006, at 1:53 PM, Theallnighter Theallnighter wrote:

    > Hi all,
    > how can i delete all non alphanumeric characters in a string ? thanks
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >


    string.gsub(/[0-9a-z]+/i, '')
     
    Logan Capaldo, Jul 21, 2006
    #2
    1. Advertising

  3. Theallnighter Theallnighter

    Tom Werner Guest

    Logan Capaldo wrote:
    >
    > On Jul 21, 2006, at 1:53 PM, Theallnighter Theallnighter wrote:
    >
    >> Hi all,
    >> how can i delete all non alphanumeric characters in a string ? thanks
    >>
    >> --
    >> Posted via http://www.ruby-forum.com/.
    >>

    >
    > string.gsub(/[0-9a-z]+/i, '')
    >
    >
    >

    That deletes all alphanumeric. To delete all non-alphanumeric:

    string.gsub(/[^0-9a-z]/i, '')

    --
    Tom Werner
    Helmets to Hardhats
    Software Developer

    www.helmetstohardhats.org
     
    Tom Werner, Jul 21, 2006
    #3
  4. On Jul 21, 2006, at 2:05 PM, Tom Werner wrote:

    > Logan Capaldo wrote:
    >>
    >> On Jul 21, 2006, at 1:53 PM, Theallnighter Theallnighter wrote:
    >>
    >>> Hi all,
    >>> how can i delete all non alphanumeric characters in a string ?
    >>> thanks
    >>>
    >>> --
    >>> Posted via http://www.ruby-forum.com/.
    >>>

    >>
    >> string.gsub(/[0-9a-z]+/i, '')
    >>
    >>
    >>

    > That deletes all alphanumeric. To delete all non-alphanumeric:
    >
    > string.gsub(/[^0-9a-z]/i, '')
    >
    > --
    > Tom Werner
    > Helmets to Hardhats
    > Software Developer
    >
    > www.helmetstohardhats.org
    >
    >


    Doh! I'm obviously not awake yet this ---err-- afternoon.
     
    Logan Capaldo, Jul 21, 2006
    #4
  5. Theallnighter Theallnighter

    Jim Cochrane Guest

    On 2006-07-21, Theallnighter Theallnighter <> wrote:
    > Hi all,
    > how can i delete all non alphanumeric characters in a string ? thanks
    >


    I've also just started to learn Ruby, so thought I'd reply for the practice -
    Here's one solution:


    ------------------------------------------------------------------------
    #!/usr/bin/ruby

    x = "There are 2007 beans and 15234 grains of rice in this bag."
    puts x
    x.gsub!(/\W/, '')
    puts x

    ------------------------------------------------------------------------

    output:

    There are 2007 beans and 15234 grains of rice in this bag.
    Thereare2007beansand15234grainsofriceinthisbag

    --
     
    Jim Cochrane, Jul 21, 2006
    #5
  6. On Jul 21, 2006, at 3:40 PM, Jim Cochrane wrote:

    > On 2006-07-21, Theallnighter Theallnighter
    > <> wrote:
    >> Hi all,
    >> how can i delete all non alphanumeric characters in a string ? thanks
    >>

    >
    > I've also just started to learn Ruby, so thought I'd reply for the
    > practice -
    > Here's one solution:
    >
    >
    > ----------------------------------------------------------------------
    > --
    > #!/usr/bin/ruby
    >
    > x = "There are 2007 beans and 15234 grains of rice in this bag."
    > puts x
    > x.gsub!(/\W/, '')
    > puts x
    >
    > ----------------------------------------------------------------------
    > --
    >
    > output:
    >
    > There are 2007 beans and 15234 grains of rice in this bag.
    > Thereare2007beansand15234grainsofriceinthisbag
    >
    > --
    >
    >


    Well the only "problem" with that is

    x = '\w includes_under_scores_too'
     
    Logan Capaldo, Jul 21, 2006
    #6
  7. Theallnighter Theallnighter

    Jim Cochrane Guest

    On 2006-07-21, Logan Capaldo <> wrote:
    >
    > On Jul 21, 2006, at 3:40 PM, Jim Cochrane wrote:
    >
    >> On 2006-07-21, Theallnighter Theallnighter
    >> <> wrote:
    >>> Hi all,
    >>> how can i delete all non alphanumeric characters in a string ? thanks
    >>>

    >> ...
    >> #!/usr/bin/ruby
    >>
    >> x = "There are 2007 beans and 15234 grains of rice in this bag."
    >> puts x
    >> x.gsub!(/\W/, '')
    >> puts x
    >> ...
    >>
    >>

    >
    > Well the only "problem" with that is
    >
    > x = '\w includes_under_scores_too'
    >


    Woah! Thanks for pointing that out. It looks like
    http://www.ruby-doc.org/docs/ruby-doc-bundle/UsersGuide/rg/regexp.html
    has a bug:

    \w letter or digit; same as [0-9A-Za-z]

    It's missing a _.

    Here's a fixed version:


    #!/usr/bin/ruby

    x = "There are 2007 beans_and 15234 grains of rice in this bag."
    puts x
    x.gsub!(/\W/, '')
    puts x
    x.gsub!(/\W|_/, '')
    puts "fixed:"
    puts x
     
    Jim Cochrane, Jul 21, 2006
    #7
  8. Theallnighter Theallnighter

    Guest

    for fun, I started irb, then typed

    "567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')

    It returned

    67576hgjhgjh&**)

    Tom Werner wrote:
    > Logan Capaldo wrote:
    > >
    > > On Jul 21, 2006, at 1:53 PM, Theallnighter Theallnighter wrote:
    > >
    > >> Hi all,
    > >> how can i delete all non alphanumeric characters in a string ? thanks
    > >>
    > >> --
    > >> Posted via http://www.ruby-forum.com/.
    > >>

    > >
    > > string.gsub(/[0-9a-z]+/i, '')
    > >
    > >
    > >

    > That deletes all alphanumeric. To delete all non-alphanumeric:
    >
    > string.gsub(/[^0-9a-z]/i, '')
    >
    > --
    > Tom Werner
    > Helmets to Hardhats
    > Software Developer
    >
    > www.helmetstohardhats.org
     
    , Jul 21, 2006
    #8
  9. Theallnighter Theallnighter

    Jim Cochrane Guest

    On 2006-07-21, Jim Cochrane <> wrote:
    > On 2006-07-21, Logan Capaldo <> wrote:
    >>
    >> On Jul 21, 2006, at 3:40 PM, Jim Cochrane wrote:
    >>
    >>> On 2006-07-21, Theallnighter Theallnighter
    >>> <> wrote:
    >>>> Hi all,
    >>>> how can i delete all non alphanumeric characters in a string ? thanks
    >>>>
    >>> ...
    >>> #!/usr/bin/ruby
    >>>
    >>> x = "There are 2007 beans and 15234 grains of rice in this bag."
    >>> puts x
    >>> x.gsub!(/\W/, '')
    >>> puts x
    >>> ...
    >>>
    >>>

    >>
    >> Well the only "problem" with that is
    >>
    >> x = '\w includes_under_scores_too'
    >>

    >
    > Woah! Thanks for pointing that out. It looks like
    > http://www.ruby-doc.org/docs/ruby-doc-bundle/UsersGuide/rg/regexp.html
    > has a bug:
    >
    > \w letter or digit; same as [0-9A-Za-z]
    >
    > It's missing a _.
    >
    > Here's a fixed version:
    >
    >
    > #!/usr/bin/ruby
    >
    > x = "There are 2007 beans_and 15234 grains of rice in this bag."
    > puts x
    > x.gsub!(/\W/, '')
    > puts x
    > x.gsub!(/\W|_/, '')
    > puts "fixed:"
    > puts x


    Oops - the above has a bug (although it still "works"). Here's a fixed
    version, with an opposite example further demonstrating the bug in the
    ruby doc site:


    #!/usr/bin/ruby

    s = "There are 2007 beans_and 15234 grains of rice in this bag."
    x = s.dup
    y = s.dup
    puts "original:"
    puts x
    x.gsub!(/\W/, '')
    puts "\nbroken:"
    puts x
    y.gsub!(/\W|_/, '')
    puts "\nfixed:"
    puts y

    puts "\nopposite:"
    z = s.dup
    z.gsub!(/\w/, '')
    puts z

    --

    original:
    There are 2007 beans_and 15234 grains of rice in this bag.

    broken:
    Thereare2007beans_and15234grainsofriceinthisbag

    fixed:
    Thereare2007beansand15234grainsofriceinthisbag

    opposite:
     
    Jim Cochrane, Jul 21, 2006
    #9
  10. Theallnighter Theallnighter

    Tom Werner Guest

    wrote:
    > for fun, I started irb, then typed
    >
    > "567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')
    >
    > It returned
    >
    > 67576hgjhgjh&**)
    >
    >


    The carat goes inside the brackets (it inverses the character class)

    Tom

    --
    Tom Werner
    Helmets to Hardhats
    Software Developer

    www.helmetstohardhats.org
     
    Tom Werner, Jul 21, 2006
    #10
  11. > for fun, I started irb, then typed
    >
    > "567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')
    >
    > It returned
    >
    > 67576hgjhgjh&**)


    No wonder. There was only one character at the begining of the string....



    Regards,
    Rimantas
    --
    http://rimantas.com/
     
    Rimantas Liubertas, Jul 21, 2006
    #11
  12. On 21-Jul-06, at 4:19 PM, Tom Werner wrote:

    > wrote:
    >> for fun, I started irb, then typed
    >>
    >> "567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')
    >>
    >> It returned
    >>
    >> 67576hgjhgjh&**)
    >>
    >>

    >
    > The carat goes inside the brackets (it inverses the character class)


    And it should look like this:

    "567576hgjhgjh&**)".sub(/[^0-9a-zA-Z]+/i, '')

    Note the +

    > Tom


    --
    Jeremy Tregunna



    "One serious obstacle to the adoption of good programming languages
    is the notion that everything has to be sacrificed for speed. In
    computer languages as in life, speed kills." -- Mike Vanier
     
    Jeremy Tregunna, Jul 21, 2006
    #12
  13. Theallnighter Theallnighter

    Tom Werner Guest

    Jeremy Tregunna wrote:
    >
    > And it should look like this:
    >
    > "567576hgjhgjh&**)".sub(/[^0-9a-zA-Z]+/i, '')
    >
    > Note the +
    >


    #sub only does one replacement; adding a + will replace one chunk of
    non-alphas, but not any others in the string.

    Tom

    --
    Tom Werner
    Helmets to Hardhats
    Software Developer

    www.helmetstohardhats.org
     
    Tom Werner, Jul 21, 2006
    #13
  14. On 21-Jul-06, at 4:44 PM, Tom Werner wrote:

    > Jeremy Tregunna wrote:
    >>
    >> And it should look like this:
    >>
    >> "567576hgjhgjh&**)".sub(/[^0-9a-zA-Z]+/i, '')
    >>
    >> Note the +
    >>

    >
    > #sub only does one replacement; adding a + will replace one chunk
    > of non-alphas, but not any others in the string.


    typo, sorry.

    > Tom


    --
    Jeremy Tregunna



    "One serious obstacle to the adoption of good programming languages
    is the notion that everything has to be sacrificed for speed. In
    computer languages as in life, speed kills." -- Mike Vanier
     
    Jeremy Tregunna, Jul 21, 2006
    #14
  15. On Jul 21, 2006, at 6:15 PM, Jeremy Tregunna wrote:

    >
    > On 21-Jul-06, at 4:44 PM, Tom Werner wrote:
    >
    >> Jeremy Tregunna wrote:
    >>>
    >>> And it should look like this:
    >>>
    >>> "567576hgjhgjh&**)".sub(/[^0-9a-zA-Z]+/i, '')
    >>>
    >>> Note the +
    >>>

    >>
    >> #sub only does one replacement; adding a + will replace one chunk
    >> of non-alphas, but not any others in the string.

    >
    > typo, sorry.


    Speaking of typos, say either a-zA-Z or a-z/i, you don't need both <g>

    >
    >> Tom

    >
    > --
    > Jeremy Tregunna
    >
    >
    >
    > "One serious obstacle to the adoption of good programming languages
    > is the notion that everything has to be sacrificed for speed. In
    > computer languages as in life, speed kills." -- Mike Vanier
    >
    >
     
    Logan Capaldo, Jul 22, 2006
    #15
  16. Theallnighter Theallnighter

    Joe Karma Guest

    On 7/21/06, Theallnighter Theallnighter <> wrote:
    > Hi all,
    > how can i delete all non alphanumeric characters in a string ? thanks
    >
    > --
    > Posted via http://www.ruby-forum.com/.
    >
    >


    TMTOWTDI:

    username.delete('^A-Za-z0-9')

    ...I just thought I'd add a little variety to this collection of
    Regexp-centric solutions.
     
    Joe Karma, Jul 22, 2006
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven J Sobol
    Replies:
    8
    Views:
    5,737
    Thomas Weidenfeller
    Apr 30, 2004
  2. joe

    remove non alphanumeric characters

    joe, Mar 2, 2007, in forum: C Programming
    Replies:
    5
    Views:
    872
  3. Yasin Cepeci
    Replies:
    1
    Views:
    979
    Juan T. Llibre
    Apr 26, 2007
  4. The Web President

    re.match and non-alphanumeric characters

    The Web President, Nov 16, 2008, in forum: Python
    Replies:
    8
    Views:
    406
    John Machin
    Nov 17, 2008
  5. Yasin Cepeci
    Replies:
    2
    Views:
    251
    Yasin Cepeci
    Apr 26, 2007
Loading...

Share This Page