Newbie Question: delete all non alphanumeric characters

  • Thread starter Theallnighter Theallnighter
  • Start date
T

Theallnighter Theallnighter

Hi all,
how can i delete all non alphanumeric characters in a string ? thanks
 
J

Jim Cochrane

Hi all,
how can i delete all non alphanumeric characters in a string ? thanks

I've also just started to learn Ruby, so thought I'd reply for the practice -
Here's one solution:


------------------------------------------------------------------------
#!/usr/bin/ruby

x = "There are 2007 beans and 15234 grains of rice in this bag."
puts x
x.gsub!(/\W/, '')
puts x

------------------------------------------------------------------------

output:

There are 2007 beans and 15234 grains of rice in this bag.
Thereare2007beansand15234grainsofriceinthisbag

--
 
L

Logan Capaldo

I've also just started to learn Ruby, so thought I'd reply for the
practice -
Here's one solution:


----------------------------------------------------------------------
--
#!/usr/bin/ruby

x = "There are 2007 beans and 15234 grains of rice in this bag."
puts x
x.gsub!(/\W/, '')
puts x

----------------------------------------------------------------------
--

output:

There are 2007 beans and 15234 grains of rice in this bag.
Thereare2007beansand15234grainsofriceinthisbag

Well the only "problem" with that is

x = '\w includes_under_scores_too'
 
D

dominique.plante

for fun, I started irb, then typed

"567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')

It returned

67576hgjhgjh&**)

Tom said:
Logan said:
Hi all,
how can i delete all non alphanumeric characters in a string ? thanks

string.gsub(/[0-9a-z]+/i, '')
That deletes all alphanumeric. To delete all non-alphanumeric:

string.gsub(/[^0-9a-z]/i, '')

--
Tom Werner
Helmets to Hardhats
Software Developer
(e-mail address removed)
www.helmetstohardhats.org
 
J

Jim Cochrane

Well the only "problem" with that is

x = '\w includes_under_scores_too'

Woah! Thanks for pointing that out. It looks like
http://www.ruby-doc.org/docs/ruby-doc-bundle/UsersGuide/rg/regexp.html
has a bug:

\w letter or digit; same as [0-9A-Za-z]

It's missing a _.

Here's a fixed version:


#!/usr/bin/ruby

x = "There are 2007 beans_and 15234 grains of rice in this bag."
puts x
x.gsub!(/\W/, '')
puts x
x.gsub!(/\W|_/, '')
puts "fixed:"
puts x

Oops - the above has a bug (although it still "works"). Here's a fixed
version, with an opposite example further demonstrating the bug in the
ruby doc site:


#!/usr/bin/ruby

s = "There are 2007 beans_and 15234 grains of rice in this bag."
x = s.dup
y = s.dup
puts "original:"
puts x
x.gsub!(/\W/, '')
puts "\nbroken:"
puts x
y.gsub!(/\W|_/, '')
puts "\nfixed:"
puts y

puts "\nopposite:"
z = s.dup
z.gsub!(/\w/, '')
puts z

--

original:
There are 2007 beans_and 15234 grains of rice in this bag.

broken:
Thereare2007beans_and15234grainsofriceinthisbag

fixed:
Thereare2007beansand15234grainsofriceinthisbag

opposite:
 
T

Tom Werner

for fun, I started irb, then typed

"567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')

It returned

67576hgjhgjh&**)

The carat goes inside the brackets (it inverses the character class)

Tom

--
Tom Werner
Helmets to Hardhats
Software Developer
(e-mail address removed)
www.helmetstohardhats.org
 
R

Rimantas Liubertas

for fun, I started irb, then typed
"567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')

It returned

67576hgjhgjh&**)

No wonder. There was only one character at the begining of the string....



Regards,
Rimantas
 
J

Jeremy Tregunna

for fun, I started irb, then typed

"567576hgjhgjh&**)".gsub(/^[0-9a-z]/i, '')

It returned

67576hgjhgjh&**)

The carat goes inside the brackets (it inverses the character class)

And it should look like this:

"567576hgjhgjh&**)".sub(/[^0-9a-zA-Z]+/i, '')

Note the +

--
Jeremy Tregunna
(e-mail address removed)


"One serious obstacle to the adoption of good programming languages
is the notion that everything has to be sacrificed for speed. In
computer languages as in life, speed kills." -- Mike Vanier
 
T

Tom Werner

Jeremy said:
And it should look like this:

"567576hgjhgjh&**)".sub(/[^0-9a-zA-Z]+/i, '')

Note the +

#sub only does one replacement; adding a + will replace one chunk of
non-alphas, but not any others in the string.

Tom

--
Tom Werner
Helmets to Hardhats
Software Developer
(e-mail address removed)
www.helmetstohardhats.org
 
J

Jeremy Tregunna

Jeremy said:
And it should look like this:

"567576hgjhgjh&**)".sub(/[^0-9a-zA-Z]+/i, '')

Note the +

#sub only does one replacement; adding a + will replace one chunk
of non-alphas, but not any others in the string.

typo, sorry.

--
Jeremy Tregunna
(e-mail address removed)


"One serious obstacle to the adoption of good programming languages
is the notion that everything has to be sacrificed for speed. In
computer languages as in life, speed kills." -- Mike Vanier
 
J

Joe Karma

Hi all,
how can i delete all non alphanumeric characters in a string ? thanks

TMTOWTDI:

username.delete('^A-Za-z0-9')

...I just thought I'd add a little variety to this collection of
Regexp-centric solutions.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top