remove commas from string

  • Thread starter Jason Lillywhite
  • Start date
J

Jason Lillywhite

I have following string:

s = "B747-400, 8,357 miles, 561 mph, 4 Pratt & Whitney PW 4056
turbofans, 56,000 lbs."

I want to remove the comma only from the numbers (8,357 miles and 56,000
lbs) separating the thousands. I want the string to read as follows:

"B747-400, 8357 miles, 561 mph, 4 Pratt & Whitney PW 4056 turbofans,
56000 lbs."

I thought this following regex would do the trick because it
successfully isolated the right commas in rubular.com

s.gsub(/\d+(,)\d+/, "")

It turns out that my regex removes the entire number, not just the
comma.

Am I wrong in saying that my regex searches for 1 or more numbers
surrounding a comma and replaces just the comma with ""?

Thank you.
 
S

Suresh Kk

Jason said:
I have following string:

s = "B747-400, 8,357 miles, 561 mph, 4 Pratt & Whitney PW 4056
turbofans, 56,000 lbs."

I want to remove the comma only from the numbers (8,357 miles and 56,000
lbs) separating the thousands. I want the string to read as follows:

"B747-400, 8357 miles, 561 mph, 4 Pratt & Whitney PW 4056 turbofans,
56000 lbs."

I thought this following regex would do the trick because it
successfully isolated the right commas in rubular.com

s.gsub(/\d+(,)\d+/, "")

It turns out that my regex removes the entire number, not just the
comma.

Am I wrong in saying that my regex searches for 1 or more numbers
surrounding a comma and replaces just the comma with ""?

Thank you.

one simple one here:
new_string = s.gsub(",","")
p new_string
 
S

steve

Jason said:
I have following string:

s = "B747-400, 8,357 miles, 561 mph, 4 Pratt & Whitney PW 4056
turbofans, 56,000 lbs."

I want to remove the comma only from the numbers (8,357 miles and 56,000
lbs) separating the thousands. I want the string to read as follows:

"B747-400, 8357 miles, 561 mph, 4 Pratt & Whitney PW 4056 turbofans,
56000 lbs."

I thought this following regex would do the trick because it
successfully isolated the right commas in rubular.com

s.gsub(/\d+(,)\d+/, "")

It turns out that my regex removes the entire number, not just the
comma.

Am I wrong in saying that my regex searches for 1 or more numbers
surrounding a comma and replaces just the comma with ""?

Thank you.

Hi Jason

there is an example of how to do this in the gsub documentation

irb(main):007:0> s = "B747-400, 8,357 miles, 561 mph, 4 Pratt & Whitney
PW 4056 tubofans, 56,000 lbs"
=> "B747-400, 8,357 miles, 561 mph, 4 Pratt & Whitney PW 4056 tubofans,
56,000 lbs"
irb(main):008:0> s.gsub(/(\d+),(\d+)/,'\1\2')
=> "B747-400, 8357 miles, 561 mph, 4 Pratt & Whitney PW 4056 tubofans,
56000 lbs"

The regexp specifies a group of digits a comma and a group of digits
The brackets indicate that the groups should be remembered
\1 substitutes in the first match and \2 the second. Note that you need
single quotes around the string or \1 will be interpreted as octal 001
and \2 as octal 002

Hope this helps

Steve
 
7

7stud --

Jason said:
I have following string:

s = "B747-400, 8,357 miles, 561 mph, 4 Pratt & Whitney PW 4056
turbofans, 56,000 lbs."

I want to remove the comma only from the numbers (8,357 miles and 56,000
lbs) separating the thousands. I want the string to read as follows:

"B747-400, 8357 miles, 561 mph, 4 Pratt & Whitney PW 4056 turbofans,
56000 lbs."

I thought this following regex would do the trick because it
successfully isolated the right commas in rubular.com

s.gsub(/\d+(,)\d+/, "")

It turns out that my regex removes the entire number, not just the
comma.

Am I wrong in saying that my regex searches for 1 or more numbers
surrounding a comma and replaces just the comma with ""?

Yes, you are wrong:

s = "|yes|"
puts s.gsub(/y(e)s/, "")

--output:--
||

gsub() replaces the whole match with the specified replacement. gsub()
does not pick out a parenthesized group in the regex and replace that
with the specified replacement . However, there is a block form of
gsub:

s = "yes, 1,234, yes, 4,567"

result = s.gsub(/(\d),(\d)/) do |match|
"#{$1}#{$2}"
end

puts result

--output:--
yes, 1234, yes, 4567

Inside the block, $1, $2, $3, etc. refer to the matches for each
parenthesized group in the regex. The return value of the block is used
as the replacement.
 
7

7stud --

7stud said:
Inside the block, $1, $2, $3, etc. refer to the matches for each
parenthesized group in the regex. The return value of the block is used
as the replacement.

But note that once again, the entire match is replaced by the return
value of the block.
 
L

Lars Haugseth

* Jason Lillywhite said:
I have following string:

s = "B747-400, 8,357 miles, 561 mph, 4 Pratt & Whitney PW 4056
turbofans, 56,000 lbs."

I want to remove the comma only from the numbers (8,357 miles and 56,000
lbs) separating the thousands. I want the string to read as follows:

"B747-400, 8357 miles, 561 mph, 4 Pratt & Whitney PW 4056 turbofans,
56000 lbs."

Ruby 1.9 supports look-behind in regular expressions (Ruby 1.8
only supports look-ahead):

$ irb1.9

irb(main):001:0> s = "B747-400, 8,357 miles, 561 mph, 56,000 lbs."
=> "B747-400, 8,357 miles, 561 mph, 56,000 lbs."

irb(main):002:0> s.gsub(/(?<=\d),(?=\d)/, '')
=> "B747-400, 8357 miles, 561 mph, 56000 lbs."
 
R

Robert Klemme

2009/8/5 Lars Haugseth said:
Ruby 1.9 supports look-behind in regular expressions (Ruby 1.8
only supports look-ahead):

=A0$ irb1.9

=A0irb(main):001:0> s =3D "B747-400, 8,357 miles, 561 mph, 56,000 lbs."
=A0=3D> "B747-400, 8,357 miles, 561 mph, 56,000 lbs."

=A0irb(main):002:0> s.gsub(/(?<=3D\d),(?=3D\d)/, '')
=A0=3D> "B747-400, 8357 miles, 561 mph, 56000 lbs."

I'r rather do this to be a bit more robust:

irb(main):003:0> s.gsub(/(?<=3D\d),(?=3D\d{3})/, '')
=3D> "B747-400, 8357 miles, 561 mph, 56000 lbs."

Kind regards

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
S

s.ross

I'r rather do this to be a bit more robust:

irb(main):003:0> s.gsub(/(?<=\d),(?=\d{3})/, '')
=> "B747-400, 8357 miles, 561 mph, 56000 lbs."

Kind regards

robert

Why so complex? Perhaps:
=> "B747-400, 8357 miles, 561 mph, 56000 lbs."

Is there a corner case I'm missing here?
 
N

Nick Brown

Ruby has the neat ability to pass a block to gsub. This can be a more
versatile solution than using backreferences. It also allows Jason to
use his original, straight-forward regex. The matched string is passed
to the block, and whatever the block evaluates to is used as the
replacement string. Check it out:

irb(main):012:0> s
=> "B747-400, 8357 miles, 561 mph, 4 Pratt & Whitney PW 4056 turbofans,
turbofans, 56,000 lbs."
irb(main):013:0> s.gsub(/\d+,\d+/) { |subs| subs.gsub(',', '') }
=> "B747-400, 8357 miles, 561 mph, 4 Pratt & Whitney PW 4056 turbofans,
turbofans, 56000 lbs."
 
R

Rüdiger Bahns

s.ross schrieb:
....
Why so complex? Perhaps:

=> "B747-400, 8357 miles, 561 mph, 56000 lbs."

Is there a corner case I'm missing here?

This would make 2344,667 from 2,344,667 (leave the second comma). (Why
the "\b"s and not just /(\d),(\d{3})/ ?)

To clean up sums like those that are made as gifts for banks in this
times (like 1,000,000,000,000) in 1.8 you need something like

s.gsub /\d(,\d{3})+/ do |mo|; mo.gsub ',',''; end


R.
 
R

Robert Klemme

2009/8/6 Nick Brown said:
Ruby has the neat ability to pass a block to gsub. This can be a more
versatile solution than using backreferences.

It isn't needed though in this case. Please also note that the block
form is usually slower.

The block form is most appropriate if you need to calculate each
replacement individually.
It also allows Jason to
use his original, straight-forward regex. The matched string is passed
to the block, and whatever the block evaluates to is used as the
replacement string. Check it out:

irb(main):012:0> s
=> "B747-400, 8357 miles, 561 mph, 4 Pratt & Whitney PW 4056 turbofans,
turbofans, 56,000 lbs."
irb(main):013:0> s.gsub(/\d+,\d+/) { |subs| subs.gsub(',', '') }
=> "B747-400, 8357 miles, 561 mph, 4 Pratt & Whitney PW 4056 turbofans,
turbofans, 56000 lbs."

Frankly, I'd rather use any of the other non block solutions that the
one with a block. My 0.02EUR.

Kind regards

robert
 
J

Jason Lillywhite

Steve said:
Why so complex? Perhaps:

=> "B747-400, 8357 miles, 561 mph, 56000 lbs."

Is there a corner case I'm missing here?

So I think I would like to try a non-block option but this code above
misses the second comma if I have say 12,134,650 lbs instead of 56,000
lbs in my string.

It looks like I need Ruby 1.9 to do this: s.gsub(/(?<=\d),(?=\d{3})/,
'')

If I have not yet installed Ruby 1.9, what would be a good non-block
regex that is more robust?
 
R

Robert Klemme

2009/8/6 Jason Lillywhite said:
So I think I would like to try a non-block option but this code above
misses the second comma if I have say 12,134,650 lbs instead of 56,000
lbs in my string.

It looks like I need Ruby 1.9 to do this: s.gsub(/(?<=3D\d),(?=3D\d{3})/,
'')

If I have not yet installed Ruby 1.9, what would be a good non-block
regex that is more robust?

s.gsub(/(\d),(?=3D\d{3})/,'\\1')

Cheers

robert

--=20
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
 
J

Jason Lillywhite

Robert said:
s.gsub(/(\d),(?=\d{3})/,'\\1')

Thank you very much Robert.

And thanks to everyone else. This has been a good learning experience
for me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,754
Messages
2,569,528
Members
45,000
Latest member
MurrayKeync

Latest Threads

Top