extracting numbers from a string

M

Matt Jones

I have filenames from various digital cameras: DSC_1234.jpg,
CRW1234.jpg, etc. What I really want is the numeric portion of that
filename. How would I extract just that portion?


I expect it to involve the regex /\d+/, but I'm unclear how to extract a
portion of a string matching a regex.

Thank you
 
D

Dan Zwell

Matt said:
I have filenames from various digital cameras: DSC_1234.jpg,
CRW1234.jpg, etc. What I really want is the numeric portion of that
filename. How would I extract just that portion?


I expect it to involve the regex /\d+/, but I'm unclear how to extract a
portion of a string matching a regex.

Thank you

This may be the simplest (and arguably the most ruby-esque):
str = "DSC_1234.jpg"
num = str.scan(/\d+/)[0]

Other ways to do it:
num = str.match(/\d+/)[0]

OR
num = (/\d+/).match(str)[0]

OR
num = str.scan(/\d+/) {|match| match}

OR
num = str =~ /(\d+)/ ? $1 : nil

That is,
num = if str =~ /(\d+)/
$1
else
nil
end

OR
if str =~ /\d+/
num = $~[0]
end

Some proponents of ruby have said that perl's "There is more than one
way to do it," is a curse. But the same is true of ruby. However, it
seems to me that most people learn reasonable idioms and common sense
prevails.

Dan
 
M

Michael W. Ryder

Matt said:
I have filenames from various digital cameras: DSC_1234.jpg,
CRW1234.jpg, etc. What I really want is the numeric portion of that
filename. How would I extract just that portion?


I expect it to involve the regex /\d+/, but I'm unclear how to extract a
portion of a string matching a regex.

Thank you
a = "DSC_1234.jpg"
b = a.gsub(/[^[:digit:]]/, '')
 
C

come

If you just want to extract one number from a string, you could write
something like :

if a="DSC_1234.jpg"

then a[/\d+/] will give you the first longest string of numbers, so
1234.

If you want to be more precise, you could use parenthesis to extract
the exact portion you want, like :

a[/DSC_(\d+)\.jpg/,1] (<=> a.match(/DSC_(\d+)\.jpg/)[1])

or even : a[/\ADSC_(\d+)\.jpg\Z/,1]
 
B

Bas van Gils

I have filenames from various digital cameras: DSC_1234.jpg,
CRW1234.jpg, etc. What I really want is the numeric portion of that
filename. How would I extract just that portion?

Some solutions have been posted already, but here's mine:

irb(main):001:0> s="DSC_1234.jpg"
=> "DSC_1234.jpg"
irb(main):002:0> s.sub(/\D+(\d+).*/,'\1')
=> "1234"

basicially the regexp looks for :

- one or more non-digits
- one or more digits => because this is between parenthesis you can refer to
it with \1 later on
- something more

The digits (safely stored in \1) is all you want to keep... this assumed you
are only interested in the first sequence of numbers.

Cheers

Bas

--
Bas van Gils <[email protected]>, http://www.van-gils.org
[[[ Thank you for not distributing my E-mail address ]]]

Quod est inferius est sicut quod est superius, et quod est superius est sicut
quod est inferius, ad perpetranda miracula rei unius.
 
R

Robert Klemme

If you just want to extract one number from a string, you could write
something like :

if a="DSC_1234.jpg"

then a[/\d+/] will give you the first longest string of numbers, so
1234.

If you want to be more precise, you could use parenthesis to extract
the exact portion you want, like :

a[/DSC_(\d+)\.jpg/,1] (<=> a.match(/DSC_(\d+)\.jpg/)[1])

or even : a[/\ADSC_(\d+)\.jpg\Z/,1]

Or even simpler

irb(main):001:0> "DSC_1234.jpg"[/\d+/]
=> "1234"
irb(main):002:0> Integer("DSC_1234.jpg"[/\d+/])
=> 1234

Kind regards

robert
 
R

Rob Biedenharn

I have filenames from various digital cameras: DSC_1234.jpg,
CRW1234.jpg, etc. What I really want is the numeric portion of that
filename. How would I extract just that portion?


I expect it to involve the regex /\d+/, but I'm unclear how to
extract a
portion of a string matching a regex.

Thank you

Last November (2006), there was a series of postings to the Columbus
Ruby Brigade list beginning with:
http://groups.google.com/group/columbusrb/browse_frm/thread/
9c2e682f9926bad0

This was the pattern that I used when responding to Bill's code
because many of *my* pictures had names like "100_5142.jpg",
"100_5143.jpg", etc.

NUMBERED_FILE_PATTERN = %r{^(.*\D)?(\d+)(.+)$}

It became a constant since I used it in three places.

Rob Biedenharn http://agileconsultingllc.com
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top