extracting numbers from a string

Discussion in 'Ruby' started by Matt Jones, Jun 12, 2007.

  1. Matt Jones

    Matt Jones Guest

    I have filenames from various digital cameras: DSC_1234.jpg,
    CRW1234.jpg, etc. What I really want is the numeric portion of that
    filename. How would I extract just that portion?


    I expect it to involve the regex /\d+/, but I'm unclear how to extract a
    portion of a string matching a regex.

    Thank you
     
    Matt Jones, Jun 12, 2007
    #1
    1. Advertisements

  2. Matt Jones

    Dan Zwell Guest

    This may be the simplest (and arguably the most ruby-esque):
    str = "DSC_1234.jpg"
    num = str.scan(/\d+/)[0]

    Other ways to do it:
    num = str.match(/\d+/)[0]

    OR
    num = (/\d+/).match(str)[0]

    OR
    num = str.scan(/\d+/) {|match| match}

    OR
    num = str =~ /(\d+)/ ? $1 : nil

    That is,
    num = if str =~ /(\d+)/
    $1
    else
    nil
    end

    OR
    if str =~ /\d+/
    num = $~[0]
    end

    Some proponents of ruby have said that perl's "There is more than one
    way to do it," is a curse. But the same is true of ruby. However, it
    seems to me that most people learn reasonable idioms and common sense
    prevails.

    Dan
     
    Dan Zwell, Jun 12, 2007
    #2
    1. Advertisements

  3. a = "DSC_1234.jpg"
    b = a.gsub(/[^[:digit:]]/, '')
     
    Michael W. Ryder, Jun 12, 2007
    #3
  4. Matt Jones

    come Guest

    If you just want to extract one number from a string, you could write
    something like :

    if a="DSC_1234.jpg"

    then a[/\d+/] will give you the first longest string of numbers, so
    1234.

    If you want to be more precise, you could use parenthesis to extract
    the exact portion you want, like :

    a[/DSC_(\d+)\.jpg/,1] (<=> a.match(/DSC_(\d+)\.jpg/)[1])

    or even : a[/\ADSC_(\d+)\.jpg\Z/,1]
     
    come, Jun 12, 2007
    #4
  5. Matt Jones

    Bas van Gils Guest

    Some solutions have been posted already, but here's mine:

    irb(main):001:0> s="DSC_1234.jpg"
    => "DSC_1234.jpg"
    irb(main):002:0> s.sub(/\D+(\d+).*/,'\1')
    => "1234"

    basicially the regexp looks for :

    - one or more non-digits
    - one or more digits => because this is between parenthesis you can refer to
    it with \1 later on
    - something more

    The digits (safely stored in \1) is all you want to keep... this assumed you
    are only interested in the first sequence of numbers.

    Cheers

    Bas

    --
    Bas van Gils <>, http://www.van-gils.org
    [[[ Thank you for not distributing my E-mail address ]]]

    Quod est inferius est sicut quod est superius, et quod est superius est sicut
    quod est inferius, ad perpetranda miracula rei unius.
     
    Bas van Gils, Jun 12, 2007
    #5
  6. Or even simpler

    irb(main):001:0> "DSC_1234.jpg"[/\d+/]
    => "1234"
    irb(main):002:0> Integer("DSC_1234.jpg"[/\d+/])
    => 1234

    Kind regards

    robert
     
    Robert Klemme, Jun 12, 2007
    #6
  7. Last November (2006), there was a series of postings to the Columbus
    Ruby Brigade list beginning with:
    http://groups.google.com/group/columbusrb/browse_frm/thread/
    9c2e682f9926bad0

    This was the pattern that I used when responding to Bill's code
    because many of *my* pictures had names like "100_5142.jpg",
    "100_5143.jpg", etc.

    NUMBERED_FILE_PATTERN = %r{^(.*\D)?(\d+)(.+)$}

    It became a constant since I used it in three places.

    Rob Biedenharn http://agileconsultingllc.com
     
    Rob Biedenharn, Jun 12, 2007
    #7
  8. Matt Jones

    Matt Jones Guest

    A big thanks to everybody and all the creative solutions!
     
    Matt Jones, Jun 16, 2007
    #8
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.