Simple regex question.

Discussion in 'Ruby' started by Peter Bailey, Jun 26, 2009.

  1. Peter Bailey

    Peter Bailey Guest

    Hello.
    I need to parse through thousands of TIFF files and do some re-naming.
    These files have underscores in them followed by a sequential number. I
    need to grab just the "root" of the filename, without the underscore or
    the numbers.
    Dir.chdir("L:/infocontiffs/ehs-g7917741")
    files = Dir.glob("*.tiff")
    file = files[0]
    puts file
    file = file.gsub(/^(.*)_[0-9]+\.tiff/, "#{$1}")
    puts file
    What I get with this is:
    ehs-g7917741_01.tiff
    Why doesn't it give me my root filename?
    Thanks,
    Peter
    --
    Posted via http://www.ruby-forum.com/.
     
    Peter Bailey, Jun 26, 2009
    #1
    1. Advertising

  2. Peter Bailey

    Tim Hunter Guest

    Peter Bailey wrote:
    > Hello.
    > I need to parse through thousands of TIFF files and do some re-naming.
    > These files have underscores in them followed by a sequential number. I
    > need to grab just the "root" of the filename, without the underscore or
    > the numbers.
    > Dir.chdir("L:/infocontiffs/ehs-g7917741")
    > files = Dir.glob("*.tiff")
    > file = files[0]
    > puts file
    > file = file.gsub(/^(.*)_[0-9]+\.tiff/, "#{$1}")
    > puts file
    > What I get with this is:
    > ehs-g7917741_01.tiff
    > Why doesn't it give me my root filename?
    > Thanks,
    > Peter


    Is this what you want?

    while fname = DATA.gets
    m = fname.match /(.*?)_\d+\.tiff/
    if m
    puts "Match: '#{m[1]}'"
    else
    puts "No match: #{fname}"
    end
    end

    __END__
    ehs-g7917741_01.tiff
    asadsasd_12345.tiff
    ljhkjhkh_1_2_3.tiff
    xxxx__1.tiff
    xxxx_.tiff
    xxxx.tiff
    xxxx
    _.tiff
    _01.tiff
    --
    Posted via http://www.ruby-forum.com/.
     
    Tim Hunter, Jun 26, 2009
    #2
    1. Advertising

  3. Peter Bailey wrote:
    > Hello.
    > I need to parse through thousands of TIFF files and do some re-naming.
    > These files have underscores in them followed by a sequential number. I
    > need to grab just the "root" of the filename, without the underscore or
    > the numbers.
    > Dir.chdir("L:/infocontiffs/ehs-g7917741")
    > files = Dir.glob("*.tiff")
    > file = files[0]
    > puts file
    > file = file.gsub(/^(.*)_[0-9]+\.tiff/, "#{$1}")


    The argument "#{$1}" is expanded once, before gsub even executes. You
    probably want the block form:

    file = file.sub(/^(.*)_\d+\.tiff/) { $1 }
    --
    Posted via http://www.ruby-forum.com/.
     
    Brian Candler, Jun 26, 2009
    #3
  4. Peter Bailey

    Peter Bailey Guest

    Tim Hunter wrote:
    > Peter Bailey wrote:
    >> Hello.
    >> I need to parse through thousands of TIFF files and do some re-naming.
    >> These files have underscores in them followed by a sequential number. I
    >> need to grab just the "root" of the filename, without the underscore or
    >> the numbers.
    >> Dir.chdir("L:/infocontiffs/ehs-g7917741")
    >> files = Dir.glob("*.tiff")
    >> file = files[0]
    >> puts file
    >> file = file.gsub(/^(.*)_[0-9]+\.tiff/, "#{$1}")
    >> puts file
    >> What I get with this is:
    >> ehs-g7917741_01.tiff
    >> Why doesn't it give me my root filename?
    >> Thanks,
    >> Peter

    >
    > Is this what you want?
    >
    > while fname = DATA.gets
    > m = fname.match /(.*?)_\d+\.tiff/
    > if m
    > puts "Match: '#{m[1]}'"
    > else
    > puts "No match: #{fname}"
    > end
    > end
    >
    > __END__
    > ehs-g7917741_01.tiff
    > asadsasd_12345.tiff
    > ljhkjhkh_1_2_3.tiff
    > xxxx__1.tiff
    > xxxx_.tiff
    > xxxx.tiff
    > xxxx
    > _.tiff
    > _01.tiff


    Well, you gave me a good idea, using match. Here's what I did, and, it
    worked. Thank you very much, Tim.

    Dir.chdir("L:/infocontiffs/ehs-g7917741")
    files = Dir.glob("*.tiff")
    file = files[0]
    puts file
    file = file.match(/^(.*)_[0-9]+\.tiff/)
    #file = file.to_i
    puts $1
    #end
    gives me:
    ehs-g7917741_01.tiff
    ehs-g7917741

    Program exited with code 0
    --
    Posted via http://www.ruby-forum.com/.
     
    Peter Bailey, Jun 26, 2009
    #4
  5. Hi --

    On Fri, 26 Jun 2009, Peter Bailey wrote:

    > Hello.
    > I need to parse through thousands of TIFF files and do some re-naming.
    > These files have underscores in them followed by a sequential number. I
    > need to grab just the "root" of the filename, without the underscore or
    > the numbers.
    > Dir.chdir("L:/infocontiffs/ehs-g7917741")
    > files = Dir.glob("*.tiff")
    > file = files[0]
    > puts file
    > file = file.gsub(/^(.*)_[0-9]+\.tiff/, "#{$1}")
    > puts file
    > What I get with this is:
    > ehs-g7917741_01.tiff
    > Why doesn't it give me my root filename?


    Here's another good use of the string[//] technique:

    >> file = "ehs-g7917741_01.tiff"

    => "ehs-g7917741_01.tiff"
    >> file[/[^_]+/] # match non-underscore characters

    => "ehs-g7917741"


    David

    --
    David A. Black / Ruby Power and Light, LLC
    Ruby/Rails consulting & training: http://www.rubypal.com
    Now available: The Well-Grounded Rubyist (http://manning.com/black2)
    "Ruby 1.9: What You Need To Know" Envycasts with David A. Black
    http://www.envycasts.com
     
    David A. Black, Jun 26, 2009
    #5
  6. Peter Bailey

    Peter Bailey Guest

    Peter Bailey, Jun 26, 2009
    #6
  7. 2009/6/26 David A. Black <>:
    > On Fri, 26 Jun 2009, Peter Bailey wrote:


    > Here's another good use of the string[//] technique:
    >
    >>> file =3D "ehs-g7917741_01.tiff"

    >
    > =3D> "ehs-g7917741_01.tiff"
    >>>
    >>> file[/[^_]+/] =A0 =A0 =A0# match non-underscore characters

    >
    > =3D> "ehs-g7917741"


    Combining all the good suggestions this is probably what I'd do:

    files =3D Dir.glob("L:/infocontiffs/ehs-g7917741/*.tiff")
    files.each do |f|
    base =3D File.basename f
    root =3D base[/^([^_]+)_\d+\.tiff$/, 1]

    if base
    # rename or whatever
    else
    $stderr.puts "Dunno what to do with #{f}"
    end
    end

    The reason I left in the matching of underscores and digits is to be
    sure that the complete name matches the pattern that we required in
    order to detect other files that might accidentally have been placed
    in that directory.

    Kind regards

    robert

    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
     
    Robert Klemme, Jun 26, 2009
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Saad Malik
    Replies:
    5
    Views:
    393
    John C. Bollinger
    May 2, 2005
  2. John Salerno

    a simple regex question

    John Salerno, Apr 1, 2006, in forum: Python
    Replies:
    6
    Views:
    317
    Paddy
    Apr 2, 2006
  3. johnny

    Simple Python REGEX Question

    johnny, May 11, 2007, in forum: Python
    Replies:
    4
    Views:
    417
    James T. Dennis
    May 12, 2007
  4. Replies:
    3
    Views:
    794
    Reedick, Andrew
    Jul 1, 2008
  5. Sam Kong
    Replies:
    8
    Views:
    123
    Csaba Henk
    Mar 25, 2005
Loading...

Share This Page