regexing a file's contents without reading the whole thing?

Discussion in 'Ruby' started by Roger Pack, Nov 30, 2009.

  1. Roger Pack

    Roger Pack Guest

    I see that it is possible currently to parse through a file without
    reading the whole thing into RAM, a la

    a = File.open('a', 'r')
    a.lines{|line|
    if line =~ /some regex/
    ...
    end
    }

    But what if I can to do something like
    a = File.read('a').scan /some regex/

    is that possible?

    Thanks.
    -r
    --
    Posted via http://www.ruby-forum.com/.
    Roger Pack, Nov 30, 2009
    #1
    1. Advertising

  2. Roger Pack wrote:
    > I see that it is possible currently to parse through a file without
    > reading the whole thing into RAM, a la
    >
    > a = File.open('a', 'r')
    > a.lines{|line|
    > if line =~ /some regex/
    > ...
    > end
    > }
    >
    > But what if I can to do something like
    > a = File.read('a').scan /some regex/
    >
    > is that possible?
    >
    > Thanks.
    > -r


    File.open('/usr/share/dict/words').grep /ruby/i

    --
    vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
    Joel VanderWerf, Nov 30, 2009
    #2
    1. Advertising

  3. 2009/11/30 Roger Pack <>:
    > I see that it is possible currently to parse through a file without
    > reading the whole thing into RAM, a la
    >
    > a =3D File.open('a', 'r')
    > a.lines{|line|
    > =A0if line =3D~ /some regex/
    > =A0 =A0...
    > =A0end
    > }
    >
    > But what if I can to do something like
    > a =3D File.read('a').scan /some regex/
    >
    > is that possible?


    If you know that matches will never cross line breaks you can do

    a =3D []
    File.foreach("a") do |line|
    line.scan /regex/ do |m|
    a << m
    end
    # alternative:
    a.concat(line.scan(/regex/))
    end

    If matches can cross line breaks the whole store becomes more
    complicated and your solution with File.read is probably the simplest
    way to do it (if files aren't too large).

    Kind regards

    robert

    --=20
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Dec 1, 2009
    #3
  4. On 11/30/09, Roger Pack <> wrote:
    > I see that it is possible currently to parse through a file without
    > reading the whole thing into RAM, a la
    >
    > a = File.open('a', 'r')
    > a.lines{|line|
    > if line =~ /some regex/
    > ...
    > end
    > }
    >
    > But what if I can to do something like
    > a = File.read('a').scan /some regex/
    >
    > is that possible?


    The library which makes this possible is sequence. I'm coding this
    from memory, so I'm likely to get something wrong, but the equivalent
    in sequence looks more or less like this:

    require 'rubygems'
    require 'sequence'
    require 'sequence/file'

    seq=Sequence.new(File.open('a'))
    seq.scan_until(/some regex/)

    Keep the following in mind:
    1) Sequence#scan works like StringScanner#scan, not String#scan.
    2) The pattern to be matched must have a max length (4k by default, I
    think; it can be changed).
    3) If your pattern is guaranteed to not contain a nl, you're better
    off with readline, as robert said.
    Caleb Clausen, Dec 2, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. drgonzo120
    Replies:
    9
    Views:
    566
    Owen Jacobson
    Oct 7, 2005
  2. \A_Michigan_User\
    Replies:
    2
    Views:
    867
    \A_Michigan_User\
    Aug 21, 2006
  3. J. J. Ramsey
    Replies:
    3
    Views:
    726
  4. Lord Merlin

    regexing through numbers?

    Lord Merlin, Jun 7, 2004, in forum: ASP General
    Replies:
    5
    Views:
    108
    Lord Merlin
    Jun 8, 2004
  5. Kamarulnizam Rahim
    Replies:
    4
    Views:
    200
    Robert Klemme
    Jan 28, 2011
Loading...

Share This Page