Knocking Lines Out Of A Multiline String

Discussion in 'Ruby' started by Andrew Stewart, Mar 22, 2007.

  1. Hello,

    What's a (good!) way to remove lines matching a pattern from a
    multiline string?

    For example, I would like to remove lines matching /usr/local/lib
    from the multiline string:

    /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/
    action_controller/test_process.rb:382:in `process'
    /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/
    action_controller/test_process.rb:353:in `post'
    test/functional/orders_controller_test.rb:241:in
    `test_should_handle_errors_on_edit'

    ...to give:

    test/functional/orders_controller_test.rb:241:in
    `test_should_handle_errors_on_edit'

    I tried matching the pattern-to-remove with gsub and substituting an
    empty string, but that leaves me with lots of blank lines and not
    really any nearer to the answer.

    Thanks and regards,
    Andy Stewart
     
    Andrew Stewart, Mar 22, 2007
    #1
    1. Advertising

  2. On 22.03.2007 15:43, Andrew Stewart wrote:
    > What's a (good!) way to remove lines matching a pattern from a multiline
    > string?
    >
    > For example, I would like to remove lines matching /usr/local/lib from
    > the multiline string:
    >
    >
    > /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:382:in
    > `process'
    >
    > /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:353:in
    > `post'
    > test/functional/orders_controller_test.rb:241:in
    > `test_should_handle_errors_on_edit'
    >
    > ..to give:
    >
    > test/functional/orders_controller_test.rb:241:in
    > `test_should_handle_errors_on_edit'
    >
    > I tried matching the pattern-to-remove with gsub and substituting an
    > empty string, but that leaves me with lots of blank lines and not really
    > any nearer to the answer.


    Convert it to an array and select like

    >> "foo\nbar\n".to_a.select {|l| /^f/ =~ l}

    => ["foo\n"]

    Kind regards

    robert
     
    Robert Klemme, Mar 22, 2007
    #2
    1. Advertising

  3. On 22.03.2007 16:09, Robert Klemme wrote:
    > On 22.03.2007 15:43, Andrew Stewart wrote:
    >> What's a (good!) way to remove lines matching a pattern from a
    >> multiline string?
    >>
    >> For example, I would like to remove lines matching /usr/local/lib from
    >> the multiline string:
    >>
    >>
    >> /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:382:in
    >> `process'
    >>
    >> /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/test_process.rb:353:in
    >> `post'
    >> test/functional/orders_controller_test.rb:241:in
    >> `test_should_handle_errors_on_edit'
    >>
    >> ..to give:
    >>
    >> test/functional/orders_controller_test.rb:241:in
    >> `test_should_handle_errors_on_edit'
    >>
    >> I tried matching the pattern-to-remove with gsub and substituting an
    >> empty string, but that leaves me with lots of blank lines and not
    >> really any nearer to the answer.

    >
    > Convert it to an array and select like
    >
    > >> "foo\nbar\n".to_a.select {|l| /^f/ =~ l}

    > => ["foo\n"]


    Bullshit: just use select:

    >> "foo\nbar\n".select {|l| /^f/ =~ l}

    => ["foo\n"]

    Sorry for the noise.

    robert
     
    Robert Klemme, Mar 22, 2007
    #3
  4. On 3/22/07, Andrew Stewart <> wrote:
    > Hello,
    >
    > What's a (good!) way to remove lines matching a pattern from a
    > multiline string?
    >
    > For example, I would like to remove lines matching /usr/local/lib
    > from the multiline string:
    >
    > /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/
    > action_controller/test_process.rb:382:in `process'
    > /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/
    > action_controller/test_process.rb:353:in `post'
    > test/functional/orders_controller_test.rb:241:in
    > `test_should_handle_errors_on_edit'
    >
    > ...to give:
    >
    > test/functional/orders_controller_test.rb:241:in
    > `test_should_handle_errors_on_edit'
    >
    > I tried matching the pattern-to-remove with gsub and substituting an
    > empty string, but that leaves me with lots of blank lines and not
    > really any nearer to the answer.


    You could change your lines into an array of lines and then remove the
    lines that match:

    lines =3D []
    File.new("text.txt").read.each_line {|line| lines << line }
    lines.delete_if {|line| line =3D~ /\/usr\/local\/lib/}




    --=20
    If you could create a machine that copies hamburgers =97 you put one
    hamburger in and two equally good hamburgers come out the other side =97
    it would be unethical not to do so and make it freely available.
     
    Leslie Viljoen, Mar 22, 2007
    #4
  5. On Mar 22, 2007, at 10:43 AM, Andrew Stewart wrote:

    > Hello,
    >
    > What's a (good!) way to remove lines matching a pattern from a
    > multiline string?
    >
    > For example, I would like to remove lines matching /usr/local/lib
    > from the multiline string:
    >
    > /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/
    > action_controller/test_process.rb:382:in `process'
    > /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/
    > action_controller/test_process.rb:353:in `post'
    > test/functional/orders_controller_test.rb:241:in
    > `test_should_handle_errors_on_edit'
    >
    > ...to give:
    >
    > test/functional/orders_controller_test.rb:241:in
    > `test_should_handle_errors_on_edit'
    >
    > I tried matching the pattern-to-remove with gsub and substituting
    > an empty string, but that leaves me with lots of blank lines and
    > not really any nearer to the answer.
    >
    > Thanks and regards,
    > Andy Stewart


    >> input = " /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/

    lib/action_controller/test_process.rb:382:in `process'
    /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/
    action_controller/test_process.rb:353:in `post'
    test/functional/orders_controller_test.rb:241:in
    `test_should_handle_errors_on_edit'
    "
    => " /usr/local/lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/
    action_controller/test_process.rb:382:in `process'\n /usr/local/
    lib/ruby/gems/1.8/gems/actionpack-1.13.1/lib/action_controller/
    test_process.rb:353:in `post'\n test/functional/
    orders_controller_test.rb:241:in `test_should_handle_errors_on_edit'\n"

    >> input.gsub(%r{^.*/usr/local/lib/.*\n?},'')

    => " test/functional/orders_controller_test.rb:241:in
    `test_should_handle_errors_on_edit'\n"


    If you showed your code, an explanation could be added as to your
    regexp, but the concept certainly works as I've shown.

    -Rob


    Rob Biedenharn http://agileconsultingllc.com
     
    Rob Biedenharn, Mar 22, 2007
    #5
  6. Andrew Stewart

    Phrogz Guest

    On Mar 22, 9:15 am, "Leslie Viljoen" <> wrote:
    > You could change your lines into an array of lines and then remove the
    > lines that match:
    >
    > lines = []
    > File.new("text.txt").read.each_line {|line| lines << line }
    > lines.delete_if {|line| line =~ /\/usr\/local\/lib/}


    Leslie, as a public service announcement, you should be aware of
    IO.readlines:

    C:\>qri IO.readlines
    ----------------------------------------------------------
    IO::readlines
    IO.readlines(name, sep_string=$/) => array
    ------------------------------------------------------------------------
    Reads the entire file specified by _name_ as individual lines,
    and
    returns those lines in an array. Lines are separated by
    _sep_string_.

    a = IO.readlines("testfile")
    a[0] #=> "This is line one\n"


    For that matter, you should also be aware of IO.read:
    ---------------------------------------------------------------
    IO::read
    IO.read(name, [length [, offset]] ) => string
    ------------------------------------------------------------------------
    Opens the file, optionally seeks to the given offset, then
    returns
    _length_ bytes (defaulting to the rest of the file). +read+
    ensures
    the file is closed before returning.

    IO.read("testfile") #=> "This is line one\nThis is
    line two\nThis is line three\nAnd so on...\n"
    IO.read("testfile", 20) #=> "This is line one\nThi"
    IO.read("testfile", 20, 10) #=> "ne one\nThis is line "

    You should also be aware of the block form of #open, which opens the
    IO object and then closes it when done.

    What you wrote creates a new File object and opens it, but never
    closes it. I'm not really sure what badness can result from this, but
    I gather it's not a good idea.
     
    Phrogz, Mar 22, 2007
    #6
  7. On 22 Mar 2007, at 15:15, Robert Klemme wrote:
    > >> "foo\nbar\n".select {|l| /^f/ =~ l}

    > => ["foo\n"]


    Thanks for that. So simple once you've seen it.

    Regards,
    Andy Stewart
     
    Andrew Stewart, Mar 22, 2007
    #7
  8. On 22 Mar 2007, at 15:15, Leslie Viljoen wrote:
    > lines.delete_if {|line| line =~ /\/usr\/local\/lib/}


    Leslie, thanks for that. That works for me (with a join chained on
    the end).

    Regards,
    Andy Stewart
     
    Andrew Stewart, Mar 22, 2007
    #8
  9. On 22 Mar 2007, at 15:22, Rob Biedenharn wrote:
    > >> input.gsub(%r{^.*/usr/local/lib/.*\n?},'')

    >
    > If you showed your code, an explanation could be added as to your
    > regexp, but the concept certainly works as I've shown.


    Aha! You have proved that I chose my regexp poorly. Here's what I
    tried:

    input.gsub(%r{^.*/usr/local/lib/.*$}i, '')

    The difference is that yours consumes the new line character but mine
    doesn't. I should have just matched it explicitly like you rather
    than using an anchor.

    Thanks and regards,
    Andy Stewart
     
    Andrew Stewart, Mar 22, 2007
    #9
  10. On 3/22/07, Phrogz <> wrote:
    > On Mar 22, 9:15 am, "Leslie Viljoen" <> wrote:
    > > You could change your lines into an array of lines and then remove the
    > > lines that match:
    > >
    > > lines =3D []
    > > File.new("text.txt").read.each_line {|line| lines << line }
    > > lines.delete_if {|line| line =3D~ /\/usr\/local\/lib/}

    >
    > Leslie, as a public service announcement, you should be aware of
    > IO.readlines:
    >
    > C:\>qri IO.readlines
    > ----------------------------------------------------------
    > IO::readlines
    > IO.readlines(name, sep_string=3D$/) =3D> array
    > ------------------------------------------------------------------------
    > Reads the entire file specified by _name_ as individual lines,
    > and
    > returns those lines in an array. Lines are separated by
    > _sep_string_.
    >
    > a =3D IO.readlines("testfile")
    > a[0] #=3D> "This is line one\n"
    >
    >
    > For that matter, you should also be aware of IO.read:
    > ---------------------------------------------------------------
    > IO::read
    > IO.read(name, [length [, offset]] ) =3D> string
    > ------------------------------------------------------------------------
    > Opens the file, optionally seeks to the given offset, then
    > returns
    > _length_ bytes (defaulting to the rest of the file). +read+
    > ensures
    > the file is closed before returning.
    >
    > IO.read("testfile") #=3D> "This is line one\nThis is
    > line two\nThis is line three\nAnd so on...\n"
    > IO.read("testfile", 20) #=3D> "This is line one\nThi"
    > IO.read("testfile", 20, 10) #=3D> "ne one\nThis is line "
    >
    > You should also be aware of the block form of #open, which opens the
    > IO object and then closes it when done.
    >
    > What you wrote creates a new File object and opens it, but never
    > closes it. I'm not really sure what badness can result from this, but
    > I gather it's not a good idea.


    This does sound rather frightening! What *is* the effect of opening a
    file and not closing it?
    ;-)

    Also, doesn't the above say that IO.read closes the file afterwards?


    --=20
    If you could create a machine that copies hamburgers =97 you put one
    hamburger in and two equally good hamburgers come out the other side =97
    it would be unethical not to do so and make it freely available.
     
    Leslie Viljoen, Mar 22, 2007
    #10
  11. Andrew Stewart

    Phrogz Guest

    On Mar 22, 2:56 pm, "Leslie Viljoen" <> wrote:
    > On 3/22/07, Phrogz <> wrote:
    > > On Mar 22, 9:15 am, "Leslie Viljoen" <> wrote:
    > > > lines = []
    > > > File.new("text.txt").read.each_line {|line| lines << line }
    > > > lines.delete_if {|line| line =~ /\/usr\/local\/lib/}

    [snip]
    > > What you wrote creates a new File object and opens it, but never
    > > closes it. I'm not really sure what badness can result from this, but
    > > I gather it's not a good idea.

    >
    > This does sound rather frightening! What *is* the effect of opening a
    > file and not closing it?
    > ;-)
    >
    > Also, doesn't the above say that IO.read closes the file afterwards?


    IO.read (the class method) does open/close the file, but IO#read (the
    instance method) does not. Manually managing an IO object, you need
    something like:
    f = File.new('foo.txt')
    f.read
    f.close
     
    Phrogz, Mar 22, 2007
    #11
  12. On 3/22/07, Robert Klemme <> wrote:

    > Bullshit: just use select:
    >
    > >> "foo\nbar\n".select {|l| /^f/ =~ l}

    > => ["foo\n"]


    Or, since he's really trying to exclude lines which include /usr/local/lib:

    str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

    --
    Rick DeNatale

    My blog on Ruby
    http://talklikeaduck.denhaven2.com/
     
    Rick DeNatale, Mar 23, 2007
    #12
  13. On Sat, Mar 24, 2007 at 03:39:42AM +0900, Rick DeNatale wrote:
    > On 3/22/07, Robert Klemme <> wrote:
    >
    > >Bullshit: just use select:
    > >
    > > >> "foo\nbar\n".select {|l| /^f/ =~ l}

    > >=> ["foo\n"]

    >
    > Or, since he's really trying to exclude lines which include /usr/local/lib:
    >
    > str.reject {|s| Regexp.new("/usr/local/lib").match(s)}


    Why does nobody seem to anchor regexps? I have seen so many documentation
    examples now which suggest that something like

    raise "Invalid data" unless /[A-Za-z0-9]+/ =~ data

    is a good example of data validation :-(

    It would be a drift away from Perl, but I wonder if regexps should be
    anchored by default, and you'd have to add a flag to them to make them
    unanchored...

    Sorry, just my gripe of the day.

    Regards,

    Brian.
     
    Brian Candler, Mar 23, 2007
    #13
  14. Andrew Stewart

    Vince H&K Guest

    Brian Candler wrote:
    > On Sat, Mar 24, 2007 at 03:39:42AM +0900, Rick DeNatale wrote:
    >> On 3/22/07, Robert Klemme <> wrote:
    >>
    >>> Bullshit: just use select:
    >>>
    >>>>> "foo\nbar\n".select {|l| /^f/ =~ l}
    >>> => ["foo\n"]

    >> Or, since he's really trying to exclude lines which include /usr/local/lib:
    >>
    >> str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

    >
    > Why does nobody seem to anchor regexps? I have seen so many documentation
    > examples now which suggest that something like
    >
    > raise "Invalid data" unless /[A-Za-z0-9]+/ =~ data
    >
    > is a good example of data validation :-(


    I agree that it isn't... However, most of my uses of regexps are to
    extract some pieces of data from a bigger text - such as

    `identify biniou.jpg` =~ /(\d+)x(\d+)/

    It would be a pain to have to unanchor those... Moreover, it is way
    better (in my opinion), to write something like

    raise "Invalid data" if /[^A-Za-z0-9]/ =~ data

    if you really require the full data to be alnum. Or, rather, if you
    want to be somehow more flexible (stripping whitespace and other):

    raise "Invalid data" unless /([A-Za-z0-9]+)/ =~ data

    data = $1 # Cleaning up data...

    Cheers,

    Vincent

    --
    Vincent Fourmond, PhD student (not for long anymore)
    http://vincent.fourmond.neuf.fr/
     
    Vince H&K, Mar 23, 2007
    #14
  15. On 3/23/07, Brian Candler <> wrote:
    > On Sat, Mar 24, 2007 at 03:39:42AM +0900, Rick DeNatale wrote:
    > > On 3/22/07, Robert Klemme <> wrote:
    > >
    > > >Bullshit: just use select:
    > > >
    > > > >> "foo\nbar\n".select {|l| /^f/ =~ l}
    > > >=> ["foo\n"]

    > >
    > > Or, since he's really trying to exclude lines which include /usr/local/lib:
    > >
    > > str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

    >
    > Why does nobody seem to anchor regexps?


    Well, the OP said "I would like to remove lines matching
    /usr/local/lib from the multiline string:", no mention of at the start
    of a line.

    I did actually consider pointing out that he might have meant at the
    beginning of each line, but he didn't so I didn't.
    --
    Rick DeNatale

    My blog on Ruby
    http://talklikeaduck.denhaven2.com/
     
    Rick DeNatale, Mar 23, 2007
    #15
  16. On 23 Mar 2007, at 21:29, Rick DeNatale wrote:
    > On 3/23/07, Brian Candler <> wrote:
    >> On Sat, Mar 24, 2007 at 03:39:42AM +0900, Rick DeNatale wrote:
    >> > On 3/22/07, Robert Klemme <> wrote:
    >> >
    >> > >Bullshit: just use select:
    >> > >
    >> > > >> "foo\nbar\n".select {|l| /^f/ =~ l}
    >> > >=> ["foo\n"]
    >> >
    >> > Or, since he's really trying to exclude lines which include /usr/

    >> local/lib:
    >> >
    >> > str.reject {|s| Regexp.new("/usr/local/lib").match(s)}


    That's neat. I should have realised that enumberable's methods come
    into play with a multiline string. No need to mess about with \n
    characters -- let Ruby handle those.

    >> Why does nobody seem to anchor regexps?

    >
    > Well, the OP said "I would like to remove lines matching
    > /usr/local/lib from the multiline string:", no mention of at the start
    > of a line.
    >
    > I did actually consider pointing out that he might have meant at the
    > beginning of each line, but he didn't so I didn't.


    And you were right -- the '/usr/local/lib' sequence isn't at the
    start of the line. (Though the part between the start of the line
    and this sequence is predictable and could be worked into the regexp.)

    For those interested, the context of the problem was filtering stack
    traces produced under autotest:

    http://blog.airbladesoftware.com/2007/3/22/filtering-autotest-s-output

    Thanks for all the help. I greatly appreciate it.

    Regards,
    Andy Stewart
     
    Andrew Stewart, Mar 26, 2007
    #16
  17. Andrew Stewart

    Phrogz Guest

    On Mar 23, 12:39 pm, "Rick DeNatale" <> wrote:
    > str.reject {|s| Regexp.new("/usr/local/lib").match(s)}


    Note that the above results in Regexp#new being called for each line
    in the string; not very efficient:
    require 'Benchmark'

    s = "foobar"
    N = 1000000

    Benchmark.bmbm{ |x|
    x.report( 'Regexp.new' ){
    N.times{ Regexp.new( "foobar" ) =~ s }
    }
    x.report( 'inline literal' ){
    N.times{ /foobar/ =~ s }
    }
    x.report( 'as variable' ){
    r = /foobar/
    N.times{ r =~ s }
    }
    }

    #=> Rehearsal --------------------------------------------------
    #=> Regexp.new 20.844000 1.875000 22.719000 ( 22.750000)
    #=> inline literal 0.906000 0.000000 0.906000 ( 0.906000)
    #=> as variable 1.094000 0.000000 1.094000 ( 1.094000)
    #=> ---------------------------------------- total: 24.719000sec
    #=>
    #=> user system total real
    #=> Regexp.new 21.234000 1.671000 22.905000 ( 22.922000)
    #=> inline literal 0.891000 0.000000 0.891000 ( 0.891000)
    #=> as variable 1.047000 0.000000 1.047000 ( 1.047000)



    If you just want to avoid having to escape the forward slashes in the
    literal, you can use the %r notation for a regexp literal. For
    example:
    str.reject{ |s| %r{/usr/local/lib} =~ s }
     
    Phrogz, Mar 26, 2007
    #17
  18. On 3/26/07, Phrogz <> wrote:
    > On Mar 23, 12:39 pm, "Rick DeNatale" <> wrote:
    > > str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

    >
    > Note that the above results in Regexp#new being called for each line
    > in the string; not very efficient:


    Quoting my old friend Kent Beck:

    "Make it run before you make it fast."

    --
    Rick DeNatale

    My blog on Ruby
    http://talklikeaduck.denhaven2.com/
     
    Rick DeNatale, Mar 27, 2007
    #18
  19. Andrew Stewart

    Phrogz Guest

    On Mar 27, 10:12 am, "Rick DeNatale" <> wrote:
    > On 3/26/07, Phrogz <> wrote:
    >
    > > On Mar 23, 12:39 pm, "Rick DeNatale" <> wrote:
    > > > str.reject {|s| Regexp.new("/usr/local/lib").match(s)}

    >
    > > Note that the above results in Regexp#new being called for each line
    > > in the string; not very efficient:

    >
    > Quoting my old friend Kent Beck:
    >
    > "Make it run before you make it fast."


    On the one hand:
    "Premature optimization is the root of all evil"[1]

    On the other - knowing that calling a constructor on each iteration of
    a block is much slower when a literal doesn't change - and when it's
    less characters to type as well - might be considered not so much
    premature optimization as...well...reasonable code planning.

    [1] http://hans.gerwitz.com/2004/08/12/premature-optimization-is-the-root-of-all-evil.html
     
    Phrogz, Mar 28, 2007
    #19
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Port Knocking

    , Dec 21, 2005, in forum: Ruby
    Replies:
    7
    Views:
    136
    Derek Chesterfield
    Dec 30, 2005
  2. dale zhang
    Replies:
    8
    Views:
    428
    Tintin
    Nov 30, 2004
  3. mike
    Replies:
    3
    Views:
    102
  4. PerlFAQ Server
    Replies:
    0
    Views:
    167
    PerlFAQ Server
    Jan 14, 2011
  5. PerlFAQ Server
    Replies:
    0
    Views:
    153
    PerlFAQ Server
    Apr 19, 2011
Loading...

Share This Page