regexp exclusion search - find matches NOT ending with a string?

Discussion in 'Ruby' started by BrendanC, Jul 17, 2009.

  1. BrendanC

    BrendanC Guest

    I have the following text in a file:

    1 a1.html
    2 b.doc
    3 c.xml
    4 d.tiff
    5 e.jpeg
    6 f.html
    ....

    I need a regexp to match lines except those that end with ending in
    ".html" - iow - I want lines 2-5 above. I believe this may require a
    negative lookbehind match. I tried the following but Ruby (1.8) gives
    an undefined sequence error :

    $(?<!\.html) # <---- this seems to work with other engines

    Before you jump re Ruby the version I also tested this here -
    http://www.rubyxp.com/ and get invalid expression (fyi this tests with
    Ruby 1.9). Any ideas/alternatives?

    TIA,
    BC
    BrendanC, Jul 17, 2009
    #1
    1. Advertising

  2. BrendanC

    Xavier Noria Guest

    On Fri, Jul 17, 2009 at 2:35 AM, BrendanC<> wrote:

    > I have the following text in a file:
    >
    > 1 a1.html
    > 2 b.doc
    > 3 c.xml
    > 4 d.tiff
    > 5 e.jpeg
    > 6 f.html
    > ....
    >
    > I need a regexp to match lines except those that end with ending in
    > ".html"


    The easiest path is to negate that it matches, say for instance:

    if filename !~ /\.html\z/
    # non-HTML here
    end

    -- fxn
    Xavier Noria, Jul 17, 2009
    #2
    1. Advertising

  3. Hi --

    On Fri, 17 Jul 2009, BrendanC wrote:

    > I have the following text in a file:
    >
    > 1 a1.html
    > 2 b.doc
    > 3 c.xml
    > 4 d.tiff
    > 5 e.jpeg
    > 6 f.html
    > ....
    >
    > I need a regexp to match lines except those that end with ending in
    > ".html" - iow - I want lines 2-5 above. I believe this may require a
    > negative lookbehind match. I tried the following but Ruby (1.8) gives
    > an undefined sequence error :
    >
    > $(?<!\.html) # <---- this seems to work with other engines
    >
    > Before you jump re Ruby the version I also tested this here -
    > http://www.rubyxp.com/ and get invalid expression (fyi this tests with
    > Ruby 1.9). Any ideas/alternatives?


    I would probably do:

    lines.reject {|line| line =~ /html$/ }


    David

    --
    David A. Black / Ruby Power and Light, LLC
    Ruby/Rails consulting & training: http://www.rubypal.com
    Now available: The Well-Grounded Rubyist (http://manning.com/black2)
    Training! Intro to Ruby, with Black & Kastner, September 14-17
    (More info: http://rubyurl.com/vmzN)
    David A. Black, Jul 17, 2009
    #3
  4. BrendanC

    Robert Dober Guest

    On 7/17/09, BrendanC <> wrote:
    > I have the following text in a file:
    >
    > 1 a1.html
    > 2 b.doc
    > 3 c.xml
    > 4 d.tiff
    > 5 e.jpeg
    > 6 f.html
    > ....
    >
    > I need a regexp to match lines except those that end with ending in
    > ".html" - iow - I want lines 2-5 above. I believe this may require a
    > negative lookbehind match. I tried the following but Ruby (1.8) gives
    > an undefined sequence error :
    >
    > $(?<!\.html) # <---- this seems to work with other engines
    >
    > Before you jump re Ruby the version I also tested this here -
    > http://www.rubyxp.com/ and get invalid expression (fyi this tests with
    > Ruby 1.9). Any ideas/alternatives?

    Xavier and David gave good advice.
    If however you really have to have a matching regex

    %r($(?<!\.html)\z) # is that what you meant above?

    works fine. I believe that you can install Oniguruma on 1.8 as a gem
    for that purpose.
    HTH
    Robert



    --=20
    Toutes les grandes personnes ont d=92abord =E9t=E9 des enfants, mais peu
    d=92entre elles s=92en souviennent.

    All adults have been children first, but not many remember.

    [Antoine de Saint-Exup=E9ry]
    Robert Dober, Jul 17, 2009
    #4
  5. BrendanC

    Robert Dober Guest

    >
    > %r($(?<!\.html)\z) # is that what you meant above?

    where does this $ come from ?
    Robert Dober, Jul 17, 2009
    #5
  6. On Jul 17, 2009, at 11:30 AM, Glenn Jackman wrote:

    > At 2009-07-16 08:59PM, "David A. Black" wrote:
    >> On Fri, 17 Jul 2009, BrendanC wrote:
    >>> $(?<!\.html) # <---- this seems to work with other engines

    >>
    >> I would probably do:
    >>
    >> lines.reject {|line| line =~ /html$/ }

    >
    > Is the Ruby regular expression syntax documented anywhere?
    >
    > I was attempting to use a look-behind, but it's not supported.
    >
    > The syntax is not documented in the RegExp rdocs, and I haven't seen a
    > site that spells out all the nitty-gritty details and pokes into the
    > dark corners.
    >
    > I'm looking for the Ruby equivalent of:
    > http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm
    > http://docs.python.org/library/re.html#regular-expression-syntax
    > http://perldoc.perl.org/perlre.html
    >
    > Does it exist?
    >
    >
    > --
    > Glenn Jackman
    > Write a wise saying and your name will live forever. -- Anonymous
    >



    You could try the Regular Expressions section of the Standard Types
    chapter of Programming Ruby. Be advised that this is the online
    version of the 1st edition that is now 8 years old. Since you seem to
    be using a version 1.8.x of Ruby, the Regexp parts are going to be
    mostly the same.

    http://www.ruby-doc.org/docs/ProgrammingRuby/

    -Rob

    Rob Biedenharn http://agileconsultingllc.com
    Rob Biedenharn, Jul 17, 2009
    #6
  7. BrendanC

    Robert Dober Guest

    On 7/17/09, Glenn Jackman <> wrote:
    > At 2009-07-16 08:59PM, "David A. Black" wrote:
    >> On Fri, 17 Jul 2009, BrendanC wrote:
    >> > $(?<!\.html) # <---- this seems to work with other engines

    >>
    >> I would probably do:
    >>
    >> lines.reject {|line| line =~ /html$/ }

    >
    > Is the Ruby regular expression syntax documented anywhere?
    >
    > I was attempting to use a look-behind, but it's not supported.
    >
    > The syntax is not documented in the RegExp rdocs, and I haven't seen a
    > site that spells out all the nitty-gritty details and pokes into the
    > dark corners.
    >
    > I'm looking for the Ruby equivalent of:
    > http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm
    > http://docs.python.org/library/re.html#regular-expression-syntax
    > http://perldoc.perl.org/perlre.html
    >
    > Does it exist?

    For Oniguruma I found this most helpful
    http://manual.macromates.com/en/regular_expressions#regular_expressions
    >
    >
    > --
    > Glenn Jackman
    > Write a wise saying and your name will live forever. -- Anonymous

    Nice one

    Cheers
    Robert
    Robert Dober, Jul 17, 2009
    #7
  8. BrendanC

    7stud -- Guest

    BrendanC wrote:
    > I have the following text in a file:
    >
    > 1 a1.html
    > 2 b.doc
    > 3 c.xml
    > 4 d.tiff
    > 5 e.jpeg
    > 6 f.html
    > ....
    >
    > I need a regexp to match lines except those that end with ending in
    > ".html" - iow - I want lines 2-5 above.


    Some alternate means to the same end:

    IO.foreach("data.txt") do |line|

    #1
    if line.chomp.split(".")[-1] != "html"
    puts line
    end

    #2
    if line[-5, 4] != "html"
    print line
    end

    #3
    if line.slice(-5..-1) != "html"
    print line
    end

    puts
    end

    --output:--
    2 b.doc
    2 b.doc
    2 b.doc

    3 c.xml
    3 c.xml
    3 c.xml

    4 d.tiff
    4 d.tiff
    4 d.tiff

    5 e.jpeg
    5 e.jpeg
    5 e.jpeg
    --
    Posted via http://www.ruby-forum.com/.
    7stud --, Jul 18, 2009
    #8
  9. Glenn Jackman wrote:
    > Is the Ruby regular expression syntax documented anywhere?
    >
    > I was attempting to use a look-behind, but it's not supported.
    >
    > The syntax is not documented in the RegExp rdocs


    In my opinion, documentation is Ruby's weakest aspect by far - and the
    deficiency has gotten substantially worse with ruby 1.9.

    Best available information is in third-party books, which presumably
    have reverse-engineered from the source code. I fairly often resort to
    irb to check behaviour is what I want, and have on occasions had to
    resort to reading the source.
    --
    Posted via http://www.ruby-forum.com/.
    Brian Candler, Jul 18, 2009
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Boris Pelakh
    Replies:
    3
    Views:
    461
    Purl Gurl
    Apr 8, 2004
  2. Anand Pillai

    String search vs regexp search

    Anand Pillai, Oct 12, 2003, in forum: Python
    Replies:
    10
    Views:
    592
    Anand Pillai
    Oct 15, 2003
  3. Rob Sanheim
    Replies:
    10
    Views:
    204
    Rob Sanheim
    Jan 14, 2007
  4. Joao Silva
    Replies:
    16
    Views:
    355
    7stud --
    Aug 21, 2009
  5. Alex DeCaria
    Replies:
    10
    Views:
    187
    Robert Klemme
    Apr 14, 2010
Loading...

Share This Page