(Maybe) a simple question about regex

Discussion in 'Ruby' started by Sam Kong, Mar 24, 2005.

  1. Sam Kong

    Sam Kong Guest

    Hello!

    I think that I am missing a very simple concept about regex.

    s = '0123456789'
    s.scan(/\d\d/) #-> ["01", "23", "45", "67", "89"]

    Now I want to exclude "45".
    How can I express it in the regex?
    When it's only one character, I can use ^.
    But for 2 characters, I don't think I can use it.

    What I want is:

    s = '0123456789'
    s.scan(some_regex) #-> ["01", "23", "67", "89"]

    What should some_regex be?

    Can somebody help me?

    Sam
    Sam Kong, Mar 24, 2005
    #1
    1. Advertising

  2. Sam Kong

    Assaph Mehr Guest


    > s = '0123456789'
    > s.scan(/\d\d/) #-> ["01", "23", "45", "67", "89"]
    >
    > Now I want to exclude "45".
    > How can I express it in the regex?
    > When it's only one character, I can use ^.
    > But for 2 characters, I don't think I can use it.
    >
    > What I want is:
    >
    > s = '0123456789'
    > s.scan(some_regex) #-> ["01", "23", "67", "89"]


    Negative lookahead:
    s.scan /(?!4|5)\d\d/
    Note the OR sign ('|') between the digits, otherwise it would produce:
    ["01", "23", "56", "78"]

    You need to tune it to your exact domain.

    Cheers,
    Assaph
    Assaph Mehr, Mar 24, 2005
    #2
    1. Advertising

  3. Sam Kong

    Carlos Guest

    [Sam Kong <>, 2005-03-24 02.49 CET]
    > Hello!
    >
    > I think that I am missing a very simple concept about regex.
    >
    > s = '0123456789'
    > s.scan(/\d\d/) #-> ["01", "23", "45", "67", "89"]
    >
    > Now I want to exclude "45".
    > How can I express it in the regex?
    > When it's only one character, I can use ^.
    > But for 2 characters, I don't think I can use it.


    You can use a "negative lookahead assertion":

    s.scan(/(?!45)\d\d/)

    This means, at every point the regex tries to match, "if the next two
    characters aren't "45", match \d\d".

    HTH.
    --
    Carlos, Mar 24, 2005
    #3
  4. Sam Kong

    Jason Sweat Guest

    On Thu, 24 Mar 2005 10:49:49 +0900, Sam Kong <> wrote:
    > Hello!
    >
    > I think that I am missing a very simple concept about regex.
    >
    > s = '0123456789'
    > s.scan(/\d\d/) #-> ["01", "23", "45", "67", "89"]
    >
    > Now I want to exclude "45".
    > How can I express it in the regex?
    > When it's only one character, I can use ^.
    > But for 2 characters, I don't think I can use it.
    >
    > What I want is:
    >
    > s = '0123456789'
    > s.scan(some_regex) #-> ["01", "23", "67", "89"]
    >
    > What should some_regex be?


    You can use a negative assertion to say you want to skip "45", but it
    will bump forward one space and you will end up with the last matches
    being "56" and "78"

    >> s.scan(/(?!45)\d\d/)

    => ["01", "23", "56", "78"]

    So with a little uglier assertion, you can say:

    >> s.scan(/(?!45|5)\d\d/)

    => ["01", "23", "67", "89"]

    and get what you specified, but though it works for your toy case, I
    would be worried that this might not extrapolate out to your real goal
    well.

    HTH

    Regards,
    Jason
    http://blog.casey-sweat.us/
    Jason Sweat, Mar 24, 2005
    #4
  5. What they said, but also if you can be more precise about your real
    problem, we might be able to better model a solution. You might find
    matching the expression you want and then scanning it to be more
    flexible for example.


    On Thu, 24 Mar 2005 11:09:51 +0900, Assaph Mehr <> wrote:
    >
    > > s = '0123456789'
    > > s.scan(/\d\d/) #-> ["01", "23", "45", "67", "89"]
    > >
    > > Now I want to exclude "45".
    > > How can I express it in the regex?
    > > When it's only one character, I can use ^.
    > > But for 2 characters, I don't think I can use it.
    > >
    > > What I want is:
    > >
    > > s = '0123456789'
    > > s.scan(some_regex) #-> ["01", "23", "67", "89"]

    >
    > Negative lookahead:
    > s.scan /(?!4|5)\d\d/
    > Note the OR sign ('|') between the digits, otherwise it would produce:
    > ["01", "23", "56", "78"]
    >
    > You need to tune it to your exact domain.
    >
    > Cheers,
    > Assaph
    >
    >
    Patrick Hurley, Mar 24, 2005
    #5
  6. "Assaph Mehr" <> schrieb im Newsbeitrag
    news:...
    >
    > > s = '0123456789'
    > > s.scan(/\d\d/) #-> ["01", "23", "45", "67", "89"]
    > >
    > > Now I want to exclude "45".
    > > How can I express it in the regex?
    > > When it's only one character, I can use ^.
    > > But for 2 characters, I don't think I can use it.
    > >
    > > What I want is:
    > >
    > > s = '0123456789'
    > > s.scan(some_regex) #-> ["01", "23", "67", "89"]

    >
    > Negative lookahead:
    > s.scan /(?!4|5)\d\d/
    > Note the OR sign ('|') between the digits, otherwise it would produce:
    > ["01", "23", "56", "78"]


    But:

    >> s = '01234567894657'

    => "01234567894657"
    >> s.scan /(?!4|5)\d\d/

    => ["01", "23", "67", "89", "65"]
    >> s.scan /\d\d/

    => ["01", "23", "45", "67", "89", "46", "57"]

    IOW, you loose "46" and "57".

    I prefer a non RE solution in these cases as it's simpler

    >> s.scan(/\d\d/).reject {|x| "45" == x}

    => ["01", "23", "67", "89", "46", "57"]

    Otherwise RE becomes really complex if you want to make it right - if it's
    possible at all (see other postings).

    Kind regards

    robert
    Robert Klemme, Mar 24, 2005
    #6
  7. Sam Kong

    Sam Kong Guest

    Thank you and other posters for the answers.
    Actually s.scan(/(?!45)\d\d/) suffices my real problem.

    What I was trying to solve was...
    To extract url's from an html source which includes list of sites.
    They are formatted like <a href="something.html">.
    But I wanted to exclude <a href="index.html"> from the list.
    So (?!index.html) will do.
    Actually my toy case was not well-defined (I realized this later) and
    thus it required more complex solutions like your second case -
    s.scan(/(?!45|5)\d\d/) .

    I think non-RE solution would be better like Mr. Robert Klemme said.
    But I wanted to learn some RE.

    Thanks.
    Sam
    Sam Kong, Mar 24, 2005
    #7
  8. On Thu, 24 Mar 2005 18:09:50 +0900, Sam Kong <> wrote:
    > To extract url's from an html source which includes list of sites.
    > They are formatted like <a href="something.html">.
    > But I wanted to exclude <a href="index.html"> from the list.
    > So (?!index.html) will do.



    does this help?

    ary=%w(a.html index.html other.txt evil.html.exe stuff.html)
    ary.select{|s| s =~ /\A(?!index).*\.html\z/ } #=> ["a.html", "stuff.html"]


    --
    Simon Strandgaard
    Simon Strandgaard, Mar 24, 2005
    #8
  9. Sam Kong

    Csaba Henk Guest

    On 2005-03-24, Sam Kong <> wrote:
    > What I was trying to solve was...
    > To extract url's from an html source which includes list of sites.
    > They are formatted like <a href="something.html">.
    > But I wanted to exclude <a href="index.html"> from the list.
    > So (?!index.html) will do.
    > Actually my toy case was not well-defined (I realized this later) and
    > thus it required more complex solutions like your second case -
    > s.scan(/(?!45|5)\d\d/) .


    Why don't you use a dedicated html parser? Eg. there's htmltokenizer,
    available ar Rubyforge, quite lightweight and very easy to use, but
    there are others, of course.

    > I think non-RE solution would be better like Mr. Robert Klemme said.
    > But I wanted to learn some RE.


    This thread was useful, I admit :)

    Csaba
    Csaba Henk, Mar 25, 2005
    #9
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ruca

    Very simple question (maybe) :)

    ruca, Jun 2, 2004, in forum: ASP .Net
    Replies:
    4
    Views:
    352
    Robert Koritnik
    Jun 3, 2004
  2. Nick Dangr

    Maybe a simple question?

    Nick Dangr, Jan 20, 2006, in forum: HTML
    Replies:
    0
    Views:
    375
    Nick Dangr
    Jan 20, 2006
  3. Guest
    Replies:
    5
    Views:
    610
  4. Ville Vainio
    Replies:
    11
    Views:
    606
    Hamish Lawson
    Aug 10, 2004
  5. Replies:
    3
    Views:
    746
    Reedick, Andrew
    Jul 1, 2008
Loading...

Share This Page