(Maybe) a simple question about regex

Sam Kong · Mar 24, 2005

Hello!

I think that I am missing a very simple concept about regex.

s = '0123456789'
s.scan(/\d\d/) #-> ["01", "23", "45", "67", "89"]

Now I want to exclude "45".
How can I express it in the regex?
When it's only one character, I can use ^.
But for 2 characters, I don't think I can use it.

What I want is:

s = '0123456789'
s.scan(some_regex) #-> ["01", "23", "67", "89"]

What should some_regex be?

Can somebody help me?

Sam

Assaph Mehr · Mar 24, 2005

s = '0123456789'
s.scan(/\d\d/) #-> ["01", "23", "45", "67", "89"]

Now I want to exclude "45".
How can I express it in the regex?
When it's only one character, I can use ^.
But for 2 characters, I don't think I can use it.

What I want is:

s = '0123456789'
s.scan(some_regex) #-> ["01", "23", "67", "89"]

Negative lookahead:
s.scan /(?!4|5)\d\d/
Note the OR sign ('|') between the digits, otherwise it would produce:
["01", "23", "56", "78"]

You need to tune it to your exact domain.

Cheers,
Assaph

Carlos · Mar 24, 2005

Hello!

I think that I am missing a very simple concept about regex.

s = '0123456789'
s.scan(/\d\d/) #-> ["01", "23", "45", "67", "89"]

Now I want to exclude "45".
How can I express it in the regex?
When it's only one character, I can use ^.
But for 2 characters, I don't think I can use it.

You can use a "negative lookahead assertion":

s.scan(/(?!45)\d\d/)

This means, at every point the regex tries to match, "if the next two
characters aren't "45", match \d\d".

HTH.
--

Jason Sweat · Mar 24, 2005

Hello!

I think that I am missing a very simple concept about regex.

s = '0123456789'
s.scan(/\d\d/) #-> ["01", "23", "45", "67", "89"]

Now I want to exclude "45".
How can I express it in the regex?
When it's only one character, I can use ^.
But for 2 characters, I don't think I can use it.

What I want is:

s = '0123456789'
s.scan(some_regex) #-> ["01", "23", "67", "89"]

What should some_regex be?

You can use a negative assertion to say you want to skip "45", but it
will bump forward one space and you will end up with the last matches
being "56" and "78"
=> ["01", "23", "56", "78"]

So with a little uglier assertion, you can say:
=> ["01", "23", "67", "89"]

and get what you specified, but though it works for your toy case, I
would be worried that this might not extrapolate out to your real goal
well.

HTH

Regards,
Jason
http://blog.casey-sweat.us/

Patrick Hurley · Mar 24, 2005

What they said, but also if you can be more precise about your real
problem, we might be able to better model a solution. You might find
matching the expression you want and then scanning it to be more
flexible for example.

s = '0123456789'
s.scan(/\d\d/) #-> ["01", "23", "45", "67", "89"]

Now I want to exclude "45".
How can I express it in the regex?
When it's only one character, I can use ^.
But for 2 characters, I don't think I can use it.

What I want is:

s = '0123456789'
s.scan(some_regex) #-> ["01", "23", "67", "89"]

Click to expand...

Negative lookahead:
s.scan /(?!4|5)\d\d/
Note the OR sign ('|') between the digits, otherwise it would produce:
["01", "23", "56", "78"]

You need to tune it to your exact domain.

Cheers,
Assaph

Robert Klemme · Mar 24, 2005

Assaph Mehr said:
s = '0123456789'
s.scan(/\d\d/) #-> ["01", "23", "45", "67", "89"]

Now I want to exclude "45".
How can I express it in the regex?
When it's only one character, I can use ^.
But for 2 characters, I don't think I can use it.

What I want is:

s = '0123456789'
s.scan(some_regex) #-> ["01", "23", "67", "89"]

Click to expand...

Negative lookahead:
s.scan /(?!4|5)\d\d/
Note the OR sign ('|') between the digits, otherwise it would produce:
["01", "23", "56", "78"]
But:

s = '01234567894657' => "01234567894657"
s.scan /(?!4|5)\d\d/ => ["01", "23", "67", "89", "65"]
s.scan /\d\d/

Click to expand...

=> ["01", "23", "45", "67", "89", "46", "57"]

IOW, you loose "46" and "57".

I prefer a non RE solution in these cases as it's simpler
=> ["01", "23", "67", "89", "46", "57"]

Otherwise RE becomes really complex if you want to make it right - if it's
possible at all (see other postings).

Kind regards

robert

Sam Kong · Mar 24, 2005

Thank you and other posters for the answers.
Actually s.scan(/(?!45)\d\d/) suffices my real problem.

What I was trying to solve was...
To extract url's from an html source which includes list of sites.
They are formatted like <a href="something.html">.
But I wanted to exclude <a href="index.html"> from the list.
So (?!index.html) will do.
Actually my toy case was not well-defined (I realized this later) and
thus it required more complex solutions like your second case -
s.scan(/(?!45|5)\d\d/) .

I think non-RE solution would be better like Mr. Robert Klemme said.
But I wanted to learn some RE.

Thanks.
Sam

Simon Strandgaard · Mar 24, 2005

To extract url's from an html source which includes list of sites.
They are formatted like <a href="something.html">.
But I wanted to exclude <a href="index.html"> from the list.
So (?!index.html) will do.

does this help?

ary=%w(a.html index.html other.txt evil.html.exe stuff.html)
ary.select{|s| s =~ /\A(?!index).*\.html\z/ } #=> ["a.html", "stuff.html"]

Csaba Henk · Mar 25, 2005

What I was trying to solve was...
To extract url's from an html source which includes list of sites.
They are formatted like <a href="something.html">.
But I wanted to exclude <a href="index.html"> from the list.
So (?!index.html) will do.
Actually my toy case was not well-defined (I realized this later) and
thus it required more complex solutions like your second case -
s.scan(/(?!45|5)\d\d/) .

Why don't you use a dedicated html parser? Eg. there's htmltokenizer,
available ar Rubyforge, quite lightweight and very easy to use, but
there are others, of course.

I think non-RE solution would be better like Mr. Robert Klemme said.
But I wanted to learn some RE.

This thread was useful, I admit

Csaba

Question about my projects	3	Jul 23, 2021
Question about WEKA, Python and Python-WEKA-Wrapper3	0	Mar 31, 2022
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
regex question	3	Nov 1, 2007
How to go about building a crud app when you are a noob	1	Jan 2, 2023
getting the results of a simple regex	2	Apr 6, 2009
Formatting a long regex: can a character class [] be split overlines?	4	May 1, 2011
question about scanf	11	Apr 16, 2014

(Maybe) a simple question about regex

Sam Kong

Assaph Mehr

Carlos

Jason Sweat

Patrick Hurley

Robert Klemme

Sam Kong

Simon Strandgaard

Csaba Henk

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads