regular expression match and exclude

A

Azalar ---

I am parsing a web page full of image links that also contain links to
the thumbnails for those images.

Here is my test data..

galleries/image1.jpg
galleries/image1thumb.jpg
galleries/image2.jpg
galleries/image2thumb.jpg
galleries/image3.jpg
galleries/image3thumb.jpg

If i use this expression..
/galleries.*(?=thumb).*jpg/

The results are just those lines containing the word thumb.

What I want to do is inverse this though and return only return that
DONT contain the word thumb.
 
J

Jesús Gabriel y Galán

I am parsing a web page full of image links that also contain links to
the thumbnails for those images.

Here is my test data..

galleries/image1.jpg
galleries/image1thumb.jpg
galleries/image2.jpg
galleries/image2thumb.jpg
galleries/image3.jpg
galleries/image3thumb.jpg

If i use this expression..
/galleries.*(?=thumb).*jpg/

The results are just those lines containing the word thumb.

What I want to do is inverse this though and return only return that
DONT contain the word thumb.

If you have all the links in an array, I would do the opposite: match
the ones that contain
thumb, rejecting those from the array

irb(main):001:0> links = %w{galleries/image1.jpg
galleries/image1thumb.jpg galleries/image2.jpg
galleries/image2thumb.jpg galleries/image3.jpg
galleries/image3thumb.jpg}
=> ["galleries/image1.jpg", "galleries/image1thumb.jpg",
"galleries/image2.jpg", "galleries/image2thumb.jpg",
"galleries/image3.jpg", "galleries/image3thumb.jpg"]

irb(main):007:0> links.reject {|l| l =~ /thumb/}
=> ["galleries/image1.jpg", "galleries/image2.jpg", "galleries/image3.jpg"]

You can as specific as you need for the regexp:

irb(main):008:0> links.reject {|l| l =~ /galleries\/.*thumb.*jpg/}
=> ["galleries/image1.jpg", "galleries/image2.jpg", "galleries/image3.jpg"]


Hope this helps,

Jesus.
 
A

Azalar ---

I should mention that image isn't a fixed word in this case I just used
that as an example.
It represents whatever the name of the image is

So the test data could be..

galleries/blah.jpg
galleries/blahthumb.jpg
galleries/landscape.jpg
galleries/landscapethumb.jpg
galleries/foo.jpg
galleries/foothumb.jpg
 
A

Axel Etzold

-------- Original-Nachricht --------
Datum: Mon, 8 Sep 2008 02:46:12 +0900
Von: Azalar --- <[email protected]>
An: (e-mail address removed)
Betreff: Re: regular expression match and exclude
I should mention that image isn't a fixed word in this case I just used
that as an example.
It represents whatever the name of the image is

So the test data could be..

galleries/blah.jpg
galleries/blahthumb.jpg
galleries/landscape.jpg
galleries/landscapethumb.jpg
galleries/foo.jpg
galleries/foothumb.jpg

Dear Azalar,

to achieve searching for some text pattern that is not followed by something else, you need
regular expressions with negative lookahead

http://www.regular-expressions.info/lookaround.html

As far as I know, this is not supported by Ruby's 1.8.x regexps, but there is a special library, Oniguruma,
with Ruby bindings available for this.

http://oniguruma.rubyforge.org/

Best regards,

Axel
 
R

Robert Klemme

2008/9/7 Axel Etzold said:
to achieve searching for some text pattern that is not followed by something else, you need
regular expressions with negative lookahead

http://www.regular-expressions.info/lookaround.html

As far as I know, this is not supported by Ruby's 1.8.x regexps,

Negative lookahead *is* supported by that (and earlier) version.

The problem with negative lookahead is that - with the test data given
- that the negative lookahead is difficult to get right:

irb(main):016:0> %w{foo.jpg foothumb.jpg}.each do |s|
irb(main):017:1* p [s, /\A\w+(?!thumb)\.jpg\z/ =~ s]
irb(main):018:1> end
["foo.jpg", 0]
["foothumb.jpg", 0]
=> ["foo.jpg", "foothumb.jpg"]

I am not saying that it won't work but off the top of my head I do not
have a solution that works. In any case, it's easier to exclude
matches outside the regular expression engine. One way would be to
make thumb a capturing group and check whether the group is present,
like

irb(main):022:0> %w{foo.jpg foothumb.jpg}.each do |s|
irb(main):023:1* p [s, /\A\w+?(thumb)?\.jpg\z/ =~ s, $1]
irb(main):024:1> end
["foo.jpg", 0, nil]
["foothumb.jpg", 0, "thumb"]
=> ["foo.jpg", "foothumb.jpg"]

and use that criterion for exclusion.

Now you can meditate on why the negative lookahead did not work. :)

Kind regards

robert
 
P

Peña, Botp

RnJvbTogQXphbGFyIC0tLSBbbWFpbHRvOnB0ZWFsZUBnbWFpbC5jb21dIA0KIyBUaGUgYXJyYXkg
aXMgY3JlYXRlZCB3aGVuIGkgdXNlIHRoZSBzY2FuIG1ldGhvZCBzbyBpZiBpIHVzZWQgDQojIHJl
amVjdCBpdCANCiMgd291bGQgYmVjb21lIGEgdHdvIHBhc3MgcHJvY2VzcyB3aGljaCBpcyB3aGF0
IGkgYW0gZG9pbmcgDQojIGFueXdheSBzbyBJIHdhcyANCiMgd29uZGVyaW5nIGlmIHJlZ3VsYXIg
ZXhwcmVzc2lvbnMgaGFkIGJ1aWx0IGluIHN1cHBvcnQgZm9yIHRoaXMuDQoNCndlbGwgaWYgaXQn
cyB0aHVtYiBmb3IgdGh1bWJzIHNha2UsIHdlIGNhbiBiZSBzdHViYm9ybiBhYm91dCBpdCA6KQ0K
DQppcmIobWFpbik6MDM2OjA+IHJlMg0KPT4gL2dhbGxlcmllcy4qPyhbXnRdW15oXVtedV1bXm1d
W15iXSkuanBnLw0KaXJiKG1haW4pOjAzODowPiBnLnNlbGVjdHt8eHwgeD1+cmUyIH0NCj0+IFsi
Z2FsbGVyaWVzL2ltYWdlMS5qcGciLCAiZ2FsbGVyaWVzL2ltYWdlMi5qcGciLCAiZ2FsbGVyaWVz
L2ltYWdlMy5qcGciXQ0KDQpraW5kIHJlZ2FyZHMgLWJvdHANCg==
 
R

Robert Klemme

2008/9/8 Caleb Clausen said:
irb(main):036:0> re2
=3D> /galleries.*?([^t][^h][^u][^m][^b]).jpg/
irb(main):038:0> g.select{|x| x=3D~re2 }
=3D> ["galleries/image1.jpg", "galleries/image2.jpg", "galleries/image3.=
jpg"]

Unfortunately this way will fail to match (for instance) "bomb.jpg".
If you want to go this route, you need to do something like this:
(untested!)

/^galleries.*([^t]humb|[^h]umb|[^u]mb|[^m]b|[^b])\.jpg$/ #bleah

Patrick said:
/^galleries\/((?!thumb).)+\.jpg$/

Ah, brilliant! But it doesn't quite work. It fails to match
"thumb_foo.jpg", which probably should match here. A simple
modification should fix it, tho:

/^galleries\/((?!thumb\.jpg$).)+\.jpg$/

There. That's a pretty nice pure regexp.

Now just make the group non capturing and you're a tad more efficient. :)

%r{\Agalleries/(?:(?!thumb\.jpg\z).)+\.jpg\z}

Cheers

robert


--=20
use.inject do |as, often| as.you_can - without end
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,776
Messages
2,569,603
Members
45,189
Latest member
CryptoTaxSoftware

Latest Threads

Top