Does string contain A, and if so, does a section of string contain B

J

Jason Carlton

Tricky subject, sorry.

I'm wanting to check a textarea to see if it contains "<img", and if
so, does the section between "<img" and the following ">" contain
"mydomain.com".

This is particularly tricky since there can be more than one
"<img...>" in the field.

I can do this in Perl easily enough:

while ($comment =~ /(<img[^>]+?>)/sgxi) {
if ($1 =~ /mydomain\.com/gi) {
# do whatever
}
}


But how do I create something similar in Javascript?

TIA,

Jason
 
E

Evertjan.

Jason Carlton wrote on 07 dec 2009 in comp.lang.javascript:
Tricky subject, sorry.

No it is not.

I'm wanting to check a textarea to see if it contains "<img", and if
so, does the section between "<img" and the following ">" contain
"mydomain.com".

This is particularly tricky since there can be more than one
"<img...>" in the field.

I can do this in Perl easily enough:

while ($comment =~ /(<img[^>]+?>)/sgxi) {
if ($1 =~ /mydomain\.com/gi) {
# do whatever
}
}

Do you think that is easy, look at javascript!
But how do I create something similar in Javascript?


var booleanResult = /<img[^>]+mydomain\.com[^>]*>/i.test(str)
 
J

Jason Carlton

Jason Carlton wrote on 07 dec 2009 in comp.lang.javascript:
Tricky subject, sorry.

No it is not.
I'm wanting to check a textarea to see if it contains "<img", and if
so, does the section between "<img" and the following ">" contain
"mydomain.com".
This is particularly tricky since there can be more than one
"<img...>" in the field.
I can do this in Perl easily enough:
while ($comment =~ /(<img[^>]+?>)/sgxi) {
  if ($1 =~ /mydomain\.com/gi) {
    # do whatever
  }
}

Do you think that is easy, look at javascript!
But how do I create something similar in Javascript?

var booleanResult = /<img[^>]+mydomain\.com[^>]*>/i.test(str)


Awesome! Thanks, Evertjan, that is easy. I couldn't find anything on
the i.test() function you used, though. Is there a different name for
that function?

Similarly, how do I do the opposite and test if any of the "<img...>"
tags do NOT contain mydomain.com?
 
T

Thomas 'PointedEars' Lahn

Jason said:
Everjan. said:
Jason said:
I can do this in Perl easily enough:

while ($comment =~ /(<img[^>]+?>)/sgxi) {
if ($1 =~ /mydomain\.com/gi) {
# do whatever
}
}

I presume this can be done better in Perl, too.
[...]
var booleanResult = /<img[^>]+mydomain\.com[^>]*>/i.test(str)

That is not equivalent to what you are doing in Perl above, though.
Incidentally, you should not assume people know other languages than those
discussed in the target newsgroup, although it is often the case. When in
doubt, explain what the code in the other language does.
Awesome! Thanks, Evertjan, that is easy. I couldn't find anything on
the i.test() function you used, though.

It is _not_ the i.test() function. The `i' (case-*i*nsensitive) belongs to
the RegExp literal, like in Perl. I am getting the idea here that you do
not know Perl (and Perl-compatible Regular Expressions) either.
Is there a different name for that function?

Any name you want to give it. The property name stands for a reference to a
Function object; that object can have any number of references to it.
(However, it is required here that the base object of the reference is a
RegExp instance).
Similarly, how do I do the opposite and test if any of the "<img...>"
tags do NOT contain mydomain.com?

Possibility: Non-capturing negative lookahead (borrowed from PCRE, too).
RTFM.


PointedEars
 
J

Jason Carlton

I presume this can be done better in Perl, too.

TIMTOWTDI.

It is _not_ the i.test() function.  The `i' (case-*i*nsensitive) belongs to
the RegExp literal, like in Perl.  I am getting the idea here that you do
not know Perl (and Perl-compatible Regular Expressions) either.

Don't be a douche. I'd never seen the switch followed by .test, and
really have never used a switch in Javascript, so I didn't catch that
this is what that was. Sue me.

Possibility: Non-capturing negative lookahead (borrowed from PCRE, too).
RTFM.

I looked into that before posting, but I'm not sure that (a) I'm doing
it right, and (b) it's going to do what I'm needing.

This just returns true on everything:

booleanResult = /(?!<img[^>]+mydomain\.com[^>]*>)/gi.test(comment);


This returns false if there's only one <img...> tag that doesn't
contain mydomain.com, but if I have multiple tags then it returns true
if any of them do not contain mydomain.com:

booleanResult = /(?=<img[^>]+mydomain\.com[^>]*>)/gi.test(comment);

Which means that it would return this as false:

var comment = "Test <img src='http://www.yahoo.com/logo.gif'>";

But this as true:

var comment = "Test <img src='http://www.mydomain.com/
logo.gif'><br>Test <img src='http://www.yahoo.com/logo.gif'>";


I need it to return false if ANY of the instances existed that didn't
contain mydomain.com.
 
A

abozhilov

Evertjan. said:
var booleanResult = /<img[^>]+mydomain\.com[^>]*>/i.test(str)

[^>]+

+ is greedy and here you have backtracking when engine go to `>`. You
can see in RegexBuddy with string:

<img src="mydomain.com" alt="" /> => Regex engine make 66 step before
match.

If you make plus lazzy:

<img[^>]+?mydomain\.com[^>]*> => 30 step

Regards.
 
C

Csaba Gabor

Which means that it would return this as false:

var comment = "Test <img src='http://www.yahoo.com/logo.gif'>";

But this as true:

var comment = "Test <img src='http://www.mydomain.com/
logo.gif'><br>Test <img src='http://www.yahoo.com/logo.gif'>";

I need it to return false if ANY of the instances existed that didn't
contain mydomain.com.

I would try something like:
if (!(1+comment.replace(
/<img[^>]+?mydomain\.com[^>]*?>/gi,"<img>").
search(/<img[^>]+?>/i)))
alert ("all have mydomain.com");
else alert ("non mydomain.com detected");

That first replace is for degenerate cases of <img> in the string.
The second replace replaces all properly formed <img ...> elements
with a dummy element. The search then checks for any rogue
elements still left.

However, what about the case of something like:
<img src='othercomain.com' title='<img src="mydomain.com">'>
Everything discussed so far will fail on that - a
broader approach is necessary if you want to protect
against more complicated strings.

Csaba Gabor from Vienna
 
E

Evertjan.

Thomas 'PointedEars' Lahn wrote on 07 dec 2009 in comp.lang.javascript:
var booleanResult = /<img[^>]+mydomain\.com[^>]*>/i.test(str)

That is not equivalent to what you are doing in Perl above, though.
Incidentally, you should not assume people know other languages than
those discussed in the target newsgroup, although it is often the
case. When in doubt, explain what the code in the other language
does.

Indeed, I don't know a perl from a swine.
It is _not_ the i.test() function. The `i' (case-*i*nsensitive)
belongs to the RegExp literal, like in Perl. I am getting the idea
here that you do not know Perl (and Perl-compatible Regular
Expressions) either.


Any name you want to give it. The property name stands for a
reference to a Function object; that object can have any number of
references to it. (However, it is required here that the base object
of the reference is a RegExp instance).


Possibility: Non-capturing negative lookahead (borrowed from PCRE,
too). RTFM.

No lookahead needed,
if "none" of the tags is ment.

var invertedBooleanResult = !/<img[^>]+mydomain\.com[^>]*>/i.test(str)
 
T

Thomas 'PointedEars' Lahn

Jason said:
Don't be a douche. I'd never seen the switch followed by .test, and
really have never used a switch in Javascript, so I didn't catch that
this is what that was. Sue me.



I looked into that before posting, but I'm not sure that (a) I'm doing
it right, and (b) it's going to do what I'm needing.

That's too bad.


Score adjusted

PointedEars
 
A

Asen Bozhilov

Csaba said:
I would try something like:
if (!(1+comment.replace(
    /<img[^>]+?mydomain\.com[^>]*?>/gi,"<img>").
    search(/<img[^>]+?>/i)))
     alert ("all have mydomain.com");
else alert ("non mydomain.com detected");

That first replace is for degenerate cases of <img> in the string.
The second replace replaces all properly formed <img ...> elements
with a dummy element.  The search then checks for any rogue
elements still left.

Interesting. But your approach make two steps before completely
analyze input string.
What about this one:

/<img(?:(?!mydomain\.com)[^>])+?>/i;

Will be match first image which doesn't contain "mydomain.com".

Regards ;~)
 
J

Jason Carlton

Evertjan. said:
var booleanResult = /<img[^>]+mydomain\.com[^>]*>/i.test(str)

[^>]+

+ is greedy and here you have backtracking when engine go to `>`. You
can see in RegexBuddy with string:

<img src="mydomain.com" alt="" /> => Regex engine make 66 step before
match.

If you make plus lazzy:

<img[^>]+?mydomain\.com[^>]*> => 30 step

Regards.


Thanks to all of you! This really helped a lot.

- Jason
 
D

Dr J R Stockton

In comp.lang.javascript message <bed20c49-b2fd-4f53-ade1-299b64ede909@g3
1g2000vbr.googlegroups.com>, Sun, 6 Dec 2009 15:16:35, Jason Carlton
I'm wanting to check a textarea to see if it contains "<img", and if
so, does the section between "<img" and the following ">" contain
"mydomain.com".

This is particularly tricky since there can be more than one
"<img...>" in the field.


Under such circumstances, and particularly if you are not fully familiar
with all the features of tee latest JavaScript RegExps, it may help to
tackle the problem in more than one pass.

In this case, consider first replacing all "<img" with a single
character that is not in the string already (Unicode offers tens of
thousands). You can then more easily express the condition that a
substring must not contain the consecutive characters < i m g .
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top