Using =~ to Include and Omit in same line

Walt · Apr 21, 2006

In the following single line @URLs will contain all urls from $omsg.

@URLs = ($omsg =~/(\Shttp:\/\/| http:\/\/|\shttp:\/\/)(.*?)( |\n|\r|
)/gi);

What I would like to have is a way to scan in the same line if there
is an equal sign in front of http and if so, omit it.

Is there a way to do it within the single line?

David Squire · Apr 21, 2006

Walt wrote again, having just done so under the subject "PERL Search and
Replace":

In the following single line @URLs will contain all urls from $omsg.

@URLs = ($omsg =~/(\Shttp:\/\/| http:\/\/|\shttp:\/\/)(.*?)( |\n|\r|
)/gi);

What I would like to have is a way to scan in the same line if there
is an equal sign in front of http and if so, omit it.

Is there a way to do it within the single line?

Please do not post the same thing twice.

Regards,

DS

Xicheng Jia · Apr 21, 2006

Walt said:
In the following single line @URLs will contain all urls from $omsg.

@URLs = ($omsg =~/(\Shttp:\/\/| http:\/\/|\shttp:\/\/)(.*?)( |\n|\r|

what do you want to achieve here

(\Shttp:\/\/| http:\/\/|\shttp:\/\/)

which is about the same(if dont count newline) as:

(.http:\/\/)

)/gi);

What I would like to have is a way to scan in the same line if there
is an equal sign in front of http and if so, omit it.

try the following:
([^=]http:\/\/)
(?!=)(.http:\/\/)
(.(?<!=)http:\/\/)

(untested)

Xicheng

J. Gleixner · Apr 21, 2006

Walt said:
In the following single line @URLs will contain all urls from $omsg.

@URLs = ($omsg =~/(\Shttp:\/\/| http:\/\/|\shttp:\/\/)(.*?)( |\n|\r|
)/gi);

What I would like to have is a way to scan in the same line if there
is an equal sign in front of http and if so, omit it.

Is there a way to do it within the single line?

? - Matches 1 or 0 times.

perldoc perlretut

You probably want to look at what \S and \s actually match, or don't match.

Also you can use other delimiters, like { and }, which will clean up
your expression, a lot.

ClubK · Apr 21, 2006

So far none of these ideas worked.

Examples:

Find in a text file http://www/google.com at it to the array or any
text url

But dont add href=http://www.google.com

Wanted to do it in a sngle line of code if I can.

So far the line I wrote works already except when a text url starts a
line.

See here: http://www.clubknowledge.com/Car_Audio_FAQ/?g4210_529

ClubK · Apr 21, 2006

In my forum, The gold members have permission to use HTML and dont want
it to include the urls already formated. Just want to do the ones for
the users who don't know html and I dont want to have them learn BBCode
etc.

Xicheng Jia · Apr 21, 2006

ClubK said:
So far none of these ideas worked.

Examples:

Find in a text file http://www/google.com at it to the array or any
text url

But dont add href=http://www.google.com

Wanted to do it in a sngle line of code if I can.

So far the line I wrote works already except when a text url starts a
line.

Let's see your regex:

@URLs = ($omsg =~/(\Shttp:\/\/|
http:\/\/|\shttp:\/\/)(.*?)( |\n|\r|)/gi);

1) you have three capturing parenthesis, so for each match, you get
three elements into the array @URLs, I guess this is not what you
wanted..

2) what do you want to end your URLs, by , \n or \r??? so:

@URLs = ( $omsg =~ /(?<!=)(http:\/\/.*?)(?: |\n|\r)/gi );

(untested)

In your regex, you need exactly only one capturing parentheses instead
of three..

Xicheng

ClubK · Apr 21, 2006

I want urls that start with http and not =http and I needed a way to
find the end of the url. In my case it can end it , \n , \r or a
space

So my regex gets everything BETWEEN http:// and , \n, \r or space

Returns www.test.com or test.com and I add back in the http:// later.

Your example returns and array of http:// for each url found

Xicheng Jia · Apr 21, 2006

ClubK said:
I want urls that start with http and not =http and I needed a way to
find the end of the url. In my case it can end it , \n , \r or a
space

So my regex gets everything BETWEEN http:// and , \n, \r or space

Returns www.test.com or test.com and I add back in the http:// later.

Your example returns and array of http:// for each url found

so just move 'http\/\/' out from the parenthesis, like:

@URLs = ( $omsg =~ /(?<!=)http:\/\/(.*?)(?: |\n|\r)/gi );

Xicheng

ClubK · Apr 21, 2006

Thank you very much, works great!

Dr.Ruud · Apr 21, 2006

Xicheng Jia schreef:

@URLs = ( $omsg =~ /(?<!=)http:\/\/(.*?)(?: |\n|\r)/gi );

To get rid of the sawtooths, pick a different separator, like ~

@URLs = ( $omsg =~ m~(?<!=)http://(.*?)(?: |\n|\r)~gi );

or, if you prefer them tall'n'skinny: !

@URLs = ( $omsg =~ m!(?<!=)http://(.*?)(?: |\n|\r)!gi );

or use brackets and the "/x" modifier and sprinkle some whitespace

@URLs = ( $omsg =~ m{ (?<!=) # no '=' in front
http://(.*?) # URL
(?: |\n|\r) # why not \s ?
}xgi ); # loop

ClubK · Apr 21, 2006

Do you think one of these methods are faster or more efficient than
the others?

Jürgen Exner · Apr 21, 2006

ClubK said:
Do you think one of these methods are faster or more efficient than
the others?

Which two methods are you talking about? =~ and what?

jue

Tad McClellan · Apr 22, 2006

Walt said:
In the following single line @URLs will contain all urls from $omsg.

@URLs = ($omsg =~/(\Shttp:\/\/| http:\/\/|\shttp:\/\/)(.*?)( |\n|\r|
)/gi);

What I would like to have is a way to scan in the same line if there
is an equal sign in front of http and if so, omit it.

So you want to filter the list that is returned from the m//g.

Is there a way to do it within the single line?

grep() is the Right Tool for filtering a list:

@URLs = grep !/=http/, $omsg =~/(\Shttp:\/\/| http:\/\/|\shttp:\/\/)(.*?)( |\n|\r|)/gi;

ClubK · Apr 22, 2006

Here is what I ended up using. So far works great.

@URLs = ( $omsg =~ m{ (?<!=) # no '=' in front
http://(.*?) # URL
(?: |\n|\r) # why not \s ?
}xgi ); # loop

foreach $ur (@URLs)
{
$urlx="http:\/\/".$ur;
$ur=~s/\?/\\\?/g;
$ur=~s/$/\\\(/g;
$ur=~s/$/\\\)/g;
$ur=~s/\*/\\\*/g;
$ur=~s/\+/\\\+/g;
$ur="http:\/\/".$ur;
$nur="<a href=$urlx target=_blank>$urlx</a>";
$omsg=~ s/$ur/$nur/g;
}

Anno Siegel · Apr 22, 2006

ClubK said:
Here is what I ended up using. So far works great.

@URLs = ( $omsg =~ m{ (?<!=) # no '=' in front
http://(.*?) # URL
(?: |\n|\r) # why not \s ?
}xgi ); # loop

Do you really have the comment on the next-to-last line in your working
code? It doesn't explain the code, is a question asked of *you*. Change
the line accordingly, or don't, but take out the comment. It has no
business in production code.

Don't blindly accept code, from here or anywhere. Spend some thought and
adapt it to its new environment, even if it does its job as is. Your
source will become a mess if you don't. That includes comments.

Anno

ClubK · Apr 23, 2006

Actually I did not but thank you for the heads up

Dr.Ruud · Apr 29, 2006

Stan R. schreef:

Dont forget to escape the ! in (?<!=), so it should look like

... =~ m!(?<\!=) ...

Yes, thanks for that catch. These tall'n'skinny ones can be trouble.

#!/usr/bin/perl
use strict;
use warnings;

while (<DATA>)
{
print;
print "~\n" if m~ (?<!=) http:// (.*?) (?: | \s ) ~xi;
print "!\n" if m! (?<\!=) http:// (.*?) (?: | \s ) !xi;
print "{}\n" if m{ (?<!=) http:// (.*?) (?: | \s ) }xi;
print "|\n" if m| (?<!=) http:// (.*?) (?: \| \s ) |xi;
}
__DATA__
href=http://test1
href="http://test2"
This is http://test3 and now comes
http://test4

I tested it wrongly, I thought the following would report anomalies:

perl -e '$r=qr{m!(?<!=)http://(.*?)(?: |\n|\r)!gi};print $r'
(?-xism:m!(?<!=)http://(.*?)(?: |\n|\r)!gi)

perl -e '$r=qr{m~(?<!=)http://(.*?)(?: |\n|\r)~gi};print $r'
(?-xism:m~(?<!=)http://(.*?)(?: |\n|\r)~gi)

These catch it:

$ perl -ce 'm!(?<!=)http://(.*?)(?: |\n|\r)!gi'
Sequence (?<...) not recognized in regex; marked by <-- HERE in m/(?<
<-- HERE äèèèìèdèôèoèüè/ at -e line 1.

perl -MO=Deparse -e 'm!(?<!=)http://(.*?)(?: |\n|\r)!gi'
Sequence (?<...) not recognized in regex; marked by <-- HERE in m/(?<
<-- HERE äèO::/ at -e line 1.

but the reports seems to have a disease themselves.

Dr.Ruud · Apr 29, 2006

Stan R. schreef:

No error about the unescaped ! in the pattern?

(m!(?<!= ...

ok after running some tests with the above, I relaized that you have
m!...! inside qr{}... is there any reason why you did that?

I already explained that: as a test, because I thought [that test] would
report anomalies.

For some errors, qr// reports them:

$ perl -le '$r=qr/*./; print $r'

Yes, but you're not using the qr construct this time.

Yes, because I found about that qr// did not croak on this error, so I
tried a different method.

It's the unescaped ! in the lookback... the test with the qr that you
did worked because qrust simple parsed everything as regex pattern
(compiles it if you will), and spits out the result which can be used
in the pattern operator. The m!...! part was enclosed in the qr{...},
so no syntax error.

Why would an invalid regular expression inside a qr// not be a syntax
error?

PERL Search & Replace	3	Apr 21, 2006
Command Line Arguments	0	Mar 7, 2023
Variables and classes in pyside6???	1	Aug 19, 2024
In R Shiny, How do I ensure variable value propagation within same code block in R?	0	Sep 29, 2022
I want to include fees depending on the payment method, using the plugin "Deposits for Woocommerce"	0	Aug 17, 2022
Implementing Many Stacks in the Same Program	1	Aug 10, 2021
How to paste n+1 every single time without copying new line from excel	3	Jul 13, 2023
#include <cstdio> and #include <stdio.h> in the same file!?	2	Jan 22, 2013

Using =~ to Include and Omit in same line

Walt

David Squire

Xicheng Jia

J. Gleixner

ClubK

ClubK

Xicheng Jia

ClubK

Xicheng Jia

ClubK

Dr.Ruud

ClubK

Jürgen Exner

Tad McClellan

ClubK

Anno Siegel

ClubK

Dr.Ruud

Dr.Ruud

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads