regex to match any url

A

A. Sinan Unur

(e-mail address removed) wrote in @g14g2000cwa.googlegroups.com:
I am struggling way too much with this. Does someone have a regex that
will match any url-ish string like. Not worried about mail links.

http://sd.org
www.dssd.com
ibm.mil
https://sdsdsd.jobs
xyz.travel

Please show what you have tried and what has not worked so that we can
help you with what you don't know rather than acting as a "write-my-
code-for-me" service.

#!/usr/bin/perl

use strict;
use warnings;

while ( <DATA> ) {
print if m{ \A (?: https?:// )? \w+ (?: \. \w+)+ \n \z }x;
}

__DATA__
http://sd.org
www.dssd.com
ibm.mil
https://sdsdsd.jobs
xyz.travel

D:\Home\asu1\UseNet\clpmisc> u
http://sd.org
www.dssd.com
ibm.mil
https://sdsdsd.jobs
xyz.travel
 
J

Jürgen Exner

I am struggling way too much with this. Does someone have a regex
that will match any url-ish string like. Not worried about mail
links.

http://sd.org
www.dssd.com
ibm.mil
https://sdsdsd.jobs
xyz.travel

That's easy: /.*/ will match not only all of your examples but any URL you
can imagine.

Now, having said that, maybe it actually was a different question you wanted
to ask?

jue
 
A

axel

A. Sinan Unur said:
(e-mail address removed) wrote in @g14g2000cwa.googlegroups.com:
Please show what you have tried and what has not worked so that we can
help you with what you don't know rather than acting as a "write-my-
code-for-me" service.
#!/usr/bin/perl

use strict;
use warnings;

while ( <DATA> ) {
print if m{ \A (?: https?:// )? \w+ (?: \. \w+)+ \n \z }x;
^
|
Perhaps this should changed to *
to relect one word valid URLs
such as 'localhost' :)

Axel
 
A

Andreas Puerzer

DJ said:
three words: Regexp::Common::URI

-jp

Hm, let me have a look again at what the OP wrote:

I am struggling way too much with this. Does someone have a regex that
will match any url-ish string like. Not worried about mail links.

http://sd.org
www.dssd.com
ibm.mil
https://sdsdsd.jobs
xyz.travel

Thanks!

I read this as: 'I want a RE that matches all of my example-URIs, because they
all look url-ish.' ( a very vague and, at least in my eyes, error-prone
criterium, tempting me to give this: /.*\.\w{2,6}/ as an answer). To the OP:
What, exactly, do you want to accomplish?

If my assumption of the OP's intention is correct, then you're out of luck with
Regexp::Common, as it will only match valid URIs, as shown here:

D:\Temp\test_area>cat stunks.pl
#!/usr/bin/perl

use warnings;
use strict;

use Regexp::Common qw/URI/;

chomp ( my @uris = ( <DATA> ) );
foreach ( @uris ) {
/$RE{URI}{-keep}/ ? print "Found: $1\n" : print "Discarding: $_\n";
}

__DATA__
http://sd.org
www.dssd.com
ibm.mil
https://sdsdsd.jobs
xyz.travel

D:\Temp\test_area>perl stunks.pl
Found: http://sd.org
Discarding: www.dssd.com
Discarding: ibm.mil
Discarding: https://sdsdsd.jobs
Discarding: xyz.travel

If I did misunderstand the OP I sincerely apologize for jumping at you when you
were giving a perfectly valid Solution ( though I still see some issues coming
up with the https-uris... , but hey, here's where the Fun(tm) begins: hooking
your own REs into Regexp::Common :-> )


Greetings,
Andreas Pürzer
 
A

A. Sinan Unur

(e-mail address removed) wrote in @text.news.blueyonder.co.uk:
....

^
|
Perhaps this should changed to *
to relect one word valid URLs
such as 'localhost' :)

I wrote it to match the strings the OP provided. Further extension is
left to the reader as an exercise ;-)

Sinan
 
D

DJ Stunks

Andreas said:
If my assumption of the OP's intention is correct, then you're out of luck with
Regexp::Common, as it will only match valid URIs, as shown here:

D:\Temp\test_area>cat stunks.pl
#!/usr/bin/perl

use warnings;
use strict;

use Regexp::Common qw/URI/;

chomp ( my @uris foreach ( @uris ) {
/$RE{URI}{-keep}/ ? print "Found: $1\n" : print "Discarding: $_\n";
}

__DATA__
http://sd.org
www.dssd.com
ibm.mil
https://sdsdsd.jobs
xyz.travel

D:\Temp\test_area>perl stunks.pl
Found: http://sd.org
Discarding: www.dssd.com
Discarding: ibm.mil
Discarding: https://sdsdsd.jobs
Discarding: xyz.travel

Thanks, I had assumed that "www.dssd.com" for instance, would have
matched. Clearly, ibm.mil and xyz.travel are so ambiguous as to be
anything and I can understand them not matching.

I suppose one would have to insist on valid IANA GTLDs in the regex
my @TLDs = qw{aero biz cat com coop info jobs mobi museum} and so
on...
If I did misunderstand the OP I sincerely apologize for jumping at you when you
were giving a perfectly valid Solution ( though I still see some issues coming
up with the https-uris... , but hey, here's where the Fun(tm) begins: hooking
your own REs into Regexp::Common :-> )

I don't feel jumped on :) but from the docs regarding https:

$RE{URI}{HTTP}{-scheme}

If -scheme => P is specified the pattern P is used as the
scheme. By default P is qr/http/. https and https? are
reasonable alternatives.

altering this value could also allow a match of "www.dssd.com", but
then you're starting to get to such a generic regex it would open you
up to a lot of false positives.

-jp
 
A

Andreas Puerzer

DJ Stunks schrieb:

[previous discussion about matching ambigous URIs with Regexp::Common snipped]
I don't feel jumped on :) but from the docs regarding https:

$RE{URI}{HTTP}{-scheme}

If -scheme => P is specified the pattern P is used as the
scheme. By default P is qr/http/. https and https? are
reasonable alternatives.

altering this value could also allow a match of "www.dssd.com", but
then you're starting to get to such a generic regex it would open you
up to a lot of false positives.

-jp

Aaaargh, Shame on me! I don't know why I missed that part of the description, as
it is the second sentence in the pod!!

<blush>
Now, where's my BOfH-Excuse-Generator? Ah, I see, it's due to the Phases of the
Jupiter-moons and the reversed magnetic field of the Sun on a Wednesday the 15th
in a non-leap-year that I overlooked these pretty obvious sentences...
</blush>
;->

Thanks for pointing this out,
Andreas Pürzer
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top