T
Tuxedo
Can anyone suggest a solution to enclose bare urls with href tags?
open(my $fh, 'urls.txt') or die $!;
while (my $line = <$fh>) {
$line =~ s[...] # match http or https instances
[...]s; # replace with enclosing hrefs
print $line;
}
The input format may be one or more URLs p/line.
Each scheme begins with either http:// or https:// but not necessarily as a
first string on a line.
Each URL ends with either the end of a line or a whitespace.
The input file would look like for example:
---------- urls.txt -------
http://www.example.com/hello
http://www.example.com/
bla https://www.example.com/a_page.htm plus a string not part of the URL
-----------
If an http or https string already has a preceding occurrence of a closing
html tag ">", such as:
<a href=http://bla.com>http://bla.com</a>
.... then it should be excluded with no replacement.
Two conditions exist in the input file:
The 'http' or 'https' bit will always begin at the first character on a new
line or have a preceding whitespace immediately before itself, like:
http://someurl.com line w/ whitespace before
http://someother.com
hello http://bla.com also w/ a whitespace before
The match and replace output on the above three lines would then be:
<a href=http://someurl.com>http://someurl.com</a> line w/ whitespace before
<a href=http://someother.com>http://someother.com</a>
hello <a href=http://bla.com>http://bla.com</a> also w/ a whitespace before
In case something may written as http://bla, which as in this sentence
isn't a link, it would inadvertently end up being converted into a link,
but that would be a rare occurrence. In other words, without additional
validity checking, the regex would be a best-guess procedure. For a more
strict procedure, each match could perhaps be checked against a
is_web_uri($...) function using Data::Validate::URI that validates http or
https URIs specifically. That said, any example that illustrates a basic
search and replace concept be much appreciated, even if it's only a
best-guess URL type of procedure.
Many thanks for any bright ideas!
Tuxedo
open(my $fh, 'urls.txt') or die $!;
while (my $line = <$fh>) {
$line =~ s[...] # match http or https instances
[...]s; # replace with enclosing hrefs
print $line;
}
The input format may be one or more URLs p/line.
Each scheme begins with either http:// or https:// but not necessarily as a
first string on a line.
Each URL ends with either the end of a line or a whitespace.
The input file would look like for example:
---------- urls.txt -------
http://www.example.com/hello
http://www.example.com/
bla https://www.example.com/a_page.htm plus a string not part of the URL
-----------
If an http or https string already has a preceding occurrence of a closing
html tag ">", such as:
<a href=http://bla.com>http://bla.com</a>
.... then it should be excluded with no replacement.
Two conditions exist in the input file:
The 'http' or 'https' bit will always begin at the first character on a new
line or have a preceding whitespace immediately before itself, like:
http://someurl.com line w/ whitespace before
http://someother.com
hello http://bla.com also w/ a whitespace before
The match and replace output on the above three lines would then be:
<a href=http://someurl.com>http://someurl.com</a> line w/ whitespace before
<a href=http://someother.com>http://someother.com</a>
hello <a href=http://bla.com>http://bla.com</a> also w/ a whitespace before
In case something may written as http://bla, which as in this sentence
isn't a link, it would inadvertently end up being converted into a link,
but that would be a rare occurrence. In other words, without additional
validity checking, the regex would be a best-guess procedure. For a more
strict procedure, each match could perhaps be checked against a
is_web_uri($...) function using Data::Validate::URI that validates http or
https URIs specifically. That said, any example that illustrates a basic
search and replace concept be much appreciated, even if it's only a
best-guess URL type of procedure.
Many thanks for any bright ideas!
Tuxedo