Need regexp to rejoin URL links broken by \n

Tony · Jun 22, 2005

Hi regular expression experts.

Can someone help me with a regular expression that removes \n's from
the middle of URL's?

I have an email inside a variable $A like so:

Hello, this is an
email which has
been formatted to
fit a narrow
column. Here is a
URL: http://test.
com/hello?test=op
tion1&test2=optio
n2. Thanks for
reading.

As you can see, the link has been wrapped into the column by a number
of \n's. Obviously, this means the link can't be clicked on.

I'd like to pass this through a regular expression that removes all the
\n's between http:\\ and the next dot followed by a space (that is:
'. ')

Hello, this is an
email which has
been formatted to
fit a narrow
column. Here is a
URL: http://test.com/hello?test=option1&test2=option2.
Thanks for
reading.

Any ideas?

Thank you!

(And are there any tools to help construct and test regular
expressions?)

Greg Bacon · Jun 22, 2005

: Can someone help me with a regular expression that removes \n's from
: the middle of URL's?
: [...]

My first thought was to suggest stripping all runs of whitespace and
feeding the result to URI::Find, but then I realized that you're
trying to reformat the message for human consumption.

Below is a cut at it:

$ cat try
#! /usr/local/bin/perl

use warnings;
use strict;

chomp(my $A = <<EOMessage);
Hello, this is an
email which has
been formatted to
fit a narrow
column. Here is a
URL: http://test.
com/hello?test=op
tion1&test2=optio
n2. Thanks for
reading.
EOMessage

$A =~ s!(http://.+?\.) !($a=$1) =~ tr/\n//d; "$a\n"!se;

print $A, "\n";

$ ./try
Hello, this is an
email which has
been formatted to
fit a narrow
column. Here is a
URL: http://test.com/hello?test=option1&test2=option2.
Thanks for
reading.

Using /\. / as a terminator strikes me as being *very* brittle, but
that only shows the truth of mjd's words: "Of course, this is a
heuristic, which is a fancy way of saying that it doesn't work."

Hope this helps,
Greg

Tad McClellan · Jun 22, 2005

Tony said:
I have an email inside a variable $A like so:

Hello, this is an
email which has
been formatted to
fit a narrow
column. Here is a
URL: http://test.
com/hello?test=op
tion1&test2=optio
n2. Thanks for
reading.

removes all the
\n's between http:\\ and the next dot followed by a space

Hello, this is an
email which has
been formatted to
fit a narrow
column. Here is a
URL: http://test.com/hello?test=option1&test2=option2.
Thanks for
reading.

$A =~ s{(http://.*?)\. }
{my $s=$1; $s=~tr/\n//d; "$s.\n"}egsi;

Martien Verbruggen · Jun 27, 2005

Can someone help me with a regular expression that removes \n's from
the middle of URL's?
[snip]

I'd like to pass this through a regular expression that removes all the
\n's between http:\\ and the next dot followed by a space (that is:
'. ')

While Tad's solution gives you that, it isn't going to be a solution
to your problem. The example text you showed can have URLs broken
without them following a space:

Here is another
URL: http://test.
com/hello?test=op
tion1&test2=optio
n2&test3=option3.
Thanks for reading.

Looking for ( |\n) following a full stop also won't work, as the first
full stop in that URL would signify the end of the URL. I can't really
think of a RE that would work in the generic case. You'd probably have
to build something that also validates that the URL is valid to get
closer.

Martien

Python-URL! - weekly Python news and links (Feb 18)	6	Feb 18, 2008
RegExp to validate an MVS dataset name	7	Feb 23, 2006
need help with a cart I inherited, need to increase number of total characters allowed	3	Oct 22, 2007
Dr. Dobb's Python-URL! - weekly Python news and links (Jul 17)	1	Jul 17, 2006
Dr. Dobb's Python-URL! - weekly Python news and links (Jun 29)	6	Jun 29, 2005
Dr. Dobb's Python-URL! - weekly Python news and links (Jul 12)	3	Jul 12, 2004
Dr. Dobb's Python-URL! - weekly Python news and links (Dec 25)	1	Dec 25, 2004
Dr. Dobb's Python-URL! - weekly Python news and links (Aug 31)	4	Aug 31, 2004

Need regexp to rejoin URL links broken by \n

Tony

Greg Bacon

Tad McClellan

Martien Verbruggen

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads