Perl is the answer?

Andries · Apr 19, 2004

Hello there,

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a> 
<a href="hs80.htm#hartkleppen" target="topic"></a> 
<a href="hs80.htm#hartvolume" target="topic"></a> 
<a href="hs80.htm#hemoglobine" target="topic"></a> 
<a href="hs80.htm#heteroseksueel " target="topic"></a> 
<a href="hs80.htm#hijgen" target="topic"></a> 
<a href="hs80.htm#histamine" target="topic"></a> 
--------------------------------------------------------------------------------------
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

If so I still have a problem can anyone tell me how?

TIA
Andries Meijer

Paul Lalli · Apr 19, 2004

Hello there,

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a> 
<a href="hs80.htm#hartkleppen" target="topic"></a> 
<a href="hs80.htm#hartvolume" target="topic"></a> 
<a href="hs80.htm#hemoglobine" target="topic"></a> 
<a href="hs80.htm#heteroseksueel " target="topic"></a> 
<a href="hs80.htm#hijgen" target="topic"></a> 
<a href="hs80.htm#histamine" target="topic"></a> 
--------------------------------------------------------------------------------------
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

If so I still have a problem can anyone tell me how?

Are these the only lines in the file, or is this an actual HTML file? If
these are the only lines in the entire file, this is (relatively) simple:

perl -pi.bkp -e 's/#(.*?)"([^>]+?)>/#$1"$2>$1/' file.html

The i.bkp piece of that will make a backup copy of your file, just in case
it didn't do exactly what you wanted.

If you have an actual HTML file, I would suggest visiting CPAN and looking
at the various HTML Parsing modules. http://search.cpan.org

Paul Lalli

Glenn Jackman · Apr 19, 2004

Andries said:
Hello there,

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a> 
<a href="hs80.htm#hartkleppen" target="topic"></a> 
<a href="hs80.htm#hartvolume" target="topic"></a> 
<a href="hs80.htm#hemoglobine" target="topic"></a> 
<a href="hs80.htm#heteroseksueel " target="topic"></a> 
<a href="hs80.htm#hijgen" target="topic"></a> 
<a href="hs80.htm#histamine" target="topic"></a>

perl -pi -we 's{#(.*?)(".*>)(</a>)}{#$1$2$1$3}' file.html ...

Lack Mr G M · Apr 19, 2004

|>
|> I hope someone can help me.
|> This is my problem:
|> I have a list of thousands and thousands of the next lines:
|> ----------------------------------------------------------------------
|>...
|> <a href="hs80.htm#hartkleppen" target="topic"></a> 
|> --------------------------------------------------------------------------------------
|> I need to copy the word between the # and " and put it after the > and
|> </a>
|>
|> It can done by hand like the first line but it can be automated with a
|> perl script isn't it?

An emacs keyboard macro is a possibility too.

|> If so I still have a problem can anyone tell me how?

Well, on the first line you showed (the one you had edited) you'd lost
a space before target=.

Jim Cochrane · Apr 19, 2004

Hello there,

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a> 
<a href="hs80.htm#hartkleppen" target="topic"></a> 
<a href="hs80.htm#hartvolume" target="topic"></a> 
<a href="hs80.htm#hemoglobine" target="topic"></a> 
<a href="hs80.htm#heteroseksueel " target="topic"></a> 
<a href="hs80.htm#hijgen" target="topic"></a> 
<a href="hs80.htm#histamine" target="topic"></a> 
--------------------------------------------------------------------------------------
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

If so I still have a problem can anyone tell me how?

TIA
Andries Meijer

Here's one way to do it:

#!/usr/bin/perl -n

use strict;
use warnings;

my ($part1, $part2, $part3, $part4) = /([^#]*#)([^"]*)("[^>]*><\/a>)(.*)/;
if ($part4) {
print "${part1}${part2}${part3}${part2}${part4}\n";
} else {
print
}

Julia deSilva · Apr 19, 2004

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:

I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

Can be done very simply in any programming language, but if you are only
using it once or twice I'd just use a macro in a good text viewer. I like
www.textpad.com but Word could be used.

Matt Garrish · Apr 19, 2004

Jim Cochrane said:
Hello there,

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a> 
<a href="hs80.htm#hartkleppen" target="topic"></a> 
<a href="hs80.htm#hartvolume" target="topic"></a> 
<a href="hs80.htm#hemoglobine" target="topic"></a> 
<a href="hs80.htm#heteroseksueel " target="topic"></a> 
<a href="hs80.htm#hijgen" target="topic"></a> 
<a href="hs80.htm#histamine" target="topic"></a> 
-------------------------------------------------------------------------- ------------
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

If so I still have a problem can anyone tell me how?

TIA
Andries Meijer

Click to expand...

Here's one way to do it:

#!/usr/bin/perl -n

use strict;
use warnings;

my ($part1, $part2, $part3, $part4) = /([^#]*#)([^"]*)("[^>]*><\/a>)(.*)/;
if ($part4) {
print "${part1}${part2}${part3}${part2}${part4}\n";
} else {
print
}

Ick! Why not just in one line?

while (my $line = <DATA>) {

$line =~ s/(#([^"]+)"[^>]*>)/$1$2/;

print $line;

}

__DATA__
<a href="hs80.htm#hartkleppen" target="topic"></a> 
<a href="hs80.htm#hartvolume" target="topic"></a> 
<a href="hs80.htm#hemoglobine" target="topic"></a> 
<a href="hs80.htm#heteroseksueel " target="topic"></a> 
<a href="hs80.htm#hijgen" target="topic"></a> 
<a href="hs80.htm#histamine" target="topic"></a>

axel · Apr 19, 2004

Julia deSilva said:
Can be done very simply in any programming language, but if you are only
using it once or twice I'd just use a macro in a good text viewer. I like
www.textpad.com but Word could be used.

Macros, Word... shudder! that would probably take longer than creating
and running a simple perl script!

But the one line perl scripts can easily be adapted to the vi editor...
assuming all the lines in the file are of the same format as the
examples, the following command will achieve the same and edit the
entire file:

:% s/$#$$.*$$" .*ic">$$<.*$/\1\2\3\2\4/

But seriously there is an advantage in running a short script over
thousands of lines of input as it is easy to alter if it does not work
quite as expected and it can also be used to check for anomalies in the
data... just in case.

Axel

Jim Cochrane · Apr 20, 2004

Jim Cochrane said:
Jim Cochrane said:

Hello there,

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a> 
<a href="hs80.htm#hartkleppen" target="topic"></a> 
<a href="hs80.htm#hartvolume" target="topic"></a> 
<a href="hs80.htm#hemoglobine" target="topic"></a> 
<a href="hs80.htm#heteroseksueel " target="topic"></a> 
<a href="hs80.htm#hijgen" target="topic"></a> 
<a href="hs80.htm#histamine" target="topic"></a> 
-------------------------------------------------------------------------- ------------
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

If so I still have a problem can anyone tell me how?

TIA
Andries Meijer

Click to expand...

Here's one way to do it:

#!/usr/bin/perl -n

use strict;
use warnings;

my ($part1, $part2, $part3, $part4) = /([^#]*#)([^"]*)("[^>]*><\/a>)(.*)/;
if ($part4) {
print "${part1}${part2}${part3}${part2}${part4}\n";
} else {
print
}

Click to expand...

Ick! Why not just in one line?

while (my $line = <DATA>) {

$line =~ s/(#([^"]+)"[^>]*>)/$1$2/;

print $line;

}

__DATA__
<a href="hs80.htm#hartkleppen" target="topic"></a> 
<a href="hs80.htm#hartvolume" target="topic"></a> 
<a href="hs80.htm#hemoglobine" target="topic"></a> 
<a href="hs80.htm#heteroseksueel " target="topic"></a> 
<a href="hs80.htm#hijgen" target="topic"></a> 
<a href="hs80.htm#histamine" target="topic"></a>

Hmm - there are two differences between our versions; apparently we read
the "spec" differently. Your version copies the captured string before
the </a> and mine puts it after it. Probably the OP meant: "put it
after the > and before the </a>" instead of "put it after the > and </a>",
so your version does what is apparently expected.

But the other difference is that mine just prints the anomalous first line
as is, without modification. Yours changes it into:

<a
href="hs80.htm#halveringstijd"target="topic">halveringstijdhalveringstijd</a> 

, which I think is not what he wants.

Of course, rereading the OP, it looks like you understood his English
better than I did - It appears the 1st line was not part of the input.
If the file indeed has no anomalies, your version is more appropriate.
With anomalies, mine is probably more appropriate, with the fix:

#!/usr/bin/perl -n

use strict;
use warnings;

my ($part1, $part2, $part3, $part4) = /([^#]*#)([^"]*)("[^>]*>)(<\/a>.*)/;
if ($part4) {
print "${part1}${part2}${part3}${part2}${part4}\n";
} else {
print
}

Of course, you're right that mine is more wordy that necessary. Here's a
shorter version:

#!/usr/bin/perl -p

use strict;
use warnings;

s/([^#]*#)([^"]*)("[^>]*>)(<\/a>.*)/$1$2$3$2$4/;

It uses substitution. like your version, but it also, like the original
version, attempts to leave "anomalous" lines unchanged.

Brad Baxter · Apr 20, 2004

Perl isn't the answer. Perl is the question. Yes! is the answer.

Sorry, I couldn't resist.

Regards,

Brad

George Kinley · Apr 20, 2004

I bet, OP Never tried it himself ,
helping him like this will be spoon feeding,

Jürgen Exner · Apr 20, 2004

Andries said:
I have a list of thousands and thousands of the next lines:

Where are those lines? In one file?
Scattered in hundreds or thousands of files?
In an array? On your computer monitor as output of some SQL command?

The best approach very much depends upon _how_ you can access those lines?

----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a> 
<a href="hs80.htm#hartkleppen" target="topic"></a> 
<a href="hs80.htm#hartvolume" target="topic"></a> 
<a href="hs80.htm#hemoglobine" target="topic"></a> 
<a href="hs80.htm#heteroseksueel " target="topic"></a> 
<a href="hs80.htm#hijgen" target="topic"></a> 
<a href="hs80.htm#histamine" target="topic"></a> 
-------------------------------------------------------------------------- ------------
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

If your data is as simple and as regular as stated above and if all your
"lines" are actually the content in one file without much junk around them
(these are a lot of "ifs" but I have a strong feeling your data might
actually be arranged like that), then writing a Perl script for this simple
task is overkill.
Just use the substitute-regexp or replace-regexp function in any decend text
editor and you are done faster then writing even this single-line Perl
script.

jue

This must be done with perl isn't it?	3	Apr 19, 2004
HCaptcha - How to stop page from refreshing on submit if captcha is not checked/validated	1	Aug 29, 2023
Help with code	0	Jun 12, 2022
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
Is it possible an iframe can overlapp another?	3	Apr 20, 2022
Song requests	4	Aug 16, 2023
Find and count strings of text from multiple files	17	Dec 16, 2021
Different font sizes inside same div	2	Dec 3, 2023

Perl is the answer?

Andries

Paul Lalli

Glenn Jackman

Lack Mr G M

Jim Cochrane

Julia deSilva

Matt Garrish

axel

Jim Cochrane

Brad Baxter

George Kinley

Jürgen Exner

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads