Perl is the answer?

A

Andries

Hello there,

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a><br>
<a href="hs80.htm#hartkleppen" target="topic"></a><br>
<a href="hs80.htm#hartvolume" target="topic"></a><br>
<a href="hs80.htm#hemoglobine" target="topic"></a><br>
<a href="hs80.htm#heteroseksueel " target="topic"></a><br>
<a href="hs80.htm#hijgen" target="topic"></a><br>
<a href="hs80.htm#histamine" target="topic"></a><br>
--------------------------------------------------------------------------------------
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

If so I still have a problem can anyone tell me how?


TIA
Andries Meijer
 
P

Paul Lalli

Hello there,

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a><br>
<a href="hs80.htm#hartkleppen" target="topic"></a><br>
<a href="hs80.htm#hartvolume" target="topic"></a><br>
<a href="hs80.htm#hemoglobine" target="topic"></a><br>
<a href="hs80.htm#heteroseksueel " target="topic"></a><br>
<a href="hs80.htm#hijgen" target="topic"></a><br>
<a href="hs80.htm#histamine" target="topic"></a><br>
--------------------------------------------------------------------------------------
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

If so I still have a problem can anyone tell me how?

Are these the only lines in the file, or is this an actual HTML file? If
these are the only lines in the entire file, this is (relatively) simple:

perl -pi.bkp -e 's/#(.*?)"([^>]+?)>/#$1"$2>$1/' file.html

The i.bkp piece of that will make a backup copy of your file, just in case
it didn't do exactly what you wanted.

If you have an actual HTML file, I would suggest visiting CPAN and looking
at the various HTML Parsing modules. http://search.cpan.org

Paul Lalli
 
G

Glenn Jackman

Andries said:
Hello there,

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a><br>
<a href="hs80.htm#hartkleppen" target="topic"></a><br>
<a href="hs80.htm#hartvolume" target="topic"></a><br>
<a href="hs80.htm#hemoglobine" target="topic"></a><br>
<a href="hs80.htm#heteroseksueel " target="topic"></a><br>
<a href="hs80.htm#hijgen" target="topic"></a><br>
<a href="hs80.htm#histamine" target="topic"></a><br>


perl -pi -we 's{#(.*?)(".*>)(</a>)}{#$1$2$1$3}' file.html ...
 
L

Lack Mr G M

|>
|> I hope someone can help me.
|> This is my problem:
|> I have a list of thousands and thousands of the next lines:
|> ----------------------------------------------------------------------
|>...
|> <a href="hs80.htm#hartkleppen" target="topic"></a><br>
|> --------------------------------------------------------------------------------------
|> I need to copy the word between the # and " and put it after the > and
|> </a>
|>
|> It can done by hand like the first line but it can be automated with a
|> perl script isn't it?

An emacs keyboard macro is a possibility too.

|> If so I still have a problem can anyone tell me how?

Well, on the first line you showed (the one you had edited) you'd lost
a space before target=.
 
J

Jim Cochrane

Hello there,

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a><br>
<a href="hs80.htm#hartkleppen" target="topic"></a><br>
<a href="hs80.htm#hartvolume" target="topic"></a><br>
<a href="hs80.htm#hemoglobine" target="topic"></a><br>
<a href="hs80.htm#heteroseksueel " target="topic"></a><br>
<a href="hs80.htm#hijgen" target="topic"></a><br>
<a href="hs80.htm#histamine" target="topic"></a><br>
--------------------------------------------------------------------------------------
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

If so I still have a problem can anyone tell me how?


TIA
Andries Meijer

Here's one way to do it:

#!/usr/bin/perl -n

use strict;
use warnings;

my ($part1, $part2, $part3, $part4) = /([^#]*#)([^"]*)("[^>]*><\/a>)(.*)/;
if ($part4) {
print "${part1}${part2}${part3}${part2}${part4}\n";
} else {
print
}
 
J

Julia deSilva

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?


Can be done very simply in any programming language, but if you are only
using it once or twice I'd just use a macro in a good text viewer. I like
www.textpad.com but Word could be used.
 
M

Matt Garrish

Jim Cochrane said:
Hello there,

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a><br>
<a href="hs80.htm#hartkleppen" target="topic"></a><br>
<a href="hs80.htm#hartvolume" target="topic"></a><br>
<a href="hs80.htm#hemoglobine" target="topic"></a><br>
<a href="hs80.htm#heteroseksueel " target="topic"></a><br>
<a href="hs80.htm#hijgen" target="topic"></a><br>
<a href="hs80.htm#histamine" target="topic"></a><br>
-------------------------------------------------------------------------- ------------
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

If so I still have a problem can anyone tell me how?


TIA
Andries Meijer

Here's one way to do it:

#!/usr/bin/perl -n

use strict;
use warnings;

my ($part1, $part2, $part3, $part4) = /([^#]*#)([^"]*)("[^>]*><\/a>)(.*)/;
if ($part4) {
print "${part1}${part2}${part3}${part2}${part4}\n";
} else {
print
}

Ick! Why not just in one line?

while (my $line = <DATA>) {

$line =~ s/(#([^"]+)"[^>]*>)/$1$2/;

print $line;

}

__DATA__
<a href="hs80.htm#hartkleppen" target="topic"></a><br>
<a href="hs80.htm#hartvolume" target="topic"></a><br>
<a href="hs80.htm#hemoglobine" target="topic"></a><br>
<a href="hs80.htm#heteroseksueel " target="topic"></a><br>
<a href="hs80.htm#hijgen" target="topic"></a><br>
<a href="hs80.htm#histamine" target="topic"></a><br>
 
A

axel

Julia deSilva said:
Can be done very simply in any programming language, but if you are only
using it once or twice I'd just use a macro in a good text viewer. I like
www.textpad.com but Word could be used.

Macros, Word... shudder! that would probably take longer than creating
and running a simple perl script!

But the one line perl scripts can easily be adapted to the vi editor...
assuming all the lines in the file are of the same format as the
examples, the following command will achieve the same and edit the
entire file:

:% s/\(#\)\(.*\)\(" .*ic">\)\(<.*\)/\1\2\3\2\4/

But seriously there is an advantage in running a short script over
thousands of lines of input as it is easy to alter if it does not work
quite as expected and it can also be used to check for anomalies in the
data... just in case.

Axel
 
J

Jim Cochrane

Jim Cochrane said:
Hello there,

I hope someone can help me.
This is my problem:
I have a list of thousands and thousands of the next lines:
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a><br>
<a href="hs80.htm#hartkleppen" target="topic"></a><br>
<a href="hs80.htm#hartvolume" target="topic"></a><br>
<a href="hs80.htm#hemoglobine" target="topic"></a><br>
<a href="hs80.htm#heteroseksueel " target="topic"></a><br>
<a href="hs80.htm#hijgen" target="topic"></a><br>
<a href="hs80.htm#histamine" target="topic"></a><br>
-------------------------------------------------------------------------- ------------
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

If so I still have a problem can anyone tell me how?


TIA
Andries Meijer

Here's one way to do it:

#!/usr/bin/perl -n

use strict;
use warnings;

my ($part1, $part2, $part3, $part4) = /([^#]*#)([^"]*)("[^>]*><\/a>)(.*)/;
if ($part4) {
print "${part1}${part2}${part3}${part2}${part4}\n";
} else {
print
}

Ick! Why not just in one line?

while (my $line = <DATA>) {

$line =~ s/(#([^"]+)"[^>]*>)/$1$2/;

print $line;

}

__DATA__
<a href="hs80.htm#hartkleppen" target="topic"></a><br>
<a href="hs80.htm#hartvolume" target="topic"></a><br>
<a href="hs80.htm#hemoglobine" target="topic"></a><br>
<a href="hs80.htm#heteroseksueel " target="topic"></a><br>
<a href="hs80.htm#hijgen" target="topic"></a><br>
<a href="hs80.htm#histamine" target="topic"></a><br>

Hmm - there are two differences between our versions; apparently we read
the "spec" differently. Your version copies the captured string before
the </a> and mine puts it after it. Probably the OP meant: "put it
after the > and before the </a>" instead of "put it after the > and </a>",
so your version does what is apparently expected.

But the other difference is that mine just prints the anomalous first line
as is, without modification. Yours changes it into:

<a
href="hs80.htm#halveringstijd"target="topic">halveringstijdhalveringstijd</a><br>

, which I think is not what he wants.

Of course, rereading the OP, it looks like you understood his English
better than I did - It appears the 1st line was not part of the input.
If the file indeed has no anomalies, your version is more appropriate.
With anomalies, mine is probably more appropriate, with the fix:

#!/usr/bin/perl -n

use strict;
use warnings;

my ($part1, $part2, $part3, $part4) = /([^#]*#)([^"]*)("[^>]*>)(<\/a>.*)/;
if ($part4) {
print "${part1}${part2}${part3}${part2}${part4}\n";
} else {
print
}

Of course, you're right that mine is more wordy that necessary. Here's a
shorter version:

#!/usr/bin/perl -p

use strict;
use warnings;

s/([^#]*#)([^"]*)("[^>]*>)(<\/a>.*)/$1$2$3$2$4/;


It uses substitution. like your version, but it also, like the original
version, attempts to leave "anomalous" lines unchanged.
 
B

Brad Baxter

Perl isn't the answer. Perl is the question. Yes! is the answer.


Sorry, I couldn't resist.

Regards,

Brad
 
J

Jürgen Exner

Andries said:
I have a list of thousands and thousands of the next lines:

Where are those lines? In one file?
Scattered in hundreds or thousands of files?
In an array? On your computer monitor as output of some SQL command?

The best approach very much depends upon _how_ you can access those lines?
----------------------------------------------------------------------
<a href="hs80.htm#halveringstijd"target="topic">halveringstijd</a><br>
<a href="hs80.htm#hartkleppen" target="topic"></a><br>
<a href="hs80.htm#hartvolume" target="topic"></a><br>
<a href="hs80.htm#hemoglobine" target="topic"></a><br>
<a href="hs80.htm#heteroseksueel " target="topic"></a><br>
<a href="hs80.htm#hijgen" target="topic"></a><br>
<a href="hs80.htm#histamine" target="topic"></a><br>
-------------------------------------------------------------------------- ------------
I need to copy the word between the # and " and put it after the > and
</a>

It can done by hand like the first line but it can be automated with a
perl script isn't it?

If your data is as simple and as regular as stated above and if all your
"lines" are actually the content in one file without much junk around them
(these are a lot of "ifs" but I have a strong feeling your data might
actually be arranged like that), then writing a Perl script for this simple
task is overkill.
Just use the substitute-regexp or replace-regexp function in any decend text
editor and you are done faster then writing even this single-line Perl
script.

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,596
Members
45,128
Latest member
ElwoodPhil
Top