Reformatting file - basic question

P

per

I have very limited knowledge about Perl (used to know the basics, but
that is a while ago) and hope someone can help with advice.

I need to reformat a text file with several segments as listed below.
The reformatting consists of (a) extracting a string (cats, dogs,
fish, etc. in the example below), and copy them into the "CATEGORIES:"
part of the text. The source file is an exported blog, and have
several entries, so this procedure will be repeated several times.

*** This is what I have (only showing the relevant segment for one
entry) ***

CATEGORIES:

DATE: 03/20/2007 04:13:00 PM
-----
BODY:
text here text here some urls as well more text.

Labels: <a rel='tag' href="http://blogname.wordpress.com/tag/
cats">cats</a>,
<a rel='tag' href="http://blogname.wordpress.com/tag/dogs">dogs</a>,
<a rel='tag' href="http://blogname.wordpress.com/tag/fish">fish</a>,
<a rel='tag' href="http://blogname.wordpress.com/tag/
chameleons">chameleons</a>

*** And this is how it should look when finished ***

CATEGORIES: cats
CATEGORIES: dogs
CATEGORIES: fish
CATEGORIES: chameleons

DATE: 03/20/2007 04:13:00 PM
-----
BODY:
text here text here some urls as well more text.

Labels: <a rel='tag' href="http://blogname.wordpress.com/tag/
cats">cats</a>,
<a rel='tag' href="http://blogname.wordpress.com/tag/dogs">dogs</a>,
<a rel='tag' href="http://blogname.wordpress.com/tag/fish">fish</a>,
<a rel='tag' href="http://blogname.wordpress.com/tag/
chameleons">chameleons</a>

..............................

I initially tried using BK ReplaceEm, but it seems to not be the right
application for this task. Someone at a Regular Expression forum
suggested I use this Perl script instead:

perl -0777pe 's{
^CATEGORIES:\s*(.*?Labels:\s*)
((?:<a[^>]*>(.*?)(?{$x .="CATEGORIES: $3\n"})</a>(,\s)?)*)
}{$x\n$1$2}msx
' mydoc.txt

This should do the job, but I need help with making it work for me.

So the question is: What needs to be added?

I have Active Perl installed on Windows XP, and I need to read from
one text file, and output the result to another (copying the whole
file, and adding the labels to where it says "CATEGORIES:") The
source file has a series of entries like what is shown above, so the
Perl script needs to do this for each one.

Any help is greatly appreciated!

Per
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,763
Messages
2,569,563
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top