Help needed with reg exp please

A

Aristotle

Could you please help me out with regular expressions. I'm trying to
write a perl script that proccesses some text, and i'm stuck at the
following:

need to remove from the text below all words starting and ending with
lower case letters. Words maybe followed by dot "." or not (most do),
and may contain a "-" character:


eg:

---> Apis calc. Carb-v. cham. dendr-pol. halia-lac. hep. lac-leo. lyc.
Med. nat-m. nit-ac. nux-v. OPIUM plat. polys. PULS. rauw. sal-fr.
Sanguis-s sil. sulph. Tarent. tung-met. VERAT. viol-o vio-zinc.
zinc-c.

should yield:

---> Apis Carb-v. Med. OPIUM PULS. Sanguis-s Tarent. VERAT.


ie words starting with a capital letter must remain untouched.


I've tried various combinations of reg exp before posting here, but
could not find the right one.
I'd really appreciate your help.
 
G

Gunnar Hjalmarsson

Aristotle said:
Could you please help me out with regular expressions.

I've tried various combinations of reg exp before posting here,

Show us!

And consult e.g. "perldoc perlrequick", if you haven't done so already.
 
A

Aristotle

Aristotle said:
I've tried various combinations of reg exp before posting here,
Gunnar said:

I managed to get the desired effect by using the following code; it
gets the job done, but it looks ugly:

{
$parts[1] =~ s/ ([a-z]+[a-z]) / /g;
$parts[1] =~ s/ ([a-z]+[a-z])./ /g;
$parts[1] =~ s/ ([a-z]+[a-z]) / /g;
$parts[1] =~ s/ ([a-z]+[a-z])./ /g;
$parts[1] =~ s/ ([a-z]) / /g;
$parts[1] =~ s/ ([a-z])./ /g;
}

However that was after trying MANY, MANY exps, eg:

$parts[1] =~ s/([a-z]+[a-z]\.)//g;
$parts[1] =~ s/([a-z]*[a-z]\.)//g;
$parts[1] =~ s/([a-z][a-z]+\-[a-z]\.)//g;
$parts[1] =~ s/([a-z][a-z]+\-.[a-z])//g;
$parts[1] =~ s/([a-z][a-z]+[a-z])//g;

I'm no expert, i did what i could...
If you think you can help, please do so without questioning me.
 
G

Gunnar Hjalmarsson

Aristotle said:
Gunnar said:

I managed to get the desired effect by using the following code; it
gets the job done, but it looks ugly:

{
$parts[1] =~ s/ ([a-z]+[a-z]) / /g;
$parts[1] =~ s/ ([a-z]+[a-z])./ /g;
$parts[1] =~ s/ ([a-z]+[a-z]) / /g;
$parts[1] =~ s/ ([a-z]+[a-z])./ /g;
$parts[1] =~ s/ ([a-z]) / /g;
$parts[1] =~ s/ ([a-z])./ /g;
}

However that was after trying MANY, MANY exps, eg:

$parts[1] =~ s/([a-z]+[a-z]\.)//g;
$parts[1] =~ s/([a-z]*[a-z]\.)//g;
$parts[1] =~ s/([a-z][a-z]+\-[a-z]\.)//g;
$parts[1] =~ s/([a-z][a-z]+\-.[a-z])//g;
$parts[1] =~ s/([a-z][a-z]+[a-z])//g;

I'm no expert, i did what i could...
If you think you can help, please do so without questioning me.

There are all too many lazy people who have no real interest in
learning Perl, and who believe that groups like this one are just free
help desks. I asked you to prove that you are not one of those by
posting code. You need to live with that, whatever you call it, or
else few people are willing to assist.

Anyway, this is one way to do it with one substitution:

s/\s+[a-z][-\w]*[a-z]\.?//g;
 
G

Gunnar Hjalmarsson

Gunnar said:
Anyway, this is one way to do it with one substitution:

s/\s+[a-z][-\w]*[a-z]\.?//g;

Should better be:

s/\s*[a-z][-\w]*[a-z]\.?//g;
--------^
 

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top