Help needed with reg exp please

Discussion in 'Perl' started by Aristotle, Sep 4, 2004.

  1. Aristotle

    Aristotle Guest

    Could you please help me out with regular expressions. I'm trying to
    write a perl script that proccesses some text, and i'm stuck at the
    following:

    need to remove from the text below all words starting and ending with
    lower case letters. Words maybe followed by dot "." or not (most do),
    and may contain a "-" character:


    eg:

    ---> Apis calc. Carb-v. cham. dendr-pol. halia-lac. hep. lac-leo. lyc.
    Med. nat-m. nit-ac. nux-v. OPIUM plat. polys. PULS. rauw. sal-fr.
    Sanguis-s sil. sulph. Tarent. tung-met. VERAT. viol-o vio-zinc.
    zinc-c.

    should yield:

    ---> Apis Carb-v. Med. OPIUM PULS. Sanguis-s Tarent. VERAT.


    ie words starting with a capital letter must remain untouched.


    I've tried various combinations of reg exp before posting here, but
    could not find the right one.
    I'd really appreciate your help.
    Aristotle, Sep 4, 2004
    #1
    1. Advertising

  2. Aristotle wrote:
    > Could you please help me out with regular expressions.


    <snip>

    > I've tried various combinations of reg exp before posting here,


    Show us!

    And consult e.g. "perldoc perlrequick", if you haven't done so already.

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Sep 4, 2004
    #2
    1. Advertising

  3. Aristotle

    Aristotle Guest

    Aristotle wrote:
    > I've tried various combinations of reg exp before posting here,

    Gunnar Hjalmarsson wrote:
    >Show us!


    I managed to get the desired effect by using the following code; it
    gets the job done, but it looks ugly:

    {
    $parts[1] =~ s/ ([a-z]+[a-z]) / /g;
    $parts[1] =~ s/ ([a-z]+[a-z])./ /g;
    $parts[1] =~ s/ ([a-z]+[a-z]) / /g;
    $parts[1] =~ s/ ([a-z]+[a-z])./ /g;
    $parts[1] =~ s/ ([a-z]) / /g;
    $parts[1] =~ s/ ([a-z])./ /g;
    }

    However that was after trying MANY, MANY exps, eg:

    $parts[1] =~ s/([a-z]+[a-z]\.)//g;
    $parts[1] =~ s/([a-z]*[a-z]\.)//g;
    $parts[1] =~ s/([a-z][a-z]+\-[a-z]\.)//g;
    $parts[1] =~ s/([a-z][a-z]+\-.[a-z])//g;
    $parts[1] =~ s/([a-z][a-z]+[a-z])//g;

    I'm no expert, i did what i could...
    If you think you can help, please do so without questioning me.
    Aristotle, Sep 4, 2004
    #3
  4. Aristotle wrote:
    > Gunnar Hjalmarsson wrote:
    >> Aristotle wrote:
    >>> need to remove from the text below all words starting and
    >>> ending with lower case letters. Words maybe followed by dot "."
    >>> or not (most do), and may contain a "-" character:
    >>>
    >>> eg:
    >>>
    >>> ---> Apis calc. Carb-v. cham. dendr-pol. halia-lac. hep.
    >>> lac-leo. lyc. Med. nat-m. nit-ac. nux-v. OPIUM plat. polys.
    >>> PULS. rauw. sal-fr. Sanguis-s sil. sulph. Tarent. tung-met.
    >>> VERAT. viol-o vio-zinc. zinc-c.
    >>>
    >>> should yield:
    >>>
    >>> ---> Apis Carb-v. Med. OPIUM PULS. Sanguis-s Tarent. VERAT.
    >>>
    >>> ie words starting with a capital letter must remain untouched.
    >>>
    >>> I've tried various combinations of reg exp before posting here,

    >>
    >> Show us!

    >
    > I managed to get the desired effect by using the following code; it
    > gets the job done, but it looks ugly:
    >
    > {
    > $parts[1] =~ s/ ([a-z]+[a-z]) / /g;
    > $parts[1] =~ s/ ([a-z]+[a-z])./ /g;
    > $parts[1] =~ s/ ([a-z]+[a-z]) / /g;
    > $parts[1] =~ s/ ([a-z]+[a-z])./ /g;
    > $parts[1] =~ s/ ([a-z]) / /g;
    > $parts[1] =~ s/ ([a-z])./ /g;
    > }
    >
    > However that was after trying MANY, MANY exps, eg:
    >
    > $parts[1] =~ s/([a-z]+[a-z]\.)//g;
    > $parts[1] =~ s/([a-z]*[a-z]\.)//g;
    > $parts[1] =~ s/([a-z][a-z]+\-[a-z]\.)//g;
    > $parts[1] =~ s/([a-z][a-z]+\-.[a-z])//g;
    > $parts[1] =~ s/([a-z][a-z]+[a-z])//g;
    >
    > I'm no expert, i did what i could...
    > If you think you can help, please do so without questioning me.


    There are all too many lazy people who have no real interest in
    learning Perl, and who believe that groups like this one are just free
    help desks. I asked you to prove that you are not one of those by
    posting code. You need to live with that, whatever you call it, or
    else few people are willing to assist.

    Anyway, this is one way to do it with one substitution:

    s/\s+[a-z][-\w]*[a-z]\.?//g;

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Sep 4, 2004
    #4
  5. Gunnar Hjalmarsson wrote:
    > Anyway, this is one way to do it with one substitution:
    >
    > s/\s+[a-z][-\w]*[a-z]\.?//g;


    Should better be:

    s/\s*[a-z][-\w]*[a-z]\.?//g;
    --------^

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Sep 4, 2004
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. PerlE

    Reg Exp Help

    PerlE, Jan 30, 2004, in forum: Perl
    Replies:
    0
    Views:
    496
    PerlE
    Jan 30, 2004
  2. GrelEns

    help with cr in reg exp...

    GrelEns, Jan 17, 2004, in forum: Python
    Replies:
    1
    Views:
    292
    Peter Otten
    Jan 17, 2004
  3. Jim

    reg exp help

    Jim, Jul 27, 2004, in forum: Python
    Replies:
    5
    Views:
    368
    Christopher T King
    Jul 27, 2004
  4. aekalman
    Replies:
    6
    Views:
    124
    Ben Morrow
    Nov 22, 2004
  5. Oliver Meister

    reg exp: helping hand needed

    Oliver Meister, Nov 20, 2006, in forum: Perl Misc
    Replies:
    4
    Views:
    345
    Oliver Meister
    Nov 20, 2006
Loading...

Share This Page