regular expression

Discussion in 'Perl Misc' started by dj, Jul 2, 2003.

  1. dj

    dj Guest

    Hi,
    I am writing a script that parses an html file (which has been retrieved as
    a scalar by LWP::UserAgent). The script looks for everything in between the
    first <P> tag and the last </P> tag, with any number of <P> and </P> tags in
    between. I am sure I have done something like this before, but for the life
    of me I can't remember how... (maybe i did it before in lex). Anyone got
    any neato suggestions?

    Thanks for any help,
    Drew
     
    dj, Jul 2, 2003
    #1
    1. Advertising

  2. on Tuesday 01 July 2003 07:18 pm, dj <> wrote
    in <3f02410e$0$30568$>:

    > Hi,
    > I am writing a script that parses an html file (which has been retrieved
    > as
    > a scalar by LWP::UserAgent). The script looks for everything in between
    > the first <P> tag and the last </P> tag, with any number of <P> and </P>
    > tags in
    > between. I am sure I have done something like this before, but for the
    > life
    > of me I can't remember how... (maybe i did it before in lex). Anyone
    > got any neato suggestions?


    Are you looking for //s ? It makes '.' match newlines, too. I'd probably
    do it like this (the 'i' to ignore case, as some people capitalize all
    tags and some don't):

    /<p>(.*)<\/p>/si


    --
    Nicholas Knight <>
     
    Nicholas Knight, Jul 2, 2003
    #2
    1. Advertising

  3. dj

    dj Guest

    Hi Nicholas,

    yep, i had something along these lines,

    while ($_ =~ s/.+<P>(.+)<\/P>.+/$1/gsi) {
    print;
    }

    but no sub occurs. I have tried a few combinations, but no match :)



    "Nicholas Knight" <> wrote in message
    news:bdthgf$10je22$...
    > on Tuesday 01 July 2003 07:18 pm, dj <> wrote
    > in <3f02410e$0$30568$>:
    >
    > > Hi,
    > > I am writing a script that parses an html file (which has been retrieved
    > > as
    > > a scalar by LWP::UserAgent). The script looks for everything in between
    > > the first <P> tag and the last </P> tag, with any number of <P> and </P>
    > > tags in
    > > between. I am sure I have done something like this before, but for the
    > > life
    > > of me I can't remember how... (maybe i did it before in lex). Anyone
    > > got any neato suggestions?

    >
    > Are you looking for //s ? It makes '.' match newlines, too. I'd probably
    > do it like this (the 'i' to ignore case, as some people capitalize all
    > tags and some don't):
    >
    > /<p>(.*)<\/p>/si
    >
    >
    > --
    > Nicholas Knight <>
     
    dj, Jul 2, 2003
    #3
  4. On 02 Jul 2003 03:02:47 GMT,
    Martien Verbruggen <> wrote:

    > A _very_ simpleminded approach could do this:
    >
    > my ($stuff) = /<P>(.*)</P>/i;


    Addition: You also need the /s flag to match newlines. But again, I
    wouldn't use it.

    Martien
    --
    |
    Martien Verbruggen | If at first you don't succeed, try again.
    Trading Post Australia | Then quit; there's no use being a damn fool
    | about it.
     
    Martien Verbruggen, Jul 2, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Keith-Earl
    Replies:
    1
    Views:
    457
    Mary Chipman
    Jun 15, 2004
  2. VSK
    Replies:
    2
    Views:
    2,307
  3. =?iso-8859-1?B?bW9vcJk=?=

    Matching abitrary expression in a regular expression

    =?iso-8859-1?B?bW9vcJk=?=, Dec 1, 2005, in forum: Java
    Replies:
    8
    Views:
    851
    Alan Moore
    Dec 2, 2005
  4. GIMME
    Replies:
    3
    Views:
    11,974
    vforvikash
    Dec 29, 2008
  5. Noman Shapiro
    Replies:
    0
    Views:
    235
    Noman Shapiro
    Jul 17, 2013
Loading...

Share This Page