multiline regular expression, is it possible?

Discussion in 'Perl Misc' started by Leif Wessman, Oct 28, 2003.

  1. Leif Wessman

    Leif Wessman Guest

    I have a variable $myregexp with the following regexp (multiline):

    <tr><td>
    id: (\d{5}[AX])
    <\/td><\/tr>

    I also have another variable $results that has some html, like this:

    <table>
    <tr><td>
    id: 45434X
    </td></tr>
    <tr><td>
    id: 95434A
    </td></tr>
    </table>

    In php I'm doing the following:

    preg_match_all("/$myregexp/", $results, $matches);

    But I get an error. Why is this?

    Leif
    Leif Wessman, Oct 28, 2003
    #1
    1. Advertising

  2. Leif Wessman wrote:
    > I have a variable $myregexp with the following regexp (multiline):
    >
    > <tr><td>
    > id: (\d{5}[AX])
    > <\/td><\/tr>
    >
    > I also have another variable $results that has some html, like this:
    >
    > <table>
    > <tr><td>
    > id: 45434X
    > </td></tr>
    > <tr><td>
    > id: 95434A
    > </td></tr>
    > </table>
    >
    > In php I'm doing the following:
    >
    > preg_match_all("/$myregexp/", $results, $matches);
    >
    > But I get an error. Why is this?


    Have no idea. But in Perl (this is a Perl group, you know) you can do:

    @matches = $results =~ /$myregexp/g;

    --
    Gunnar Hjalmarsson
    Email: http://www.gunnar.cc/cgi-bin/contact.pl
    Gunnar Hjalmarsson, Oct 28, 2003
    #2
    1. Advertising

  3. (Leif Wessman) wrote in
    news::

    > In php I'm doing the following:
    >
    > preg_match_all("/$myregexp/", $results, $matches);
    >
    > But I get an error. Why is this?


    Perhaps you could ask in a PHP newsgroup? :)

    --
    Eric
    $_ = reverse sort $ /. r , qw p ekca lre uJ reh
    ts p , map $ _. $ " , qw e p h tona e and print
    Eric J. Roode, Oct 28, 2003
    #3
  4. Leif Wessman

    Leif Wessman Guest

    Yes, maby I could. But I have more faith in Perl programmers... And
    regexp work almost the same in the two languages...

    Leif

    "Eric J. Roode" <> wrote in message news:<Xns9422BFB87EEB0sdn.comcast@216.196.97.136>...
    > (Leif Wessman) wrote in
    > news::
    >
    > > In php I'm doing the following:
    > >
    > > preg_match_all("/$myregexp/", $results, $matches);
    > >
    > > But I get an error. Why is this?

    >
    > Perhaps you could ask in a PHP newsgroup? :)
    Leif Wessman, Oct 29, 2003
    #4
  5. Leif Wessman

    Ben Morrow Guest

    [please don't top-post]

    (Leif Wessman) wrote:
    > "Eric J. Roode" <> wrote in message
    > news:<Xns9422BFB87EEB0sdn.comcast@216.196.97.136>...
    > > (Leif Wessman) wrote in
    > > news::
    > >
    > > > In php I'm doing the following:
    > > >
    > > > preg_match_all("/$myregexp/", $results, $matches);
    > > >
    > > > But I get an error. Why is this?

    > >
    > > Perhaps you could ask in a PHP newsgroup? :)

    >
    > Yes, maby I could. But I have more faith in Perl programmers... And
    > regexp work almost the same in the two languages...


    In that case, let us translate your script into Perl:

    #!/usr/bin/perl -l

    use warnings;
    use strict;

    my $myregexp = <<RE;
    <tr><td>
    id: (\\d{5}[AX])
    <\\/td><\\/tr>
    RE

    my $results = <<RES;
    <table>
    <tr><td>
    id: 45434X
    </td></tr>
    <tr><td>
    id: 95434A
    </td></tr>
    </table>
    RES

    print join ", ", $results =~ /$myregexp/g;

    __END__

    Worksforme.

    ~% ./php
    45434X, 95434A
    ~%

    Now, what was your Perl problem?

    Ben

    --
    "If a book is worth reading when you are six, *
    it is worth reading when you are sixty." - C.S.Lewis
    Ben Morrow, Oct 29, 2003
    #5
  6. Leif Wessman

    Bart Lateur Guest

    Leif Wessman wrote:

    >In php I'm doing the following:
    >
    >preg_match_all("/$myregexp/", $results, $matches);
    >
    >But I get an error. Why is this?


    Because PHP is stupid.

    You may not like that answer, but you have to make sure the interpolated
    string looks like a proper regexp, complete with slashes. For this
    simple case, this implies that there must be backslashes in front of
    every slash in the string.

    Even though it's not intended for that purpose -- but PHP is a very
    hackish language anyway -- you can try using addslashes() on the regexp
    before wrapping slashes around it. It would also escape backslashes,
    which is a plus.

    Otherwise, you could use alternative delimiters on the regexp, something
    not in the string, for example "!":

    preg_match_all("!$myregexp!", $results, $matches); # untested


    Perl programmers will still be shocked when they realize what is going
    on here, but it's the best one can do on such a braindead language,
    except for writing a very elaborate library to get the level of
    smartness Perl has all by itself.

    --
    Bart.
    Bart Lateur, Oct 29, 2003
    #6
  7. -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    (Leif Wessman) wrote in
    news::

    > Yes, maby I could. But I have more faith in Perl programmers... And
    > regexp work almost the same in the two languages...


    "Almost" probably isn't good enough. From a Perl point of view, there's
    nothing wrong with your regular expression. I don't know PHP, so I don't
    know if there's some difference in how the two languages do regular
    expressions which is causing your problem. If I did, I'd probably hang out
    in a PHP newsgroup. Assuming there are PHP newsgroups.

    - --
    Eric
    $_ = reverse sort $ /. r , qw p ekca lre uJ reh
    ts p , map $ _. $ " , qw e p h tona e and print

    -----BEGIN PGP SIGNATURE-----
    Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

    iQA/AwUBP5+sDmPeouIeTNHoEQKqcQCfa1D+EU7ob1n5umUPSD7zpbKeWpIAninN
    7sO1DiowGsyAEFV/rrFf43cf
    =mlVK
    -----END PGP SIGNATURE-----
    Eric J. Roode, Oct 29, 2003
    #7
  8. Leif Wessman

    Zachary Kent Guest

    "Eric J. Roode" <> wrote in message
    news:Xns942347700787Esdn.comcast@216.196.97.136...

    > "Almost" probably isn't good enough. From a Perl point of view, there's
    > nothing wrong with your regular expression. I don't know PHP, so I don't
    > know if there's some difference in how the two languages do regular
    > expressions which is causing your problem. If I did, I'd probably hang

    out
    > in a PHP newsgroup. Assuming there are PHP newsgroups.


    The idea behind "preg..." in PHP is to utilize perl regex in PHP. However,
    I don't know if it is using perl to power the preg regex or just borrowing
    perl's syntax for consistency.
    Zachary Kent, Oct 29, 2003
    #8
  9. Leif Wessman

    Ben Morrow Guest

    Re: [OT] pcre in PHP

    "Zachary Kent" <> wrote:
    >
    > "Eric J. Roode" <> wrote in message
    > news:Xns942347700787Esdn.comcast@216.196.97.136...
    >
    > > "Almost" probably isn't good enough. From a Perl point of view, there's
    > > nothing wrong with your regular expression. I don't know PHP, so I don't
    > > know if there's some difference in how the two languages do regular
    > > expressions which is causing your problem. If I did, I'd probably hang

    > out
    > > in a PHP newsgroup. Assuming there are PHP newsgroups.

    >
    > The idea behind "preg..." in PHP is to utilize perl regex in PHP. However,
    > I don't know if it is using perl to power the preg regex or just borrowing
    > perl's syntax for consistency.


    It uses libpcre, Perl-Compatible Regular Expressions.

    An absolute life-saver when working with PHP :).

    Ben

    --
    The cosmos, at best, is like a rubbish heap scattered at random.
    - Heraclitus
    Ben Morrow, Oct 29, 2003
    #9
  10. Leif Wessman

    Bill Guest

    Ben Morrow <> wrote in message news:<bno022$71s$>...
    > #!/usr/bin/perl -l
    >
    > use warnings;
    > use strict;
    >
    > my $myregexp = <<RE;
    > <tr><td>
    > id: (\\d{5}[AX])
    > <\\/td><\\/tr>
    > RE
    >
    > my $results = <<RES;
    > <table>
    > <tr><td>
    > id: 45434X
    > </td></tr>
    > <tr><td>
    > id: 95434A
    > </td></tr>
    > </table>
    > RES
    >
    > print join ", ", $results =~ /$myregexp/g;
    >
    > __END__
    >
    > Worksforme.


    Is the space between id: and \d accounted for here? Is that version
    specific and does it matter in PHP?
    Bill, Oct 29, 2003
    #10
  11. Leif Wessman

    Ben Morrow Guest

    (Bill) wrote:
    > Ben Morrow <> wrote in message
    > news:<bno022$71s$>...
    > > my $myregexp = <<RE;
    > > <tr><td>
    > > id: (\\d{5}[AX])
    > > <\\/td><\\/tr>
    > > RE

    >
    > Is the space between id: and \d accounted for here?


    If you mean 'is it required to be there for the regex to match' then,
    since I didn't use the /x switch, yes, it is.

    > Is that version specific and does it matter in PHP?


    It is not specific to a particular version of Perl[1]. With regard to
    PHP, I suggest you consult the docs for libpcre.

    Ben

    [1] Leaving aside Perl6 for the moment... :)

    --
    Like all men in Babylon I have been a proconsul; like all, a slave ... During
    one lunar year, I have been declared invisible; I shrieked and was not heard,
    I stole my bread and was not decapitated.
    ~ ~ Jorge Luis Borges, 'The Babylon Lottery'
    Ben Morrow, Oct 29, 2003
    #11
  12. Leif Wessman

    Bill Guest

    Ben Morrow <> wrote in message news:<bnou1b$k9a$>...


    > > Is the space between id: and \d accounted for here?

    >
    > If you mean 'is it required to be there for the regex to match' then,
    > since I didn't use the /x switch, yes, it is.


    When parsing HTML pages, one problem I often have is failing to allow
    for and match whitespace properly. So much so that it usually pays to
    preprocess the whitespace variations out before doing the regex.

    I guess that was not the PHP problem though. Never mind :).
    Bill, Oct 30, 2003
    #12
  13. On Thu, 29 Oct 2003, Bill wrote:

    > When parsing HTML pages, one problem I often have is failing to allow
    > for and match whitespace properly.


    I dare say that the problem you mention isn't nearly so great as the
    fact that a regex is the wrong tool for parsing HTML.

    It might be good enough for simplified HTML constructs that you've
    carefully controlled yourself, but if you need to handle the full
    range of (even) valid HTML that you'd get from other sources, you'd be
    scuppered.

    (And that's not starting on the truly vast amounts of "Sturgeon's Law
    Evidence" that relies almost entirely on error fixup in browsers to
    achieve the author's intentions. But I digress, probably.)
    Alan J. Flavell, Oct 30, 2003
    #13
  14. Leif Wessman

    Bill Guest

    "Alan J. Flavell" <> wrote in message news:<>...

    > It might be good enough for simplified HTML constructs that you've
    > carefully controlled yourself, but if you need to handle the full
    > range of (even) valid HTML that you'd get from other sources, you'd be
    > scuppered.
    >
    > (And that's not starting on the truly vast amounts of "Sturgeon's Law
    > Evidence" that relies almost entirely on error fixup in browsers to
    > achieve the author's intentions. But I digress, probably.)


    Interesting...the last time I came across this was with parsing the
    'clit' formatted pages of the ebook-to-html translator by that name
    into a multipage, indexed format. I used HTML::TreeBuilder to parse
    it, but to teach the script to more or less understand what is in the
    book you still have to use regex on the actual tag and text
    content--spaces, returns, escaped characters, tabs and all.
    Bill, Oct 30, 2003
    #14
  15. Leif Wessman

    Bill Guest

    "Bernard El-Hagin" <> wrote in message news:<Xns94246338DD8E1elhber1lidotechnet@62.89.127.66>...
    > (Bill) wrote:
    >
    > [...]
    >
    > > Interesting...the last time I came across this was with parsing the
    > > 'clit' formatted pages

    > ^^^^^^
    >
    >
    > Come on, admit it, you made that name up! ;-)


    No, but I had to turn off the Internet filter on the PC to send the posting :>.

    Google for 'clit HTML'
    Bill, Oct 30, 2003
    #15
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark
    Replies:
    3
    Views:
    9,941
  2. VSK
    Replies:
    2
    Views:
    2,272
  3. Zdenek Maxa

    multiline regular expression (replace)

    Zdenek Maxa, May 29, 2007, in forum: Python
    Replies:
    6
    Views:
    631
    Zdenek Maxa
    May 30, 2007
  4. wagswvu
    Replies:
    1
    Views:
    453
    wagswvu
    Jun 4, 2008
  5. dale zhang
    Replies:
    8
    Views:
    409
    Tintin
    Nov 30, 2004
Loading...

Share This Page