multiline regular expression, is it possible?

L

Leif Wessman

I have a variable $myregexp with the following regexp (multiline):

<tr><td>
id: (\d{5}[AX])
<\/td><\/tr>

I also have another variable $results that has some html, like this:

<table>
<tr><td>
id: 45434X
</td></tr>
<tr><td>
id: 95434A
</td></tr>
</table>

In php I'm doing the following:

preg_match_all("/$myregexp/", $results, $matches);

But I get an error. Why is this?

Leif
 
G

Gunnar Hjalmarsson

Leif said:
I have a variable $myregexp with the following regexp (multiline):

<tr><td>
id: (\d{5}[AX])
<\/td><\/tr>

I also have another variable $results that has some html, like this:

<table>
<tr><td>
id: 45434X
</td></tr>
<tr><td>
id: 95434A
</td></tr>
</table>

In php I'm doing the following:

preg_match_all("/$myregexp/", $results, $matches);

But I get an error. Why is this?

Have no idea. But in Perl (this is a Perl group, you know) you can do:

@matches = $results =~ /$myregexp/g;
 
E

Eric J. Roode

(e-mail address removed) (Leif Wessman) wrote in
In php I'm doing the following:

preg_match_all("/$myregexp/", $results, $matches);

But I get an error. Why is this?

Perhaps you could ask in a PHP newsgroup? :)
 
L

Leif Wessman

Yes, maby I could. But I have more faith in Perl programmers... And
regexp work almost the same in the two languages...

Leif
 
B

Ben Morrow

[please don't top-post]

Yes, maby I could. But I have more faith in Perl programmers... And
regexp work almost the same in the two languages...

In that case, let us translate your script into Perl:

#!/usr/bin/perl -l

use warnings;
use strict;

my $myregexp = <<RE;
<tr><td>
id: (\\d{5}[AX])
<\\/td><\\/tr>
RE

my $results = <<RES;
<table>
<tr><td>
id: 45434X
</td></tr>
<tr><td>
id: 95434A
</td></tr>
</table>
RES

print join ", ", $results =~ /$myregexp/g;

__END__

Worksforme.

~% ./php
45434X, 95434A
~%

Now, what was your Perl problem?

Ben
 
B

Bart Lateur

Leif said:
In php I'm doing the following:

preg_match_all("/$myregexp/", $results, $matches);

But I get an error. Why is this?

Because PHP is stupid.

You may not like that answer, but you have to make sure the interpolated
string looks like a proper regexp, complete with slashes. For this
simple case, this implies that there must be backslashes in front of
every slash in the string.

Even though it's not intended for that purpose -- but PHP is a very
hackish language anyway -- you can try using addslashes() on the regexp
before wrapping slashes around it. It would also escape backslashes,
which is a plus.

Otherwise, you could use alternative delimiters on the regexp, something
not in the string, for example "!":

preg_match_all("!$myregexp!", $results, $matches); # untested


Perl programmers will still be shocked when they realize what is going
on here, but it's the best one can do on such a braindead language,
except for writing a very elaborate library to get the level of
smartness Perl has all by itself.
 
E

Eric J. Roode

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

(e-mail address removed) (Leif Wessman) wrote in
Yes, maby I could. But I have more faith in Perl programmers... And
regexp work almost the same in the two languages...

"Almost" probably isn't good enough. From a Perl point of view, there's
nothing wrong with your regular expression. I don't know PHP, so I don't
know if there's some difference in how the two languages do regular
expressions which is causing your problem. If I did, I'd probably hang out
in a PHP newsgroup. Assuming there are PHP newsgroups.

- --
Eric
$_ = reverse sort $ /. r , qw p ekca lre uJ reh
ts p , map $ _. $ " , qw e p h tona e and print

-----BEGIN PGP SIGNATURE-----
Version: PGPfreeware 7.0.3 for non-commercial use <http://www.pgp.com>

iQA/AwUBP5+sDmPeouIeTNHoEQKqcQCfa1D+EU7ob1n5umUPSD7zpbKeWpIAninN
7sO1DiowGsyAEFV/rrFf43cf
=mlVK
-----END PGP SIGNATURE-----
 
Z

Zachary Kent

"Almost" probably isn't good enough. From a Perl point of view, there's
nothing wrong with your regular expression. I don't know PHP, so I don't
know if there's some difference in how the two languages do regular
expressions which is causing your problem. If I did, I'd probably hang out
in a PHP newsgroup. Assuming there are PHP newsgroups.

The idea behind "preg..." in PHP is to utilize perl regex in PHP. However,
I don't know if it is using perl to power the preg regex or just borrowing
perl's syntax for consistency.
 
B

Ben Morrow

Zachary Kent said:
The idea behind "preg..." in PHP is to utilize perl regex in PHP. However,
I don't know if it is using perl to power the preg regex or just borrowing
perl's syntax for consistency.

It uses libpcre, Perl-Compatible Regular Expressions.

An absolute life-saver when working with PHP :).

Ben
 
B

Bill

Ben Morrow said:
#!/usr/bin/perl -l

use warnings;
use strict;

my $myregexp = <<RE;
<tr><td>
id: (\\d{5}[AX])
<\\/td><\\/tr>
RE

my $results = <<RES;
<table>
<tr><td>
id: 45434X
</td></tr>
<tr><td>
id: 95434A
</td></tr>
</table>
RES

print join ", ", $results =~ /$myregexp/g;

__END__

Worksforme.

Is the space between id: and \d accounted for here? Is that version
specific and does it matter in PHP?
 
B

Ben Morrow

Ben Morrow said:
my $myregexp = <<RE;
<tr><td>
id: (\\d{5}[AX])
<\\/td><\\/tr>
RE

Is the space between id: and \d accounted for here?

If you mean 'is it required to be there for the regex to match' then,
since I didn't use the /x switch, yes, it is.
Is that version specific and does it matter in PHP?

It is not specific to a particular version of Perl[1]. With regard to
PHP, I suggest you consult the docs for libpcre.

Ben

[1] Leaving aside Perl6 for the moment... :)
 
B

Bill

If you mean 'is it required to be there for the regex to match' then,
since I didn't use the /x switch, yes, it is.

When parsing HTML pages, one problem I often have is failing to allow
for and match whitespace properly. So much so that it usually pays to
preprocess the whitespace variations out before doing the regex.

I guess that was not the PHP problem though. Never mind :).
 
A

Alan J. Flavell

When parsing HTML pages, one problem I often have is failing to allow
for and match whitespace properly.

I dare say that the problem you mention isn't nearly so great as the
fact that a regex is the wrong tool for parsing HTML.

It might be good enough for simplified HTML constructs that you've
carefully controlled yourself, but if you need to handle the full
range of (even) valid HTML that you'd get from other sources, you'd be
scuppered.

(And that's not starting on the truly vast amounts of "Sturgeon's Law
Evidence" that relies almost entirely on error fixup in browsers to
achieve the author's intentions. But I digress, probably.)
 
B

Bill

Alan J. Flavell said:
It might be good enough for simplified HTML constructs that you've
carefully controlled yourself, but if you need to handle the full
range of (even) valid HTML that you'd get from other sources, you'd be
scuppered.

(And that's not starting on the truly vast amounts of "Sturgeon's Law
Evidence" that relies almost entirely on error fixup in browsers to
achieve the author's intentions. But I digress, probably.)

Interesting...the last time I came across this was with parsing the
'clit' formatted pages of the ebook-to-html translator by that name
into a multipage, indexed format. I used HTML::TreeBuilder to parse
it, but to teach the script to more or less understand what is in the
book you still have to use regex on the actual tag and text
content--spaces, returns, escaped characters, tabs and all.
 
B

Bill

Bernard El-Hagin said:
(Bill) wrote:

[...]
Interesting...the last time I came across this was with parsing the
'clit' formatted pages
^^^^^^


Come on, admit it, you made that name up! ;-)

No, but I had to turn off the Internet filter on the PC to send the posting :>.

Google for 'clit HTML'
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top