Regular Expression Question

Börni · Jan 8, 2009

Hi

This is probably very easy, but I don't get it.

Example:
#!perl -w
use strict;

my $string = '<meta name="Keywords" content="" lang="fr">';

my ($keywords) = $string =~ /.*?meta name="Keywords".*?content="(.*?)">/;

print "[$keywords]\n";
exit 0;

In the Example above I'd expect $keywords to be empty. Instead it is ["
lang="fr].

What is the correct expression to match everything
<meta name="Keywords" content="-->IN HERE<--" lang="fr">
even when it's empty?

Regards Bernard

Tim Greer · Jan 8, 2009

Börni said:
Hi

This is probably very easy, but I don't get it.

Example:
#!perl -w
use strict;

my $string = '<meta name="Keywords" content="" lang="fr">';

my ($keywords) = $string =~ /.*?meta
name="Keywords".*?content="(.*?)">/;

print "[$keywords]\n";
exit 0;

In the Example above I'd expect $keywords to be empty. Instead it is
[" lang="fr].

What is the correct expression to match everything
<meta name="Keywords" content="-->IN HERE<--" lang="fr">
even when it's empty?

Regards Bernard

In your above code, it is doing exactly what it should. Using your
current example, make the following change:

my ($keywords) = $string =~ /^.*?meta
name="Keywords".*?content="([^"]*)"/;

That will take zero or more characters in content="" and anything from
the opening double quote to the closing double quote, which is not a
double quote itself, will be what $keywords is. You could probably
just write that as: my ($keywords) = $string
=~ /^.*?content="([^"]*)"/; if that's what you want to stick with.
Notice I've added the start of the string with ^ in my examples. If
it's not going to be the start of the string in real code, just adjust
accordingly.

Charlton Wilbur · Jan 8, 2009

B> Hi This is probably very easy, but I don't get it.

That's because you're using regular expressions to parse HTML.

You will save yourself considerable pain if you use a parser, such as
HTML:

arser, to parse HTML.

Charlton

Börni · Jan 9, 2009

Thank you very much for your help everybody! (Of course my problem was the
">" character)

Tim Greer · Jan 9, 2009

Börni said:
Thank you very much for your help everybody! (Of course my problem was
the ">" character)

(top posting fixed)

Actually, the problem wasn't the ">" character. The problem was that
the match went all the way to the last character, which happened to be
the > character. The actual problem was that it was grabbing
everything from the content's opening double quote content=" (.*?) all
the way to ending ">, which happened to be " lang="fr.

Tim McDaniel · Jan 9, 2009

(top posting fixed)

Actually, the problem wasn't the ">" character. The problem was that
the match went all the way to the last character, which happened to be
the > character. The actual problem was that it was grabbing
everything from the content's opening double quote content=" (.*?) all
the way to ending ">, which happened to be " lang="fr.

No, he's right: the problem was that '>' was in the regexp.
.*?
is non-greedy matching. If the terminal '>' had not been in the
regexp, it would have stopped at the second ".

Tim Greer · Jan 9, 2009

Tim said:
No, he's right: the problem was that '>' was in the regexp.
.*?
is non-greedy matching. If the terminal '>' had not been in the
regexp, it would have stopped at the second ".

I suppose it's just a matter of wording it. I read it as the OP meaning
it was the character, rather than the formatting of the regex and the
location of it. I just think the preferable way would be to match with
([^"]*), but I suppose it's up to the individual.

Regular expression for BOM required	6	Jan 12, 2013
Why doesn't the function get called?	1	Nov 20, 2023
Regular expression 'c' modifier	4	Nov 24, 2013
Odd regular expression behaviour	5	Nov 6, 2008
How do I get the text that is found by a regular expression?	10	Apr 30, 2014
Recursion regular expression (xtended)	1	Aug 16, 2010
FAQ 6.20 What good is "\G" in a regular expression?	0	Mar 3, 2011
Perl newbie regular expression usage question/help	10	Nov 11, 2007

Regular Expression Question

Börni

Tim Greer

Charlton Wilbur

Börni

Tim Greer

Tim McDaniel

Tim Greer

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads