Regex for <option> ... </option>

J

John

Hi

I have a <option ... </option> list where I need to extract the contents.

The following should work.

if ($option_list =~ m|(option[\x00-\xff]+option)|) {$w=$1}; print $w;

But doesn't. Any ideas?

Regards
John
 
S

Steve Roscio

Hi John -

It works for me, Perl 5.10.0, if you want to include the beginning
"option>" and the closing "</option" partial parts. But I don't think
that's exactly what you want. You might find this kind of regex more
suited:

<TAG\b[^>]*>(.*?)</TAG>

The above won't handle nested tags. If you expect that what you're
matching will split across lines (as it probably will), include the 's'
modifier.

$string =~ m|<option\b[^>]*>(.*?)</option>|si

And as always, there's more than one way to do it.

- Steve
 
J

Jim Gibson

John <[email protected]> said:
Hi

I have a <option ... </option> list where I need to extract the contents.

The following should work.

if ($option_list =~ m|(option[\x00-\xff]+option)|) {$w=$1}; print $w;

But doesn't. Any ideas?

It works for me. Perhaps you have some characters in your string that
do not fall in the range \x00-\xff. Try '.+?' instead.

If you are serious about parsing HTML, you shouldn't be using regular
expressions. There are simply too many variations and special cases
possible. You should be using a real parser, such as HTML::parser,
instead.
 
S

sln

John <[email protected]> said:
Hi

I have a <option ... </option> list where I need to extract the contents.

The following should work.

if ($option_list =~ m|(option[\x00-\xff]+option)|) {$w=$1}; print $w;

But doesn't. Any ideas?

It works for me. Perhaps you have some characters in your string that
do not fall in the range \x00-\xff. Try '.+?' instead.

If you are serious about parsing HTML, you shouldn't be using regular
expressions. There are simply too many variations and special cases
possible. You should be using a real parser, such as HTML::parser,
instead.

I am a little more than worried that you keep saying all the time, in
fact, continiuously, "can't" and "regex" when it comes to parsing markup.

Markup and 'C' are somehow magically compatible when it comes to parsing,
but Regular Expressions aren't somehow ???

XHTML/XML/SGML and HTML standards are now and have been for quite some time
defined with REGULAR ESPRESSIONS exclusively. Maybe you should consult a
doctor about your dimentia !

http://www.w3.org/TR/xml11/#NT-AttValue
 
T

Tad J McClellan

XHTML/XML/SGML and HTML standards are now and have been for quite some time
defined with REGULAR ESPRESSIONS exclusively.


No they haven't.

They are defined with a context free grammar, not a regular grammar.
 
S

sln

No they haven't.

They are defined with a context free grammar, not a regular grammar.

Right, the syntax of data, including all markup extracticing data, is defined with Regular Expressions,
not what you do with the data in any assumption loaded with stupidity !!! Easily fixed with a
below-average IQ.

sln
 
T

Tim McDaniel

And they aren't standards anyway. Those are specifications.

What criteria distinguish a standard from a specification?
(That's not an accusation or implying that you're wrong;
I'm interested in definitions.)
 
P

Peter J. Holzer

What criteria distinguish a standard from a specification?
(That's not an accusation or implying that you're wrong;
I'm interested in definitions.)

AFAICS there is no semantic difference. Some organisations (mostly the
national and international standardization organizations, like ANSI,
DIN, ÖNORM, ISO, but also some industry consortiums like IEEE, ECMA, ITU)
deem themselves to be important enough to call their specifications
"standards", and others are more modest (the W3C for example, only
issues "recommendations").

So SGML is an "ISO standard", but XML is a "W3C recommendation".

hp
 
E

Eric Pozharski

What criteria distinguish a standard from a specification?
(That's not an accusation or implying that you're wrong;
I'm interested in definitions.)

I'm noway wrong, I've checked -- all of them XHTML, XML, SGML, and HTML
"papers" call itself "Specification". That can be a template issue
though.

Anyway, that's what I have on hands. I've asked WordNet dictionary:

standard
n 1: a basis for comparison; a reference point against which
other things can be evaluated; "they set the measure for
all subsequent work" [syn: {criterion}, {measure},
{touchstone}]

specification
n 1: a detailed description of design criteria for a piece of
work [syn: {spec}]

After reading about these (there're some more, missing in copy-paste) my
sole understanding of difference is: "standard" is a "paper" that
pictures what status quo is, while "specification" is a "paper" about
subject missing in present. Me lacks historic vision of some issues
obviously.

p.s. In misguided attempt to mess everything up again: criminal law is
a standard, while Ten Ammendments are spec.
 
E

Eric Pozharski

And they aren't standards anyway. Those are specifications.

I'm wrong, again (lack of curiosity issue). Just checked -- SGML is the
only standard, any else (among mentioned) are specifications.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,565
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top