Regex for <option> ... </option>

Discussion in 'Perl Misc' started by John, Jan 23, 2009.

  1. John

    John Guest

    Hi

    I have a <option ... </option> list where I need to extract the contents.

    The following should work.

    if ($option_list =~ m|(option[\x00-\xff]+option)|) {$w=$1}; print $w;

    But doesn't. Any ideas?

    Regards
    John
     
    John, Jan 23, 2009
    #1
    1. Advertising

  2. John

    Steve Roscio Guest

    Hi John -

    It works for me, Perl 5.10.0, if you want to include the beginning
    "option>" and the closing "</option" partial parts. But I don't think
    that's exactly what you want. You might find this kind of regex more
    suited:

    <TAG\b[^>]*>(.*?)</TAG>

    The above won't handle nested tags. If you expect that what you're
    matching will split across lines (as it probably will), include the 's'
    modifier.

    $string =~ m|<option\b[^>]*>(.*?)</option>|si

    And as always, there's more than one way to do it.

    - Steve
     
    Steve Roscio, Jan 23, 2009
    #2
    1. Advertising

  3. John

    Jim Gibson Guest

    In article <gld0hn$8h8$>, John <>
    wrote:

    > Hi
    >
    > I have a <option ... </option> list where I need to extract the contents.
    >
    > The following should work.
    >
    > if ($option_list =~ m|(option[\x00-\xff]+option)|) {$w=$1}; print $w;
    >
    > But doesn't. Any ideas?


    It works for me. Perhaps you have some characters in your string that
    do not fall in the range \x00-\xff. Try '.+?' instead.

    If you are serious about parsing HTML, you shouldn't be using regular
    expressions. There are simply too many variations and special cases
    possible. You should be using a real parser, such as HTML::parser,
    instead.

    --
    Jim Gibson
     
    Jim Gibson, Jan 23, 2009
    #3
  4. John

    Guest

    On Fri, 23 Jan 2009 13:30:48 -0800, Jim Gibson <> wrote:

    >In article <gld0hn$8h8$>, John <>
    >wrote:
    >
    >> Hi
    >>
    >> I have a <option ... </option> list where I need to extract the contents.
    >>
    >> The following should work.
    >>
    >> if ($option_list =~ m|(option[\x00-\xff]+option)|) {$w=$1}; print $w;
    >>
    >> But doesn't. Any ideas?

    >
    >It works for me. Perhaps you have some characters in your string that
    >do not fall in the range \x00-\xff. Try '.+?' instead.
    >
    >If you are serious about parsing HTML, you shouldn't be using regular
    >expressions. There are simply too many variations and special cases
    >possible. You should be using a real parser, such as HTML::parser,
    >instead.


    I am a little more than worried that you keep saying all the time, in
    fact, continiuously, "can't" and "regex" when it comes to parsing markup.

    Markup and 'C' are somehow magically compatible when it comes to parsing,
    but Regular Expressions aren't somehow ???

    XHTML/XML/SGML and HTML standards are now and have been for quite some time
    defined with REGULAR ESPRESSIONS exclusively. Maybe you should consult a
    doctor about your dimentia !

    http://www.w3.org/TR/xml11/#NT-AttValue
     
    , Jan 28, 2009
    #4
  5. <> wrote:

    > XHTML/XML/SGML and HTML standards are now and have been for quite some time
    > defined with REGULAR ESPRESSIONS exclusively.



    No they haven't.

    They are defined with a context free grammar, not a regular grammar.


    --
    Tad McClellan
    email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"
     
    Tad J McClellan, Jan 29, 2009
    #5
  6. John

    Guest

    On Wed, 28 Jan 2009 18:00:09 -0600, Tad J McClellan <> wrote:

    > <> wrote:
    >
    >> XHTML/XML/SGML and HTML standards are now and have been for quite some time
    >> defined with REGULAR ESPRESSIONS exclusively.

    >
    >
    >No they haven't.
    >
    >They are defined with a context free grammar, not a regular grammar.


    Right, the syntax of data, including all markup extracticing data, is defined with Regular Expressions,
    not what you do with the data in any assumption loaded with stupidity !!! Easily fixed with a
    below-average IQ.

    sln
     
    , Jan 29, 2009
    #6
  7. On 2009-01-29, Tad J McClellan <> wrote:
    > <> wrote:
    >
    >> XHTML/XML/SGML and HTML standards are now and have been for quite some time
    >> defined with REGULAR ESPRESSIONS exclusively.

    >
    >
    > No they haven't.
    >
    > They are defined with a context free grammar, not a regular grammar.


    And they aren't standards anyway. Those are specifications.


    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
     
    Eric Pozharski, Jan 29, 2009
    #7
  8. John

    Tim McDaniel Guest

    In article <>,
    Eric Pozharski <> wrote:
    >On 2009-01-29, Tad J McClellan <> wrote:
    >> <> wrote:
    >>
    >>> XHTML/XML/SGML and HTML standards are now and have been for quite
    >>> some time defined with REGULAR ESPRESSIONS exclusively.

    >>
    >> No they haven't.
    >>
    >> They are defined with a context free grammar, not a regular
    >> grammar.

    >
    >And they aren't standards anyway. Those are specifications.


    What criteria distinguish a standard from a specification?
    (That's not an accusation or implying that you're wrong;
    I'm interested in definitions.)

    --
    Tim McDaniel,
     
    Tim McDaniel, Jan 29, 2009
    #8
  9. On 2009-01-29 16:27, Tim McDaniel <> wrote:
    > In article <>,
    > Eric Pozharski <> wrote:
    >>On 2009-01-29, Tad J McClellan <> wrote:
    >>> <> wrote:
    >>>> XHTML/XML/SGML and HTML standards are now and have been for quite
    >>>> some time defined with REGULAR ESPRESSIONS exclusively.
    >>>
    >>> No they haven't.
    >>>
    >>> They are defined with a context free grammar, not a regular
    >>> grammar.

    >>
    >>And they aren't standards anyway. Those are specifications.

    >
    > What criteria distinguish a standard from a specification?
    > (That's not an accusation or implying that you're wrong;
    > I'm interested in definitions.)


    AFAICS there is no semantic difference. Some organisations (mostly the
    national and international standardization organizations, like ANSI,
    DIN, ├ľNORM, ISO, but also some industry consortiums like IEEE, ECMA, ITU)
    deem themselves to be important enough to call their specifications
    "standards", and others are more modest (the W3C for example, only
    issues "recommendations").

    So SGML is an "ISO standard", but XML is a "W3C recommendation".

    hp
     
    Peter J. Holzer, Jan 29, 2009
    #9
  10. Regex for standard and specification (was: Regex for <option> ...</option>)

    On 2009-01-29, Tim McDaniel <> wrote:
    > In article <>,
    > Eric Pozharski <> wrote:
    >>On 2009-01-29, Tad J McClellan <> wrote:
    >>> <> wrote:
    >>>
    >>>> XHTML/XML/SGML and HTML standards are now and have been for quite
    >>>> some time defined with REGULAR ESPRESSIONS exclusively.
    >>>
    >>> No they haven't.
    >>>
    >>> They are defined with a context free grammar, not a regular
    >>> grammar.

    >>
    >>And they aren't standards anyway. Those are specifications.

    >
    > What criteria distinguish a standard from a specification?
    > (That's not an accusation or implying that you're wrong;
    > I'm interested in definitions.)


    I'm noway wrong, I've checked -- all of them XHTML, XML, SGML, and HTML
    "papers" call itself "Specification". That can be a template issue
    though.

    Anyway, that's what I have on hands. I've asked WordNet dictionary:

    standard
    n 1: a basis for comparison; a reference point against which
    other things can be evaluated; "they set the measure for
    all subsequent work" [syn: {criterion}, {measure},
    {touchstone}]

    specification
    n 1: a detailed description of design criteria for a piece of
    work [syn: {spec}]

    After reading about these (there're some more, missing in copy-paste) my
    sole understanding of difference is: "standard" is a "paper" that
    pictures what status quo is, while "specification" is a "paper" about
    subject missing in present. Me lacks historic vision of some issues
    obviously.

    p.s. In misguided attempt to mess everything up again: criminal law is
    a standard, while Ten Ammendments are spec.


    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
     
    Eric Pozharski, Jan 29, 2009
    #10
  11. On 2009-01-29, Eric Pozharski <> wrote:
    > On 2009-01-29, Tad J McClellan <> wrote:
    >> <> wrote:
    >>
    >>> XHTML/XML/SGML and HTML standards are now and have been for quite some time
    >>> defined with REGULAR ESPRESSIONS exclusively.

    >>
    >>
    >> No they haven't.
    >>
    >> They are defined with a context free grammar, not a regular grammar.

    >
    > And they aren't standards anyway. Those are specifications.


    I'm wrong, again (lack of curiosity issue). Just checked -- SGML is the
    only standard, any else (among mentioned) are specifications.


    --
    Torvalds' goal for Linux is very simple: World Domination
    Stallman's goal for GNU is even simpler: Freedom
     
    Eric Pozharski, Jan 29, 2009
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Julien ROUZIERES

    g++ -pg option and -shared option

    Julien ROUZIERES, Dec 21, 2004, in forum: C++
    Replies:
    1
    Views:
    773
    GianGuz
    Dec 21, 2004
  2. Cas
    Replies:
    5
    Views:
    817
    Kevin Jones
    Aug 28, 2006
  3. Kevin Blount

    page.aspx?option - how to detect "option"

    Kevin Blount, Nov 28, 2006, in forum: ASP .Net
    Replies:
    6
    Views:
    628
    =?Utf-8?B?RWVyYWo=?=
    Nov 28, 2006
  4. Replies:
    3
    Views:
    822
    Reedick, Andrew
    Jul 1, 2008
  5. Mark Kolber
    Replies:
    4
    Views:
    365
Loading...

Share This Page