RE: xpath question

Discussion in 'Python' started by bruce, Jul 2, 2006.

  1. bruce

    bruce Guest

    hi

    is there anyone with XPath expertise here? i'm trying to figure out if
    there's a way to use regex expressions with an xpath query? i've seen
    references to the ability to use regex and xpath/xml, but i'm not sure how
    to do it...

    i have a situation where i have something like:
    /html/table/..../[@class='foo']

    is it possible to do soomething like [@class~=/fo/] so i'd match the class
    attribute with fo....

    i'm trying to parse HTML/Web docs...

    thanks

    -bruce
     
    bruce, Jul 2, 2006
    #1
    1. Advertising

  2. bruce

    Simon Forman Guest

    bruce wrote:
    > hi
    >
    > is there anyone with XPath expertise here? i'm trying to figure out if
    > there's a way to use regex expressions with an xpath query? i've seen
    > references to the ability to use regex and xpath/xml, but i'm not sure how
    > to do it...
    >
    > i have a situation where i have something like:
    > /html/table/..../[@class='foo']
    >
    > is it possible to do soomething like [@class~=/fo/] so i'd match the class
    > attribute with fo....
    >
    > i'm trying to parse HTML/Web docs...
    >
    > thanks
    >
    > -bruce


    I'll take this one...

    Dude, this is a *python* mailing list, not an xml/xpath/regex one. In
    addition, the regex syntax you're using above (~=/fo/) looks like
    *perl* code-- but I wouldn't know 'cause I don't use perl myself.

    Now it's entirely possible that there are *many* people here that are
    xml/xpath/regex Kung Fu Masters, *and* it's entirely possible that one
    or more of them are about to answer your question informatively and in
    exhaustive detail. It's also entirely possible that this is the most
    friendly and informative reply that you're going to get, here.


    Try a more appropriate newsgroup, and good luck.
     
    Simon Forman, Jul 2, 2006
    #2
    1. Advertising

  3. bruce

    Simon Forman Guest

    bruce wrote:
    > simon..
    >
    > you may not.. but lot's of people use python and xpath for html/xml
    > functionality.. check google "python xpath"...
    >
    > later..
    >

    ....
    > > i have a situation where i have something like:
    > > /html/table/..../[@class='foo']
    > >
    > > is it possible to do soomething like [@class~=/fo/] so i'd match the class
    > > attribute with fo....
    > >



    So I did some checking, starting with the google search you suggested,
    and I found out that lxml, 4Suite, and Amara (which is apparently based
    on 4Suite somehow) all seem to be capable of doing what you're talking
    about. I don't know how to do it with lxml, but I bet the people on
    the lxml mailing list would be happy to explain it to you. As for
    Amara and 4Suite I think it might be as simple as saying "Match(your
    regex here in python re module form)" in your Xpath statement..


    In the meantime, you could just use Xpath to extract a superset of the
    elements you're interested in and then filter them with a re.Match
    object.


    I avoid xml if I can help it... My new favorite HTML editor, however,
    is python and ElementTree...
     
    Simon Forman, Jul 3, 2006
    #3
  4. bruce

    Guest

    bruce wrote:
    > is there anyone with XPath expertise here? i'm trying to figure out if
    > there's a way to use regex expressions with an xpath query? i've seen
    > references to the ability to use regex and xpath/xml, but i'm not sure how
    > to do it...
    >
    > i have a situation where i have something like:
    > /html/table/..../[@class='foo']
    >
    > is it possible to do soomething like [@class~=/fo/] so i'd match the class
    > attribute with fo....
    >
    > i'm trying to parse HTML/Web docs...


    4Suite [1] supports regex in XPath using the EXSLT community standard's
    regex module [2]. It would be something like:

    [re:match(@class, 'fo.*']

    With the re prefix set as required by the EXSLT module.

    [1] http://4Suite.org
    [2] http://www.exslt.org/regexp/

    --
    Uche Ogbuji Fourthought, Inc.
    http://uche.ogbuji.net http://fourthought.com
    http://copia.ogbuji.net http://4Suite.org
    Articles: http://uche.ogbuji.net/tech/publications/
     
    , Jul 3, 2006
    #4
  5. bruce

    Simon Forman Guest

    wrote:
    > bruce wrote:
    > > is there anyone with XPath expertise here? i'm trying to figure out if
    > > there's a way to use regex expressions with an xpath query? i've seen
    > > references to the ability to use regex and xpath/xml, but i'm not sure how
    > > to do it...
    > >
    > > i have a situation where i have something like:
    > > /html/table/..../[@class='foo']
    > >
    > > is it possible to do soomething like [@class~=/fo/] so i'd match the class
    > > attribute with fo....
    > >
    > > i'm trying to parse HTML/Web docs...

    >
    > 4Suite [1] supports regex in XPath using the EXSLT community standard's
    > regex module [2]. It would be something like:
    >
    > [re:match(@class, 'fo.*']
    >
    > With the re prefix set as required by the EXSLT module.
    >
    > [1] http://4Suite.org
    > [2] http://www.exslt.org/regexp/
    >
    > --
    > Uche Ogbuji Fourthought, Inc.
    > http://uche.ogbuji.net http://fourthought.com
    > http://copia.ogbuji.net http://4Suite.org
    > Articles: http://uche.ogbuji.net/tech/publications/


    Well shut my mouth! There *is* an xml/xpath python Guru here.

    *sigh* Sorry Bruce, (and everybody else on this newsgroup) I apologize
    for mouthing off and not contributing to a greater signal-to-noise
    ratio.

    I guess that should teach me not to post so quickly when I'm in a bad
    mood. I'll do better in the future.


    Peace,
    ~Simon
     
    Simon Forman, Jul 3, 2006
    #5
  6. wrote:
    > bruce wrote:
    >> is there anyone with XPath expertise here? i'm trying to figure out if
    >> there's a way to use regex expressions with an xpath query? i've seen
    >> references to the ability to use regex and xpath/xml, but i'm not sure how
    >> to do it...
    >>
    >> i have a situation where i have something like:
    >> /html/table/..../[@class='foo']
    >>
    >> is it possible to do soomething like [@class~=/fo/] so i'd match the class
    >> attribute with fo....
    >>
    >> i'm trying to parse HTML/Web docs...

    >
    > 4Suite [1] supports regex in XPath using the EXSLT community standard's
    > regex module [2]. It would be something like:
    >
    > [re:match(@class, 'fo.*']
    >
    > With the re prefix set as required by the EXSLT module.


    Same for lxml, although it's currently only enabled in XSLT:
    http://codespeak.net/lxml/api.html#xslt

    Guess I should change that for 1.1...

    Stefan
     
    Stefan Behnel, Jul 4, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Marvin_123456

    "Memory leak" in javax.xml.xpath.XPath

    Marvin_123456, Jul 29, 2005, in forum: Java
    Replies:
    4
    Views:
    1,995
    jan V
    Jul 29, 2005
  2. Alastair Cameron
    Replies:
    1
    Views:
    7,445
    SQL Server Development Team [MSFT]
    Jul 8, 2003
  3. Anna
    Replies:
    0
    Views:
    542
  4. goog
    Replies:
    0
    Views:
    518
  5. Tjerk Wolterink

    XPath: efficiency in xpath expressions

    Tjerk Wolterink, Nov 13, 2004, in forum: XML
    Replies:
    1
    Views:
    1,663
    Richard Tobin
    Nov 13, 2004
Loading...

Share This Page