How to apply text changes to HTML, keeping it intact if inside "a" tags

Discussion in 'Python' started by vbfoobar@gmail.com, Sep 27, 2006.

  1. Guest

    Hello,

    I have HTML input to which I apply some changes.

    Feature 1:
    =======
    I want to tranform all the text, but if the text is inside
    an "a href" tag, I want to leave the text as it is.

    The HTML is not necessarily well-formed, so
    I would like to do that using BeautifulSoup (or
    maybe another tolerant parser).

    As a test case, suppose I want to uppercase all the text
    except the text that is within "a href" tags:

    ExampleString = """
    <footag>Lorem Ipsum</footag> is simply
    dummy text of <a href="junk.html">the printing</a> and
    <a href="junk2.html">typesetting <b>industry</b>.</a>
    Thanks."""

    When applying the text transform, I want to obtain:

    <footag>LOREM IPSUM</footag> IS SIMPLY
    DUMMY TEXT OF <a href="junk.html">the printing</a> AND
    <a href="junk2.html">typesetting <b>industry</b>.</a>
    THANKS."""


    Feature 2:
    ========
    Another thing I may want to do: If the text I would normally
    transform is inside an "a href" tag, then do not transform it,
    but insert the result of text transformation just after the "</a>".

    Using the same example as input, application of
    this feature2 would give something like that:

    <footag>LOREM IPSUM</footag> IS SIMPLY
    DUMMY TEXT OF <a href="junk.html">the printing</a><feat2>THE
    PRINTING</feat2> AND
    <a href="junk2.html">typesetting
    <b>industry</b>.</a><feat2>TYPESETTING <b>INDUSTRY</b>.</feat2>
    THANKS."""

    ========
    Thanks for your help
     
    , Sep 27, 2006
    #1
    1. Advertising

  2. wrote:

    > Hello,
    >
    > I have HTML input to which I apply some changes.
    >
    > Feature 1:
    > =======
    > I want to tranform all the text, but if the text is inside
    > an "a href" tag, I want to leave the text as it is.
    >
    > The HTML is not necessarily well-formed, so
    > I would like to do that using BeautifulSoup (or
    > maybe another tolerant parser).
    >


    <snip/>

    Use the BeautifulSoup + XSL. Writing your two features in xsl is close to a
    no-brainer, and it is certainly the best tool for the job.

    And there are a few implementations for python available.

    Diez
     
    Diez B. Roggisch, Sep 27, 2006
    #2
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. A. Brinkmann
    Replies:
    2
    Views:
    1,090
    A. Brinkmann
    Apr 16, 2004
  2. Stefan Siegl
    Replies:
    1
    Views:
    976
    Marrow
    Jul 18, 2003
  3. Replies:
    3
    Views:
    372
    richard
    Oct 4, 2006
  4. Raja Kannan
    Replies:
    2
    Views:
    139
  5. Angel

    Keeping DOM changes intact

    Angel, Apr 7, 2006, in forum: Javascript
    Replies:
    0
    Views:
    98
    Angel
    Apr 7, 2006
Loading...

Share This Page