encoding of script tags in html

Discussion in 'XML' started by Andy Fish, Jun 2, 2008.

  1. Andy Fish

    Andy Fish Guest

    hi,

    this is more an html parsing question than an XML question but I think it's
    the kind of thing that folks in an XML newsgroup would be more likely to
    help with, so please excuse me if it's a little off topic. please be aware
    that I am primarily talking about HTML rather than XHTML but I would also
    like to understand how XHTML works for when I prepare to convert the app to
    XHTML.

    I have recently discovered that this:

    <script>var x='</script>';</script>

    is not valid HTML - the fact that there is an end script tag in quotes
    causes the parser to stop recognising the script. initially my reaction was
    that this is not a surprise because I had failed to HTML encode the script
    contents, so my second attempt was this:

    <script>var x='&lt;/script&gt;';</script>

    however this it DOES NOT WORK - the variable ends up containing the text
    "&lt;/script&gt;"

    can someone point me at part of the w3c specification that states how script
    tags are parsed differently to other tags in HTML.

    interestingly i have also discovered that this:

    <script>if (3<5);</script>

    IS valid html and seems even to be valid XHTML even though it is not valid
    XML

    Andy
     
    Andy Fish, Jun 2, 2008
    #1
    1. Advertising

  2. Andy Fish schrieb:
    > <script>var x='</script>';</script>


    Escape the '/' in your script code:

    var x='<\/script>';
    --
    Johannes Koch
    In te domine speravi; non confundar in aeternum.
    (Te Deum, 4th cent.)
     
    Johannes Koch, Jun 2, 2008
    #2
    1. Advertising

  3. Andy Fish wrote:

    > can someone point me at part of the w3c specification that states how script
    > tags are parsed differently to other tags in HTML.


    See http://www.w3.org/TR/html4/types.html#type-script:
    "Please note that script data that is element content may not contain
    character references, but script data that is the value of an attribute
    may contain them."
    and http://www.w3.org/TR/html4/appendix/notes.html#notes-specifying-data:
    "The DTD defines script and style data to be CDATA for both element
    content and attribute values. SGML rules do not allow character
    references in CDATA element content but do allow them in CDATA attribute
    values."

    > interestingly i have also discovered that this:
    >
    > <script>if (3<5);</script>
    >
    > IS valid html and seems even to be valid XHTML even though it is not valid
    > XML


    That snippet is not well-formed so it can't be valid XML or XHTML as it
    is not even XML.


    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
     
    Martin Honnen, Jun 2, 2008
    #3
  4. Andy Fish

    Andy Fish Guest

    thanks to both for the quick replies

    wow - what a minefield this has turned out to be !!

    i previously had just one server-side utility function to escape a string as
    a literal javascript string. now i realise i need to have 2 separate
    functions, one for when the javascript literal is to be placed inside an
    HTML attribute value (e.g. onclick="...") and a different one for when it is
    inside a script block, because one is CDATA and one is PCDATA

    Andy


    "Martin Honnen" <> wrote in message
    news:4843e357$0$27438$-online.net...
    > Andy Fish wrote:
    >
    >> can someone point me at part of the w3c specification that states how
    >> script tags are parsed differently to other tags in HTML.

    >
    > See http://www.w3.org/TR/html4/types.html#type-script:
    > "Please note that script data that is element content may not contain
    > character references, but script data that is the value of an attribute
    > may contain them."
    > and http://www.w3.org/TR/html4/appendix/notes.html#notes-specifying-data:
    > "The DTD defines script and style data to be CDATA for both element
    > content and attribute values. SGML rules do not allow character references
    > in CDATA element content but do allow them in CDATA attribute values."
    >
    >> interestingly i have also discovered that this:
    >>
    >> <script>if (3<5);</script>
    >>
    >> IS valid html and seems even to be valid XHTML even though it is not
    >> valid XML

    >
    > That snippet is not well-formed so it can't be valid XML or XHTML as it is
    > not even XML.
    >
    >
    > --
    >
    > Martin Honnen
    > http://JavaScript.FAQTs.com/
     
    Andy Fish, Jun 2, 2008
    #4
  5. Andy Fish

    Peter Flynn Guest

    Martin Honnen wrote:
    > Andy Fish wrote:
    >
    >> can someone point me at part of the w3c specification that states how
    >> script tags are parsed differently to other tags in HTML.

    [...]
    >> interestingly i have also discovered that this:
    >>
    >> <script>if (3<5);</script>
    >>
    >> IS valid html and seems even to be valid XHTML even though it is not
    >> valid XML

    >
    > That snippet is not well-formed so it can't be valid XML or XHTML as it
    > is not even XML.


    It is, however, valid HTML (SGML): the < sign is valid unescaped in
    CDATA declared content (and would be valid elsewhere, as the digit
    following it cannot be taken to be the beginning of an element type
    name).

    > wow - what a minefield this has turned out to be !!


    Don't guess ("seems to be..."). Install a standalone validating parser
    that handles both SGML and XML (eg onsgmls, part of SP), and copies of
    the relevant DTDs; or a schema validator and copies of the schemas,
    and test any files you create for validity. A good XML editor will do
    this for you anyway.

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
     
    Peter Flynn, Jun 2, 2008
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Dean H. Saxe
    Replies:
    0
    Views:
    1,081
    Dean H. Saxe
    Jan 3, 2004
  2. Chuck
    Replies:
    2
    Views:
    746
    Chuck
    Nov 12, 2004
  3. Rob Nicholson
    Replies:
    3
    Views:
    840
    Rob Nicholson
    May 28, 2005
  4. Donald Firesmith

    html tags within meta tags allowed?

    Donald Firesmith, Jan 5, 2005, in forum: XML
    Replies:
    5
    Views:
    942
    Andy Dingley
    Jan 8, 2005
  5. Jeffrey
    Replies:
    10
    Views:
    655
    mbstevens
    Oct 15, 2006
Loading...

Share This Page