Writing a parser

Discussion in 'XML' started by Jan Danielsson, Aug 17, 2005.

  1. Hello all,

    I guess this is a question for people who have written a parser.

    Does an XML parser ever need to be recursive? I mean like:

    &fo&bar;o;

    I know this particular example is in the XML specs, and it says that
    it will not happen. But are there some really wild constructions that
    are allowed, that would require recurive parsing?

    Like.. <tag <!-- Comment <tag2 attr="<fo&ou<!-- comment!
    -->ml;o/>"></tag2> -->></tag>

    Please, don't start taking that a part, I know all the errors in it.
    However, what I want to demonstrate is the level of complexity I'm
    wondering about. Any case where recursion is needed?

    --
    Kind Regards,
    Jan Danielsson
    Te audire no possum. Musa sapientum fixa est in aure.
     
    Jan Danielsson, Aug 17, 2005
    #1
    1. Advertising

  2. Hello,

    Jan Danielsson wrote:
    >
    > I guess this is a question for people who have written a parser.
    >
    > Does an XML parser ever need to be recursive? I mean like:
    >
    > &fo&bar;o;
    >
    > I know this particular example is in the XML specs, and it says that
    > it will not happen. But are there some really wild constructions that
    > are allowed, that would require recurive parsing?
    >
    > Like.. <tag <!-- Comment <tag2 attr="<fo&ou<!-- comment!
    > -->ml;o/>"></tag2> -->></tag>
    >
    > Please, don't start taking that a part, I know all the errors in it.
    > However, what I want to demonstrate is the level of complexity I'm
    > wondering about. Any case where recursion is needed?
    >


    I'm no expert, but AFAIK a XML parser will have to stop if the XML
    file is not well-formed. The above example contains errors (you
    said it), so it is not well-formed. There's no need for a parser
    to accept the above construct. I even think that a parser is not
    allowed to accept it.

    Gerald
     
    Gerald Aichholzer, Aug 17, 2005
    #2
    1. Advertising

  3. Jan Danielsson wrote:

    > However, what I want to demonstrate is the level of complexity I'm
    > wondering about. Any case where recursion is needed?


    Why do you worry about recursion ?
    Recursive functions usually make parsers easier to implement.
    If you *really* cant recurse in your implementation, use stacks
    for holding the context.
     
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, Aug 17, 2005
    #3
  4. In article <4303a842$>,
    Jan Danielsson <> wrote:

    >Does an XML parser ever need to be recursive? I mean like:


    Yes, but not in the way your examples are.

    Elements may contain other elements:

    <foo>...<bar>...</bar>...</foo>

    Even if you don't return this as a nested structure (for example,
    a SAX parser just returns start and end tags), you need to maintain
    a stack of open elements so you can detect errors like this:

    <foo>...<bar>...</bar>...</wrong>

    The replacement text of entities may contain references to other
    entities:

    <!ENTITY foo "some text">
    <!ENTITY bar "contains this [ &bar; ] text">

    So that a reference in the document to "&foo;" must be expanded
    to "contains this [ some text ] text".

    And similarly for external entities.

    -- Richard
     
    Richard Tobin, Aug 17, 2005
    #4
  5. J├╝rgen Kahrs wrote:
    >>However, what I want to demonstrate is the level of complexity I'm
    >>wondering about. Any case where recursion is needed?

    >
    > Why do you worry about recursion ?
    > Recursive functions usually make parsers easier to implement.
    > If you *really* cant recurse in your implementation, use stacks
    > for holding the context.


    I'm sorry, but I was talking about recursive *expressions* in *XML*,
    not as in "a function calling itself". I already have a stack based
    parser, but I'm beginning to wonder it is worth the trouble, I haven't
    actually seen any examples where I would actually need the stack based
    design, and there'a much neater way to solve it, imho, but it would
    make certain recursions *in* *XML* impossible.

    Sorry for the confusion.

    --
    Kind Regards,
    Jan Danielsson
    Te audire no possum. Musa sapientum fixa est in aure.
     
    Jan Danielsson, Aug 17, 2005
    #5
  6. Jan Danielsson

    Soren Kuula Guest

    Richard Tobin wrote:
    > <!ENTITY foo "some text">
    > <!ENTITY bar "contains this [ &bar; ] text">
    >
    > So that a reference in the document to "&foo;" must be expanded
    > to "contains this [ some text ] text".
    >

    Surely you mean
    > <!ENTITY bar "contains this [ &foo; ] text">


    ?

    Soren
     
    Soren Kuula, Aug 18, 2005
    #6
  7. In article <bmPMe.65151$>,
    Soren Kuula <> wrote:

    >> <!ENTITY bar "contains this [ &bar; ] text">


    >Surely you mean
    > > <!ENTITY bar "contains this [ &foo; ] text">


    Yes, of course.

    The one I typed is illegal (and must be reported as such by an XML
    parser if it is used).

    -- Richard
     
    Richard Tobin, Aug 18, 2005
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bernd Oninger
    Replies:
    0
    Views:
    794
    Bernd Oninger
    Jun 9, 2004
  2. ZOCOR

    XML Parser VS HTML Parser

    ZOCOR, Oct 3, 2004, in forum: Java
    Replies:
    11
    Views:
    850
    Paul King
    Oct 5, 2004
  3. Bernd Oninger
    Replies:
    0
    Views:
    844
    Bernd Oninger
    Jun 9, 2004
  4. Joel Hedlund
    Replies:
    2
    Views:
    560
    Joel Hedlund
    Nov 11, 2006
  5. Joel Hedlund
    Replies:
    0
    Views:
    329
    Joel Hedlund
    Nov 11, 2006
Loading...

Share This Page