XML Parser

Discussion in 'XML' started by an0047@gmail.com, Jul 2, 2007.

  1. Guest

    Hello

    Would like to develop a simple XML parser with own commands

    The aproach is first to develop a state machine to later implement it
    in C. I had a look to some posts relating lexical analysers but the
    information
    i found was not helpfull.

    I know there are some books relating the creation of tables and
    complicated
    equations to analyse text, but don't know how to look for them.

    Any recommendation, tips on how to implement the parser or maybe
    literature
    reference (book, paper) would be kindly appreciated.

    Best Regards
    , Jul 2, 2007
    #1
    1. Advertising

  2. Many good XML parsers exist. It sounds like you're going to need to do a
    fair amount of homework before constructing your own. Reinventing the
    wheel is probably not very useful unless you have a real interest in
    learning how parsers function.

    One standard standard reference which covers this topic: "Compilers:
    Principles, Techniques, and Tools" (Aho, Ullman, and others). You can
    ignore the code-generation and optimization sections, but the parsing
    portions of the task are essentially the same, and the typechecking
    chapter may be relevant if you want to implement validation.
    Joseph Kesselman, Jul 2, 2007
    #2
    1. Advertising

  3. wrote:

    > Any recommendation, tips on how to implement the parser or maybe
    > literature
    > reference (book, paper) would be kindly appreciated.


    This question has been answered here several times.
    Google for it. Usually, we warn newbies who want to
    write their own parsers. You will be surprised about
    the tricky details. Have you ever heard of a BOM ?
    Are you prepared to process 32-bit-characters ?
    Juergen Kahrs, Jul 3, 2007
    #3
  4. Juergen Kahrs wrote:
    > Have you ever heard of a BOM ? Are you prepared to process 32-bit-characters ?


    The usual estimate is that a complete XML parser is about the right size
    to be a serious term project for a college student who already
    understands the basics of writing parsers.

    You can rattle off a subset in less time than that. But, again, unless
    you have very special needs (such as a language where nobody has written
    one yet and which can't link to existing parsers), the question is "why".

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Jul 3, 2007
    #4
  5. Guest

    On 3 Jul., 00:56, Joseph Kesselman <> wrote:
    > Many good XML parsers exist. It sounds like you're going to need to do a
    > fair amount of homework before constructing your own. Reinventing the
    > wheel is probably not very useful unless you have a real interest in
    > learning how parsers function.
    >
    > One standard standard reference which covers this topic: "Compilers:
    > Principles, Techniques, and Tools" (Aho, Ullman, and others). You can
    > ignore the code-generation and optimization sections, but the parsing
    > portions of the task are essentially the same, and the typechecking
    > chapter may be relevant if you want to implement validation.


    Thanks for your answer and reference!. I'm not trying to reinvent the
    wheel, I'm trying to write a very simple and reliable parser for a
    commercial software. The ones out there are very complex, big and
    license violation needs to be taken under consideration. If you know
    about a very simple one written in C please let me know.

    I have indeed a real interest in learning how parsers function, that's
    because I asked for a book reference. As for now the new state machine
    has 5 states and more or less I can handle some simple XLM tags.

    Regards
    , Jul 3, 2007
    #5
  6. Guest

    On 3 Jul., 09:49, Juergen Kahrs <>
    wrote:
    > wrote:
    > > Any recommendation, tips on how to implement the parser or maybe
    > > literature
    > > reference (book, paper) would be kindly appreciated.

    >
    > This question has been answered here several times.
    > Google for it. Usually, we warn newbies who want to
    > write their own parsers. You will be surprised about
    > the tricky details. Have you ever heard of a BOM ?
    > Are you prepared to process 32-bit-characters ?


    Hi thanks for your answer and thanks for the warning too. As a newbie
    I need and want to learn about parsers. Actually I don't even know if
    I'm posting at the right group, unfortunately I didn't found any
    information on the web that satisfied my search and that is the reason
    of my post. The characters are still 8 bit long and I think they will
    remain like that. For your pleasure I had great problems handling
    chars and strings under C. I don't know to which question do you refer
    but if you could point me to posts that talk about the implementation
    (and state machine) of the kind of parser described above I would
    kindly appreciate it, have a nice day and best regards
    , Jul 3, 2007
    #6
  7. Guest

    On 3 Jul., 14:06, Joe Kesselman <> wrote:
    > Juergen Kahrs wrote:
    > > Have you ever heard of a BOM ? Are you prepared to process 32-bit-characters ?

    >
    > The usual estimate is that a complete XML parser is about the right size
    > to be a serious term project for a college student who already
    > understands the basics of writing parsers.
    >
    > You can rattle off a subset in less time than that. But, again, unless
    > you have very special needs (such as a language where nobody has written
    > one yet and which can't link to existing parsers), the question is "why".


    Why? I think the answer is the posts above

    >
    > --
    > () ASCII Ribbon Campaign | Joe Kesselman
    > /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    , Jul 3, 2007
    #7
  8. The existing parsers are complex because that's what's required to do a
    good job. Supporting a trivial subset of XML is near-trivial, but there
    is a lot more that has to be dealt with if you want your code to survive
    contact with real data and real users.

    Everything should be as simple as possible, but not simpler.

    If you want to learn about parsers, implementing a sloppy subset really
    won't teach you much.

    "Try not. Do! ... Or do not."

    There are royalty-free parsers out there, if that's your concern. I
    don't know what's available in plain C these days, but Apache's Xerces
    parser is available in a C++ version.

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
    Joseph Kesselman, Jul 3, 2007
    #8
  9. (If your parser doesn't support all of XML, it isn't an XML parser.)

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
    Joseph Kesselman, Jul 3, 2007
    #9
  10. Joseph Kesselman wrote:
    > There are royalty-free parsers out there, if that's your concern.


    For what it's worth, the W3C's own website just suggests you do a
    websearch for "XML parser" to get a list of the available parsers.
    Adding "in C" and "free" to that suggests that you might want to look at
    libxml2, XMLTok, expat, and possibly others.

    (I haven't used any C-based XML parser in years, so I can't offer
    opinions on any of these.)
    Joseph Kesselman, Jul 3, 2007
    #10
  11. wrote:

    > Hi thanks for your answer and thanks for the warning too. As a newbie
    > I need and want to learn about parsers. Actually I don't even know if


    Use one of these as a starting point:

    http://www.grinninglizard.com/tinyxmldocs/index.html
    http://www.danoneverythingelse.com/articles/Softwarebxmlnode.html

    > I'm posting at the right group, unfortunately I didn't found any
    > information on the web that satisfied my search and that is the reason
    > of my post. The characters are still 8 bit long and I think they will
    > remain like that. For your pleasure I had great problems handling


    OK, then you focus on processing XML files produced
    by yourself.

    > chars and strings under C. I don't know to which question do you refer
    > but if you could point me to posts that talk about the implementation
    > (and state machine) of the kind of parser described above I would
    > kindly appreciate it, have a nice day and best regards


    The links above are the main points.
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, Jul 4, 2007
    #11
  12. Guest

    > If you want to learn about parsers, implementing a sloppy subset really
    > won't teach you much.


    Then what will teach me excluding the use of a fix and finished xml
    library?

    > There are royalty-free parsers out there, if that's your concern.


    No that's not my main concern

    > don't know what's available in plain C these days, but Apache's Xerces
    > parser is available in a C++ version.


    I didn't know either and that is the reason of my post
    , Jul 4, 2007
    #12
  13. Guest

    On 3 Jul., 23:45, Joseph Kesselman <> wrote:
    > (If your parser doesn't support all of XML, it isn't an XML parser.)


    It won't be an XML parser, if I make it to finish it, it will be an
    own command XML parser
    , Jul 4, 2007
    #13
  14. Guest

    On 4 Jul., 19:07, J├╝rgen Kahrs <>
    wrote:
    > The links above are the main points.


    Thank you very much :)
    , Jul 4, 2007
    #14
  15. wrote:
    > It won't be an XML parser, if I make it to finish it, it will be an
    > own command XML parser


    I don't know what "own command" means in this context, I'm afraid. But
    I'm not sure I need to; I've raised the relevant questions, I think.


    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Jul 4, 2007
    #15
  16. Guest

    On 5 Jul., 00:01, Joe Kesselman <> wrote:
    > I don't know what "own command" means in this context, I'm afraid. But
    > I'm not sure I need to; I've raised the relevant questions, I think.


    Own dataset? Thank you for the reference, I just got the book you
    recommended
    , Jul 4, 2007
    #16
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. ZOCOR

    XML Parser VS HTML Parser

    ZOCOR, Oct 3, 2004, in forum: Java
    Replies:
    11
    Views:
    810
    Paul King
    Oct 5, 2004
  2. arne
    Replies:
    0
    Views:
    351
  3. Erik Wasser
    Replies:
    5
    Views:
    449
    Peter J. Holzer
    Mar 5, 2006
  4. Sean
    Replies:
    3
    Views:
    270
    robic0
    Oct 3, 2006
  5. Sean
    Replies:
    0
    Views:
    366
Loading...

Share This Page