parsing non-well-formed XML (SAX)

Discussion in 'Java' started by Timo Nentwig, Jun 4, 2004.

  1. Timo Nentwig

    Timo Nentwig Guest

    Hi!

    I need to parse multi-MByte "XML" files which are not well-formed, i.e.
    there's are plenty of <TAGS> in there instead of <TAGS />. I'm also not
    sure about case sensitiveness.

    Any ready-to-use solutions? :)

    Timo
     
    Timo Nentwig, Jun 4, 2004
    #1
    1. Advertising

  2. Timo Nentwig

    Andy Fish Guest

    well I shouldn't think there are any XML parsers you can use.

    the trouble with not well formed documents is that only you will know what
    types of non-well-formedness are acceptable and how to interpret them - Any
    piece of information that is not a well-formed XML document is a badly
    formed XML document!!

    So, the key to a successful solution is to write down what your definition
    of a valid input document is. only once you have done this can you evaluate
    different approaches.

    if there are only a few well-known examples of badly formed tags you could
    pre-process it first to generate XML. e.g. say you knew that the TAGS
    element could never have any content but it might be missing the end-tag
    delimiter (like the <br> in HTML) it would be easy to pick it up.

    Failing that, antlr is a well known parser generator which would be a
    builing block on the way to making your own parser.

    "Timo Nentwig" <> wrote in message
    news:...
    > Hi!
    >
    > I need to parse multi-MByte "XML" files which are not well-formed, i.e.
    > there's are plenty of <TAGS> in there instead of <TAGS />. I'm also not
    > sure about case sensitiveness.
    >
    > Any ready-to-use solutions? :)
    >
    > Timo
     
    Andy Fish, Jun 4, 2004
    #2
    1. Advertising

  3. Timo Nentwig

    Timo Nentwig Guest

    Andy Fish wrote:
    > well I shouldn't think there are any XML parsers you can use.


    Something like NekoHTML's HTMLTagBalancer...
     
    Timo Nentwig, Jun 4, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Paul Flew

    Well-formed XML question

    Paul Flew, Jun 30, 2003, in forum: XML
    Replies:
    3
    Views:
    1,001
    Micah Cowan
    Jul 5, 2003
  2. Andy Dingley
    Replies:
    7
    Views:
    560
    Andy Dingley
    Mar 19, 2007
  3. Replies:
    7
    Views:
    428
    Andy Dingley
    Apr 18, 2007
  4. Rich Fowler
    Replies:
    2
    Views:
    1,336
    Rich Fowler
    Jan 22, 2010
  5. Erik Wasser
    Replies:
    5
    Views:
    485
    Peter J. Holzer
    Mar 5, 2006
Loading...

Share This Page