XML expat error

Discussion in 'Python' started by dirkheld, Feb 27, 2008.

  1. dirkheld

    dirkheld Guest

    Hi,

    I have written a piece of code that reads all xml files in a directory
    in onder to retrieve one element in each of these files. All files
    have the same XML structure. After file 123 I receive the following
    error :

    xml.parsers.expat.ExpatError: not well-formed (invalid token): line
    554, column 20

    I guess that the element I try to read or the XML(which would be
    strange since they have been created with the same code) can't ben
    retrieved.

    Is there a way to :
    1. fix this problems so that I can retrieve it
    2. is there a way that after such an error the invalid file is being
    skipped and the program continues with reading the subsequent files;
    Some sort of error handling?

    Here is the code I use :

    from xml.dom import minidom
    import os
    path = "/Documents/programming/data/xml/"


    dirList = os.listdir(path)
    url_file=open('/Documents/programming/data/xml/test.txt','w')
    for file in dirList:
    xmldoc = minidom.parse('/Documents/programming/data/xml/'+file)
    xml_elem = xmldoc.getElementsByTagName('webpage')
    web_elem = xml_elem[0]
    url = web_elem.attributes['uri']
    url_file.write(url.value + '\n')
    url_file.close()
     
    dirkheld, Feb 27, 2008
    #1
    1. Advertising

  2. "dirkheld" <> wrote in message
    news:...

    > xml.parsers.expat.ExpatError: not well-formed (invalid token): line
    > 554, column 20
    >
    > I guess that the element I try to read or the XML(which would be
    > strange since they have been created with the same code) can't ben
    > retrieved.


    It's fairly easy to write non-robust XML generating code, and also
    quick to test if one file is always bad. Drop it into a text editor or
    Firefox, and take a quick look at line 554. Most likely some random
    control character has sneaked in; it only takes (for example) one NUL
    to make the document ill-formed.
     
    Richard Brodie, Feb 27, 2008
    #2
    1. Advertising

  3. dirkheld

    dirkheld Guest

    On 27 feb, 17:18, "Richard Brodie" <> wrote:
    > "dirkheld" <> wrote in message
    >
    > news:...
    >
    > > xml.parsers.expat.ExpatError: not well-formed (invalid token): line
    > > 554, column 20

    >
    > > I guess that the element I try to read or the XML(which would be
    > > strange since they have been created with the same code) can't ben
    > > retrieved.

    >
    > It's fairly easy to write non-robust XML generating code, and also
    > quick to test if one file is always bad. Drop it into a text editor or
    > Firefox, and take a quick look at line 554. Most likely some random
    > control character has sneaked in; it only takes (for example) one NUL
    > to make the document ill-formed.


    Something strange here. The xml file causing the problem has only 361
    lines. Isn't there a way to catch this error, ignore it and continu
    with the rest of the other files?
    This is the full error report :

    Traceback (most recent call last):
    File "xmltest.py", line 10, in <module>
    xmldoc = minidom.parse('/Documents/programming/data/xml/'+file)
    File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
    python2.5/xml/dom/minidom.py", line 1913, in parse
    return expatbuilder.parse(file)
    File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
    python2.5/xml/dom/expatbuilder.py", line 924, in parse
    result = builder.parseFile(fp)
    File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
    python2.5/xml/dom/expatbuilder.py", line 207, in parseFile
    parser.Parse(buffer, 0)
    xml.parsers.expat.ExpatError: not well-formed (invalid token): line
    554, column 20
     
    dirkheld, Feb 27, 2008
    #3
  4. On Wed, 27 Feb 2008 14:02:25 -0800, dirkheld wrote:

    > Something strange here. The xml file causing the problem has only 361
    > lines. Isn't there a way to catch this error, ignore it and continu
    > with the rest of the other files?


    Yes of course: handle the exception instead of letting it propagate to the
    top level and ending the program.

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Feb 28, 2008
    #4
  5. dirkheld

    dirkheld Guest

    On 28 feb, 08:18, Marc 'BlackJack' Rintsch <> wrote:
    > On Wed, 27 Feb 2008 14:02:25 -0800, dirkheld wrote:
    > > Something strange here. The xml file causing the problem has only 361
    > > lines. Isn't there a way to catch this error, ignore it and continu
    > > with the rest of the other files?

    >
    > Yes of course: handle the exception instead of letting it propagate to the
    > top level and ending the program.
    >
    > Ciao,
    > Marc 'BlackJack' Rintsch


    Ehm, maybe a stupid question... how. I'm rather new to python and I
    never user error handling.
     
    dirkheld, Feb 28, 2008
    #5
  6. dirkheld wrote:
    > On 28 feb, 08:18, Marc 'BlackJack' Rintsch <> wrote:
    >> On Wed, 27 Feb 2008 14:02:25 -0800, dirkheld wrote:
    >>> Something strange here. The xml file causing the problem has only 361
    >>> lines. Isn't there a way to catch this error, ignore it and continu
    >>> with the rest of the other files?

    >> Yes of course: handle the exception instead of letting it propagate to the
    >> top level and ending the program.
    >>
    >> Ciao,
    >> Marc 'BlackJack' Rintsch

    >
    > Ehm, maybe a stupid question... how. I'm rather new to python and I
    > never user error handling.


    Care to read the tutorial?

    Stefan
     
    Stefan Behnel, Feb 28, 2008
    #6
  7. On Thu, 28 Feb 2008 12:37:10 -0800, dirkheld wrote:

    >> Yes of course: handle the exception instead of letting it propagate to the
    >> top level and ending the program.

    >
    > Ehm, maybe a stupid question... how. I'm rather new to python and I
    > never user error handling.


    Then you should work through the tutorial in the docs, at least until
    section 8.3 Handling Exceptions:

    http://docs.python.org/tut/node10.html#SECTION0010300000000000000000

    Ciao,
    Marc 'BlackJack' Rintsch
     
    Marc 'BlackJack' Rintsch, Feb 28, 2008
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. winderjj
    Replies:
    3
    Views:
    496
    Toni Uusitalo
    Jul 30, 2003
  2. Thomas Guettler

    xml.parsers.expat vs. xml.sax

    Thomas Guettler, Apr 27, 2004, in forum: Python
    Replies:
    2
    Views:
    919
    Martijn Faassen
    Apr 27, 2004
  3. kaens
    Replies:
    6
    Views:
    345
    Stefan Behnel
    May 23, 2007
  4. kaens
    Replies:
    0
    Views:
    388
    kaens
    May 23, 2007
  5. sharan
    Replies:
    1
    Views:
    731
    Pavel Lepin
    Oct 26, 2007
Loading...

Share This Page