xml.dom.minidom losing the XML document type attribute

Discussion in 'Python' started by Johannes Bauer, Jun 10, 2009.

  1. Hello group,

    when I read in a XML document with the xml.dom.minidom parser and write
    it out again, an attribute is lost:

    Input:

    <?xml version="1.0" encoding="utf-8" ?>
    [...]

    Output:
    <?xml version="1.0" ?>

    How can I fix this? Python is Python 3.0rc2 (r30rc2:67114, Nov 16 2008,
    15:24:36)

    Kind regards,
    Johannes

    --
    "Meine Gegenklage gegen dich lautet dann auf bewusste Verlogenheit,
    verlästerung von Gott, Bibel und mir und bewusster Blasphemie."
    -- Prophet und Visionär Hans Joss aka HJP in de.sci.physik
    <48d8bf1d$0$7510$>
     
    Johannes Bauer, Jun 10, 2009
    #1
    1. Advertising

  2. Johannes Bauer wrote:
    > when I read in a XML document with the xml.dom.minidom parser and write
    > it out again, an attribute is lost:
    >
    > Input:
    >
    > <?xml version="1.0" encoding="utf-8" ?>
    > [...]
    >
    > Output:
    > <?xml version="1.0" ?>
    >
    > How can I fix this?


    You don't have to. UTF-8 is the default encoding, so the two lines above
    are equivalent.

    Stefan
     
    Stefan Behnel, Jun 11, 2009
    #2
    1. Advertising

  3. Stefan Behnel schrieb:
    > Johannes Bauer wrote:
    >> when I read in a XML document with the xml.dom.minidom parser and write
    >> it out again, an attribute is lost:
    >>
    >> Input:
    >>
    >> <?xml version="1.0" encoding="utf-8" ?>
    >> [...]
    >>
    >> Output:
    >> <?xml version="1.0" ?>
    >>
    >> How can I fix this?

    >
    > You don't have to. UTF-8 is the default encoding, so the two lines above
    > are equivalent.


    Can I somehow force Python to generate it anyways? I have software which
    complains if an explicit encoding is missing...

    Kind regards,
    Johannes

    --
    "Meine Gegenklage gegen dich lautet dann auf bewusste Verlogenheit,
    verlästerung von Gott, Bibel und mir und bewusster Blasphemie."
    -- Prophet und Visionär Hans Joss aka HJP in de.sci.physik
    <48d8bf1d$0$7510$>
     
    Johannes Bauer, Jun 11, 2009
    #3
  4. Johannes Bauer wrote:
    > Stefan Behnel schrieb:
    >> Johannes Bauer wrote:
    >>> when I read in a XML document with the xml.dom.minidom parser and write
    >>> it out again, an attribute is lost:
    >>>
    >>> Input:
    >>>
    >>> <?xml version="1.0" encoding="utf-8" ?>
    >>> [...]
    >>>
    >>> Output:
    >>> <?xml version="1.0" ?>
    >>>
    >>> How can I fix this?

    >> You don't have to. UTF-8 is the default encoding, so the two lines above
    >> are equivalent.

    >
    > Can I somehow force Python to generate it anyways?


    Did you try passing encoding='UTF-8' on serialisation?


    > I have software which
    > complains if an explicit encoding is missing...


    Well, to parse XML, it's best to use an XML parser. ;)

    Stefan
     
    Stefan Behnel, Jun 11, 2009
    #4
  5. Stefan Behnel schrieb:

    >> Can I somehow force Python to generate it anyways?

    >
    > Did you try passing encoding='UTF-8' on serialisation?


    Uhm... nope - how can I do that?

    >
    >> I have software which
    >> complains if an explicit encoding is missing...

    >
    > Well, to parse XML, it's best to use an XML parser. ;)


    Well, I'm not speaking about my software :) Actually it's Gnucash which
    complains if the tag is not explicitly set. This is because they
    appearently had a ancient version which did not specify the charset, but
    used a different one than UTF-8. Kind of annoying, but fixing my XML
    output seems to be easier than convincing the Gnucash people to change
    their software :)

    Kind regards,
    Johannes

    --
    "Meine Gegenklage gegen dich lautet dann auf bewusste Verlogenheit,
    verlästerung von Gott, Bibel und mir und bewusster Blasphemie."
    -- Prophet und Visionär Hans Joss aka HJP in de.sci.physik
    <48d8bf1d$0$7510$>
     
    Johannes Bauer, Jun 11, 2009
    #5
  6. Johannes Bauer wrote:
    > Stefan Behnel schrieb:
    >
    >>> Can I somehow force Python to generate it anyways?

    >> Did you try passing encoding='UTF-8' on serialisation?

    >
    > Uhm... nope - how can I do that?


    Well, depends on what your code currently does.

    Maybe you could use something like

    doc.xmlwrite(..., encoding='UTF-8')

    Stefan
     
    Stefan Behnel, Jun 11, 2009
    #6
  7. On Thu, Jun 11, 2009 at 9:20 AM, Johannes Bauer<> wrote:
    > Well, I'm not speaking about my software :) Actually it's Gnucash which
    > complains if the tag is not explicitly set. This is because they
    > appearently had a ancient version which did not specify the charset, but
    > used a different one than UTF-8. Kind of annoying, but fixing my XML
    > output seems to be easier than convincing the Gnucash people to change
    > their software :)


    from the GnuCash web page:
    How can you help?

    Testing: Test it and help us discover all bugs that might show up in
    there. Please enter each and every bug into bugzilla.

    Translating: The new release comes with some new translation strings.
    If you consider contributing a translation, we invite you to test this
    release already. A string freeze will be announced in one of the later
    2.3.x releases. Please check
    http://wiki.gnucash.org/wiki/Translation_Status for updates on this.

    We would like to encourage people to test this and any further
    releases as much as possible and submit bug reports in order that we
    can polish GnuCash to be as stable as possible for the 2.4.0 release
    in a few weeks. Then post any bugs you find to bugzilla
    (http://bugzilla.gnome.org/enter_bug.cgi?product=GnuCash)
     
    David Robinow, Jun 11, 2009
    #7
  8. David Robinow schrieb:
    > On Thu, Jun 11, 2009 at 9:20 AM, Johannes Bauer<> wrote:
    >> Well, I'm not speaking about my software :) Actually it's Gnucash which
    >> complains if the tag is not explicitly set. This is because they
    >> appearently had a ancient version which did not specify the charset, but
    >> used a different one than UTF-8. Kind of annoying, but fixing my XML
    >> output seems to be easier than convincing the Gnucash people to change
    >> their software :)

    >
    > from the GnuCash web page:
    > How can you help?


    Well, it's not as if it's a bug of GnuCash. This is a deliberate
    decision used to ensure backwards compatibility with older versions of
    GnuCash. So a bug report wouldn't really do good anything at all
    ("Please remove your backwards compatibility feature, it annoys me and I
    only use recent versions anyways").

    Kind regards,
    Johannes

    --
    "Meine Gegenklage gegen dich lautet dann auf bewusste Verlogenheit,
    verlästerung von Gott, Bibel und mir und bewusster Blasphemie."
    -- Prophet und Visionär Hans Joss aka HJP in de.sci.physik
    <48d8bf1d$0$7510$>
     
    Johannes Bauer, Jun 11, 2009
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Roman Yakovenko

    xml.dom.minidom - bug ? future ?

    Roman Yakovenko, Sep 4, 2003, in forum: Python
    Replies:
    1
    Views:
    347
    =?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?=
    Sep 4, 2003
  2. Hans Nowak
    Replies:
    1
    Views:
    400
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Sep 4, 2003
  3. Greg Wogan-Browne
    Replies:
    1
    Views:
    832
    Uche Ogbuji
    Jan 28, 2005
  4. Replies:
    3
    Views:
    548
    Stefan Behnel
    Aug 3, 2007
  5. ming
    Replies:
    2
    Views:
    187
Loading...

Share This Page