xml-filter with XMLFilterBase() and XMLGenerator() shuffles attributes

D

Dmitry Teslenko

Hello!
I've made a trivial xml filter to modify some attributes on-the-fly:

....
from __future__ import with_statement
import os
import sys

from xml import sax
from xml.sax import saxutils

class ReIdFilter(saxutils.XMLFilterBase):
def __init__(self, upstream, downstream):
saxutils.XMLFilterBase.__init__(self, upstream)

self.__downstream = downstream
return

def startElement(self, name, attrs):
self.__downstream.startElement(name, attrs)
return

def startElementNS(self, name, qname, attrs):
self.__downstream.startElementNS(name, qname, attrs)
return

def endElement(self, name):
self.__downstream.endElement(name)
return

def endElementNS(self, name, qname):
self.__downstream.endElementNS(name, qname)
return

def processingInstruction(self, target, body):
self.__downstream.processingInstruction(target, body)
return

def comment(self, body):
self.__downstream.comment(body)
return

def characters(self, text):
self.__downstream.characters(text)
return

def ignorableWhitespace(self, ws):
self.__downstream.ignorableWhitespace(ws)
return

....
with open(some_file_path, 'w') as f:
parser = sax.make_parser()
downstream_handler = saxutils.XMLGenerator(f, 'cp1251')
filter_handler = ReIdFilter(parser, downstream_handler)
filter_handler.parse(file_path)

I want prevent it from shuffling attributes, i.e. preserve original
file's attribute order. Is there any ContentHandler.features*
responsible for that?
 
I

infidel

def startElement(self, name, attrs):
self.__downstream.startElement(name, attrs)
return
I want prevent it from shuffling attributes, i.e. preserve original
file's attribute order. Is there any ContentHandler.features*
responsible for that?

I suspect not. attrs is a dictionary which does not maintain order,
and XML attributes are unordered to begin with. Is there any reason
other than aesthetics that you want the order preserved? It shouldn't
matter to any upstream consumer of the filtered XML.
 
B

Brian Smith

I want prevent it from shuffling attributes, i.e. preserve original
I suspect not. attrs is a dictionary which does not maintain
order, and XML attributes are unordered to begin with. Is
there any reason other than aesthetics that you want the
order preserved? It shouldn't matter to any upstream
consumer of the filtered XML.

I had the same requirements. I also had the requirement to preserve
namespace prefixes. Luckily, for my application I was able to use the
XML_STRING property to handle my requirements. Otherwise, if you drop
down to using PyExpat (not SAX) then you can do what you want. If you
want to keep using SAX, then you need to use a non-default Python SAX
implementation or use one of the Java SAX parsers that have this option.


BTW, I have never been able to get XMLGenerator to work; it seems really
buggy regarding namespaces. I had to write my own version of it.

- Brian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top