F
François Robert
Dear Newsgroup,
I am looking for a way to search and replace some strings inside various
XML documents while at the same time binary-preserving all the
whitespace of each document (in particular the line ending convention,
the white space both *inside* the markup and inside the content).
So far, this sounds more like a plain text search-and-replace, but the
twist is that the strings should only be replaced if they match a
certain XML context (say: replace attribute name "jarfile" in any
element <jar> with attribute name "destfile", or change the entire
content of element <value>, but only when when <value> immediately
follows an element <key> with a content of "OutputFile" etc...)
I even don't know if my problem has a "canonical" name, which pretty
much precludes a meaningfull search on Google...
I know XLST can do some (all ?) of that, but :
a) These substitutions need to occur on many different XML files and the
XML contexts / search strings may differ from file to file, so I will
need many different stylesheets. (which could be generated
automatically, I suppose)
b) What guarantee do I have on binary-preservation of all whitespace ?
(BTW this "weird" requirements arises from the need to keep the ability
to make plain textual diff of those XML documents which are stored
inside a source control system)
I have also looked at SAX parsers, thinking that maybe I could rely on
event notifications, but it seems that the events are not granular
enough for my situation (eg : AFAICT, no notification will tell that I
have encountered a block of contiguous whitespace inside an element tag
and how is such a block made, for instance 3 SPC + LF + LF + TAB + TAB).
Also, the SAX parser does not seem to be able to tell me the exact
'slices' of input characters that it identified as element name,
attribute name, attribute value, whitespaces, entity reference, etc...
AFAICT, SAX will not tell me the difference between 'attr="!"' and
'attr="!"' ?
Pointers, suggestions & comments appreciated.
Regards
_______________________________________________________
François Robert
(to mail me, reverse character order in reply address)
I am looking for a way to search and replace some strings inside various
XML documents while at the same time binary-preserving all the
whitespace of each document (in particular the line ending convention,
the white space both *inside* the markup and inside the content).
So far, this sounds more like a plain text search-and-replace, but the
twist is that the strings should only be replaced if they match a
certain XML context (say: replace attribute name "jarfile" in any
element <jar> with attribute name "destfile", or change the entire
content of element <value>, but only when when <value> immediately
follows an element <key> with a content of "OutputFile" etc...)
I even don't know if my problem has a "canonical" name, which pretty
much precludes a meaningfull search on Google...
I know XLST can do some (all ?) of that, but :
a) These substitutions need to occur on many different XML files and the
XML contexts / search strings may differ from file to file, so I will
need many different stylesheets. (which could be generated
automatically, I suppose)
b) What guarantee do I have on binary-preservation of all whitespace ?
(BTW this "weird" requirements arises from the need to keep the ability
to make plain textual diff of those XML documents which are stored
inside a source control system)
I have also looked at SAX parsers, thinking that maybe I could rely on
event notifications, but it seems that the events are not granular
enough for my situation (eg : AFAICT, no notification will tell that I
have encountered a block of contiguous whitespace inside an element tag
and how is such a block made, for instance 3 SPC + LF + LF + TAB + TAB).
Also, the SAX parser does not seem to be able to tell me the exact
'slices' of input characters that it identified as element name,
attribute name, attribute value, whitespaces, entity reference, etc...
AFAICT, SAX will not tell me the difference between 'attr="!"' and
'attr="!"' ?
Pointers, suggestions & comments appreciated.
Regards
_______________________________________________________
François Robert
(to mail me, reverse character order in reply address)