truncating specific lines from xml

Discussion in 'XML' started by foolproofplan@gmail.com, Jan 30, 2007.

  1. Guest

    I have a somewhat simple task I need to do, but since I am new at xml,
    I need help:

    Right now, I have xml files that are output from tests I do with an
    automated testing program. I want to compare these files back to the
    originals I have, but there is one little complication: the xml files
    have lines of code added in them with unique ids which are included in
    the xml file when it is run. These unique ids are currently throwing
    off the xml tester. How can I go about getting rid of these lines of
    unique ids so that the files compared are the same again?

    Thanks in advance!
    , Jan 30, 2007
    #1
    1. Advertising

  2. Guest

    On Jan 30, 5:01 pm, wrote:
    > Right now, I have xml files that are output from tests I
    > do with an automated testing program. I want to compare
    > these files back to the originals I have, but there is
    > one little complication: the xml files have lines of code
    > added in them with unique ids which are included in the
    > xml file when it is run. These unique ids are currently
    > throwing off the xml tester. How can I go about getting
    > rid of these lines of unique ids so that the files
    > compared are the same again?


    You question is pretty much impossible to answer as it is.
    You should've provided some (possibly simplified) examples
    to get your meaning across to group readers. For one thing,
    speaking of 'lines' in XML is quite meaningless.

    It sounds as if XSLT would fit the bill, but that would
    depend on some factors. If you need to remove some easily
    distinguishable nodes, there probably isn't a better
    solution than XSLT identity with exclusions. But in case
    the stuff you need removed is buried within the text nodes,
    XSLT suddenly becomes a much less attractive proposition--
    it's just not that good at juggling strings, it was never
    meant for that.

    --
    Pavel Lepin
    , Jan 30, 2007
    #2
    1. Advertising

  3. Andy Dingley Guest

    On 30 Jan, 15:01, wrote:
    > How can I go about getting rid of these lines of
    > unique ids so that the files compared are the same again?


    You need to suppress these ids (and datestamps / usernames etc.) and
    also to canonicalise the XML serialisation. Ideally we wouldn't need
    to do the second, we'd just just use an XML-aware comparison tool.
    However you're probably using some old unix command-line textfile
    comparator that doesn't understand XML whitespace equivalence.
    Serialise it first to something with each tag unindented on its own
    line, and a repeatable text format output for comparable XML input.
    XSLT can do this.

    Run them through XSLT, using the "identity copy" template (search for
    it) modified to recognise the ids and to output nothing for them
    Andy Dingley, Jan 30, 2007
    #3
  4. Guest

    The tester is using a python script (which i did not create) to
    compare the xml files. Is there the way we can work with this?

    On Jan 30, 11:21 am, "Andy Dingley" <> wrote:
    > On 30 Jan, 15:01, wrote:
    >
    > > How can I go about getting rid of these lines of
    > > unique ids so that the files compared are the same again?You need to suppress these ids (and datestamps / usernames etc.) and

    > also to canonicalise the XML serialisation. Ideally we wouldn't need
    > to do the second, we'd just just use an XML-aware comparison tool.
    > However you're probably using some old unix command-line textfile
    > comparator that doesn't understand XML whitespace equivalence.
    > Serialise it first to something with each tag unindented on its own
    > line, and a repeatable text format output for comparable XML input.
    > XSLT can do this.
    >
    > Run them through XSLT, using the "identity copy" template (search for
    > it) modified to recognise the ids and to output nothing for them
    , Jan 30, 2007
    #4
  5. Guest

    here is an example of two xml files that are exactly the same, except
    for the fact that they have different ids:

    XML file one:

    <?xml version="1.0" encoding="UTF-8"?>

    <EnCapta>
    <Document type="Part" id=":1156453195:1262379012:" name="New
    Document" >
    <FileName>\New Document</FileName>
    <Unit/>
    <ApplicationData id=":1156453207:1327785362:" name="CAD_Note" >
    <ApplicationReference id_ref=":91005593:790373312:" >
    <Name>CAD_Note</Name>
    <MajorVersion>0</MajorVersion>
    <MinorVersion>0</MinorVersion>
    </ApplicationReference>
    <Note template_id=":96227828:304003723:" id=":
    1156453207:1116306377:" name="Note1" >
    <Name type="FixedString" >Note1</Name>
    <Author type="FixedString" >SHO</Author>
    <CreationDate type="DateTime" >2006-08-24T17:00:07</CreationDate>
    <ModificationDate type="DateTime" >2006-08-24T17:00:07</
    ModificationDate>
    <RelatingTo type="FixedString" >Engineering</RelatingTo>
    <Description type="String" >1234</Description>
    </Note>
    </ApplicationData>
    </Document>
    </EnCapta>

    XML file two:

    <?xml version="1.0" encoding="UTF-8"?>

    <EnCapta>
    <Document type="Part" id=":1170176183:1209286222:" name="New
    Document" >
    <FileName>\New Document</FileName>
    <Unit/>
    <ApplicationData id=":1170176190:357510851:" name="CAD_Note" >
    <ApplicationReference id_ref=":91005593:790373312:" >
    <Name>CAD_Note</Name>
    <MajorVersion>0</MajorVersion>
    <MinorVersion>0</MinorVersion>
    </ApplicationReference>
    <Note template_id=":96227828:304003723:" id=":
    1170176190:655829958:" name="Note1" >
    <Name type="FixedString" >Note1</Name>
    <Author type="FixedString" >SHO</Author>
    <CreationDate type="DateTime" >2000-01-01T12:00:01</CreationDate>
    <ModificationDate type="DateTime" >2000-01-01T12:00:01</
    ModificationDate>
    <RelatingTo type="FixedString" >Engineering</RelatingTo>
    <Description type="String" >1234</Description>
    </Note>
    </ApplicationData>
    </Document>
    </EnCapta>

    On Jan 30, 10:29 am, wrote:
    > On Jan 30, 5:01 pm, wrote:
    >
    > > Right now, I have xml files that are output from tests I
    > > do with an automated testing program. I want to compare
    > > these files back to the originals I have, but there is
    > > one little complication: the xml files have lines of code
    > > added in them with unique ids which are included in the
    > > xml file when it is run. These unique ids are currently
    > > throwing off the xml tester. How can I go about getting
    > > rid of these lines of unique ids so that the files
    > > compared are the same again?You question is pretty much impossible to answer as it is.

    > You should've provided some (possibly simplified) examples
    > to get your meaning across to group readers. For one thing,
    > speaking of 'lines' in XML is quite meaningless.
    >
    > It sounds as if XSLT would fit the bill, but that would
    > depend on some factors. If you need to remove some easily
    > distinguishable nodes, there probably isn't a better
    > solution than XSLT identity with exclusions. But in case
    > the stuff you need removed is buried within the text nodes,
    > XSLT suddenly becomes a much less attractive proposition--
    > it's just not that good at juggling strings, it was never
    > meant for that.
    >
    > --
    > Pavel Lepin
    , Jan 30, 2007
    #5
  6. Andy Dingley Guest

    On 30 Jan, 16:58, wrote:
    > The tester is using a python script (which i did not create) to
    > compare the xml files. Is there the way we can work with this?


    Use XSLT first, as I described.

    Or re-write the Python comparator so as to ignore the ids as well as
    any other XML whitespace it presumably already ignores.
    Andy Dingley, Jan 30, 2007
    #6
  7. Guest

    Please don't top-post. Top-posting fixed.

    On Jan 30, 7:03 pm, wrote:
    > On Jan 30, 10:29 am, wrote:
    > > On Jan 30, 5:01 pm, wrote:
    > > > Right now, I have xml files that are output from
    > > > tests I do with an automated testing program. I want
    > > > to compare these files back to the originals I have

    > >
    > > You question is pretty much impossible to answer as it
    > > is. You should've provided some (possibly simplified)
    > > examples to get your meaning across to group readers.
    > > For one thing, speaking of 'lines' in XML is quite
    > > meaningless.

    >
    > > It sounds as if XSLT would fit the bill

    >
    > here is an example of two xml files that are exactly the
    > same, except for the fact that they have different ids:


    [snip]

    It seems it wouldn't be possible without transforming both
    files (unless you're willing to write a tool for comparing
    them in XSLT). The following transformation strips the id
    attributes from all elements:

    <xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="@*|node()">
    <xsl:copy>
    <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
    </xsl:template>
    <xsl:template match="@id"/>
    </xsl:stylesheet>

    Testing results:

    pavel@debian:~/dev/xslt$ saxon -novw test1.xml strip_id.xsl
    >test1_prc.xml

    pavel@debian:~/dev/xslt$ saxon -novw test2.xml strip_id.xsl
    >test2_prc.xml

    pavel@debian:~/dev/xslt$ diff test1_prc.xml test2_prc.xml
    14,15c14,15
    < <CreationDate type="DateTime">2006-08-24T17:00:07</CreationDate>
    < <ModificationDate type="DateTime">2006-08-24T17:00:07</
    ModificationDate>
    ---
    > <CreationDate type="DateTime">2000-01-01T12:00:01</CreationDate>
    > <ModificationDate type="DateTime">2000-01-01T12:00:01</ModificationDate>


    Uh oh. It seems there are a couple more differences in
    those files. Anyway, if you know precisely what you need
    stripped, the transformation given about should serve as a
    good starting point.

    --
    Pavel Lepin
    , Jan 31, 2007
    #7
  8. wrote:
    > It seems it wouldn't be possible without transforming both
    > files (unless you're willing to write a tool for comparing
    > them in XSLT).


    Or in another programming language, eg by using a SAX or DOM parser and
    writing a parallel tree-walker that understands which differences are
    meaningful and which aren't.

    Note that a text diff is often not the right tool anyway, because there
    are things which XML itself doesn't consider meaningful -- order of
    attributes, whitespace in some places, that sort of thing. So if you're
    doing a serious test suite, you usually wind up having to write some
    special-purpose code anyway, or find something you can swipe for the
    purpose.

    For example: You might want to look at the compare code used in the
    Xalan processor's regression test suite, and either adapt that to also
    ignore the things you don't consider meaningful or (as Pavel suggested)
    preprocess those away before comparing. Another approach I've seen
    (which again would require preprocessing) involved canonicalizing the
    two documents (which theoretically suppresses most or all of the
    insignificant differences) and then doing a text diff against the results.


    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Jan 31, 2007
    #8
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Jitesh Sinha
    Replies:
    1
    Views:
    619
    Munsifali Rashid
    Dec 5, 2003
  2. Jason Williard

    Truncating Variables

    Jason Williard, Oct 15, 2004, in forum: ASP .Net
    Replies:
    11
    Views:
    5,334
    Kevin Spencer
    Oct 18, 2004
  3. uy_do
    Replies:
    1
    Views:
    664
    Harish Madhavan
    Dec 4, 2003
  4. BemusedByQM

    truncating java 'doubles'

    BemusedByQM, Jul 24, 2005, in forum: Java
    Replies:
    2
    Views:
    17,406
    jfalt
    Jul 25, 2005
  5. mazdotnet
    Replies:
    2
    Views:
    387
    Alexey Smirnov
    Oct 2, 2009
Loading...

Share This Page