XML Oddity

Discussion in 'XML' started by Mark Johnson, Mar 30, 2005.

  1. Mark Johnson

    Mark Johnson Guest

    >>DELURK<<

    Over the last few weeks, we've been working on building an online
    portfolio using XML to pass content to an HTML page via PHP. In the
    process, we've run across a rather inexplicable error which we've been
    unable to find any reference to elsewhere. Hopefully, someone who
    reads this will know what's going on and be able to provide some
    assistance.

    Here is our XML:
    http://www.uky.edu/AuxServ/creativegraphics/clients/test/portfolio_xml.txt

    Here is our HTML and PHP:
    http://www.uky.edu/AuxServ/creativegraphics/clients/test/portfolio_php.txt

    And here is the page in action:
    http://www.uky.edu/AuxServ/creativegraphics/clients/test/portfolio.php

    The problem is this: When a user clicks the third link under the
    "Digital" heading, as you can see from the XML, the following text
    ought to be displayed:

    ==begin==
    Such has been the patient sufferance of these Colonies; and such is now
    the necessity which constrains them to alter their former Systems of
    Government. The history of the present King of Great Britain [George
    III] is a history of repeated injuries and usurpations, all having in
    direct object the establishment of an absolute Tyranny over these
    States. To prove this, let Facts be submitted to a candid world. He
    has refused his Assent to Laws, the most wholesome and necessary for
    the public good. He has forbidden his Governors to pass Laws of
    immediate and pressing importance, unless suspended in their operation
    till his Assent should be obtained; and when so suspended, he has
    utterly neglected to attend to them.
    ==end==

    However, rather than that text being displayed in its entirety, the
    following is all that displays:
    ==begin==
    sing importance, unless suspended in their operation till his Assent
    should be obtained; and when so suspended, he has utterly neglected to
    attend to them.
    ==end==

    Somehow, everything prior to that point has been eaten.

    This is what we know: this error occurs in WindowsXP, MacOSX, and
    RedHat Linux. It occurs regardless of whether IE or a Gekko-based
    browser is used. It occurs regardless of what type of server the files
    are uploaded to. If all elements are edited to contain the exact same
    number of characters, the error seems to disappear, but doing so
    renders the code useless for our purposes. No other errors have been
    noted. Changing the code so that no elements are undisplayed has no
    effect. The question is this: what is causing this error, and how can
    it be avoided? Any assistance would be greatly appreciated.

    Mark Johnson
    Mark Johnson, Mar 30, 2005
    #1
    1. Advertising

  2. In message <>, Mark
    Johnson <> writes

    Caveat: I know nothing about the PHP XML parser. However, I suspect
    that the problem is a failure to separate the physical reading of input
    blocks from the logical parsing of the data they contain. My reason for
    saying this is that the truncated phrase you quote "sing importance,
    unless suspended ..." is at the start of the second 4096-byte block in
    the file.

    I would guess that the parser handed you the first part of this data
    content, you placed in your array variable, and then it handed you the
    second part ... Little suspecting this, you promptly overwrote the
    variable with this second chunk. You can easily test this hypothesis by
    changing the block size and seeing if the position of the error changes.

    If this is the case, you'll have to be a bit smarter about processing
    character data. Or get a better parser ...

    Richard Light

    >Over the last few weeks, we've been working on building an online
    >portfolio using XML to pass content to an HTML page via PHP. In the
    >process, we've run across a rather inexplicable error which we've been
    >unable to find any reference to elsewhere. Hopefully, someone who
    >reads this will know what's going on and be able to provide some
    >assistance.
    >
    >Here is our XML:
    >http://www.uky.edu/AuxServ/creativegraphics/clients/test/portfolio_xml.txt
    >
    >Here is our HTML and PHP:
    >http://www.uky.edu/AuxServ/creativegraphics/clients/test/portfolio_php.txt
    >
    >And here is the page in action:
    >http://www.uky.edu/AuxServ/creativegraphics/clients/test/portfolio.php
    >
    >The problem is this: When a user clicks the third link under the
    >"Digital" heading, as you can see from the XML, the following text
    >ought to be displayed:
    >
    >==begin==
    >Such has been the patient sufferance of these Colonies; and such is now
    >the necessity which constrains them to alter their former Systems of
    >Government. The history of the present King of Great Britain [George
    >III] is a history of repeated injuries and usurpations, all having in
    >direct object the establishment of an absolute Tyranny over these
    >States. To prove this, let Facts be submitted to a candid world. He
    >has refused his Assent to Laws, the most wholesome and necessary for
    >the public good. He has forbidden his Governors to pass Laws of
    >immediate and pressing importance, unless suspended in their operation
    >till his Assent should be obtained; and when so suspended, he has
    >utterly neglected to attend to them.
    >==end==
    >
    >However, rather than that text being displayed in its entirety, the
    >following is all that displays:
    >==begin==
    >sing importance, unless suspended in their operation till his Assent
    >should be obtained; and when so suspended, he has utterly neglected to
    >attend to them.
    >==end==
    >
    >Somehow, everything prior to that point has been eaten.
    >
    >This is what we know: this error occurs in WindowsXP, MacOSX, and
    >RedHat Linux. It occurs regardless of whether IE or a Gekko-based
    >browser is used. It occurs regardless of what type of server the files
    >are uploaded to. If all elements are edited to contain the exact same
    >number of characters, the error seems to disappear, but doing so
    >renders the code useless for our purposes. No other errors have been
    >noted. Changing the code so that no elements are undisplayed has no
    >effect. The question is this: what is causing this error, and how can
    >it be avoided? Any assistance would be greatly appreciated.
    >
    >Mark Johnson
    >


    --
    Richard Light
    SGML/XML and Museum Information Consultancy
    Richard Light, Mar 31, 2005
    #2
    1. Advertising

  3. Richard Light () wrote:
    : In message <>, Mark
    : Johnson <> writes

    : Caveat: I know nothing about the PHP XML parser. However, I suspect
    : that the problem is a failure to separate the physical reading of input
    : blocks from the logical parsing of the data they contain. My reason for
    : saying this is that the truncated phrase you quote "sing importance,
    : unless suspended ..." is at the start of the second 4096-byte block in
    : the file.

    : I would guess that the parser handed you the first part of this data
    : content, you placed in your array variable, and then it handed you the
    : second part ... Little suspecting this, you promptly overwrote the
    : variable with this second chunk. You can easily test this hypothesis by
    : changing the block size and seeing if the position of the error changes.

    : If this is the case, you'll have to be a bit smarter about processing
    : character data. Or get a better parser ...
    ^^^^^^^^^^^^^^^^^^^^^

    sounds like a likely scenario

    however that doesn't mean there's anything wrong with the parser. a SAX
    parser has no requirement to feed all of some contiguous character data in
    a single call, and in fact a parser that did so could be considered a
    problem.

    Imagine if I had an xml document that had a giga byte of contiguous
    character data. One of the points of the SAX parser is that it can feed
    that data to the handler in smaller, more memory efficient chunks, and not
    have to load the entire string in to memory.




    --

    This space not for rent.
    Malcolm Dew-Jones, Mar 31, 2005
    #3
  4. In message <>, Malcolm Dew-Jones
    <> writes

    >however that doesn't mean there's anything wrong with the parser. a SAX
    >parser has no requirement to feed all of some contiguous character data in
    >a single call, and in fact a parser that did so could be considered a
    >problem.
    >
    >Imagine if I had an xml document that had a giga byte of contiguous
    >character data. One of the points of the SAX parser is that it can feed
    >that data to the handler in smaller, more memory efficient chunks, and not
    >have to load the entire string in to memory.


    I would agree with that principle entirely. However, from a software
    engineering point of view, I would expect as the user of such a parser
    to be able to control the "text chunk" size, and not have character data
    cut into arbitrary chunks based on where the block boundaries in the
    input stream happen to fall.

    Richard
    --
    Richard Light
    SGML/XML and Museum Information Consultancy
    Richard Light, Mar 31, 2005
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    3
    Views:
    504
  2. Edwin Knoppert
    Replies:
    0
    Views:
    353
    Edwin Knoppert
    Dec 29, 2005
  3. nooobody

    bean reflection oddity

    nooobody, Feb 20, 2005, in forum: Java
    Replies:
    4
    Views:
    605
    nooobody
    Feb 20, 2005
  4. darrel

    XML writer oddity

    darrel, May 23, 2006, in forum: ASP .Net
    Replies:
    0
    Views:
    312
    darrel
    May 23, 2006
  5. Rob

    XML::Simple oddity

    Rob, Feb 15, 2006, in forum: Perl Misc
    Replies:
    2
    Views:
    100
    Mahesh Asolkar
    Feb 16, 2006
Loading...

Share This Page