XML Not good for Big Files (vs Flat Files)

Discussion in 'Java' started by Homer, Apr 4, 2006.

  1. Homer

    Homer Guest

    I am a little bit tired of this obsession people have with XML and XML
    technology. Please share your thoughts and let me know if I am thinking
    in a wrong way. I believe some people are over using XML all over the
    place. Nowadays Canadian Government is pushing XML to its organization
    as standard for data/file transfer. Huge files moving between companies
    now include tones of XML Tags repeating all over the file and slowing
    down networks and crashing applications because of size.
    I am not objecting to the whole technology. I know advantages of XML
    and using it all the times for Config files or our web oriented
    applications but using it as standard for moving big files is going too
    far. Here is the example:

    John,Smith,5555555,37 Finch Ave.

    Is now:

    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <PhoneNum>5555555</PhoneNum>
    <Address>37 Finch Ave.</Address>

    And Tags are repeating and repeating:

    <FirstName>....</FirstName>
    <LastName>....</LastName>
    <PhoneNum>....</PhoneNum>
    <Address>....</Address>

    <FirstName>....</FirstName>
    <LastName>....</LastName>
    <PhoneNum>....</PhoneNum>
    <Address>....</Address>


    Please let me know what you think.


    Regards,

    Homer
     
    Homer, Apr 4, 2006
    #1
    1. Advertising

  2. Homer

    James McGill Guest

    On Tue, 2006-04-04 at 08:27 -0700, Homer wrote:
    >
    > And Tags are repeating and repeating:


    XML markup does tend to bloat the data.

    I personally believe you should use serializable objects that can be
    represented according to an XML schema when that's appropriate, but that
    also can be serialized into a tightly packed format when that is
    appropriate as well. So I should be able to marshal/unmarshal the
    serialized object to and from XML, but I should also be able to stream
    that object without marshalling it -- and the other end should be able
    to unmarshal to xml, validate according to the schema, etc.

    Likewise, database bindings should be informed by the xml schema, but
    the XML markup shouldn't be what you store in the db.
     
    James McGill, Apr 4, 2006
    #2
    1. Advertising

  3. Homer

    mtp Guest

    Homer wrote:
    > I am a little bit tired of this obsession people have with XML and XML
    > technology. Please share your thoughts and let me know if I am thinking
    > in a wrong way. I believe some people are over using XML all over the
    > place. Nowadays Canadian Government is pushing XML to its organization
    > as standard for data/file transfer. Huge files moving between companies
    > now include tones of XML Tags repeating all over the file and slowing
    > down networks and crashing applications because of size.


    you can use indexing, binary XML, or compression

    > I am not objecting to the whole technology. I know advantages of XML
    > and using it all the times for Config files or our web oriented
    > applications but using it as standard for moving big files is going too
    > far. Here is the example:
    >
    > John,Smith,5555555,37 Finch Ave.
    >
    > Is now:
    >
    > <FirstName>John</FirstName>
    > <LastName>Smith</LastName>
    > <PhoneNum>5555555</PhoneNum>
    > <Address>37 Finch Ave.</Address>
    >
    > And Tags are repeating and repeating:
    >
    > <FirstName>....</FirstName>
    > <LastName>....</LastName>
    > <PhoneNum>....</PhoneNum>
    > <Address>....</Address>
    >
    > <FirstName>....</FirstName>
    > <LastName>....</LastName>
    > <PhoneNum>....</PhoneNum>
    > <Address>....</Address>
    >
    >
    > Please let me know what you think.


    may be one of the computing service wanted more money for his service
    with this big project ?

    may be everybody think "newer is better" ?
     
    mtp, Apr 4, 2006
    #3
  4. Homer

    Guest

    Homer wrote:
    > I am a little bit tired of this obsession people have with XML and XML
    > technology. Please share your thoughts and let me know if I am thinking
    > in a wrong way. I believe some people are over using XML all over the
    > place. Nowadays Canadian Government is pushing XML to its organization
    > as standard for data/file transfer. Huge files moving between companies
    > now include tones of XML Tags repeating all over the file and slowing
    > down networks and crashing applications because of size.
    > I am not objecting to the whole technology. I know advantages of XML
    > and using it all the times for Config files or our web oriented
    > applications but using it as standard for moving big files is going too
    > far. Here is the example:
    >
    > John,Smith,5555555,37 Finch Ave.
    >
    > Is now:
    >
    > <FirstName>John</FirstName>
    > <LastName>Smith</LastName>
    > <PhoneNum>5555555</PhoneNum>
    > <Address>37 Finch Ave.</Address>
    >
    > And Tags are repeating and repeating:
    >
    > <FirstName>....</FirstName>
    > <LastName>....</LastName>
    > <PhoneNum>....</PhoneNum>
    > <Address>....</Address>
    >
    > <FirstName>....</FirstName>
    > <LastName>....</LastName>
    > <PhoneNum>....</PhoneNum>
    > <Address>....</Address>
    >
    >
    > Please let me know what you think.
    >
    >
    > Regards,
    >
    > Homer


    Yes that does seem like a network killer. It depends on what the
    intended use of the file is, on the other end and the client receiving
    it, if they *have to* use XML, certain optimizations can be done for
    just the transfer part...

    <header>
    <firstName>A15</firstName>
    <lastName>A15</lastName>
    <phone>A10</phone>
    <address>A10</address>
    </header>
    <data>
    [[CDATA
    <!-- fixed width data goes here -->
    ]]
    </data>

    OR

    <header>
    <fieldSeparator>;</fieldSeparator>
    <field>firstName</field>
    <field>lastName</field>
    <field>phone</field>
    <field>address</field>
    </header>
    <data>
    [[CDATA
    <!-- delimited data goes here -->
    ]]
    </data>

    OR a combination of the above.

    In short, XML should be preferred only if documentation and
    discoverability are more important than performance.
     
    , Apr 4, 2006
    #4
  5. Homer

    RC Guest

    Homer wrote:


    > Please let me know what you think.


    XML is never designed to replace database server.

    You can use XML file transfer portion of data
    from a database.
    i.e.

    SELECT lastname,fistname,phonenumber,address
    FROM phonebook
    WHERE state = 'NY' AND city = 'somewhere';

    A flat file like this

    William|John|12345678|84 5th Ave

    I don't know which column is last name, first name.
    3rd column is person ID or phone number?

    You need let the programmers know what column is what.

    Next time if some one change flat file format to

    85 5th Ave|John|William|12345678

    Then your database will incorrect after updated.


    True XML creates large file size.
    But it makes our life easier.

    You can make up your own tags
    <lastName> or <Last_Name>, etc.
    the tags can be in English, Spanish, French, Russian, Japanese, etc.
     
    RC, Apr 4, 2006
    #5
  6. Homer

    James McGill Guest

    On Tue, 2006-04-04 at 09:06 -0700, wrote:
    >
    > OR a combination of the above.


    You're almost touching on the big problem: Misconception of what it
    means to be "standard".

    XML has (several) standardized markup frameworks, but it is silent as to
    content or utilization. It is ridiculous for a government entity to
    demand that "XML" be "the standard" for data interchange. They need to
    bless certain schemas if that's their goal, but it also needs to be
    abstract enough that systems can be designed efficiently.

    In your examples, the designers can claim that they are using "XML", and
    therefore "are standardized" on it, but the three examples we've seen so
    far are not at all interchangeable...
     
    James McGill, Apr 4, 2006
    #6
  7. Homer

    Timbo Guest

    Homer wrote:
    > John,Smith,5555555,37 Finch Ave.
    >
    > Is now:
    >
    > <FirstName>John</FirstName>
    > <LastName>Smith</LastName>
    > <PhoneNum>5555555</PhoneNum>
    > <Address>37 Finch Ave.</Address>
    >

    It's true that the XML data in your example is bulky, but what it
    has that the unstructured doesn't have is meta-level information,
    such as "John" the first name of someone. If the parties involved
    (ie. that sender and receiver of this information) have an
    agreement as to the meaning of "FirstName", then they are sharing
    more than just text... it has some implicit meaning. If you send
    it unstructured, then the receiver has to know how to parse the
    data into this agreed meaning, which means it needs to know the
    format of the data.

    Then, on the other hand, if the data is just stored in a database
    or something with no definition of the what the tags mean, then I
    agree with you... using XML is of little use.
     
    Timbo, Apr 4, 2006
    #7
  8. Homer

    Oliver Wong Guest

    "Homer" <> wrote in message
    news:...
    >I am a little bit tired of this obsession people have with XML and XML
    > technology. Please share your thoughts and let me know if I am thinking
    > in a wrong way. I believe some people are over using XML all over the
    > place. Nowadays Canadian Government is pushing XML to its organization
    > as standard for data/file transfer. Huge files moving between companies
    > now include tones of XML Tags repeating all over the file and slowing
    > down networks and crashing applications because of size.
    > I am not objecting to the whole technology. I know advantages of XML
    > and using it all the times for Config files or our web oriented
    > applications but using it as standard for moving big files is going too
    > far. Here is the example:
    >
    > John,Smith,5555555,37 Finch Ave.
    >
    > Is now:
    >
    > <FirstName>John</FirstName>
    > <LastName>Smith</LastName>
    > <PhoneNum>5555555</PhoneNum>
    > <Address>37 Finch Ave.</Address>
    >
    > And Tags are repeating and repeating:
    >
    > <FirstName>....</FirstName>
    > <LastName>....</LastName>
    > <PhoneNum>....</PhoneNum>
    > <Address>....</Address>
    >
    > <FirstName>....</FirstName>
    > <LastName>....</LastName>
    > <PhoneNum>....</PhoneNum>
    > <Address>....</Address>
    >
    >
    > Please let me know what you think.


    If your complaint is file size during network transfer, compress the
    file before sending it.

    If your complaint is file size during parsing, use SAX instead of DOM,
    and don't keep the whole file in memory at once.

    Use the right tool for the job. If for whatever problem you're trying to
    solve, you've got a better tool than XML, then use it. But if the problem is
    "The government requires me to use XML", then I can't think of a better tool
    than XML to solve that particular problem (except maybe emmigration ;)).

    - Oliver
     
    Oliver Wong, Apr 4, 2006
    #8
  9. Homer

    James McGill Guest

    On Tue, 2006-04-04 at 16:44 +0000, Oliver Wong wrote:

    > except maybe emmigration


    You say that as though anyone would ever leave the utopian paradise that
    is Canada...
     
    James McGill, Apr 4, 2006
    #9
  10. "Homer" <> writes:

    > I am a little bit tired of this obsession people have with XML and XML
    > technology.


    Hear hear!
    Seems some people think XML is the solution to all problems.
    I'd rather classify it as the lowest common denominator for exchanging
    tree-structured data - and definitly not something fit for humans to
    read or write directly.

    > John,Smith,5555555,37 Finch Ave.
    >
    > Is now:
    >
    > <FirstName>John</FirstName>
    > <LastName>Smith</LastName>
    > <PhoneNum>5555555</PhoneNum>
    > <Address>37 Finch Ave.</Address>
    >
    > And Tags are repeating and repeating:


    > Please let me know what you think.


    Apart from what everybody else have said, zipping such a file
    should yield a *very* high compression factor.

    /L
    --
    Lasse Reichstein Nielsen -
    DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
    'Faith without judgement merely degrades the spirit divine.'
     
    Lasse Reichstein Nielsen, Apr 4, 2006
    #10
  11. Homer

    Joe Attardi Guest

    > John,Smith,5555555,37 Finch Ave.
    >
    > Is now:
    >
    > <FirstName>John</FirstName>
    > <LastName>Smith</LastName>
    > <PhoneNum>5555555</PhoneNum>
    > <Address>37 Finch Ave.</Address>


    Yes but, now we know what all the data means. Your example is quite
    clear, but what about this one:

    Lawrence,David,Maynard,MA

    Could mean several things:
    (1) Lawrence David lives in Maynard, MA.
    (2) David Lawrence lives in Maynard, MA
    (3) David Maynard lives in Lawrence, MA
    (4) Maynard David lives in Lawrence, MA
    etc. You see where I'm going with this.

    Where
    <FirstName>Lawrence</FirstName>
    <LastName>David</LastName>
    <City>Maynard</City>
    <State>MA</State>

    leaves no question.

    Yes, we as humans know intuitively that city and state go together. But
    for an application using this data, there has to be some specification
    defined and all systems that use it must be aware of it.
     
    Joe Attardi, Apr 4, 2006
    #11
  12. Homer

    Homer Guest

    I guess these responses are proving of my point. You know all that the
    best solution for transferring huge files between two parties is simple
    flat file that both sender/receiver have agreed upon file format and
    using secure line. But you still defend adding tons of tags to a file
    that both sender/receiver are familiar with the format. I believe lots
    of people are using XML because it's cool and new. And these people
    give advise to companies and organizations.

    Some points about your suggestions:

    1- Marshalling/Object Stream: Too Advance for places like government.
    2- Have Mixed XML/Raw Data: Then what is the point of having XML at the
    top? Unless you are sending the file to an unknown place that doesn't
    know what is getting.
    3- Compression: There is no good standard for compression (Unix is not
    really ZIP friendly unless you add some opensource or buy Zip product)
    and Mainframe is another story. Even for Windows you need to buy the
    product (or use open source that most companies don't like). Also why
    make file size triple and then compress it?


    Let me give you another example of coolness (sorry, it's a bit off
    the topic but it's about coolness):

    I got a job in telecommunication company (cell phone) to convert their
    code from C to C++ because OO was so cool those days but application
    was working with no problem.
    I did my job, converted the code/building class library for one year,
    and left the company.

    One year later they hired bunch of other people to come and convert the
    whole thing to Java because Java was the Best.

    3 years later they hired me again to convert everything again to J2EE
    because J2EE is (guess what) the Best.


    Regards,

    Homer
     
    Homer, Apr 4, 2006
    #12
  13. Homer

    James McGill Guest

    On Tue, 2006-04-04 at 11:08 -0700, Homer wrote:
    > I believe lots
    > of people are using XML because it's cool and new.


    It's anything but "cool". And as for it being "new", XML isn't old
    enough to vote, but SGML is. If you aren't seeing the benefits of
    logical structure and validation, standardized processing, etc.,
    that may be because you aren't exploiting those things in your
    application.

    One of your complaints is directly counter to an explicit design goal,
    from the beginning of the XML spec: "Terseness in XML markup is of
    minimal importance."

    XML markup is deliberately intended to favor clarity to conciseness.

    But most of your complaint seems to derive from the fact that you work
    in a bureaucratic government situation, where you have no authority to
    make decisions, and where there is a limited backchannel for your
    recommendations. That is unfortunate, but isn't it a choice you made
    when you went to work for a government?

    I've always been led to believe that the Canadian government is a
    prototype of efficiency and reason, one that should make Americans feel
    ashamed. Are you suggesting that it too may be clogged with
    bureaucratic nonsense? I would be shocked to hear that!
     
    James McGill, Apr 4, 2006
    #13
  14. Homer

    Homer Guest

    Very good guess but no, I don't work for government. All I am saying
    is in these cases sender and receiver both knows the file format by
    heart. They know and their application knows. That's how they were
    moving files in past and if they want to establish a new file transfer
    they will let each other know about upcoming file format for sure.
    There is no reason to send the file format along with each file every
    time they have a file transfer (unless you are wearing name tag in your
    home so your family know your name).
     
    Homer, Apr 4, 2006
    #14
  15. Homer

    James McGill Guest

    On Tue, 2006-04-04 at 12:06 -0700, Homer wrote:
    > All I am saying
    > is in these cases sender and receiver both knows the file format by
    > heart. They know and their application knows.


    The interesting thing with XML is that in its case, the *document*
    knows. In a well designed system, the DTD can change and applications
    can cope.

    >There is no reason to send the file format along with each file every
    >time they have a file transfer


    But you aren't sending the file format. You're sending a notice with a
    URI that locatest the format (schema, dtd, etc.), and then sending data
    that's marked up according to that format.

    >(unless you are wearing name tag in your
    >home so your family know your name).


    Or like wearing a badge at a workplace, perhaps?
     
    James McGill, Apr 4, 2006
    #15
  16. Homer

    Daniel Dyer Guest

    On Tue, 04 Apr 2006 22:15:58 +0200, Roedy Green
    <> wrote:

    > On 4 Apr 2006 08:27:51 -0700, "Homer" <> wrote,
    > quoted or indirectly quoted someone who said :
    >
    >> <FirstName>....</FirstName>
    >> <LastName>....</LastName>
    >> <PhoneNum>....</PhoneNum>
    >> <Address>....</Address>
    >>
    >>
    >> Please let me know what you think.

    >
    > see http://mindprod.com/jgloss/xml.html
    >
    > Pay particular attention to the images and the XML "logo".
    >
    > XML needs a binary format both for compactness and automatic format
    > compliance.


    http://www.w3.org/XML/Binary/
    http://asn1.elibel.tm.fr/xml/

    Dan.


    --
    Daniel Dyer
    http://www.dandyer.co.uk
     
    Daniel Dyer, Apr 4, 2006
    #16
  17. Homer wrote:
    > I guess these responses are proving of my point. You know all that the
    > best solution for transferring huge files between two parties is simple
    > flat file that both sender/receiver have agreed upon file format and
    > using secure line. But you still defend adding tons of tags to a file
    > that both sender/receiver are familiar with the format. I believe lots
    > of people are using XML because it's cool and new. And these people
    > give advise to companies and organizations.
    >

    Here's another thought: use ASN.1 encoding. Have a look here
    <http://asn1.elibel.tm.fr/> if you haven't heard of it.

    It does virtually everything XML does in terms of tagged fields and the
    ability to completely omit optional fields and structures, but it uses
    binary tags and can encapsulate binary data. Like XML you can take a
    data description (written in BNF notation) and use it to generate file
    encoders and decoders, or you can write fast interpretive decoders (as I
    have). Its a standard in the telecoms industry, where its routinely used
    to transfer multi-megabyte files as well as individual short messages.

    Java ASN.1 schema compilers are available.

    Translating a file between ASN.1 and XML should be a doddle: the site I
    mentioned has a tool for doing just that.


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
     
    Martin Gregorie, Apr 4, 2006
    #17
  18. Homer

    Joe Attardi Guest

    Homer wrote:
    > I believe lots of people are using XML because it's cool and new. And these people
    > give advise to companies and organizations.

    XML isn't new. It's been around almost ten years. The first working
    draft for the XML spec was put together in November of 1996.

    > 3- Compression: There is no good standard for compression (Unix is not
    > really ZIP friendly unless you add some opensource or buy Zip product)

    Gzip? In fact IIRC, the gzip algorithm takes advantage of strings that
    are repeated over and over (like the tag names) that help with its
    compression.

    > (or use open source that most companies don't like).

    That most companies don't like? I don't think you researched this much
    before making this statement. Look how many of the huge players (Sun,
    IBM, etc.) have strong support for open source. In addition, open
    source is being adopted all over the place.

    > Let me give you another example of coolness (sorry, it's a bit off
    > the topic but it's about coolness):

    It's not just because XML is "the cool thing". It's perfectly suited
    for the exchange of data like this. The data describes itself!
     
    Joe Attardi, Apr 4, 2006
    #18
  19. On 2006-04-04, Homer penned:
    > I guess these responses are proving of my point. You know all that
    > the best solution for transferring huge files between two parties is
    > simple flat file that both sender/receiver have agreed upon file
    > format and using secure line. But you still defend adding tons of
    > tags to a file that both sender/receiver are familiar with the
    > format.


    I guess that you are wrong. I guess that the word "best" is meaningless
    unless it is qualified by something. If you want a format that is best
    at clarity, then flat files lose. I guess that you don't really
    understand when to use XML, and that it doesn't really matter because
    you don't have the authority to change things in the environment in
    which it's causing you trouble, so you've developed a grudge against
    XML rather than against whoever decided to use it inappropriately or
    whoever decided to create an excessively verbose schema.

    > I believe lots of people are using XML because it's cool and
    > new. And these people give advise to companies and organizations.


    XML isn't new enough to offer the glamour factor you think it has.

    --
    monique

    Ask smart questions, get good answers:
    http://www.catb.org/~esr/faqs/smart-questions.html
     
    Monique Y. Mudama, Apr 4, 2006
    #19
  20. Homer

    Roedy Green Guest

    On 4 Apr 2006 08:27:51 -0700, "Homer" <> wrote,
    quoted or indirectly quoted someone who said :

    ><FirstName>....</FirstName>
    ><LastName>....</LastName>
    ><PhoneNum>....</PhoneNum>
    ><Address>....</Address>
    >
    >
    >Please let me know what you think.


    see http://mindprod.com/jgloss/xml.html

    Pay particular attention to the images and the XML "logo".

    XML needs a binary format both for compactness and automatic format
    compliance.
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
     
    Roedy Green, Apr 4, 2006
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. hakhan
    Replies:
    0
    Views:
    427
    hakhan
    Oct 19, 2004
  2. Shaguf
    Replies:
    0
    Views:
    513
    Shaguf
    Dec 24, 2008
  3. Shaguf
    Replies:
    0
    Views:
    479
    Shaguf
    Dec 26, 2008
  4. Shaguf
    Replies:
    0
    Views:
    257
    Shaguf
    Dec 26, 2008
  5. Shaguf
    Replies:
    0
    Views:
    236
    Shaguf
    Dec 24, 2008
Loading...

Share This Page