Edit large xml files

Discussion in 'XML' started by billsahiker@yahoo.com, Feb 18, 2008.

  1. Guest

    What edtor do you use for large(50MB+) xml files that checks well-
    formedness?

    I am looking for something that can quickly load, edit and save large
    xml files and makes sure it is well-formed. I have some that are over
    100MB, some with 200K nodes. These are computer generated files (come
    to us from large financial instituions) but many have content errors -
    markup seems to be correct, but data is wrong and the errors are only
    in the first few lines of the file. We drop them into sql server and
    look for the errors -we know pretty much what to look for. But I have
    to hand the files over to IT and it takes time before I can access
    the file. Since he errors are only in the beginning of the file, it
    would be nice to not have to go through that process and just edit
    them with an xml editor.

    Has anyone successfully edited, with an xml editor, files this large?

    Bill
     
    , Feb 18, 2008
    #1
    1. Advertising

  2. schrieb:

    > What edtor do you use for large(50MB+) xml files that checks well-
    > formedness?



    Are you sure you need an editor to check for well-formedness ?
    I use xmllint for checking:

    xmllint --noout file.xml

    Other readers recommend some other tools.
    This is a FAQ here.
     
    Jürgen Kahrs, Feb 18, 2008
    #2
    1. Advertising

  3. Guest

    As I stated, " many have content errors ." I need to correct those and
    when I save the file, I want to make sure I did not mess up the markup
    or do anything that would make it not well-formed.

    Bill

    On Feb 18, 10:31 am, Jürgen Kahrs <>
    wrote:
    > schrieb:
    >
    > > What edtor do you use for large(50MB+) xml files that checks well-
    > > formedness?

    >
    > Are you sure you need an editor to check for well-formedness ?
    > I use xmllint for checking:
    >
    >   xmllint --noout file.xml
    >
    > Other readers recommend some other tools.
    > This is a FAQ here.
     
    , Feb 18, 2008
    #3
  4. wrote:

    > As I stated, " many have content errors ." I need to correct those and
    > when I save the file, I want to make sure I did not mess up the markup
    > or do anything that would make it not well-formed.


    OK, so I suggest this approach:

    WHILE xmllint --noout file.xml report errors
    DO
    edit file.xml with vi or emacs or whatever ASCII editor
    END

    This works with files that contain several 100 MB of XML data.
     
    Jürgen Kahrs, Feb 18, 2008
    #4
  5. wrote:
    >
    > I am looking for something that can quickly load, edit and save large
    > xml files and makes sure it is well-formed.
    > (...)
    > Has anyone successfully edited, with an xml editor, files this large?
    >


    I would use vi for editing and xmlwf (or xmllint) for checking
    well-formedness.

    Opening a 160M xml file, editing something in the first lines and saving
    the file back takes me less than 1 minute (on an older laptop). xmlwf
    needs 8 seconds for checking well-formedness of the same file.

    Hermann
     
    Hermann Peifer, Feb 18, 2008
    #5
  6. Guest

    On Feb 18, 12:41 pm, Jürgen Kahrs <>
    wrote:
    Thanks. That sounds like a good solution if I can get approval -policy
    may not allow it. I am surprised that by now some vendor has not come
    out with something -like vi -why dont they offer an editor with built-
    in xml parser? I would think there would be a big demand for it , and
    especially something that lets you edit xml without touching the
    markup -there are lots of those, but none that I have found can handle
    large files.

    Bill

    > wrote:
    > > As I stated, " many have content errors ." I need to correct those and
    > > when I save the file, I want to make sure I did not mess up the markup
    > > or do anything that would make it not well-formed.

    >
    > OK, so I suggest this approach:
    >
    >   WHILE  xmllint --noout file.xml   report errors
    >   DO
    >     edit file.xml with vi or emacs or whatever ASCII editor
    >   END
    >
    > This works with files that contain several 100 MB of XML data.
     
    , Feb 18, 2008
    #6
  7. Andy Dingley Guest

    On 18 Feb, 15:14, wrote:
    > What edtor do you use for large(50MB+) xml files that checks well-
    > formedness?


    I don't. Files this size (IMHE) have to be perfect, or else I reject
    them outright. XML just doesn't worrk well as a "document" this size.
    It's OK if they're automatically generated and correct, but it's just
    not a good idea to start treating them as hand-editable.

    I certainly _don't_ edit supplied inbound data files like this. If it
    works it's tiresome for me, if it goes wrong, it's my fault. This is
    your supplier's problem for not being able to generate the things
    correctly - get _them_ to fix it!

    I also don't just check well-formedness, I check validity. If I'm
    throwing this much data around regularly and no-one has yet thought to
    define the schema for it, something is wrong.
     
    Andy Dingley, Feb 19, 2008
    #7
  8. Guest

    Good points. Nonetheless, I would expect the marketplace to have a
    virtual xml editor for files that are too big for the current crop of
    editors, but not so big that they could not have been composed
    manually(or semi-automatically). I see lots of xml files in that
    category. If such a tool exits, I would like to know about it.

    Bill
    On Feb 19, 3:09 am, Andy Dingley <> wrote:
    > On 18 Feb, 15:14, wrote:
    >
    > > What edtor do you use for large(50MB+) xml files that checks well-
    > > formedness?

    >
    > I don't. Files this size (IMHE) have to be perfect, or else I reject
    > them outright. XML just doesn't worrk well as a "document" this size.
    > It's OK if they're automatically generated and correct, but it's just
    > not a good idea to start treating them as hand-editable.
    >
    > I certainly _don't_ edit supplied inbound data files like this. If it
    > works it's tiresome for me, if it goes wrong, it's my fault. This is
    > your supplier's problem for not being able to generate the things
    > correctly - get _them_ to fix it!
    >
    > I also don't just check well-formedness, I check validity. If I'm
    > throwing this much data around regularly and no-one has yet thought to
    > define the schema for it, something is wrong.
     
    , Feb 19, 2008
    #8
  9. Peter Flynn Guest

    Jürgen Kahrs wrote:
    > schrieb:
    >
    >> What edtor do you use for large(50MB+) xml files that checks well-
    >> formedness?

    >
    >
    > Are you sure you need an editor to check for well-formedness ?
    > I use xmllint for checking:
    >
    > xmllint --noout file.xml
    >
    > Other readers recommend some other tools.
    > This is a FAQ here.


    It certainly is.
    http://xml.silmaril.ie/authors/parsers

    I'll update it with the results

    ///Peter
     
    Peter Flynn, Feb 19, 2008
    #9
  10. Peter Flynn Guest

    Jürgen Kahrs wrote:
    > wrote:
    >
    >> As I stated, " many have content errors ." I need to correct those and
    >> when I save the file, I want to make sure I did not mess up the markup
    >> or do anything that would make it not well-formed.

    >
    > OK, so I suggest this approach:
    >
    > WHILE xmllint --noout file.xml report errors
    > DO
    > edit file.xml with vi or emacs or whatever ASCII editor
    > END
    >
    > This works with files that contain several 100 MB of XML data.


    If you use emacs with psgml-mode you can omit the loop.
    C-c C-v will parse the document (for well-formedness or validity).

    As will every other XML editor...what would worry me is that the OP
    appears not to have been using an XML editor to start with...

    ///Peter
    --
    XML FAQ: http://xml.silmaril.ie/
     
    Peter Flynn, Feb 19, 2008
    #10
  11. Peter Flynn Guest

    wrote:
    > On Feb 18, 12:41 pm, Jürgen Kahrs <>
    > wrote:
    > Thanks. That sounds like a good solution if I can get approval -policy
    > may not allow it.


    If they're asking you to work with XML without an XML editor then you
    need a new job, and they will shortly need a new company.

    > I am surprised that by now some vendor has not come
    > out with something -like vi -why dont they offer an editor with built-
    > in xml parser?


    All XML editors have a built-in parser.
    That's what makes them XML editors.

    > I would think there would be a big demand for it , and
    > especially something that lets you edit xml without touching the
    > markup -there are lots of those, but none that I have found can handle
    > large files.


    Emacs. I used it today to edit a 450Mb OOXML document.

    ///Peter
     
    Peter Flynn, Feb 19, 2008
    #11
  12. Peter Flynn wrote:
    > If they're asking you to work with XML without an XML editor then you
    > need a new job, and they will shortly need a new company.


    Depends on the application. Large XML docs are often machine-generated,
    and I find small ones (order of 10KB) are often just as easy to
    manipulate with an ordinary text editor. Though I do use Emacs, which
    provides some assistance.

    >> why dont they offer an editor with built-in xml parser?


    They exist. On the other hand, if you want that kind of assistance with
    editing XML you probably want much more than a simple XML parser; you'll
    probably want something that can be directed by the schema, at least,
    and that means fairly deep integration into the editor.


    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
     
    Joseph Kesselman, Feb 19, 2008
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. San Diego Guy
    Replies:
    0
    Views:
    581
    San Diego Guy
    Aug 7, 2003
  2. Schultz
    Replies:
    3
    Views:
    596
    =?Utf-8?B?QmlsbCBCb3Jn?=
    Feb 14, 2005
  3. =?Utf-8?B?a2Vu?=
    Replies:
    1
    Views:
    10,392
    Wiktor Zychla [C# MVP]
    Jan 23, 2006
  4. setar
    Replies:
    11
    Views:
    23,176
    mikem789
    Apr 1, 2011
  5. Sharon McCarty
    Replies:
    0
    Views:
    279
    Sharon McCarty
    Nov 24, 2004
Loading...

Share This Page