How to edit a large xml file (250MB)?

Discussion in 'XML' started by setar, Aug 23, 2006.

  1. setar

    setar Guest

    How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
    Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
    read this file because it is too big. SmEdit reads only the first MB of the
    file and doesn't support UTF-8 (I need program which supports it). Now I use
    XVI32 which is hexadecimal editor, but it can be useful only is editing
    small number of characters - deleting and inserting characters to large
    files is very tiring.



    I don't need xml editor. It can be any text editor without xml validation
    etc. I don't know how such a program should work, but in my opinion there
    should be such a program.
    setar, Aug 23, 2006
    #1
    1. Advertising

  2. setar

    Andy Dingley Guest

    setar wrote:

    > How can I edit an xml file which has 250MB?


    Don't make XML files that are 250MB in size.

    Editing is simple. So if you can't even edit it, how are you going to
    process it? If you run XPath on it, what do you think performance will
    be like?

    There are (rare) times when XML works in these volumes, but in general
    it doesn't. If you're looking for a stream-based format (easy to work
    with in huge volumes) then XML's single root element constraint works
    against you. If you're trying to build a database, then XML's lack of
    efficient querying is a performance hit. If you want 250MB files as an
    encapsulated data format (maybe ETL on a database) then it's workable,
    but the document lifecycle is a fairly short
    create-transfer-load-delete.

    So if your application requires a 250MB data entity, then think
    carefully about the tools you're using. Life might be simpler that way.

    I also have lots of 250MB files around, but I don't edit them by hand.
    I have computers to do that sort of thing for me instead.
    Andy Dingley, Aug 23, 2006
    #2
    1. Advertising

  3. setar wrote:

    > I don't need xml editor. It can be any text editor without xml validation
    > etc. I don't know how such a program should work, but in my opinion there
    > should be such a program.


    Use vim, the improved vi editor. I have edited such
    large XML files with vi several times and you hardly
    notice the difference between 10 MB and 200 MB files.
    Current versions of vim (when configured properly)
    can also edit any UTF-8 characters, for example Japanese.
    Juergen Kahrs, Aug 23, 2006
    #3
  4. setar wrote:
    > How can I edit an xml file which has 250MB?


    Emacs also supports UTF-8, of course.

    How much swap space have you got? That's what's going to control your
    maximum buffer size, assuming you've got a reasonably intelligent editor
    implementation.

    Another alternative is a stream editor -- the Unix tool "sed" or
    something equivalent. Downside of that is that it isn't interactive; you
    have to essentially write a program that tells it how to find the points
    you want changed and what you want done with them.

    If you'd rather stay in the XML world, you could find or write a stream
    editor based on SAX streams; this is one of the classic situations where
    SAX can have advantages over DOM-based processing.

    Or find/write a tool that will handle your document in chunks, either
    text-based or SAX-based. Again, that presumes that what you're doing
    divides up nicely.

    Which of these approaches/tools makes the most sense depends on exactly
    what you're trying to do to the file.


    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Aug 23, 2006
    #4
  5. setar schreef:
    > How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
    > Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
    > read this file because it is too big. SmEdit reads only the first MB of the
    > file and doesn't support UTF-8 (I need program which supports it). Now I use
    > XVI32 which is hexadecimal editor, but it can be useful only is editing
    > small number of characters - deleting and inserting characters to large
    > files is very tiring.
    >
    >
    >
    > I don't need xml editor. It can be any text editor without xml validation
    > etc. I don't know how such a program should work, but in my opinion there
    > should be such a program.
    >
    >


    Use a native XML-Database to store your xml data, and edit it using
    XQuery,
    there already exists databases that supports xml file sizes into the
    multiple GB range:

    http://exist.sourceforge.net/
    http://xml.apache.org/xindice/
    Tjerk Wolterink, Aug 23, 2006
    #5
  6. Tjerk Wolterink wrote:
    > Use a native XML-Database to store your xml data, and edit it using XQuery,
    > there already exists databases that supports xml file sizes into the
    > multiple GB range:
    >
    > http://exist.sourceforge.net/
    > http://xml.apache.org/xindice/


    IBM's DB2 now has a native-XML data format, making it a world-class XML
    database as well as a world-class relational database.

    --
    () ASCII Ribbon Campaign | Joe Kesselman
    /\ Stamp out HTML e-mail! | System architexture and kinetic poetry
    Joe Kesselman, Aug 23, 2006
    #6
  7. setar

    Guest

    In case you haven't got the hang of vim yet :) ...

    If you're on Windows you could try TextPad (you can get a full-featured
    evaluation version to test) or EmEditor (free standard version with
    most features). Obviously your system's resources will determine
    whether this works for you and how well, but I can open a 250MB text
    file with those text editors and it looks as though I could edit.
    Performance seems better on EmEditor, TextPad doesn't have full Unicode
    display support but seems like it might cope... That said, I've never
    opened such large files except out of curiosity...

    Also check that you aren't using UTF-16 as a file encoding --
    conversion to UTF-8 could save you some space.

    XML editors will obviously have problems opening such large files
    because they have to parse the file (some XML editors have an option
    which you can set so that files aren't automatically parsed on
    opening). One good open-source XML editor which aims at efficiency is
    XML Copy Editor which you'll find on sourceforge. It won't manage files
    of that size, though.

    Tim

    setar wrote:
    > How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
    > Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
    > read this file because it is too big. SmEdit reads only the first MB of the
    > file and doesn't support UTF-8 (I need program which supports it). Now I use
    > XVI32 which is hexadecimal editor, but it can be useful only is editing
    > small number of characters - deleting and inserting characters to large
    > files is very tiring.
    >
    >
    >
    > I don't need xml editor. It can be any text editor without xml validation
    > etc. I don't know how such a program should work, but in my opinion there
    > should be such a program.
    , Aug 23, 2006
    #7
  8. setar

    setar Guest

    User "Andy Dingley" wrote:
    > Don't make XML files that are 250MB in size.


    It isn't file created by me. File contains about 100'000 records which I
    import to my program. Everything is working. Unfortunately several records
    in the file have errors which I want to correct. I don't want to write
    additional code to be able to correct imported data. I prefer to make some
    changes in source file. Of course I could write code for editing imported
    data, but I don't need this functionality except for correcting mentioned
    errors. I also have no access to editor which exported mentioned xml file.

    User "Juergen Kahrs" wrote:
    > Use vim, the improved vi editor. I have edited such
    > large XML files with vi several times ....


    Thanks! I've checked it and it's good solution for me.
    With this configuration:
    - set enc=utf-8 (UTF-8 encoding)
    - set undolevels=-1 (maybe with this vim is faster ...)
    efficiencies for subtasks of editing in gvim are:
    - opening 250MB xml file: 15 seconds
    - searching word (case sensitive): to 20 seconds (depending on its place
    in file)
    In my opinion it could be better because for example in Total
    Commander's default viewer it takes only 2 seconds!
    But it is acceptable, because I want only to make a few dozen of
    changes.
    - going to specified line of the file by specifying line number or by
    draging vertical slider by mouse: veeeery long, so don't do this!
    - making small changes (for example inserting and deleting some lines of
    text; writing something): fluently
    - writing changes to file (for example when we will do all changes): 15
    seconds
    I have Athlon 2500 with 1GB RAM. gvim uses only 300MB, so 512MB of RAM were
    free.

    User "Juergen Kahrs" wrote:
    > ... and you hardly
    > notice the difference between 10 MB and 200 MB files.
    > Current versions of vim (when configured properly)
    > can also edit any UTF-8 characters, for example Japanese.


    I can notice difference between searches which take 2 seconds and 20
    seconds:) But you are right that "making small changes (for example
    inserting and deleting some lines of text; writing something)" is very fast.

    User "Joe Kesselman" wrote:
    >Ather alternative is a stream editor -- the Unix tool "sed" or
    >something equivalent. Downside of that is that it isn't interactive; you
    >have to essentially write a program that tells it how to find the points
    >you want changed and what you want done with them.


    I would prefer something interactive, because every change will be different
    .... I dont want to write a program every time ...

    >Or find/write a tool that will handle your document in chunks, either
    >text-based or SAX-based. Again, that presumes that what you're doing
    >divides up nicely.


    Unfortunatelly I can't find such a tool ...

    User wrote:
    >If you're on Windows you could try TextPad (you can get a full-featured
    >evaluation version to test) or EmEditor (free standard version with
    >most features).


    Here are statistics with default configuration: ;)
    - opening 250MB xml file: 70 seconds
    - searching word at end of file: 45 seconds
    - draging vertical slider by mouse: fluently:)
    - making small changes (for example inserting and deleting some lines of
    text; writing something): sometimes 0.5 second, sometimes 30 seconds :(((
    30 seconds is long, but maybe it will be acceptable for someone ...
    - writing changes to file (for example when we will do all changes): not
    tested;)

    P.S. Sorry for errors, my English isn't good.
    setar, Aug 23, 2006
    #8
  9. setar wrote:

    > efficiencies for subtasks of editing in gvim are:
    > - opening 250MB xml file: 15 seconds


    7 seconds on my AMD Sempron 2800+ (SuSE Linux 10.1).

    > - searching word (case sensitive): to 20 seconds (depending on its place
    > in file)


    18 seconds on my PC for searching until end of file.

    > - going to specified line of the file by specifying line number or by
    > draging vertical slider by mouse: veeeery long, so don't do this!


    You shouldnt use gvim but the original vim on Linux.
    Going to line number 5000000 works instantly on my PC.

    > - writing changes to file (for example when we will do all changes): 15
    > seconds


    15 seconds also on my PC.

    > I have Athlon 2500 with 1GB RAM. gvim uses only 300MB, so 512MB of RAM were
    > free.


    300 MB used by vim on my PC also.

    > I can notice difference between searches which take 2 seconds and 20
    > seconds:) But you are right that "making small changes (for example
    > inserting and deleting some lines of text; writing something)" is very fast.


    That's true, I also noticed a "slight" difference.

    >> Or find/write a tool that will handle your document in chunks, either
    >> text-based or SAX-based. Again, that presumes that what you're doing
    >> divides up nicely.

    >
    > Unfortunatelly I can't find such a tool ...


    Before you choose a tool you have to find out if you
    can assume that XML files are well-formed. If they _are_
    well-formed, than you can choose among a large set of
    tools on the marke. Otherwise, you have to use an editor.

    I guess you are better off using vim.
    But if you consider using a tool, have a look at this one:

    http://home.vrweb.de/~juergen.kahrs/gawk/XML/

    Good luck.
    =?ISO-8859-1?Q?J=FCrgen_Kahrs?=, Aug 23, 2006
    #9
  10. setar

    Peter Flynn Guest

    setar wrote:
    > How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
    > Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
    > read this file because it is too big. SmEdit reads only the first MB of the
    > file and doesn't support UTF-8 (I need program which supports it). Now I use
    > XVI32 which is hexadecimal editor, but it can be useful only is editing
    > small number of characters - deleting and inserting characters to large
    > files is very tiring.


    Emacs. With psgml and xxml and onsgmls if you want DTD validation.

    ///Peter
    Peter Flynn, Aug 23, 2006
    #10
  11. setar

    setar Guest

    User "Peter Flynn" wrote:
    > Emacs. With psgml and xxml and onsgmls if you want DTD validation.


    I installed GNU Emacs 21.3 on Windows XP. Emacs displays this message while
    opening file:
    "find-file-noselect-1: Maximum buffer size exceeded"
    and doesn't load file.
    I've found this information on gnu.emacs.help news group written by Stefan
    Monnier on 11 January 2005:

    --------------------------------------------------
    > Emacs 21.3.1 did not open a 150Mb text file in windows XP. Is there
    > are way to make emacs open larger files ?


    On 32bit systems, the maximum file size in Emacs-21.3 is 128MB.
    In Emacs-CVS, it's been pushed to 256MB.
    It can be fairly easily be pushed further to 512MB, tho the corresponding
    patch is not in Emacs-CVS.

    If that's not good enough:
    1 - use a 64bit system (with an Emacs compiled accordingly).
    2 - split your file into smaller chunks.
    3 - use XEmacs whose max is 1GB.
    --------------------------------------------------

    So ... did you mean using XEmacs?
    setar, Aug 24, 2006
    #11
  12. setar

    mikem789

    Joined:
    Apr 1, 2011
    Messages:
    3
    how to edit a large file 250mb

    the link here might help, it supports gigabyte and even terabyte size files, but I must stress I’ve not used it myself so don't know how good it is, would welcome feedback from other though as you can download it for a 30 day trial.
    http://www.liquid-technologies.com/Large-File-Editor.aspx
    mikem789, Apr 1, 2011
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. San Diego Guy
    Replies:
    0
    Views:
    534
    San Diego Guy
    Aug 7, 2003
  2. Schultz
    Replies:
    3
    Views:
    547
    =?Utf-8?B?QmlsbCBCb3Jn?=
    Feb 14, 2005
  3. =?Utf-8?B?a2Vu?=
    Replies:
    1
    Views:
    10,314
    Wiktor Zychla [C# MVP]
    Jan 23, 2006
  4. Replies:
    11
    Views:
    1,854
    Joseph Kesselman
    Feb 19, 2008
  5. Replies:
    5
    Views:
    852
    Xho Jingleheimerschmidt
    Apr 2, 2009
Loading...

Share This Page