How to edit a large xml file (250MB)?

S

setar

How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
read this file because it is too big. SmEdit reads only the first MB of the
file and doesn't support UTF-8 (I need program which supports it). Now I use
XVI32 which is hexadecimal editor, but it can be useful only is editing
small number of characters - deleting and inserting characters to large
files is very tiring.



I don't need xml editor. It can be any text editor without xml validation
etc. I don't know how such a program should work, but in my opinion there
should be such a program.
 
A

Andy Dingley

setar said:
How can I edit an xml file which has 250MB?

Don't make XML files that are 250MB in size.

Editing is simple. So if you can't even edit it, how are you going to
process it? If you run XPath on it, what do you think performance will
be like?

There are (rare) times when XML works in these volumes, but in general
it doesn't. If you're looking for a stream-based format (easy to work
with in huge volumes) then XML's single root element constraint works
against you. If you're trying to build a database, then XML's lack of
efficient querying is a performance hit. If you want 250MB files as an
encapsulated data format (maybe ETL on a database) then it's workable,
but the document lifecycle is a fairly short
create-transfer-load-delete.

So if your application requires a 250MB data entity, then think
carefully about the tools you're using. Life might be simpler that way.

I also have lots of 250MB files around, but I don't edit them by hand.
I have computers to do that sort of thing for me instead.
 
J

Juergen Kahrs

setar said:
I don't need xml editor. It can be any text editor without xml validation
etc. I don't know how such a program should work, but in my opinion there
should be such a program.

Use vim, the improved vi editor. I have edited such
large XML files with vi several times and you hardly
notice the difference between 10 MB and 200 MB files.
Current versions of vim (when configured properly)
can also edit any UTF-8 characters, for example Japanese.
 
J

Joe Kesselman

setar said:
How can I edit an xml file which has 250MB?

Emacs also supports UTF-8, of course.

How much swap space have you got? That's what's going to control your
maximum buffer size, assuming you've got a reasonably intelligent editor
implementation.

Another alternative is a stream editor -- the Unix tool "sed" or
something equivalent. Downside of that is that it isn't interactive; you
have to essentially write a program that tells it how to find the points
you want changed and what you want done with them.

If you'd rather stay in the XML world, you could find or write a stream
editor based on SAX streams; this is one of the classic situations where
SAX can have advantages over DOM-based processing.

Or find/write a tool that will handle your document in chunks, either
text-based or SAX-based. Again, that presumes that what you're doing
divides up nicely.

Which of these approaches/tools makes the most sense depends on exactly
what you're trying to do to the file.
 
T

Tjerk Wolterink

setar schreef:
How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
read this file because it is too big. SmEdit reads only the first MB of the
file and doesn't support UTF-8 (I need program which supports it). Now I use
XVI32 which is hexadecimal editor, but it can be useful only is editing
small number of characters - deleting and inserting characters to large
files is very tiring.



I don't need xml editor. It can be any text editor without xml validation
etc. I don't know how such a program should work, but in my opinion there
should be such a program.

Use a native XML-Database to store your xml data, and edit it using
XQuery,
there already exists databases that supports xml file sizes into the
multiple GB range:

http://exist.sourceforge.net/
http://xml.apache.org/xindice/
 
A

acristata

In case you haven't got the hang of vim yet :) ...

If you're on Windows you could try TextPad (you can get a full-featured
evaluation version to test) or EmEditor (free standard version with
most features). Obviously your system's resources will determine
whether this works for you and how well, but I can open a 250MB text
file with those text editors and it looks as though I could edit.
Performance seems better on EmEditor, TextPad doesn't have full Unicode
display support but seems like it might cope... That said, I've never
opened such large files except out of curiosity...

Also check that you aren't using UTF-16 as a file encoding --
conversion to UTF-8 could save you some space.

XML editors will obviously have problems opening such large files
because they have to parse the file (some XML editors have an option
which you can set so that files aren't automatically parsed on
opening). One good open-source XML editor which aims at efficiency is
XML Copy Editor which you'll find on sourceforge. It won't manage files
of that size, though.

Tim
 
S

setar

User said:
Don't make XML files that are 250MB in size.

It isn't file created by me. File contains about 100'000 records which I
import to my program. Everything is working. Unfortunately several records
in the file have errors which I want to correct. I don't want to write
additional code to be able to correct imported data. I prefer to make some
changes in source file. Of course I could write code for editing imported
data, but I don't need this functionality except for correcting mentioned
errors. I also have no access to editor which exported mentioned xml file.
Use vim, the improved vi editor. I have edited such
large XML files with vi several times ....

Thanks! I've checked it and it's good solution for me.
With this configuration:
- set enc=utf-8 (UTF-8 encoding)
- set undolevels=-1 (maybe with this vim is faster ...)
efficiencies for subtasks of editing in gvim are:
- opening 250MB xml file: 15 seconds
- searching word (case sensitive): to 20 seconds (depending on its place
in file)
In my opinion it could be better because for example in Total
Commander's default viewer it takes only 2 seconds!
But it is acceptable, because I want only to make a few dozen of
changes.
- going to specified line of the file by specifying line number or by
draging vertical slider by mouse: veeeery long, so don't do this!
- making small changes (for example inserting and deleting some lines of
text; writing something): fluently
- writing changes to file (for example when we will do all changes): 15
seconds
I have Athlon 2500 with 1GB RAM. gvim uses only 300MB, so 512MB of RAM were
free.
... and you hardly
notice the difference between 10 MB and 200 MB files.
Current versions of vim (when configured properly)
can also edit any UTF-8 characters, for example Japanese.

I can notice difference between searches which take 2 seconds and 20
seconds:) But you are right that "making small changes (for example
inserting and deleting some lines of text; writing something)" is very fast.
Ather alternative is a stream editor -- the Unix tool "sed" or
something equivalent. Downside of that is that it isn't interactive; you
have to essentially write a program that tells it how to find the points
you want changed and what you want done with them.

I would prefer something interactive, because every change will be different
.... I dont want to write a program every time ...
Or find/write a tool that will handle your document in chunks, either
text-based or SAX-based. Again, that presumes that what you're doing
divides up nicely.

Unfortunatelly I can't find such a tool ...
If you're on Windows you could try TextPad (you can get a full-featured
evaluation version to test) or EmEditor (free standard version with
most features).

Here are statistics with default configuration: ;)
- opening 250MB xml file: 70 seconds
- searching word at end of file: 45 seconds
- draging vertical slider by mouse: fluently:)
- making small changes (for example inserting and deleting some lines of
text; writing something): sometimes 0.5 second, sometimes 30 seconds :(((
30 seconds is long, but maybe it will be acceptable for someone ...
- writing changes to file (for example when we will do all changes): not
tested;)

P.S. Sorry for errors, my English isn't good.
 
?

=?ISO-8859-1?Q?J=FCrgen_Kahrs?=

setar said:
efficiencies for subtasks of editing in gvim are:
- opening 250MB xml file: 15 seconds

7 seconds on my AMD Sempron 2800+ (SuSE Linux 10.1).
- searching word (case sensitive): to 20 seconds (depending on its place
in file)

18 seconds on my PC for searching until end of file.
- going to specified line of the file by specifying line number or by
draging vertical slider by mouse: veeeery long, so don't do this!

You shouldnt use gvim but the original vim on Linux.
Going to line number 5000000 works instantly on my PC.
- writing changes to file (for example when we will do all changes): 15
seconds

15 seconds also on my PC.
I have Athlon 2500 with 1GB RAM. gvim uses only 300MB, so 512MB of RAM were
free.

300 MB used by vim on my PC also.
I can notice difference between searches which take 2 seconds and 20
seconds:) But you are right that "making small changes (for example
inserting and deleting some lines of text; writing something)" is very fast.

That's true, I also noticed a "slight" difference.
Unfortunatelly I can't find such a tool ...

Before you choose a tool you have to find out if you
can assume that XML files are well-formed. If they _are_
well-formed, than you can choose among a large set of
tools on the marke. Otherwise, you have to use an editor.

I guess you are better off using vim.
But if you consider using a tool, have a look at this one:

http://home.vrweb.de/~juergen.kahrs/gawk/XML/

Good luck.
 
P

Peter Flynn

setar said:
How can I edit an xml file which has 250MB? I tried to use UltraEdit, Visual
Studio, Eclipse, Stylus Studio and XMLSpy editors but these programs can't
read this file because it is too big. SmEdit reads only the first MB of the
file and doesn't support UTF-8 (I need program which supports it). Now I use
XVI32 which is hexadecimal editor, but it can be useful only is editing
small number of characters - deleting and inserting characters to large
files is very tiring.

Emacs. With psgml and xxml and onsgmls if you want DTD validation.

///Peter
 
S

setar

User said:
Emacs. With psgml and xxml and onsgmls if you want DTD validation.

I installed GNU Emacs 21.3 on Windows XP. Emacs displays this message while
opening file:
"find-file-noselect-1: Maximum buffer size exceeded"
and doesn't load file.
I've found this information on gnu.emacs.help news group written by Stefan
Monnier on 11 January 2005:

--------------------------------------------------
Emacs 21.3.1 did not open a 150Mb text file in windows XP. Is there
are way to make emacs open larger files ?

On 32bit systems, the maximum file size in Emacs-21.3 is 128MB.
In Emacs-CVS, it's been pushed to 256MB.
It can be fairly easily be pushed further to 512MB, tho the corresponding
patch is not in Emacs-CVS.

If that's not good enough:
1 - use a 64bit system (with an Emacs compiled accordingly).
2 - split your file into smaller chunks.
3 - use XEmacs whose max is 1GB.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,045
Latest member
DRCM

Latest Threads

Top