ReplaceAll causes an OUT OF MEMORY error!

J

John Guilbert

Hi,

Can someone help.

I have a string that contains 16 Million characters and when I try a
ReplaceAll on this string I get an OUT Of MEMORY error
(java.lang.reflect.Invocation - Out of Memory). Is there anyway I can
fix this, or is there a better way to handle this amount of
characters.

Thanks,

John.
 
T

Tor Iver Wilhelmsen

I have a string that contains 16 Million characters and when I try a
ReplaceAll on this string I get an OUT Of MEMORY error
(java.lang.reflect.Invocation - Out of Memory). Is there anyway I can
fix this, or is there a better way to handle this amount of
characters.

You can increase the heap available to the Java VM with the -Xmx
parameter, e.g. -Xmx512m will use up to 512 megs of memory for heap.
 
M

Michael Borgwardt

John said:
I have a string that contains 16 Million characters and when I try a
ReplaceAll on this string I get an OUT Of MEMORY error
(java.lang.reflect.Invocation - Out of Memory). Is there anyway I can
fix this, or is there a better way to handle this amount of
characters.

You bet there is!!!

In fact, you should never even HAVE a String that large. Such an
amount of data should not be processed as one chunk, as it is an unnecessary
waste of RAM.

Use readers/writers instead, and process the data one line at a time.
 
C

Chris Uppal

John said:
I have a string that contains 16 Million characters and when I try a
ReplaceAll on this string I get an OUT Of MEMORY error
(java.lang.reflect.Invocation - Out of Memory). Is there anyway I can
fix this, or is there a better way to handle this amount of
characters.

Questions that could affect the answer:

What is the regexp you are replacing all occurrences of ? What are you
replacing them with ?

Do you actually need the full power of regexps ?

Typically, how many occurrences are there in the input String ?

For many combinations of answers to these questions, I'd /expect/ replaceAll()
to be able to run in roughly the space required by the input string plus that
required for the output, so it sounds as if something odd's going on.

More questions:

Can you do the same replacement with a much shorter input String, or does that
use lots of / run out of memory too ?

Also, holding that much text data in a single String /may/ not be the best way
to handle things, but it depends on what you are doing with it (and how often,
and on what class machine -- if it's a PDA then you're stuffed ;-). So what
/are/ you doing with it ?

-- chris
 
C

Chris Uppal

I said:
For many combinations of answers to these questions, I'd /expect/
replaceAll() to be able to run in roughly the space required by the input
string plus that required for the output, so it sounds as if something
odd's going on.

[Damn, hit then send button, /then/ think. Sigh...]

No it's not odd. 16M characters is 32MBytes, plus (presumably) something
similar for the ouput string = probably more than will fit in a default heap on
many JMVs. The other questions are still relevant, though, unless you want to
just throw memory at the problem until it goes away (which might not be a bad
idea -- that depends too).

-- chris
 
J

John Guilbert

Guys,

I see both your points of view.

I am using Crystal Reports with Java. The guy who coded the existing
Web App embeds the HTML (All CR Pages of Report) returned from Crystal
Reports Server in the One web page. He does not use a frame as he
should accessing one page at a time. Crystal Reports does return all
the HTML to Java as it should. I got rid of the ReplaceAll (these were
used to tighten formatting as CR does not do a great job) and it
generates the error:

Apache Tomcat/4.0.6 - Http Status 500 - Internal Server Error

Type: Exception Report

Message: Internal Server Error

Description: The server encountered an internal error(Internal Server
Error) that prevented it from fulfilling this request.

Exception:

javax.servlet.ServletException
at org.apache.jasper.runtime.PageContextImpl.handlePageException(PageContextImpl.java:471)
......
......
root cause

java.lang.OutOfMemoryError


Could it be that APACHE needs increased memory size, not just the JVM.
My JVM Cache has been set to 128MB.

Help Appreciated.

Thanks,

John.
 
J

John C. Bollinger

Chris said:
I wrote:

For many combinations of answers to these questions, I'd /expect/
replaceAll() to be able to run in roughly the space required by the input
string plus that required for the output, so it sounds as if something
odd's going on.


[Damn, hit then send button, /then/ think. Sigh...]

No it's not odd. 16M characters is 32MBytes, plus (presumably) something
similar for the ouput string = probably more than will fit in a default heap on
many JMVs.

But wait, there's more! Depending on the implementation and the number
of replacements to be performed, replaceAll could end up generating a
large number of String or char[] objects comparable in size to the
original (especially if the replacement text is larger than the typical
match to the pattern). A common strategy to reduce the number of such
additional objects would instead produce at least one that was much
larger than the original -- e.g. twice the size.
The other questions are still relevant, though, unless you want to
just throw memory at the problem until it goes away (which might not be a bad
idea -- that depends too).

Yes, the other questions are still relevant. I think, though, that
Michael Borgwardt's response is better general advice: don't attempt to
process data in monolithic chunks [my paraphrase]. Ignoring that advice
is likely to lead to problems later, even if the app can be twiddled to
work OK now.


John Bollinger
(e-mail address removed)
 
C

Chris Uppal

John said:
[...]

But wait, there's more! Depending on the implementation and the number
of replacements to be performed, replaceAll could end up generating a
large number of String or char[] objects comparable in size to the
original (especially if the replacement text is larger than the typical
match to the pattern).

Agreed. However I'd hope that the implementation didn't do that -- there's no
need to take more space than is needed by the sum of the input and output
streams. I think that the possibility that the implementation might be broken
in that sense, was what I had in mind when I said "something odd's going on"
(before reflecting on the sizes involved).

(Incidently, the Sun implementation appears not to be broken in this sense --
to a quick glance anyway.)

A common strategy to reduce the number of such
additional objects would instead produce at least one that was much
larger than the original -- e.g. twice the size.

Sorry, what what strategy you are talking about here ?

The other questions are still relevant, though, unless you
want to just throw memory at the problem until it goes away (which
might not be a bad idea -- that depends too).

Yes, the other questions are still relevant. I think, though, that
Michael Borgwardt's response is better general advice: don't attempt to
process data in monolithic chunks [my paraphrase]. Ignoring that advice
is likely to lead to problems later, even if the app can be twiddled to
work OK now.

Agreed with that too, in general. There are specific exceptions, though, such
as one-off code, or when the 16M case is a wild outlier that almost never
happpens, and is not worth complicating the existing code (written for short
strings) for when it can (just) handle it. (It seems that OP was not in one of
these situations).

-- chris
 
J

John C. Bollinger

Chris said:
John C. Bollinger wrote:
But wait, there's more! Depending on the implementation and the number
of replacements to be performed, replaceAll could end up generating a
large number of String or char[] objects comparable in size to the
original (especially if the replacement text is larger than the typical
match to the pattern).
[...]
A common strategy to reduce the number of such
additional objects would instead produce at least one that was much
larger than the original -- e.g. twice the size.


Sorry, what what strategy you are talking about here ?

The alternative to creating a new char[] (and possibly a new String
around it) at every replacement is to start by creating a new char[]
that is much larger than the original, and to bump up the size in
another large increment whenever the array becomes too small (thus
requiring creation of another new array). There are various ways to
choose the increment in this kind of scenario, but a common one is that
each new array be double the size of the previous (full) one. That
tends to minimize the number of new arrays (and accompanying array copy
operations) that must be performed, while achieving passable memory
utilization (never less than 50%). If there is enough memory to do so,
a right-size array can be created at the end to hold the final result.


John Bollinger
(e-mail address removed)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,904
Latest member
HealthyVisionsCBDPrice

Latest Threads

Top