How do I do a LOT of non-regular-

S

Stryder

I'm trying to do a lot (several hundred) of replacements in a string.
These are just string replacements I'm trying to do, not regular
expressions. Here's the WRONG way to do it, but hopefully it tells
you what it is I'm trying to do...

static String unescapeString(String string) {
Iterator i = entitiesHashMap.keySet().iterator();
for (String key : entitiesHashMap.keySet()) {
string = string.replaceAll(key, (String)
entitiesHashMap.get(key));
}

return string;
}

entitiesHashmap is a HashMap with literally hundreds of entries.

Any help would be greatly appreciated.
 
A

Andreas Leitgeb

Stryder said:
I'm trying to do a lot (several hundred) of replacements in a string.
These are just string replacements I'm trying to do, not regular
expressions. Here's the WRONG way to do it, but hopefully it tells
you what it is I'm trying to do...
static String unescapeString(String string) {
Iterator i = entitiesHashMap.keySet().iterator();
for (String key : entitiesHashMap.keySet()) {
string = string.replaceAll(key,
(String) entitiesHashMap.get(key));
}
return string;
}
entitiesHashmap is a HashMap with literally hundreds of entries.

For just hundreds of elements in the map, your approach isn't all that
bad, except that it may re-replace results from earlier replacements.

If you know the lengths of the longest and shortest keys in the
HashMap, then you could check all feasible substrings in the
HashMap, like:

for (int startIdx=0; startIdx<string.length(); startIdx++) {
for (int len= lenLongestKey; len>=lenShortestKey; len--) {
String sub=string.substring(startIdx,len);
if (entitiesHashMap.count(sub) > 0) {
String replacementString=entitiesHashMap.get(sub);
string=string.substring(0,startIdx) +
replacementString +
string.substring(startIdx+len);
startIdx += replacementString.length()-1;
break;
}
}
}

If you don't know these min/max lengths, you can
1) obtain them from iterating the keySet(), or
2) you can let the inner loop always go
from string.length()-startIdx down to 1

If the strings are rather short, and the Map very very large (rather
in the range of hundreds of thousands, than just hundreds) and you
also don't want to maintain these min/max lengths together with the
map, then "2)" wins.

PS: you can optimize away the .count(sub), by just checking the
replacementString for null.
 
A

Albert

Stryder a écrit :
I'm trying to do a lot (several hundred) of replacements in a string.
These are just string replacements I'm trying to do, not regular
expressions. Here's the WRONG way to do it, but hopefully it tells
you what it is I'm trying to do...

static String unescapeString(String string) {
Iterator i = entitiesHashMap.keySet().iterator();
for (String key : entitiesHashMap.keySet()) {
string = string.replaceAll(key, (String)
entitiesHashMap.get(key));
}

return string;
}

entitiesHashmap is a HashMap with literally hundreds of entries.

Any help would be greatly appreciated.

You could use replace(CharSequence target, CharSequence replacement)
from jdk 1.5 String and also use for on the entrySet instead of the keySet.
 
M

Mark Space

Stryder wrote:
... snip ...
expressions. Here's the WRONG way to do it, but hopefully it tells
you what it is I'm trying to do...
... snip ...
return string;


I think if you use "return new String( string );" here it will gather up
all the temporary objects made during replacement and create a new
string with one single contiguous memory block. That should release a
lot of memory.

Other than that, I don't see a fundamental error in what you are doing.
Can you tell us WHY you think this is the wrong way to do it. Memory
problems? Performance problems? Thread safety? Incorrect results?
You haven't really told us anything.
 
C

charlesbos73

Can you tell us WHY you think this is the wrong way to do it. Memory
problems? Performance problems? Thread safety? Incorrect results?
You haven't really told us anything.

OP: These are just string replacements I'm trying to do, *not regular
OP: expressions*.

OP: string = string.*replaceAll*(key, ...

From what I understood from what I've enclosed inside '*', it's not
solving his problem for he wants to do non-regex string replacement.

Albert pointed him to "@Since 1.5"'s String:

replace(CharSequence target, CharSequence replacement)
 
M

Mark Space

charlesbos73 said:
replace(CharSequence target, CharSequence replacement)


Oh. I read the OPs post as "I don't need REGEX but they're fine, it's
working as is."

So this is a RTFM issue?

So Albert was pointing him at the FM, not trying to speed things up a tad.
 
D

Daniel Pitts

Mark said:
Stryder wrote:
... snip ...


I think if you use "return new String( string );" here it will gather up
all the temporary objects made during replacement and create a new
string with one single contiguous memory block. That should release a
lot of memory.
Uh, no.
replace() will returns either itself or a new String(), no need to
re-allocate and copy after.
 
A

Albert

Mark Space a écrit :
Oh. I read the OPs post as "I don't need REGEX but they're fine, it's
working as is."

So this is a RTFM issue?

So Albert was pointing him at the FM, not trying to speed things up a tad.

Well, you might guess that searching to match a regexp is slower than
seraching for a CharSequence...

And using entrySet avoid calling get() so there is no more lookup to do.
 
S

Stryder

Here's what I settled on. Thanks for all the help! The CharSequences
sped things up the most, and using an Entry set helped also. I'm now
just a tiny bit less of a newbie 8^)

static String unescapeString(String inputString) {
Set<Map.Entry<CharSequence, CharSequence>> set =
entitiesHashMap.entrySet();

for (Map.Entry<CharSequence, CharSequence> me : set) {
inputString = inputString.replace(me.getKey(), me.getValue
());
}

return inputString;
}
 
M

Mark Space

Daniel said:
Uh, no.
replace() will returns either itself or a new String(), no need to
re-allocate and copy after.


I did a quick check. replace() just calls replaceAll(), which I thought
was interesting since everyone seemed to be assuming that replace()
didn't use regex or something (I assumed that, at least).

replace() does call Pattern.compile() and Matcher.quoteReplacement(), so
that's how it achieves its "regex free" behavior.

replaceAll() uses a StringBuffer to compose the new String, so you're
correct, there's no need to call "new String()". Sorry about the bum
steer. Although I wonder about the need for synchronization, the
StringBuffer exists only on the stack.
 
L

Lew

Stryder said:
Here's what I settled on. Thanks for all the help! The CharSequences

Please do not top-post.
sped things up the most, and using an Entry set helped also. I'm now
just a tiny bit less of a newbie 8^)

static String unescapeString(String inputString) {
Set<Map.Entry<CharSequence, CharSequence>> set =
entitiesHashMap.entrySet();

for (Map.Entry<CharSequence, CharSequence> me : set) {

It's slightly more compact just to put the 'entrySet()' call after the colon
in lieu of declaring a separate variable for it, but it does no harm to
declare the variable.
inputString = inputString.replace(me.getKey(), me.getValue
());
}

return inputString;
}

This approach creates a lot of intermediate String objects. If you use a
StringBuilder you can avoid the intermediate objects at the expense of rather
more complex code.

Unless String objects represented a huge burden, I would usually prefer the
way you did it.
 
A

Andreas Leitgeb

This approach creates a lot of intermediate String objects. If you use a
StringBuilder you can avoid the intermediate objects at the expense of rather
more complex code.

For StringBuilder, the .replace takes start and end index, rather than
a needle-word. yes, it would probably be faster, but he'd have to
make a loop with SB's indexOf, to get all such replacements done.
Unless String objects represented a huge burden, I would usually prefer the
way you did it.

Unless he doesn't care, that something like \\n could be mistranslated
into backslash-newline (assuming some common way of unescaping), rather
than into backslash-'n', just because the '\n'-translation may appear
before the '\\' translation in the map's iteration order ... Well,
unless this isn't a problem, I'd rather step through the string char by
char and look for replaceable substrings (as in my other followup to
the original posting.)
 
L

Lew

Andreas said:
For StringBuilder, the .replace takes start and end index, rather than
a needle-word. yes, it would probably be faster, but he'd have to
make a loop with SB's indexOf, to get all such replacements done.

That qualifies as "rather more complex code."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,434
Messages
2,571,691
Members
48,796
Latest member
Greg L.

Latest Threads

Top