NOBODY said:
The use case are short vs long strings (5 vs ~100) and strings that need
escaping vs strings that do not need escaping (reserved chars are quotes,
CR, LF, TAB, BS, \, \z: you can guess that looks like sql escaping....
I'm not sure what you mean by \z so I've ignored it in my tests (below).
But don't get me side tracked on prepared statement! that is another
game!)
OK, I'll skip the lecture on security, but if you are going to execute these
strings as SQL it still raises a couple of technical points. (1) since you can
obviously trust the supplier of this data, the supplier is presumably under
your control; if so then it might be easier for them to pre-escape the data for
you. (2) the cost of string copies, etc, will be /tiny/ compared with the cost
of parsing SQL, let alone executing it.
The encoder scans the strings first to find reserved chars (and counts
the extra space it would require). If none, return the original jstring.
Super fast. Otherwise, it malloc a new jchar* of the extended dimension
and loops again to copy & escape the input chars.
Here are the results. [sniped]
Thanks. Interesting numbers.
It seems that you are using an inefficient Java implementation, and an oddly
slow machine. My own experiments run more than an order of magnitude faster.
I don't know what kind of kit you are using, but I presume this is intended to
run on a server class machine. I would expect that to be a lot faster than my
laptop; even if it didn't have a faster clock speed (afaik, clock speed is not
a major factor for most servers) it still would have high spec-ed memory, large
caches, and high bandwidth between memory and CPU -- which is just what this
test needs.
I'm running JDK 1.5 on a 1.5 GHz WinXP laptop, using the -server flag and
allowing the JITer time to warm up before measuring. The JNI code was compiled
with MS VS.net 2003 with default optimisations for "release" mode.
I'll append my Java code at the end of this message. My JNI code is very
similar (I can post that too if you want). It uses the same algorithm as the
Java code; the only essential difference is that the JNI code has a small
optimisation to avoid the overhead of malloc()-ing a temporary buffer when
copying small strings. That saves around 0.1 usecs.
(All the following data averaged over at least 10'000'000 runs)
Short string that do not require escaping: "hello"
java: avg 2.99 us
jni: avg 7.56 us (2.53x slower)
For me:
Java: 0.06 us
JNI: 0.47 us
Short string that requires escaping: "hel'lo"
java: avg 3.69 us
jni: avg 21.9 us (5.9x slower)
For me:
Java: 0.24
JNI: 0.90
Long string no escaping:
"aaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccc
ccddddddddddddddddddeeeeeeeeeeeeeeeeeeeee"
Java: avg 44.9 us
JNI: avg 17.2 us (2.61x FASTER)
For me:
Java: 0.58
JNI: 0.92
Long string + escaping:
"aaaaa'aaaaaaaa'aaaaaaaa'bbbbbbbb'bbbbbbbbbb'bbbbbbbbb'ccccccccc'cccccccc
cc'ccccccc'ddddddd'dddd'ddddddd'eeeee'eeeeeeeeee'eeeeee"
Java: avg 54.0 us
JNI: avg 59.0 us (1.09x slower)
For me:
Java: 3.07
JNI: 2.92
The above tests are a bit ad-hoc. More careful testing gives (note that the
following are in /nano/seconds):
Java (no escaping)
overhead: 23
nanos/char: 5
JNI (no escaping)
overhead: 438
nanos/char: 5
Java (10% escaped)
overhead: 102
nanos/char: 22
JNI (10% escaped)
overhead: 812
nanos/char: 21
(Note that all the strings used for the JNI test fell within the scope of my
malloc() optimisation, longer strings would have made JNI relatively slower)
You'll see that in the case where no copying needs to be done, the Java and JNI
versions run at the same speed, they only differ in the fixed overhead of a JNI
call. A very similar observation holds for the case where 10% of characters
have to be escaped. So there seems to be no benefit in using JNI even for very
long strings, since the curves don't cross. That's on my machine/JVM combo, of
course, other machines may differ.
That encoding is the hotspot of our app, invoked
about a minimum of 3000 times per second).
Since this machine can process even the slowest of your four example inputs in
around 3 usecs, it would be able to handle that workrate at < 0.1% CPU load.
Maybe you should replace your server with a two-year-old "ultra-portable"
laptop like mine ;-)
Come to think of it, even with the times you posted, and even if the entire
workload were the slowest case, that workrate would still only give about a 15%
CPU load... Is your profiler lying to you ?
-- chris
======== code ===========
public String
escape(String input)
{
int length = input.length();
int count = length;
for (int i = 0; i < length; i++)
{
switch (input.charAt(i))
{
case '\'': case '"': case '\\':
case '\n': case '\r': case '\t':
case 0x08:
count++;
}
}
if (count == length)
return input;
char[] b = new char[count];
int pos = 0;
for (int i = 0; i < length; i++)
{
char ch = input.charAt(i);
switch (ch)
{
case '\'': case '"': case '\\':
case '\n': case '\r': case '\t':
case 0x08:
b[pos++] = '\\';
}
b[pos++] = ch;
}
return new String(b);
}