JNI question: where is the 'jstring' defined?

N

NOBODY

Hi,

I'm trying to read the char[] of a String from JNI code.
I got the jchar* from the GetStringChars(... &isCopy) for now, and of
course it works.

But the isCopy is JNI_TRUE. So performance cost is knocking at the door...

I don't intend to violate the jstring object received, But I would love to
read its actual jchar* (with the offset and size attribute of String of
course). I just can't find the stupid .h or .c file in the jvm source code.

Any idea where to look for?
I'm trying to write a simple string transform (creates an encoded/decoded
string).

Thanks.
 
G

Gordon Beaton

I don't intend to violate the jstring object received, But I would love to
read its actual jchar* (with the offset and size attribute of String of
course). I just can't find the stupid .h or .c file in the jvm source code.

jstring is a pointer to an opaque datatype, and is likely just one of
several aliases for jobject. If you really want to look inside it, you
need to find its "real" definition from the source code of your
specific JVM. If it's even available at all it will be a separate
download from the JDK itself.

/gordon
 
R

Roedy Green

Any idea where to look for?
I'm trying to write a simple string transform (creates an encoded/decoded
string).

When you run Javah it will generate a file something like this:

/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>

/* Header for class com_mindprod_pcclock_PCClock */

#ifndef _Included_com_mindprod_pcclock_PCClock
#define _Included_com_mindprod_pcclock_PCClock

#ifdef __cplusplus
extern "C" {
#endif

/* Inaccessible static: UTC */

/*

* Class: com_mindprod_pcclock_PCClock

* Method: nativeSetClock

* Signature: (IIIIIII)I

*/

JNIEXPORT jint JNICALL
Java_com_mindprod_pcclock_PCClock_nativeSetClock

(JNIEnv *, jobject, jint, jint, jint, jint, jint, jint, jint);


#ifdef __cplusplus
}
#endif

#endif


I think you can begin your searches in jni.h
 
G

Gordon Beaton

I think you can begin your searches in jni.h

Unfortunately for him, the relevant datatypes are completely opaque,
i.e. they are not publicly defined in the header files that come with
Suns JDK, and I suspect the same is true of other common JDKs.

/gordon
 
R

Roedy Green

Unfortunately for him, the relevant datatypes are completely opaque,
i.e. they are not publicly defined in the header files that come with
Suns JDK, and I suspect the same is true of other common JDKs.

that suggests Sun is telling you the structures are free to change at
any time without warning. If you trace them and crack the structure,
your code may not work on any other release.
 
C

Chris Uppal

NOBODY said:
I'm trying to read the char[] of a String from JNI code.
I got the jchar* from the GetStringChars(... &isCopy) for now, and of
course it works.

But the isCopy is JNI_TRUE. So performance cost is knocking at the door...

If you aren't already bothered by the cost of calling a JNI method from Java
(or a Java method from JNI) then I doubt if the cost of the copy is going to
bother you much. I'd guess that the string would have to be several thousand
characters long before the cost of the copy was greater than the overhead of a
JNI call.

I don't intend to violate the jstring object received, But I would love to
read its actual jchar* (with the offset and size attribute of String of
course). I just can't find the stupid .h or .c file in the jvm source
code.

That information is not available (as Gordon has already said).

If you /really want/ to violate encapsulation, write fragile
implementation-dependent code, etc, etc, then you could access the String's
internal char[] value, int size, and int offset variables directly from JNI.
But then you'd have to use GetCharArrayElements() and there's no reason to
suppose that would be any quicker....

-- chris
 
N

NOBODY

NOBODY said:
I'm trying to read the char[] of a String from JNI code.
I got the jchar* from the GetStringChars(... &isCopy) for now, and of
course it works.

But the isCopy is JNI_TRUE. So performance cost is knocking at the
door...

If you aren't already bothered by the cost of calling a JNI method
from Java (or a Java method from JNI) then I doubt if the cost of the
copy is going to bother you much. I'd guess that the string would
have to be several thousand characters long before the cost of the
copy was greater than the overhead of a JNI call.


I made many tests before asking.
The use case are short vs long strings (5 vs ~100) and strings that need
escaping vs strings that do not need escaping (reserved chars are quotes,
CR, LF, TAB, BS, \, \z: you can guess that looks like sql escaping....
But don't get me side tracked on prepared statement! that is another
game!)

The encoder scans the strings first to find reserved chars (and counts
the extra space it would require). If none, return the original jstring.
Super fast. Otherwise, it malloc a new jchar* of the extended dimension
and loops again to copy & escape the input chars.

Here are the results. JNI call overhead is below 10% with ~100 char
strings (and most of our strings are long usually above 100 chars) Note:
the java version uses StringBuffer.append(char) on every char, tested in
a switch case to find if it is reserved. So the java version doesn't
return the original string. This is because I would have needed about 8
calls to String.indexOf(reservedchar) just to figure out if it needs
escaping. Pulling the char[] out of the String (to iterate the same ways
I do in C) is not an option (similar prohibitive cost to GetStringChars).


(for 1'000'000 calls)

Short string that do not require escaping: "hello"
java: avg 2.99 us
jni: avg 7.56 us (2.53x slower)

Short string that requires escaping: "hel'lo"
java: avg 3.69 us
jni: avg 21.9 us (5.9x slower)

(from here, 100'000 calls)

Long string no escaping:
"aaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccc
ccddddddddddddddddddeeeeeeeeeeeeeeeeeeeee"
Java: avg 44.9 us
JNI: avg 17.2 us (2.61x FASTER)

Long string + escaping:
"aaaaa'aaaaaaaa'aaaaaaaa'bbbbbbbb'bbbbbbbbbb'bbbbbbbbb'ccccccccc'cccccccc
cc'ccccccc'ddddddd'dddd'ddddddd'eeeee'eeeeeeeeee'eeeeee"

Java: avg 54.0 us
JNI: avg 59.0 us (1.09x slower)


So, the benefit of JNI lies in the fact that it scale better that java
for long strings, which compensate for it base call cost. That cost is
the jni overhead, and the GetStringChars() of original jstring which I'm
trying to eliminate. That encoding is the hotspot of our app, invoked
about a minimum of 3000 times per second).

I have the jvm source. That's what I searched for the 'jstring' text.
It's too much to read of course so I'm asking for those who might already
know.

Thanks.


I don't intend to violate the jstring object received, But I would
love to read its actual jchar* (with the offset and size attribute of
String of course). I just can't find the stupid .h or .c file in the
jvm source code.

That information is not available (as Gordon has already said).

If you /really want/ to violate encapsulation, write fragile
implementation-dependent code, etc, etc, then you could access the
String's internal char[] value, int size, and int offset variables
directly from JNI. But then you'd have to use GetCharArrayElements()
and there's no reason to suppose that would be any quicker....

-- chris
 
R

Roedy Green

If you /really want/ to violate encapsulation, write fragile
implementation-dependent code, etc, etc, then you could access the String's
internal char[] value, int size, and int offset variables directly from JNI.
But then you'd have to use GetCharArrayElements() and there's no reason to
suppose that would be any quicker....

if there were a generic, safe way to do that, surely Sun would have
used it. Even if you figure out a way, it will surely have a major
catch.
 
C

Chris Uppal

NOBODY said:
The use case are short vs long strings (5 vs ~100) and strings that need
escaping vs strings that do not need escaping (reserved chars are quotes,
CR, LF, TAB, BS, \, \z: you can guess that looks like sql escaping....

I'm not sure what you mean by \z so I've ignored it in my tests (below).

But don't get me side tracked on prepared statement! that is another
game!)

OK, I'll skip the lecture on security, but if you are going to execute these
strings as SQL it still raises a couple of technical points. (1) since you can
obviously trust the supplier of this data, the supplier is presumably under
your control; if so then it might be easier for them to pre-escape the data for
you. (2) the cost of string copies, etc, will be /tiny/ compared with the cost
of parsing SQL, let alone executing it.

The encoder scans the strings first to find reserved chars (and counts
the extra space it would require). If none, return the original jstring.
Super fast. Otherwise, it malloc a new jchar* of the extended dimension
and loops again to copy & escape the input chars.

Here are the results. [sniped]

Thanks. Interesting numbers.

It seems that you are using an inefficient Java implementation, and an oddly
slow machine. My own experiments run more than an order of magnitude faster.
I don't know what kind of kit you are using, but I presume this is intended to
run on a server class machine. I would expect that to be a lot faster than my
laptop; even if it didn't have a faster clock speed (afaik, clock speed is not
a major factor for most servers) it still would have high spec-ed memory, large
caches, and high bandwidth between memory and CPU -- which is just what this
test needs.

I'm running JDK 1.5 on a 1.5 GHz WinXP laptop, using the -server flag and
allowing the JITer time to warm up before measuring. The JNI code was compiled
with MS VS.net 2003 with default optimisations for "release" mode.

I'll append my Java code at the end of this message. My JNI code is very
similar (I can post that too if you want). It uses the same algorithm as the
Java code; the only essential difference is that the JNI code has a small
optimisation to avoid the overhead of malloc()-ing a temporary buffer when
copying small strings. That saves around 0.1 usecs.

(All the following data averaged over at least 10'000'000 runs)
Short string that do not require escaping: "hello"
java: avg 2.99 us
jni: avg 7.56 us (2.53x slower)

For me:
Java: 0.06 us
JNI: 0.47 us

Short string that requires escaping: "hel'lo"
java: avg 3.69 us
jni: avg 21.9 us (5.9x slower)

For me:
Java: 0.24
JNI: 0.90

Long string no escaping:
"aaaaaaaaaaaaaaaaaaaaabbbbbbbbbbbbbbbbbbbbbbbbbbbcccccccccccccccccccccccc
ccddddddddddddddddddeeeeeeeeeeeeeeeeeeeee"
Java: avg 44.9 us
JNI: avg 17.2 us (2.61x FASTER)

For me:
Java: 0.58
JNI: 0.92

Long string + escaping:
"aaaaa'aaaaaaaa'aaaaaaaa'bbbbbbbb'bbbbbbbbbb'bbbbbbbbb'ccccccccc'cccccccc
cc'ccccccc'ddddddd'dddd'ddddddd'eeeee'eeeeeeeeee'eeeeee"

Java: avg 54.0 us
JNI: avg 59.0 us (1.09x slower)

For me:
Java: 3.07
JNI: 2.92

The above tests are a bit ad-hoc. More careful testing gives (note that the
following are in /nano/seconds):

Java (no escaping)
overhead: 23
nanos/char: 5

JNI (no escaping)
overhead: 438
nanos/char: 5

Java (10% escaped)
overhead: 102
nanos/char: 22

JNI (10% escaped)
overhead: 812
nanos/char: 21

(Note that all the strings used for the JNI test fell within the scope of my
malloc() optimisation, longer strings would have made JNI relatively slower)

You'll see that in the case where no copying needs to be done, the Java and JNI
versions run at the same speed, they only differ in the fixed overhead of a JNI
call. A very similar observation holds for the case where 10% of characters
have to be escaped. So there seems to be no benefit in using JNI even for very
long strings, since the curves don't cross. That's on my machine/JVM combo, of
course, other machines may differ.

That encoding is the hotspot of our app, invoked
about a minimum of 3000 times per second).

Since this machine can process even the slowest of your four example inputs in
around 3 usecs, it would be able to handle that workrate at < 0.1% CPU load.
Maybe you should replace your server with a two-year-old "ultra-portable"
laptop like mine ;-)

Come to think of it, even with the times you posted, and even if the entire
workload were the slowest case, that workrate would still only give about a 15%
CPU load... Is your profiler lying to you ?

-- chris

======== code ===========
public String
escape(String input)
{
int length = input.length();
int count = length;
for (int i = 0; i < length; i++)
{
switch (input.charAt(i))
{
case '\'': case '"': case '\\':
case '\n': case '\r': case '\t':
case 0x08:
count++;
}
}

if (count == length)
return input;

char[] b = new char[count];
int pos = 0;
for (int i = 0; i < length; i++)
{
char ch = input.charAt(i);
switch (ch)
{
case '\'': case '"': case '\\':
case '\n': case '\r': case '\t':
case 0x08:
b[pos++] = '\\';
}
b[pos++] = ch;
}

return new String(b);
}
 
N

NOBODY

If you /really want/ to violate encapsulation, write fragile
implementation-dependent code, etc, etc, then you could access the
String's internal char[] value, int size, and int offset variables
directly from JNI. But then you'd have to use GetCharArrayElements()
and there's no reason to suppose that would be any quicker....

if there were a generic, safe way to do that, surely Sun would have
used it.

Sun would use their own implementation secretc if they had to provide such
custom encoder, I agree. But they didn't have to and their implementation
of jstring is clearly made obscure to prevent tampering memory objects.
(honestly, there are CPU instructions that can help implement
String.indexOf() and .equals() in much faster ways (on intels, SSTOS,
SSCANS if I recall correctly, which sadly are not anymore on P4 for some
reasons, but could be implemented in gate-array electronic and run 1024
char loops in 1 cpu clock... but now I'm drifting off topic!).

Even if you figure out a way, it will surely have a major catch.

I cannot see one, since the jstring object is the mirror of an immutable
object, and the jni ref is valid and used only in that stackframe, and it
is only read. Thanks anyway. But still, where is the 'jstring' .c file?!!

:)
 
N

NOBODY

Thanks for your generous assistance.
I have not seen such attention for a long time.
My comments below.

I'm not sure what you mean by \z so I've ignored it in my tests
(below).

I believe it is EOL (end of line, or EOF, end of file, it is ascii 26
anyway).

(1) since you can obviously trust the supplier of this data, the
supplier is presumably under your control; if so then it might be
easier for them to pre-escape the data for you. (2) the cost of
string copies, etc, will be /tiny/ compared with the cost of parsing
SQL, let alone executing it.

For (1), the sql assembly is under our control, yes. But some of the
sensitive strings are user provided strings and so needs escaping to
prevent sql injection (security reason you assumed). For (2), I have
little concerns for the code I cannot optimize (mysqld).

It seems that you are using an inefficient Java implementation, and an
oddly slow machine. My own experiments run more than an order of
magnitude faster.

Don't worry about the jvm/machine. It is a jdk 1.4.2_03, on a crappy dual
p2. But the test was run on a dual opteron 2.2 ghz with 4 gigs DDR ram
333 mhz, and the proportion were similar, which is the important.

So, in a nutshell: JNI won't make it faster in most case.
Got the point.

Come to think of it, even with the times you posted, and even if the
entire workload were the slowest case, that workrate would still only
give about a 15% CPU load... Is your profiler lying to you ?

I'm not escaping such simple strings. They were only for test purpose.
We are escaping strings from 1 to 32k chars, and I mentionned that we are
processing a minimum of 3000/s. In some cases, we could see bursts of
20000/s. (of course, that is on our dual opteron...)
Don't pay attention to any absolute values nor provisionning of the CPU.

Your java code is reflecting very much my C code. Interresting to note
that my brain stopped working at the sight of another String.charAt()
which lead me to believe that I needed many indexOf()...

Hehehe. Thanks a lot, Chris!
 
R

Roedy Green

there are CPU instructions that can help implement
String.indexOf() and .equals() in much faster ways (on intels, SSTOS,
SSCANS if I recall correctly, which sadly are not anymore on P4 for some
reasons, but could be implemented in gate-array electronic and run 1024
char loops in 1 cpu clock... but now I'm drifting off topic!).

I would be astounded if those instructions disappeared. All kinds of
code would stop working. What you may have read is that they are not
implemented in hardware, but in microcode, so are not as fast as the
equivalent longhand mov instructions.
 
C

Chris Uppal

NOBODY said:
I believe it is EOL (end of line, or EOF, end of file, it is ascii 26
anyway).

Control-Z. Used to be used as an end-of-file indicator by certain primitive
OSes that were too dumb even to know how long their own files were...

BTW, you perhaps also ought to check for 0x0 since some tools might take that
as end-of-string while others wouldn't. And that kind of disagreement over the
meaning of a string can be a godsend for a cracker.

For (1), the sql assembly is under our control, yes. But some of the
sensitive strings are user provided strings and so needs escaping to
prevent sql injection (security reason you assumed).

Just a thought, but if only some of the strings are "untrusted" (and if you
know which ones they are), can you save time by only escaping those ?

Don't worry about the jvm/machine. It is a jdk 1.4.2_03, on a crappy dual
p2. But the test was run on a dual opteron 2.2 ghz with 4 gigs DDR ram
333 mhz, and the proportion were similar, which is the important.

Hmm, that /should/ be faster than my laptop. Puzzling....

I'm not escaping such simple strings. They were only for test purpose.
We are escaping strings from 1 to 32k chars, and I mentionned that we are
processing a minimum of 3000/s. In some cases, we could see bursts of
20000/s. (of course, that is on our dual opteron...)

Ah, yes. 50 usec / job doesn't give you a lot of time to play with.

-- chris
 
N

NOBODY

I would be astounded if those instructions disappeared. All kinds of
code would stop working. What you may have read is that they are not
implemented in hardware, but in microcode, so are not as fast as the
equivalent longhand mov instructions.


Sounds familiar. I think you are right. My brain's memory lack parity
bits...! But yeah, I think I remember something about not being a 1-clock
operation anymore...
 
N

NOBODY

It seems that you are using an inefficient Java implementation, and
Hmm, that /should/ be faster than my laptop. Puzzling....

I really meant 'proportions'. On the dual opteron (although only 1 cpu is
used I guess), everything is about 20x faster (can't remember the exact
ratio)

Thanks again.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top