stat() help

R

rudy.martono

Hi,

I am writing a JNI function that receives jstring filename and return
the created date based on stat function.

The issue is when I am supposed to handle a Unicode filename.
For example:
Εχ.txt ==> "\u0395\u03c7.txt"

Please correct me if I am wrong.

function header:
JNIEXPORT jlong JNICALL Java_getAccessedDate
(JNIEnv * env, jclass obj, jstring filename)

using the GetStringUTFRegion, I am able to translate the jstring
filename into UTF-8 format.

(*env)->GetStringUTFRegion(env, filename, 0, len, rtn);
where
filename is the parameter jstring
rtn is (char *)
and len is (*env)->GetStringLength(env, filename)

When I print it out, it looks like that it gives the right value.
But I double check the value back by using fopento see whether the file
exists or not , and it returns NULL.
Therefore, I assume stat will return 0, but it returns 724466048.

I am still not familiar with Unicode or UTF-8.
Does UTF-8 need 2 bytes per character?
If that is true, then I should use wchar_t instead of char, _wfopen (to
detect whether the file exists), and _wstat (to get the file's info).

Thank you,

Rudy
 
R

rudy.martono

I set a flag if stat returns -1.
So it looks like it is coming from the translation......
 
R

Roland de Ruiter

Hi,

I am writing a JNI function that receives jstring filename and return
the created date based on stat function.

The issue is when I am supposed to handle a Unicode filename.
For example:
Εχ.txt ==> "\u0395\u03c7.txt"

Please correct me if I am wrong.

function header:
JNIEXPORT jlong JNICALL Java_getAccessedDate
(JNIEnv * env, jclass obj, jstring filename)

using the GetStringUTFRegion, I am able to translate the jstring
filename into UTF-8 format.

(*env)->GetStringUTFRegion(env, filename, 0, len, rtn);
where
filename is the parameter jstring
rtn is (char *)
and len is (*env)->GetStringLength(env, filename)

When I print it out, it looks like that it gives the right value.
But I double check the value back by using fopento see whether the file
exists or not , and it returns NULL.
Therefore, I assume stat will return 0, but it returns 724466048.

I am still not familiar with Unicode or UTF-8.
Does UTF-8 need 2 bytes per character?
If that is true, then I should use wchar_t instead of char, _wfopen (to
detect whether the file exists), and _wstat (to get the file's info).

Thank you,

Rudy
UTF-8 is a variable-length character encoding requiring 1, 2, 3 or 4
bytes per character. See <http://en.wikipedia.org/wiki/UTF-8>.

JNI however uses a so-called modified form of UTF-8, which, among other
differences, only uses 1, 2 or 3 bytes per character. See
<http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/types.html#wp16542>
<http://java.sun.com/j2se/1.5.0/docs/api/java/io/DataInput.html#modified-utf-8>

The UTF-8 bytes of the string Εχ.txt are (hexadecimal notation):
ce 95 | cf 87 | 2e | 74 | 78 | 74
Ε \u0395: 2 bytes: ce 95
χ \u03c7: 2 bytes: cf 87
.. \u002e: 1 byte: 2e
t \u0074: 1 byte: 74
x \u0078: 1 byte: 78

Probably stat/wstat and fopen/wfopen expect a fixed size char as
filename parameter. Which encoding do they expect?
 
B

Bill Medland

Hi,

I am writing a JNI function that receives jstring filename and return
the created date based on stat function.

The issue is when I am supposed to handle a Unicode filename.
For example:
??.txt ==> "\u0395\u03c7.txt"

Please correct me if I am wrong.

function header:
JNIEXPORT jlong JNICALL Java_getAccessedDate
(JNIEnv * env, jclass obj, jstring filename)

using the GetStringUTFRegion, I am able to translate the jstring
filename into UTF-8 format.

(*env)->GetStringUTFRegion(env, filename, 0, len, rtn);
where
filename is the parameter jstring
rtn is (char *)
and len is (*env)->GetStringLength(env, filename)

When I print it out, it looks like that it gives the right value.
But I double check the value back by using fopento see whether the file
exists or not , and it returns NULL.
Therefore, I assume stat will return 0, but it returns 724466048.

I am still not familiar with Unicode or UTF-8.
Does UTF-8 need 2 bytes per character?
If that is true, then I should use wchar_t instead of char, _wfopen (to
detect whether the file exists), and _wstat (to get the file's info).

Thank you,

Rudy

Presumably since you mention _wfopen and _wstat you are talking about a
Microsoft Windows platform. As far as I know Windows does not normally use
UTF8 for filenames. Your best bet, on Windows, would probably be to use
the wide format functions and GetStringChars.

(Subtle complication; if you are not on Windows then watch out for jchar
possibly not matching wchar_t which might well be 4 bytes wide)
 
R

rudy.martono

Well,

I am not sure about the encoded part. The filename can be anything.
Basically I want to be able to retrieve the date created from it.

Is it correct to convert the jstring filename into wide character
everytime, and use _wstat to get the date created?

What I have changed the code so that it uses GetStringChars( env,
filename, NULL )
to get the Unicode value instead of GetStringUTFChars.

jchar* file = (*env)->GetStringChars( env, filename, NULL )

and use WideCharToMultiByte function

WideCharToMultiByte( CP_ACP, 0, (LPCWSTR)filename,
(*env)->GetStringLength(env, filename)*2,
new_filename,
((*env)->GetStringLength(env,
filename)*2+1), NULL, NULL )

I test it with sampletest_ù.txt, and it works.

when I test it again with Εχ.txt, i get ΕÇ.txt

Thank you,

Rudy
 
R

rudy.martono

I think I have found the solution.
Someone posted the same question, and the solution is using memcpy to
copy the value between jchar* and wchar_t.

I will do more testing and post the result.

Thank you,

Rudy
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top