stat() help

Discussion in 'Java' started by rudy.martono@gmail.com, Aug 4, 2006.

  1. Guest

    Hi,

    I am writing a JNI function that receives jstring filename and return
    the created date based on stat function.

    The issue is when I am supposed to handle a Unicode filename.
    For example:
    Εχ.txt ==> "\u0395\u03c7.txt"

    Please correct me if I am wrong.

    function header:
    JNIEXPORT jlong JNICALL Java_getAccessedDate
    (JNIEnv * env, jclass obj, jstring filename)

    using the GetStringUTFRegion, I am able to translate the jstring
    filename into UTF-8 format.

    (*env)->GetStringUTFRegion(env, filename, 0, len, rtn);
    where
    filename is the parameter jstring
    rtn is (char *)
    and len is (*env)->GetStringLength(env, filename)

    When I print it out, it looks like that it gives the right value.
    But I double check the value back by using fopento see whether the file
    exists or not , and it returns NULL.
    Therefore, I assume stat will return 0, but it returns 724466048.

    I am still not familiar with Unicode or UTF-8.
    Does UTF-8 need 2 bytes per character?
    If that is true, then I should use wchar_t instead of char, _wfopen (to
    detect whether the file exists), and _wstat (to get the file's info).

    Thank you,

    Rudy
    , Aug 4, 2006
    #1
    1. Advertising

  2. Guest

    I set a flag if stat returns -1.
    So it looks like it is coming from the translation......



    wrote:
    > Hi,
    >
    > I am writing a JNI function that receives jstring filename and return
    > the created date based on stat function.
    >
    > The issue is when I am supposed to handle a Unicode filename.
    > For example:
    > Εχ.txt ==> "\u0395\u03c7.txt"
    >
    > Please correct me if I am wrong.
    >
    > function header:
    > JNIEXPORT jlong JNICALL Java_getAccessedDate
    > (JNIEnv * env, jclass obj, jstring filename)
    >
    > using the GetStringUTFRegion, I am able to translate the jstring
    > filename into UTF-8 format.
    >
    > (*env)->GetStringUTFRegion(env, filename, 0, len, rtn);
    > where
    > filename is the parameter jstring
    > rtn is (char *)
    > and len is (*env)->GetStringLength(env, filename)
    >
    > When I print it out, it looks like that it gives the right value.
    > But I double check the value back by using fopento see whether the file
    > exists or not , and it returns NULL.
    > Therefore, I assume stat will return 0, but it returns 724466048.
    >
    > I am still not familiar with Unicode or UTF-8.
    > Does UTF-8 need 2 bytes per character?
    > If that is true, then I should use wchar_t instead of char, _wfopen (to
    > detect whether the file exists), and _wstat (to get the file's info).
    >
    > Thank you,
    >
    > Rudy
    , Aug 4, 2006
    #2
    1. Advertising

  3. On 4-8-2006 23:26, wrote:
    > Hi,
    >
    > I am writing a JNI function that receives jstring filename and return
    > the created date based on stat function.
    >
    > The issue is when I am supposed to handle a Unicode filename.
    > For example:
    > Εχ.txt ==> "\u0395\u03c7.txt"
    >
    > Please correct me if I am wrong.
    >
    > function header:
    > JNIEXPORT jlong JNICALL Java_getAccessedDate
    > (JNIEnv * env, jclass obj, jstring filename)
    >
    > using the GetStringUTFRegion, I am able to translate the jstring
    > filename into UTF-8 format.
    >
    > (*env)->GetStringUTFRegion(env, filename, 0, len, rtn);
    > where
    > filename is the parameter jstring
    > rtn is (char *)
    > and len is (*env)->GetStringLength(env, filename)
    >
    > When I print it out, it looks like that it gives the right value.
    > But I double check the value back by using fopento see whether the file
    > exists or not , and it returns NULL.
    > Therefore, I assume stat will return 0, but it returns 724466048.
    >
    > I am still not familiar with Unicode or UTF-8.
    > Does UTF-8 need 2 bytes per character?
    > If that is true, then I should use wchar_t instead of char, _wfopen (to
    > detect whether the file exists), and _wstat (to get the file's info).
    >
    > Thank you,
    >
    > Rudy
    >

    UTF-8 is a variable-length character encoding requiring 1, 2, 3 or 4
    bytes per character. See <http://en.wikipedia.org/wiki/UTF-8>.

    JNI however uses a so-called modified form of UTF-8, which, among other
    differences, only uses 1, 2 or 3 bytes per character. See
    <http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/types.html#wp16542>
    <http://java.sun.com/j2se/1.5.0/docs/api/java/io/DataInput.html#modified-utf-8>

    The UTF-8 bytes of the string Εχ.txt are (hexadecimal notation):
    ce 95 | cf 87 | 2e | 74 | 78 | 74
    Ε \u0395: 2 bytes: ce 95
    χ \u03c7: 2 bytes: cf 87
    .. \u002e: 1 byte: 2e
    t \u0074: 1 byte: 74
    x \u0078: 1 byte: 78

    Probably stat/wstat and fopen/wfopen expect a fixed size char as
    filename parameter. Which encoding do they expect?
    --
    Regards,

    Roland
    Roland de Ruiter, Aug 4, 2006
    #3
  4. Bill Medland Guest

    wrote:

    > Hi,
    >
    > I am writing a JNI function that receives jstring filename and return
    > the created date based on stat function.
    >
    > The issue is when I am supposed to handle a Unicode filename.
    > For example:
    > ??.txt ==> "\u0395\u03c7.txt"
    >
    > Please correct me if I am wrong.
    >
    > function header:
    > JNIEXPORT jlong JNICALL Java_getAccessedDate
    > (JNIEnv * env, jclass obj, jstring filename)
    >
    > using the GetStringUTFRegion, I am able to translate the jstring
    > filename into UTF-8 format.
    >
    > (*env)->GetStringUTFRegion(env, filename, 0, len, rtn);
    > where
    > filename is the parameter jstring
    > rtn is (char *)
    > and len is (*env)->GetStringLength(env, filename)
    >
    > When I print it out, it looks like that it gives the right value.
    > But I double check the value back by using fopento see whether the file
    > exists or not , and it returns NULL.
    > Therefore, I assume stat will return 0, but it returns 724466048.
    >
    > I am still not familiar with Unicode or UTF-8.
    > Does UTF-8 need 2 bytes per character?
    > If that is true, then I should use wchar_t instead of char, _wfopen (to
    > detect whether the file exists), and _wstat (to get the file's info).
    >
    > Thank you,
    >
    > Rudy


    Presumably since you mention _wfopen and _wstat you are talking about a
    Microsoft Windows platform. As far as I know Windows does not normally use
    UTF8 for filenames. Your best bet, on Windows, would probably be to use
    the wide format functions and GetStringChars.

    (Subtle complication; if you are not on Windows then watch out for jchar
    possibly not matching wchar_t which might well be 4 bytes wide)

    --
    Bill Medland
    Bill Medland, Aug 4, 2006
    #4
  5. Guest

    Well,

    I am not sure about the encoded part. The filename can be anything.
    Basically I want to be able to retrieve the date created from it.

    Is it correct to convert the jstring filename into wide character
    everytime, and use _wstat to get the date created?

    What I have changed the code so that it uses GetStringChars( env,
    filename, NULL )
    to get the Unicode value instead of GetStringUTFChars.

    jchar* file = (*env)->GetStringChars( env, filename, NULL )

    and use WideCharToMultiByte function

    WideCharToMultiByte( CP_ACP, 0, (LPCWSTR)filename,
    (*env)->GetStringLength(env, filename)*2,
    new_filename,
    ((*env)->GetStringLength(env,
    filename)*2+1), NULL, NULL )

    I test it with sampletest_ù.txt, and it works.

    when I test it again with Εχ.txt, i get ΕÇ.txt

    Thank you,

    Rudy

    Roland de Ruiter wrote:
    > On 4-8-2006 23:26, wrote:
    > > Hi,
    > >
    > > I am writing a JNI function that receives jstring filename and return
    > > the created date based on stat function.
    > >
    > > The issue is when I am supposed to handle a Unicode filename.
    > > For example:
    > > Εχ.txt ==> "\u0395\u03c7.txt"
    > >
    > > Please correct me if I am wrong.
    > >
    > > function header:
    > > JNIEXPORT jlong JNICALL Java_getAccessedDate
    > > (JNIEnv * env, jclass obj, jstring filename)
    > >
    > > using the GetStringUTFRegion, I am able to translate the jstring
    > > filename into UTF-8 format.
    > >
    > > (*env)->GetStringUTFRegion(env, filename, 0, len, rtn);
    > > where
    > > filename is the parameter jstring
    > > rtn is (char *)
    > > and len is (*env)->GetStringLength(env, filename)
    > >
    > > When I print it out, it looks like that it gives the right value.
    > > But I double check the value back by using fopento see whether the file
    > > exists or not , and it returns NULL.
    > > Therefore, I assume stat will return 0, but it returns 724466048.
    > >
    > > I am still not familiar with Unicode or UTF-8.
    > > Does UTF-8 need 2 bytes per character?
    > > If that is true, then I should use wchar_t instead of char, _wfopen (to
    > > detect whether the file exists), and _wstat (to get the file's info).
    > >
    > > Thank you,
    > >
    > > Rudy
    > >

    > UTF-8 is a variable-length character encoding requiring 1, 2, 3 or 4
    > bytes per character. See <http://en.wikipedia.org/wiki/UTF-8>.
    >
    > JNI however uses a so-called modified form of UTF-8, which, among other
    > differences, only uses 1, 2 or 3 bytes per character. See
    > <http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/types.html#wp16542>
    > <http://java.sun.com/j2se/1.5.0/docs/api/java/io/DataInput.html#modified-utf-8>
    >
    > The UTF-8 bytes of the string Εχ.txt are (hexadecimal notation):
    > ce 95 | cf 87 | 2e | 74 | 78 | 74
    > Ε \u0395: 2 bytes: ce 95
    > χ \u03c7: 2 bytes: cf 87
    > . \u002e: 1 byte: 2e
    > t \u0074: 1 byte: 74
    > x \u0078: 1 byte: 78
    >
    > Probably stat/wstat and fopen/wfopen expect a fixed size char as
    > filename parameter. Which encoding do they expect?
    > --
    > Regards,
    >
    > Roland
    , Aug 7, 2006
    #5
  6. Guest

    I think I have found the solution.
    Someone posted the same question, and the solution is using memcpy to
    copy the value between jchar* and wchar_t.

    I will do more testing and post the result.

    Thank you,

    Rudy

    wrote:
    > Well,
    >
    > I am not sure about the encoded part. The filename can be anything.
    > Basically I want to be able to retrieve the date created from it.
    >
    > Is it correct to convert the jstring filename into wide character
    > everytime, and use _wstat to get the date created?
    >
    > What I have changed the code so that it uses GetStringChars( env,
    > filename, NULL )
    > to get the Unicode value instead of GetStringUTFChars.
    >
    > jchar* file = (*env)->GetStringChars( env, filename, NULL )
    >
    > and use WideCharToMultiByte function
    >
    > WideCharToMultiByte( CP_ACP, 0, (LPCWSTR)filename,
    > (*env)->GetStringLength(env, filename)*2,
    > new_filename,
    > ((*env)->GetStringLength(env,
    > filename)*2+1), NULL, NULL )
    >
    > I test it with sampletest_ù.txt, and it works.
    >
    > when I test it again with Εχ.txt, i get ΕÇ.txt
    >
    > Thank you,
    >
    > Rudy
    >
    > Roland de Ruiter wrote:
    > > On 4-8-2006 23:26, wrote:
    > > > Hi,
    > > >
    > > > I am writing a JNI function that receives jstring filename and return
    > > > the created date based on stat function.
    > > >
    > > > The issue is when I am supposed to handle a Unicode filename.
    > > > For example:
    > > > Εχ.txt ==> "\u0395\u03c7.txt"
    > > >
    > > > Please correct me if I am wrong.
    > > >
    > > > function header:
    > > > JNIEXPORT jlong JNICALL Java_getAccessedDate
    > > > (JNIEnv * env, jclass obj, jstring filename)
    > > >
    > > > using the GetStringUTFRegion, I am able to translate the jstring
    > > > filename into UTF-8 format.
    > > >
    > > > (*env)->GetStringUTFRegion(env, filename, 0, len, rtn);
    > > > where
    > > > filename is the parameter jstring
    > > > rtn is (char *)
    > > > and len is (*env)->GetStringLength(env, filename)
    > > >
    > > > When I print it out, it looks like that it gives the right value.
    > > > But I double check the value back by using fopento see whether the file
    > > > exists or not , and it returns NULL.
    > > > Therefore, I assume stat will return 0, but it returns 724466048.
    > > >
    > > > I am still not familiar with Unicode or UTF-8.
    > > > Does UTF-8 need 2 bytes per character?
    > > > If that is true, then I should use wchar_t instead of char, _wfopen (to
    > > > detect whether the file exists), and _wstat (to get the file's info).
    > > >
    > > > Thank you,
    > > >
    > > > Rudy
    > > >

    > > UTF-8 is a variable-length character encoding requiring 1, 2, 3 or 4
    > > bytes per character. See <http://en.wikipedia.org/wiki/UTF-8>.
    > >
    > > JNI however uses a so-called modified form of UTF-8, which, among other
    > > differences, only uses 1, 2 or 3 bytes per character. See
    > > <http://java.sun.com/j2se/1.5.0/docs/guide/jni/spec/types.html#wp16542>
    > > <http://java.sun.com/j2se/1.5.0/docs/api/java/io/DataInput.html#modified-utf-8>
    > >
    > > The UTF-8 bytes of the string Εχ.txt are (hexadecimal notation):
    > > ce 95 | cf 87 | 2e | 74 | 78 | 74
    > > Ε \u0395: 2 bytes: ce 95
    > > χ \u03c7: 2 bytes: cf 87
    > > . \u002e: 1 byte: 2e
    > > t \u0074: 1 byte: 74
    > > x \u0078: 1 byte: 78
    > >
    > > Probably stat/wstat and fopen/wfopen expect a fixed size char as
    > > filename parameter. Which encoding do they expect?
    > > --
    > > Regards,
    > >
    > > Roland
    , Aug 7, 2006
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Steven T. Hatton
    Replies:
    3
    Views:
    9,693
    Pete Becker
    Jun 2, 2005
  2. Patrick Useldinger

    os.stat('<filename>')[stat.ST_INO] on Windows

    Patrick Useldinger, Feb 27, 2005, in forum: Python
    Replies:
    6
    Views:
    1,155
    =?ISO-8859-1?Q?=22Martin_v=2E_L=F6wis=22?=
    Mar 3, 2005
  3. Magesh
    Replies:
    3
    Views:
    431
    Gordon Burditt
    Oct 5, 2007
  4. Rolf Krüger
    Replies:
    2
    Views:
    607
    Ian Collins
    Mar 12, 2008
  5. ruck
    Replies:
    10
    Views:
    1,258
Loading...

Share This Page