JNI byte array to string

Discussion in 'Java' started by static, Jun 15, 2004.

  1. static

    static Guest

    Hi,

    I use JNI to call a C function and get a record converted from MARC-8
    to UNICODE and return the data in a byteArray. It works fine and the
    byteArray is correctly populated. I can write it out to a file and
    verified the data.

    The problem is converting the byte array to a string. If I do

    String n = new String(test);

    Then about 6 of the characters get replaced with question marks. Is
    there a way to retain all of the data from a byte array and convert it
    to a String?

    I also tried

    String n = new String(test,"UTF-8");

    and that didn't work. A few characters got replaced with question
    marks.

    Any ideas will be greatly appreciated.

    Ashley
     
    static, Jun 15, 2004
    #1
    1. Advertising

  2. static

    ak Guest

    > I use JNI to call a C function and get a record converted from MARC-8
    > to UNICODE and return the data in a byteArray. It works fine and the
    > byteArray is correctly populated. I can write it out to a file and
    > verified the data.
    >
    > The problem is converting the byte array to a string. If I do
    >
    > String n = new String(test);
    >
    > Then about 6 of the characters get replaced with question marks. Is
    > there a way to retain all of the data from a byte array and convert it
    > to a String?
    >
    > I also tried
    >
    > String n = new String(test,"UTF-8");


    don't create String, but read it with DataInputStream#readUTF();

    --
    http://uio.dev.java.net Unified I/O for Java
    http://reader.imagero.com Java image reader
     
    ak, Jun 15, 2004
    #2
    1. Advertising

  3. static

    static Guest

    I tried the following and am still getting some characters in the byte
    array changed to question marks.

    try {
    DataInputStream dis = new DataInputStream(new
    ByteArrayInputStream(unicode_byte_array));
    orig = dis.readLine();
    }
    catch (IOException e)
    {
    //System.out.println(e);
    }

    Since readLine is deprecated, is there another way to read the data
    from the byte array and not change it. readString will change it and
    since it is already in UTF-8, doing a readUTF8 corrupts the data by
    translating something that is already in utf8.

    Thanks in advance.

    Ashley

    "ak" <> wrote in message news:<canf8t$rf4$>...
    > > I use JNI to call a C function and get a record converted from MARC-8
    > > to UNICODE and return the data in a byteArray. It works fine and the
    > > byteArray is correctly populated. I can write it out to a file and
    > > verified the data.
    > >
    > > The problem is converting the byte array to a string. If I do
    > >
    > > String n = new String(test);
    > >
    > > Then about 6 of the characters get replaced with question marks. Is
    > > there a way to retain all of the data from a byte array and convert it
    > > to a String?
    > >
    > > I also tried
    > >
    > > String n = new String(test,"UTF-8");

    >
    > don't create String, but read it with DataInputStream#readUTF();
     
    static, Jun 16, 2004
    #3
  4. static

    Roedy Green Guest

    On 16 Jun 2004 12:58:10 -0700, (static)
    wrote or quoted :

    >
    >Since readLine is deprecated, is there another way to read the data
    >from the byte array and not change it. readString will change it and
    >since it is already in UTF-8, doing a readUTF8 corrupts the data by
    >translating something that is already in utf8.


    There are many possible things you could be trying to do. You first
    have to get clear on just what your data are.

    1. 16-bit unicode
    2. 8-bit chars in some encoding
    3. binary data.
    4. serialised objects.

    Then you can ask the File I/O amanuensis to generate the necessary
    code to read it.

    See http://mindprod.com/fileio.html

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Jun 16, 2004
    #4
  5. static

    ak Guest

    ak, Jun 17, 2004
    #5
  6. static

    Roedy Green Guest

    On Thu, 17 Jun 2004 09:32:38 +0200, "ak" <> wrote or
    quoted :

    >readUTF() doesn't create UTF, but read data wiich is in UTF format.


    UTF is not just unicode-8. It is a special binary format with counted
    strings. It is not designed to be human readable.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Jun 17, 2004
    #6
  7. Roedy Green wrote:
    >>readUTF() doesn't create UTF, but read data wiich is in UTF format.

    >
    >
    > UTF is not just unicode-8. It is a special binary format with counted
    > strings. It is not designed to be human readable.


    There's no such thing as "unicode-8", and UTF-8 is exactly as "human readable" as
    ASCII (to which it is downwards-compatible) or any other text encoding.
    The readUTF() method simply expects a sequence of UTF-8 encoded characters
    prepended by two bytes specifying the length of the sequence.
     
    Michael Borgwardt, Jun 17, 2004
    #7
  8. static

    static Guest

    guys I tried the readUTF() but if I print out orig, the output doesn't
    match the output from the unicode_byte_array. The whole string seems
    like it shrunk the byte array down. I would like to print the String
    and have the output match the byte array. I also tried writing the
    data to a file and reading it with

    InputStream ba = new FileInputStream("test");
    DataInputStream dis = new DataInputStream(ba);
    orig = dis.readUTF();

    but when I print out orig, the output is different. I'll be glad to
    mail you my data file which is about 1830 bytes for you to try.

    Thanks so much for the input. Any other ideas?

    Ashley

    "ak" <> wrote in message news:<carhef$u4k$>...
    > readUTF() doesn't create UTF, but read data wiich is in UTF format.
    > see http://java.sun.com/j2se/1.3/docs/api/java/io/DataInput.html#readUTF()
    >
    > try {
    > DataInputStream dis = new DataInputStream(new
    > ByteArrayInputStream(unicode_byte_array));
    > orig = dis.readUTF();
    > }
    > catch (IOException e)
    > {
    > //System.out.println(e);
    > }
     
    static, Jun 17, 2004
    #8
  9. static

    Roedy Green Guest

    On Thu, 17 Jun 2004 17:47:57 +0200, Michael Borgwardt
    <> wrote or quoted :

    >
    >There's no such thing as "unicode-8", and UTF-8 is exactly as "human readable" as
    >ASCII (to which it is downwards-compatible) or any other text encoding.
    >The readUTF() method simply expects a sequence of UTF-8 encoded characters
    >prepended by two bytes specifying the length of the sequence.


    People try to use writeUTF to create human-readable files. They are
    not because of the length fields.

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Jun 17, 2004
    #9
  10. static

    ak Guest

    > but when I print out orig, the output is different. I'll be glad to
    > mail you my data file which is about 1830 bytes for you to try.
    >


    post an attachment, and dont forget to post also original string.

    --
    http://uio.dev.java.net Unified I/O for Java
    http://reader.imagero.com Java image reader
     
    ak, Jun 17, 2004
    #10
  11. static wrote:
    > guys I tried the readUTF() but if I print out orig, the output doesn't
    > match the output from the unicode_byte_array.


    That's because your input doesn't contain the length fields that
    readUTF() expects.

    The method is not meant to be used for processing text files, rather for
    processing text embedded in binary files.

    Instead, use the Reader classes.
     
    Michael Borgwardt, Jun 18, 2004
    #11
  12. static

    static Guest

    Here's what my byte array contains

    01830cam a22003734a 45000010009000000050017000090080041000269060045000679250042001129550123001540100017002770200015002940350026003090400024003350420008003590430012003670500025003792450191004042460052005952600068006473000041007155040066007566500045008226510057008676500064009247000026009887000026010148800203010408800085012438800040013288800032013689230030014009520026014301211021220020226122429.00
    0717s2000 is a b 001 0 heb 
    a7bcbccorignewd2encipf20gn-rlinjack0 aacquireb1 shelf
    copyxdefault policy amb12 to RCCD 07/17/00; desc ye91 09-21-00; to
    ye19 09-21-00 (Heidi Lerner); ye19 to sl 01-19-01; ye04 to BCCD
    02-08-01 a 00377460  a9654484749 a(CStRLIN)DCLH00-B1877
    aDLC-RcDLC-RdDLC-R apcc aa-is---00aNX573.7.A1bA15
    2000106880-01a1900-2000 :bmeʾah shenot tarbut : ha-yetsirah
    ha-ʻIvrit be-Erets-Yiśraʾel = hundred years of Hebrew
    culture in Eretz Israel /c[ʻorkhim], Orah Aḥimeʾir,
    Ḥayim Beʾer.30aHundred years of Hebrew culture in Eretz
    Israel 6880-02aTel Aviv :bʻAm ʻoved :bYediʻot
    aḥaronot,cc2000. a548 p. :bill. (some col.) ;c31 cm.
    aIncludes bibliographical references (p. 512-517) and indexes.
    0aArts, Israeliy20th centuryvChronology. 0aIsraelxIntellectual
    lifey20th centuryvChronology. 0aPopular
    culturezIsraelxHistoryy20th centuryvChronology.1
    6880-03aAhimeir, Ora.1 6880-04aBeʾer,
    Haim.106245-01/raמאה שנות
    תרבות
    :bהיצירה
    העברית
    בארץ־ישראל
    = hundred years of Hebrew culture in Eretz Israel
    /c[עורכים],
    אורה
    אחימאיר,
    חיים באר.
    6260-02/raתל אביב
    :bעם עובד
    :bידיעות
    אחרונות,cc2000.1
    6700-03/raאחימאיר,
    אורה.1 6700-04/raבאר,
    חיים. d20000430n12287s93005373
    a02/19/02 T;11/07/01 T

    A few of the hebrew characters are getting replaced with question
    marks when I do the readUTF. I hope some characters didn't get
    translated by copying and pasting here. Thanks for the help.

    Ashley
     
    static, Jun 18, 2004
    #12
  13. static

    static Guest

    Here's the String after I tried to convert the byte array to a String.
    It ends up loosing several characters.

    01830cam a22003734a 45000010009000000050017000090080041000269060045000679250042001129550123001540100017002770200015002940350026003090400024003350420008003590430012003670500025003792450191004042460052005952600068006473000041007155040066007566500045008226510057008676500064009247000026009887000026010148800203010408800085012438800040013288800032013689230030014009520026014301211021220020226122429.00
    0717s2000 is a b 001 0 heb 
    a7bcbccorignewd2encipf20gn-rlinjack0 aacquireb1 shelf
    copyxdefault policy amb12 to RCCD 07/17/00; desc ye91 09-21-00; to
    ye19 09-21-00 (Heidi Lerner); ye19 to sl 01-19-01; ye04 to BCCD
    02-08-01 a 00377460  a9654484749 a(CStRLIN)DCLH00-B1877
    aDLC-RcDLC-RdDLC-R apcc aa-is---00aNX573.7.A1bA15
    2000106880-01a1900-2000 :bmeʾah shenot tarbut : ha-yetsirah
    ha-Ê»Ivrit be-Erets-YisÌ?raʾel = hundred years of Hebrew culture in
    Eretz Israel /c[ʻorkhim], Orah Aḥimeʾir, Ḥayim
    Beʾer.30aHundred years of Hebrew culture in Eretz Israel
    6880-02aTel Aviv :bʻAm ʻoved :bYediʻot aḥaronot,cc2000.
    a548 p. :bill. (some col.) ;c31 cm. aIncludes bibliographical
    references (p. 512-517) and indexes. 0aArts, Israeliy20th
    centuryvChronology. 0aIsraelxIntellectual lifey20th
    centuryvChronology. 0aPopular culturezIsraelxHistoryy20th
    centuryvChronology.1 6880-03aAhimeir, Ora.1 6880-04aBeʾer,
    Haim.106245-01/raמ×?×" שנות תרבות :b×"יציר×"
    ×"עברית ב×?רץ־ישר×?ל = hundred years of Hebrew culture in
    Eretz Israel /c[עורכי×?], ×?ור×" ×?חימ×?יר, ×—×™×™×?
    ב×?ר. 6260-02/raתל ×?ביב :b×¢×? עוב×" :b×™×"יעות
    ×?חרונות,cc2000.1 6700-03/ra×?חימ×?יר, ×?ור×".1
    6700-04/raב×?ר, ×—×™×™×?. d20000430n12287s93005373
    a02/19/02 T;11/07/01 T
     
    static, Jun 18, 2004
    #13
  14. static

    Roedy Green Guest

    On 18 Jun 2004 07:45:46 -0700, (static)
    wrote or quoted :

    >Here's the String after I tried to convert the byte array to a String.
    > It ends up loosing several characters.


    that's because a translation occurred. Perhaps you used the wrong
    encoding.

    See http://mindprod.com/jgloss/encoding.html

    --
    Canadian Mind Products, Roedy Green.
    Coaching, problem solving, economical contract programming.
    See http://mindprod.com/jgloss/jgloss.html for The Java Glossary.
     
    Roedy Green, Jun 18, 2004
    #14
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Bharat Bhushan

    Appending byte[] to another byte[] array

    Bharat Bhushan, Aug 5, 2003, in forum: Java
    Replies:
    15
    Views:
    40,469
    Roedy Green
    Aug 5, 2003
  2. Kirby
    Replies:
    3
    Views:
    684
    Kirby
    Oct 8, 2004
  3. Replies:
    4
    Views:
    6,907
  4. Guest
    Replies:
    2
    Views:
    2,041
    Guest
    Jun 7, 2007
  5. Guest
    Replies:
    1
    Views:
    627
    Alan Johnson
    Jun 6, 2007
Loading...

Share This Page