JNI byte array to string

static · Jun 15, 2004

Hi,

I use JNI to call a C function and get a record converted from MARC-8
to UNICODE and return the data in a byteArray. It works fine and the
byteArray is correctly populated. I can write it out to a file and
verified the data.

The problem is converting the byte array to a string. If I do

String n = new String(test);

Then about 6 of the characters get replaced with question marks. Is
there a way to retain all of the data from a byte array and convert it
to a String?

I also tried

String n = new String(test,"UTF-8");

and that didn't work. A few characters got replaced with question
marks.

Any ideas will be greatly appreciated.

Ashley

ak · Jun 15, 2004

I use JNI to call a C function and get a record converted from MARC-8

to UNICODE and return the data in a byteArray. It works fine and the
byteArray is correctly populated. I can write it out to a file and
verified the data.

The problem is converting the byte array to a string. If I do

String n = new String(test);

Then about 6 of the characters get replaced with question marks. Is
there a way to retain all of the data from a byte array and convert it
to a String?

I also tried

String n = new String(test,"UTF-8");

don't create String, but read it with DataInputStream#readUTF();

static · Jun 16, 2004

I tried the following and am still getting some characters in the byte
array changed to question marks.

try {
DataInputStream dis = new DataInputStream(new
ByteArrayInputStream(unicode_byte_array));
orig = dis.readLine();
}
catch (IOException e)
{
//System.out.println(e);
}

Since readLine is deprecated, is there another way to read the data
from the byte array and not change it. readString will change it and
since it is already in UTF-8, doing a readUTF8 corrupts the data by
translating something that is already in utf8.

Thanks in advance.

Ashley

Roedy Green · Jun 16, 2004

Since readLine is deprecated, is there another way to read the data
from the byte array and not change it. readString will change it and
since it is already in UTF-8, doing a readUTF8 corrupts the data by
translating something that is already in utf8.

There are many possible things you could be trying to do. You first
have to get clear on just what your data are.

1. 16-bit unicode
2. 8-bit chars in some encoding
3. binary data.
4. serialised objects.

Then you can ask the File I/O amanuensis to generate the necessary
code to read it.

See http://mindprod.com/fileio.html

ak · Jun 17, 2004

readUTF() doesn't create UTF, but read data wiich is in UTF format.
see http://java.sun.com/j2se/1.3/docs/api/java/io/DataInput.html#readUTF()

try {
DataInputStream dis = new DataInputStream(new
ByteArrayInputStream(unicode_byte_array));
orig = dis.readUTF();
}
catch (IOException e)
{
//System.out.println(e);
}

Roedy Green · Jun 17, 2004

readUTF() doesn't create UTF, but read data wiich is in UTF format.

UTF is not just unicode-8. It is a special binary format with counted
strings. It is not designed to be human readable.

Michael Borgwardt · Jun 17, 2004

Roedy said:
UTF is not just unicode-8. It is a special binary format with counted
strings. It is not designed to be human readable.

There's no such thing as "unicode-8", and UTF-8 is exactly as "human readable" as
ASCII (to which it is downwards-compatible) or any other text encoding.
The readUTF() method simply expects a sequence of UTF-8 encoded characters
prepended by two bytes specifying the length of the sequence.

static · Jun 17, 2004

guys I tried the readUTF() but if I print out orig, the output doesn't
match the output from the unicode_byte_array. The whole string seems
like it shrunk the byte array down. I would like to print the String
and have the output match the byte array. I also tried writing the
data to a file and reading it with

InputStream ba = new FileInputStream("test");
DataInputStream dis = new DataInputStream(ba);
orig = dis.readUTF();

but when I print out orig, the output is different. I'll be glad to
mail you my data file which is about 1830 bytes for you to try.

Thanks so much for the input. Any other ideas?

Ashley

Roedy Green · Jun 17, 2004

There's no such thing as "unicode-8", and UTF-8 is exactly as "human readable" as
ASCII (to which it is downwards-compatible) or any other text encoding.
The readUTF() method simply expects a sequence of UTF-8 encoded characters
prepended by two bytes specifying the length of the sequence.

People try to use writeUTF to create human-readable files. They are
not because of the length fields.

ak · Jun 17, 2004

but when I print out orig, the output is different. I'll be glad to

mail you my data file which is about 1830 bytes for you to try.

post an attachment, and dont forget to post also original string.

Michael Borgwardt · Jun 18, 2004

static said:
guys I tried the readUTF() but if I print out orig, the output doesn't
match the output from the unicode_byte_array.

That's because your input doesn't contain the length fields that
readUTF() expects.

The method is not meant to be used for processing text files, rather for
processing text embedded in binary files.

Instead, use the Reader classes.

static · Jun 18, 2004

Here's what my byte array contains

01830cam a22003734a 45000010009000000050017000090080041000269060045000679250042001129550123001540100017002770200015002940350026003090400024003350420008003590430012003670500025003792450191004042460052005952600068006473000041007155040066007566500045008226510057008676500064009247000026009887000026010148800203010408800085012438800040013288800032013689230030014009520026014301211021220020226122429.00
0717s2000 is a b 001 0 heb
a7bcbccorignewd2encipf20gn-rlinjack0 aacquireb1 shelf
copyxdefault policy amb12 to RCCD 07/17/00; desc ye91 09-21-00; to
ye19 09-21-00 (Heidi Lerner); ye19 to sl 01-19-01; ye04 to BCCD
02-08-01 a 00377460 a9654484749 a(CStRLIN)DCLH00-B1877
aDLC-RcDLC-RdDLC-R apcc aa-is---00aNX573.7.A1bA15
2000106880-01a1900-2000 :bmeʾah shenot tarbut : ha-yetsirah
ha-ʻIvrit be-Erets-Yiśraʾel = hundred years of Hebrew
culture in Eretz Israel /c[ʻorkhim], Orah Aḥimeʾir,
Ḥayim Beʾer.30aHundred years of Hebrew culture in Eretz
Israel 6880-02aTel Aviv :bʻAm ʻoved :bYediʻot
aḥaronot,cc2000. a548 p. :bill. (some col.) ;c31 cm.
aIncludes bibliographical references (p. 512-517) and indexes.
0aArts, Israeliy20th centuryvChronology. 0aIsraelxIntellectual
lifey20th centuryvChronology. 0aPopular
culturezIsraelxHistoryy20th centuryvChronology.1
6880-03aAhimeir, Ora.1 6880-04aBeʾer,
Haim.106245-01/raמאה שנות
תרבות
:bהיצירה
העברית
בארץ־ישראל
= hundred years of Hebrew culture in Eretz Israel
/c[עורכים],
אורה
אחימאיר,
חיים באר.
6260-02/raתל אביב
:bעם עובד
:bידיעות
אחרונות,cc2000.1
6700-03/raאחימאיר,
אורה.1 6700-04/raבאר,
חיים. d20000430n12287s93005373
a02/19/02 T;11/07/01 T

A few of the hebrew characters are getting replaced with question
marks when I do the readUTF. I hope some characters didn't get
translated by copying and pasting here. Thanks for the help.

Ashley

static · Jun 18, 2004

Here's the String after I tried to convert the byte array to a String.
It ends up loosing several characters.

01830cam a22003734a 45000010009000000050017000090080041000269060045000679250042001129550123001540100017002770200015002940350026003090400024003350420008003590430012003670500025003792450191004042460052005952600068006473000041007155040066007566500045008226510057008676500064009247000026009887000026010148800203010408800085012438800040013288800032013689230030014009520026014301211021220020226122429.00
0717s2000 is a b 001 0 heb
a7bcbccorignewd2encipf20gn-rlinjack0 aacquireb1 shelf
copyxdefault policy amb12 to RCCD 07/17/00; desc ye91 09-21-00; to
ye19 09-21-00 (Heidi Lerner); ye19 to sl 01-19-01; ye04 to BCCD
02-08-01 a 00377460 a9654484749 a(CStRLIN)DCLH00-B1877
aDLC-RcDLC-RdDLC-R apcc aa-is---00aNX573.7.A1bA15
2000106880-01a1900-2000 :bmeÊ¾ah shenot tarbut : ha-yetsirah
ha-Ê»Ivrit be-Erets-YisÌ?raÊ¾el = hundred years of Hebrew culture in
Eretz Israel /c[Ê»orkhim], Orah AhÌ£imeÊ¾ir, HÌ£ayim
BeÊ¾er.30aHundred years of Hebrew culture in Eretz Israel
6880-02aTel Aviv :bÊ»Am Ê»oved :bYediÊ»ot ahÌ£aronot,cc2000.
a548 p. :bill. (some col.) ;c31 cm. aIncludes bibliographical
references (p. 512-517) and indexes. 0aArts, Israeliy20th
centuryvChronology. 0aIsraelxIntellectual lifey20th
centuryvChronology. 0aPopular culturezIsraelxHistoryy20th
centuryvChronology.1 6880-03aAhimeir, Ora.1 6880-04aBeÊ¾er,
Haim.106245-01/ra×ž×?×" ×©× ×•×ª ×ª×¨×‘×•×ª :b×"×™×¦×™×¨×"
×"×¢×‘×¨×™×ª ×‘×?×¨×¥Ö¾×™×©×¨×?×œ = hundred years of Hebrew culture in
Eretz Israel /c[×¢×•×¨×›×™×?], ×?×•×¨×" ×?×—×™×ž×?×™×¨, ×—×™×™×?
×‘×?×¨. 6260-02/ra×ª×œ ×?×‘×™×‘ :b×¢×? ×¢×•×‘×" :b×™×"×™×¢×•×ª
×?×—×¨×•× ×•×ª,cc2000.1 6700-03/ra×?×—×™×ž×?×™×¨, ×?×•×¨×".1
6700-04/ra×‘×?×¨, ×—×™×™×?. d20000430n12287s93005373
a02/19/02 T;11/07/01 T

Roedy Green · Jun 18, 2004

Here's the String after I tried to convert the byte array to a String.
It ends up loosing several characters.

that's because a translation occurred. Perhaps you used the wrong
encoding.

See http://mindprod.com/jgloss/encoding.html

Converting an Array to a String in JavaScript	7	Sep 22, 2023
convert byte array to hex string using BigInteger	21	Jun 20, 2013
Copy string from 2D array to a 1D array in C	1	Nov 1, 2023
JNI/c++ - passing imagefile as a byte array to native library	2	Jun 6, 2007
JNI with multidimensional Byte Array	4	Oct 5, 2006
How to detect the failure of byte[] => String conversion	2	Jul 2, 2006
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
Converting String to byte array	3	Oct 5, 2006

JNI byte array to string

static

ak

static

Roedy Green

ak

Roedy Green

Michael Borgwardt

static

Roedy Green

ak

Michael Borgwardt

static

static

Roedy Green

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads