In need of something like RandomAccessFile.read(char[] cAr, int off,int len)

K

Knute Johnson

~
http://java.sun.com/j2se/1.4.2/docs/api/java/io/RandomAccessFile.html
~
has:
~
public int read(byte[] b,
int off,
int len)
throws IOException
~
But I need to read in and compare chars/Unicode
~
I have tried many things but I haven't been able to find out how
~
What kind of carpentry do you do with I/O objects to achieve such a
thing?
~
Thanks
lbrtchx

Besides RandomAccessFile.readChar() you could get a FileChannel, read in
a buffer and convert it to a CharBuffer. But seek() and readChar()
ought to be adequate.
 
M

Mike Schilling

Knute said:
~

http://java.sun.com/j2se/1.4.2/docs/api/java/io/RandomAccessFile.html
~
has:
~
public int read(byte[] b,
int off,
int len)
throws IOException
~
But I need to read in and compare chars/Unicode
~
I have tried many things but I haven't been able to find out how
~
What kind of carpentry do you do with I/O objects to achieve such
a
thing?
~
Thanks
lbrtchx

Besides RandomAccessFile.readChar() you could get a FileChannel,
read
in a buffer and convert it to a CharBuffer. But seek() and
readChar()
ought to be adequate.

ISTM that there should be an InputStream subclass that reads bytes
from a RandomAccessFile starting at a given offset. It's not
difficult to construct, but why doesn't it come standard?
 
L

lbrtchx

OK,
~
I am trying to do something like this.
~
.. . .
FileInputStream FIS = new FileInputStream(new File(aIFl));
FileChannel FlChnl = FIS.getChannel();
long lFlChnlL = FlChnl.size();
MappedByteBuffer MptBytBfr =
FlChnl.map(FileChannel.MapMode.READ_ONLY, 0, lFlChnlL);
// __
String aChrSet = "ISO-8859-1"; // UTF8 or whatever
Charset ChrSt = Charset.forName(aChrSet);
CharsetDecoder ChrStDkdr = ChrSt.newDecoder();
CharBuffer ChrBfr = ChrStDkdr.decode(MptBytBfr);
char[] cArFl = ChrBfr.array();
.. . .
~
and I know the offsets in the files where certain sequences appear
and the length of the sequences, so then I go:
~
.. . .
for(int i = 0; (i < iSeqL); ++i){ aB.append(cArFl[((int)lFfst +
i)]); }
aS = aB.toString();
.. . .
~
to grab the actual sequence. However it does not seem to be working.
Can you, please, point me to a full example out there?
~
I like that you can set a CharsetDecoder to the actual file since I
may be using non "ISO-8859-1" text files, but I still wonder about:
~
1. what the speed gains really are
~
2. if "offsets" somehow change based on the CharsetDecoder
~
3. how safe the use of Memomy maps + CharsetDecoder while reading
file-based data feeds
~
I find java's I/O a bit confusing and I also think that the
RandomAccessFile class should take care of the inner plumbing needed
to offer something like what I need, namely
RandomAccessFile.read(char[] buffer, int start, int end)
~
When I don't see it, I think there may be some rather unsafe issues
underlying that reason.
~
Thanks
lbtchx
 
K

Knute Johnson

Mike said:
Knute said:
~

http://java.sun.com/j2se/1.4.2/docs/api/java/io/RandomAccessFile.html
~
has:
~
public int read(byte[] b,
int off,
int len)
throws IOException
~
But I need to read in and compare chars/Unicode
~
I have tried many things but I haven't been able to find out how
~
What kind of carpentry do you do with I/O objects to achieve such
a
thing?
~
Thanks
lbrtchx
Besides RandomAccessFile.readChar() you could get a FileChannel,
read
in a buffer and convert it to a CharBuffer. But seek() and
readChar()
ought to be adequate.

ISTM that there should be an InputStream subclass that reads bytes
from a RandomAccessFile starting at a given offset. It's not
difficult to construct, but why doesn't it come standard?

It does. He wanted to read chars and that is alittle more complicated
because you have to read two bytes at a time.

Of course that is what he really wanted.
 
K

Knute Johnson

OK,
~
I am trying to do something like this.
~
. . .
FileInputStream FIS = new FileInputStream(new File(aIFl));
FileChannel FlChnl = FIS.getChannel();
long lFlChnlL = FlChnl.size();
MappedByteBuffer MptBytBfr =
FlChnl.map(FileChannel.MapMode.READ_ONLY, 0, lFlChnlL);
// __
String aChrSet = "ISO-8859-1"; // UTF8 or whatever
Charset ChrSt = Charset.forName(aChrSet);
CharsetDecoder ChrStDkdr = ChrSt.newDecoder();
CharBuffer ChrBfr = ChrStDkdr.decode(MptBytBfr);
char[] cArFl = ChrBfr.array();
. . .
~
and I know the offsets in the files where certain sequences appear
and the length of the sequences, so then I go:
~
. . .
for(int i = 0; (i < iSeqL); ++i){ aB.append(cArFl[((int)lFfst +
i)]); }
aS = aB.toString();
. . .
~
to grab the actual sequence. However it does not seem to be working.
Can you, please, point me to a full example out there?
~
I like that you can set a CharsetDecoder to the actual file since I
may be using non "ISO-8859-1" text files, but I still wonder about:
~
1. what the speed gains really are
~
2. if "offsets" somehow change based on the CharsetDecoder
~
3. how safe the use of Memomy maps + CharsetDecoder while reading
file-based data feeds
~
I find java's I/O a bit confusing and I also think that the
RandomAccessFile class should take care of the inner plumbing needed
to offer something like what I need, namely
RandomAccessFile.read(char[] buffer, int start, int end)
~
When I don't see it, I think there may be some rather unsafe issues
underlying that reason.
~
Thanks
lbtchx

So are these files written with a Java program and do they use the Java
16 bit unicode characters? Do you actually need random access or are
you just going to read parts out of the file once and go on? If the
characters are encoded in full unicode how would you know where to look
for them? If the files are encoded in ASCII or UTF-8 you should be able
to use a BufferedReader and select an appropriate character set. Just
skip over the bytes you don't want to read.
 
M

Mike Schilling

Knute said:
Mike said:
Knute said:
(e-mail address removed) wrote:
~

http://java.sun.com/j2se/1.4.2/docs/api/java/io/RandomAccessFile.html
~
has:
~
public int read(byte[] b,
int off,
int len)
throws IOException
~
But I need to read in and compare chars/Unicode
~
I have tried many things but I haven't been able to find out how
~
What kind of carpentry do you do with I/O objects to achieve
such
a
thing?
~
Thanks
lbrtchx
Besides RandomAccessFile.readChar() you could get a FileChannel,
read
in a buffer and convert it to a CharBuffer. But seek() and
readChar()
ought to be adequate.

ISTM that there should be an InputStream subclass that reads bytes
from a RandomAccessFile starting at a given offset. It's not
difficult to construct, but why doesn't it come standard?

It does. He wanted to read chars and that is alittle more
complicated
because you have to read two bytes at a time.

If there were an InputStream, you coiuld attach an InputStreamReader
to it and read characters in any encoding you like. Unfortunately,
there isn't one.
 
L

lbrtchx

... are these files written with a Java program?
~
No necessarily. They are mostly texts downloaded from the Internet
~
... If there were an InputStream
~
option as an argument to a RandomAccessFile ctor then things were
easier, but I think, and I may be wrong, that there is a fundamental
problem here.
~
InputStreams and RandomAccessFiles should not be mixed because when
you go:
~
RandomAccessFiles.seek((long) lThere)
~
you can not be absolutely sure that:
~
1) you will land at the start of a byte sequence conforming a
character,
~
2) belonging to the encoding you specified in the InputStream
~
3) at, ... where actually? lThere? the API says:
~
http://java.sun.com/j2se/1.5.0/docs/api/java/io/RandomAccessFile.html#seek(long)
~
<API>
seek: public void seek(long pos) throws IOException
~
Sets the file-pointer offset, measured from the beginning of this
file, at which the next read or write occurs. The offset may be set
beyond the end of the file. Setting the offset beyond the end of the
file does not change the file length. The file length will change only
by writing after the offset has been set beyond the end of the file.
Parameters: pos - the offset position, measured in bytes from the
beginning of the file, at which to set the file pointer.
Throws: IOException - if pos is less than 0 or if an I/O error
occurs.
</API>
~
The unclear bit is "measured from the beginning of this file" how
exactly is it measured? Why not simply saying in "bytes"?
~
I think that java should let the programmer do something like what I
illustrate with a piece of pseudo code below
~
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
// __
int iChrsRd, iChrBfrSz = 4096;
char[] cChrAr = new char[iChrBfrSz];
RandomAccessFile RAxFl;
String aEnc = ...; // "UTF8", "ISO-8859-1", "" or whatever encoding
your text are written in
// __
try{
FileInputStream FIS = new FileInputStream(IFl);
InputStreamReader ISRdr = new InputStreamReader(FIS, aEnc);
RAxFl = new RandomAccessFile(ISRdr);
// . . .
RAxFl.seek(lThere);
iChrsRd = RAxFl.read(cChrAr, 0, iChrBfrSz); // reading iChrsRd into
cChrAr provided iChrBfrSz can fully take them
// . . .
RAxFl.close();
}catch(FileNotFoundException FlNtFX){ FlNtFX.printStackTrace(); }
catch(IOException IOX){ IOX.printStackTrace(); }
// __
finally{
if(RAxFl != null){ try{ RAxFl.close(); }catch(IOException IOXcptn)
{ ; }}
}
~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~
~
Dealing with Java's I/O and internationalization is not exactly easy
~
lbrtchx
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,537
Members
45,022
Latest member
MaybelleMa

Latest Threads

Top