offsets in a FileChannel ...

Q

qwertmonkey

What is missing in this code snippet to get the offsets in the underlying
FileChannel on which the MappedByteBuffer and then the CharBuffer are built?
~
CharBuffer.position() gives you the position alright, but how about wanting
to get the actual offset of certain characters in the actual data feed exposed
through the FileInputStream?
~
char c;
long lPsx;
FIS = new FileInputStream(IFl);
FileChannel FlChnl = FIS.getChannel();
MappedByteBuffer MptbChnlBfr = FlChnl.map(FileChannel.MapMode.READ_ONLY,
0, FlChnl.size());
CharBuffer cBfrUTF8 = ChrStDkdr.decode(MptbChnlBfr);
// __
while(cBfrUTF8.hasRemaining()){
c = cBfrUTF8.get();
lPsx = cBfrUTF8.position();
System.err.println("// __ |" + lPsx + "|" + c + "|" + (int)c + "|");
}
// __
FlChnl.close();
FIS.close();
~
Or do you know of any other way to basically do the same thing?
~
thanks,
lbrtchx
comp.lang.java.programmer:eek:ffsets in a FileChannel ...
 
R

Robert Klemme

What is missing in this code snippet to get the offsets in the underlying
FileChannel on which the MappedByteBuffer and then the CharBuffer are built?
~
CharBuffer.position() gives you the position alright, but how about wanting
to get the actual offset of certain characters in the actual data feed exposed
through the FileInputStream?
~
char c;
long lPsx;
FIS = new FileInputStream(IFl);
FileChannel FlChnl = FIS.getChannel();
MappedByteBuffer MptbChnlBfr = FlChnl.map(FileChannel.MapMode.READ_ONLY,
0, FlChnl.size());
CharBuffer cBfrUTF8 = ChrStDkdr.decode(MptbChnlBfr);
// __
while(cBfrUTF8.hasRemaining()){
c = cBfrUTF8.get();
lPsx = cBfrUTF8.position();
System.err.println("// __ |" + lPsx + "|" + c + "|" + (int)c + "|");
}
// __
FlChnl.close();
FIS.close();
~
Or do you know of any other way to basically do the same thing?

UTF8 is not an encoding with a fixed width. You would have to create
more complex code if you want to align char position and byte position.
Basically you need to read the file from the beginning and observe the
width of every char as it is being decoded. You could of course apply
heuristics if you have more knowledge about the file but I guess that
soon gets messy.

Cheers

robert
 
R

Roedy Green

UTF8 is not an encoding with a fixed width.

You could use UTF-16. Then you could interconvert 8 byte and char
offsets. with a simple shift.

You could build a table of interesting byte offsets when you construct
the stream.

You could embed binary counts in bytes/chars at the head of phrases.
You build and take the stream apart with ByteArrayStreams.
 
R

Robert Klemme

You could use UTF-16. Then you could interconvert 8 byte and char
offsets. with a simple shift.

I don't. And he don't either since UTF-16 isn't a fixed width encoding.
http://www.unicode.org/faq/utf_bom.html#gen6
http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf#G28070
You could build a table of interesting byte offsets when you construct
the stream.

So you would augment the file with an index file. This is certainly not
a general solution as you do not always have the option to transport
that extra data with the file. Plus, aligning offsets while writing
might prove as difficult as when reading (e.g. because of buffering).
You could embed binary counts in bytes/chars at the head of phrases.
You build and take the stream apart with ByteArrayStreams.

That's no longer a text document.

robert
 
R

Roedy Green

So you would augment the file with an index file. This is certainly not
a general solution as you do not always have the option to transport
that extra data with the file.

In one application I wrote, on load I compose a temporary RAF from
sequential files with a in-RAM ArrayList of offsets of where records
start. It is a primitive form of hermit crab.

Now that I have RAM and address space to burn, I could put the whole
thing in RAM.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,767
Messages
2,569,570
Members
45,045
Latest member
DRCM

Latest Threads

Top