offsets in a FileChannel ...

Discussion in 'Java' started by qwertmonkey@syberianoutpost.ru, Feb 23, 2013.

  1. Guest

    What is missing in this code snippet to get the offsets in the underlying
    FileChannel on which the MappedByteBuffer and then the CharBuffer are built?
    ~
    CharBuffer.position() gives you the position alright, but how about wanting
    to get the actual offset of certain characters in the actual data feed exposed
    through the FileInputStream?
    ~
    char c;
    long lPsx;
    FIS = new FileInputStream(IFl);
    FileChannel FlChnl = FIS.getChannel();
    MappedByteBuffer MptbChnlBfr = FlChnl.map(FileChannel.MapMode.READ_ONLY,
    0, FlChnl.size());
    CharBuffer cBfrUTF8 = ChrStDkdr.decode(MptbChnlBfr);
    // __
    while(cBfrUTF8.hasRemaining()){
    c = cBfrUTF8.get();
    lPsx = cBfrUTF8.position();
    System.err.println("// __ |" + lPsx + "|" + c + "|" + (int)c + "|");
    }
    // __
    FlChnl.close();
    FIS.close();
    ~
    Or do you know of any other way to basically do the same thing?
    ~
    thanks,
    lbrtchx
    comp.lang.java.programmer:eek:ffsets in a FileChannel ...
    , Feb 23, 2013
    #1
    1. Advertising

  2. On 23.02.2013 15:11, wrote:
    > What is missing in this code snippet to get the offsets in the underlying
    > FileChannel on which the MappedByteBuffer and then the CharBuffer are built?
    > ~
    > CharBuffer.position() gives you the position alright, but how about wanting
    > to get the actual offset of certain characters in the actual data feed exposed
    > through the FileInputStream?
    > ~
    > char c;
    > long lPsx;
    > FIS = new FileInputStream(IFl);
    > FileChannel FlChnl = FIS.getChannel();
    > MappedByteBuffer MptbChnlBfr = FlChnl.map(FileChannel.MapMode.READ_ONLY,
    > 0, FlChnl.size());
    > CharBuffer cBfrUTF8 = ChrStDkdr.decode(MptbChnlBfr);
    > // __
    > while(cBfrUTF8.hasRemaining()){
    > c = cBfrUTF8.get();
    > lPsx = cBfrUTF8.position();
    > System.err.println("// __ |" + lPsx + "|" + c + "|" + (int)c + "|");
    > }
    > // __
    > FlChnl.close();
    > FIS.close();
    > ~
    > Or do you know of any other way to basically do the same thing?


    UTF8 is not an encoding with a fixed width. You would have to create
    more complex code if you want to align char position and byte position.
    Basically you need to read the file from the beginning and observe the
    width of every char as it is being decoded. You could of course apply
    heuristics if you have more knowledge about the file but I guess that
    soon gets messy.

    Cheers

    robert

    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Feb 23, 2013
    #2
    1. Advertising

  3. Roedy Green Guest

    On Sat, 23 Feb 2013 15:39:08 +0100, Robert Klemme
    <> wrote, quoted or indirectly quoted
    someone who said :

    >UTF8 is not an encoding with a fixed width.


    You could use UTF-16. Then you could interconvert 8 byte and char
    offsets. with a simple shift.

    You could build a table of interesting byte offsets when you construct
    the stream.

    You could embed binary counts in bytes/chars at the head of phrases.
    You build and take the stream apart with ByteArrayStreams.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    One thing I love about having a website, is that when I complain about
    something, I only have to do it once. It saves me endless hours of
    grumbling.
    Roedy Green, Feb 25, 2013
    #3
  4. On 25.02.2013 13:09, Roedy Green wrote:
    > On Sat, 23 Feb 2013 15:39:08 +0100, Robert Klemme
    > <> wrote, quoted or indirectly quoted
    > someone who said :
    >
    >> UTF8 is not an encoding with a fixed width.

    >
    > You could use UTF-16. Then you could interconvert 8 byte and char
    > offsets. with a simple shift.


    I don't. And he don't either since UTF-16 isn't a fixed width encoding.
    http://www.unicode.org/faq/utf_bom.html#gen6
    http://www.unicode.org/versions/Unicode6.2.0/ch03.pdf#G28070

    > You could build a table of interesting byte offsets when you construct
    > the stream.


    So you would augment the file with an index file. This is certainly not
    a general solution as you do not always have the option to transport
    that extra data with the file. Plus, aligning offsets while writing
    might prove as difficult as when reading (e.g. because of buffering).

    > You could embed binary counts in bytes/chars at the head of phrases.
    > You build and take the stream apart with ByteArrayStreams.


    That's no longer a text document.

    robert


    --
    remember.guy do |as, often| as.you_can - without end
    http://blog.rubybestpractices.com/
    Robert Klemme, Feb 25, 2013
    #4
  5. Roedy Green Guest

    On Mon, 25 Feb 2013 21:50:18 +0100, Robert Klemme
    <> wrote, quoted or indirectly quoted
    someone who said :

    >So you would augment the file with an index file. This is certainly not
    >a general solution as you do not always have the option to transport
    >that extra data with the file.


    In one application I wrote, on load I compose a temporary RAF from
    sequential files with a in-RAM ArrayList of offsets of where records
    start. It is a primitive form of hermit crab.

    Now that I have RAM and address space to burn, I could put the whole
    thing in RAM.
    --
    Roedy Green Canadian Mind Products http://mindprod.com
    One thing I love about having a website, is that when I complain about
    something, I only have to do it once. It saves me endless hours of
    grumbling.
    Roedy Green, Feb 26, 2013
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Spendius
    Replies:
    4
    Views:
    2,306
    Harald Hein
    Jul 4, 2003
  2. Roedy Green
    Replies:
    3
    Views:
    3,248
    Roedy Green
    Aug 14, 2003
  3. Spendius
    Replies:
    0
    Views:
    414
    Spendius
    Sep 7, 2003
  4. Replies:
    1
    Views:
    232
    Robert Klemme
    Feb 23, 2013
  5. Replies:
    0
    Views:
    175
Loading...

Share This Page