Re: number of bytes for each (uni)code point while using utf-8 asencoding ...

Discussion in 'Java' started by Jason Bailey, Jul 12, 2012.

  1. Jason Bailey

    Jason Bailey Guest

    There's an incorrect assumption here. CharBuffer.get returns a char. A
    char can represent 1 or 2 bytes based on the encoding, but it is not a
    codepoint. 2 chars are needed to represent the extended UTF-16.

    If you want to determine how many bytes(either 1 or 2) that your char
    represents, just do a comparison

    boolena is2bytes = (MptChrBfr.get() >> 2) > 0;

    Here you're taking the char and bit shifting it right twice. if there
    are any values left, it would have required two bytes to create it.

    if you want to know if the char you received is part of a bigger
    codepoint. The Charachter class now has number or supporting methods.

    Character.isHighSurrogate(MptChrBfr.get());

    would tell you if it is a leading edge of a codepoint.

    I'd look at the new methods on the Character and String class. dealing
    with chars is a bit cumbersome. Just load everything into a string and
    you can see the number of bytes that it takes up and if you want to know
    the number of codepoints do a String.codePointCount

    -jason


    On 7/10/2012 6:21 AM, lbrt chx _ gemale wrote:

    <snip>
    > for (int j = 0; (j< MptChrBfr.length()); ++j){
    > MptChrBfr.get();
    > }
    > ...
    > ~
    > each time you get() a unicode point from the buffer, you will get from 1 to 4 bytes and the sum of all "lengths" should equal the file length in bytes, right?
    > ~
    > I am using the (new) nio in java 7 and I wonder if sun made changes which make hard getting lenghts of bytes a unicode point needs
    > ~
    > How can you get the number of bytes you "get()"?
    > ~
    > thank you
    > lbrtchx
    > comp.lang.java.programmer: number of bytes for each (uni)code point while using utf-8 as encoding ...
     
    Jason Bailey, Jul 12, 2012
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Daniele Futtorovic
    Replies:
    0
    Views:
    216
    Daniele Futtorovic
    Jul 10, 2012
  2. Lew
    Replies:
    0
    Views:
    227
  3. Daniele Futtorovic
    Replies:
    1
    Views:
    316
  4. Robert Klemme
    Replies:
    0
    Views:
    232
    Robert Klemme
    Jul 11, 2012
  5. Lew
    Replies:
    0
    Views:
    214
Loading...

Share This Page