Re: number of bytes for each (uni)code point while using utf-8 asencoding ...

Discussion in 'Java' started by Lew, Jul 10, 2012.

  1. Lew

    Lew Guest

    On Tuesday, July 10, 2012 12:45:07 PM UTC-7, (unknown) wrote:
    > > On 10/07/2012 12:21, lbrt chx _ gemale allegedly wrote:
    >
    > > > How can you get the number of bytes you "get()"?
    >
    > > Well, UTF-8 always encodes the same char to the same (number of) bytes,
    > > doesn't it?
    > ~
    > What about files, which (author's) claim to be UTF-8 encoded but they aren't, and/or get somehow corrupted in transit? There are quite a bit of "monkeys" (us) messing with the metadata headers of html pages
    > ~
    > Sometimes you must double check every file you keep in a text bank/corpus, because, through associations, one mistake may propagate and create other kinds of problems
    > ~
    > > So you could just build a map char -> size /a priori/.
    > ~
    > ...
    > ~
    > > But really, what's the use? ...
    > ~
    > to you there is none but I am trying pinpoint the closest I possibly can:
    > ~
    > .onMalformedInput(CodingErrorAction.REPORT);
    > .onUnmappableCharacter(CodingErrorAction.REPORT);
    > ~
    > errors
    > ~
    > There should be a way to get sizes as you get UTF-8 encoded sequences from a file. Also I how found that quite a few files get corrupted while in transmission and sometimes I wonder how safe that naive mapping you mention is, since those file formatting don't have any kind of built-in error correction measures


    It isn't the job of the file format to correct errors but of the transmission protocol.

    Are you saying "quite a few files get corrupted" when reading directly fromdisk
    or over some other wire protocol? If it's from disk, I'd blame the disk drive not
    Java.

    You aren't going to fix a bad disk with good programming.

    --
    Lew
    Lew, Jul 10, 2012
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Daniele Futtorovic
    Replies:
    0
    Views:
    208
    Daniele Futtorovic
    Jul 10, 2012
  2. Daniele Futtorovic
    Replies:
    1
    Views:
    307
  3. Robert Klemme
    Replies:
    0
    Views:
    220
    Robert Klemme
    Jul 11, 2012
  4. Lew
    Replies:
    0
    Views:
    209
  5. Joshua Cranmer
    Replies:
    0
    Views:
    230
    Joshua Cranmer
    Jul 12, 2012
Loading...

Share This Page