Peculiar issue with French characters

Discussion in 'Java' started by sumitra@gmail.com, Jan 30, 2006.

  1. Guest

    Hello All,

    I need to print out French characters
    (ççÇÇààÀÀèèÈÈééÉÉ) in a PDF file by runningmy code on
    Unix. I'm using iText to create the PDF. The configurations in iText
    for the fonts include BaseFont.IDENTITY_H for encoding and
    BaseFont.EMBEDDED.

    The PDF encoding I have given is:
    /BaseFont /Courier /Encoding /WinAnsiEncoding

    which generates the PDFs with the French text fine on Windows. Should I
    be changing this??

    The problem is that with these parameters, on Unix, all I get is
    garbled text in my pdf doc.

    Compiling with -encoding ISO-8859-1 does not help because these French
    values are picked up at run time from a Hashtable. I have checked the
    Hashtable contents and they look good.

    My code uses a lot of StringWriter() and I would like to know if I need
    to explicitly set the encoding here to "8859_1" and if so, how?? I've
    tried the ByteArrayOutputStream approach to replace the StringWriter
    and wrapped that in OutputStreamWriter with the ecoding 8859_1. That
    did not help.

    I also tried the getBytes() method of StringWriter and tried to convert
    it to another encoding, but that did not help too!!

    I really am at a loss now as to how to resolve my problem.
    If anyone out there has an idea do let me know please!
    Thanks in advance.

    --Sum
    , Jan 30, 2006
    #1
    1. Advertising

  2. opalinski from opalpaweb, Jan 30, 2006
    #2
    1. Advertising

  3. wrote:
    >
    > My code uses a lot of StringWriter() and I would like to know if I need
    > to explicitly set the encoding here to "8859_1" and if so, how?? I've
    > tried the ByteArrayOutputStream approach to replace the StringWriter
    > and wrapped that in OutputStreamWriter with the ecoding 8859_1. That
    > did not help.
    >
    > I also tried the getBytes() method of StringWriter and tried to convert
    > it to another encoding, but that did not help too!!


    Character encoding matters at the point you encode characters as bytes
    (or the opposite decode).

    Lots of APIs confuse the matter by picking the encoding up from the
    system defaults. So code may work on one setup, but not on another. To
    get around a fatal bug in Adobe Acrobat Reader I had to change
    encodings, meaning I could get different results depending upon which
    window/tab I launched an application from.

    FileWriter doesn't support character encodings, so don't use that class.
    OutputStreamWriter has constructors to take character encodings, and one
    which doesn't (so don't use that one). StringWriter.getBytes does not
    exist. Swing has various methods which may depend upon configured
    encoding, a specified encoding or just chopping the top byte off each
    character (including surrogates).

    Tom Hawtin
    --
    Unemployed English Java programmer
    http://jroller.com/page/tackline/
    Thomas Hawtin, Jan 30, 2006
    #3
  4. Sum Guest

    When I generate the pdf on Unix and view it on Windows, I see only
    garbled text.
    Sum, Jan 31, 2006
    #4
  5. Sum Guest

    My bad, I meant the String.getBytes() method and not
    StringWriter.getBytes(), which as you rightly pointed out, does not
    exist.

    What I noticed while running my app on Unix was that the French string
    being returned to my program was:

    ççÃÃà à ÃÃèèÃÃééÃÃ

    whereas I expected to see:

    ççÇÇààÀÀèèÈÈééÉÉ

    This does not happen on Windows. Also, I actually compile my code on
    Windows, and put the tarball onto Unix.
    What do you suppose is happening now??
    Sum, Jan 31, 2006
    #5
  6. Roedy Green Guest

    On 30 Jan 2006 20:41:59 -0800, "Sum" <> wrote, quoted
    or indirectly quoted someone who said :

    >This does not happen on Windows. Also, I actually compile my code on
    >Windows, and put the tarball onto Unix.
    >What do you suppose is happening now??


    There is an implied default encoding used to map any conversion byte
    <=> String. See http://mindprod.com/jgloss/encoding.html
    --
    Canadian Mind Products, Roedy Green.
    http://mindprod.com Java custom programming, consulting and coaching.
    Roedy Green, Jan 31, 2006
    #6
  7. Guest

    Figured it out. The one thing that I did not do was to start the
    application (in Unix) from the same session where I had set LANG to
    fr_FR. I assumed that setting LANG=fr_FR would have an environment
    level effect, however that turned out to be only for that telnet
    session!

    Thanks for the help everyone. :-D
    , Feb 6, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. gusmeister

    French characters and Perl

    gusmeister, Jun 1, 2004, in forum: Perl
    Replies:
    2
    Views:
    1,483
    gusmeister
    Jun 3, 2004
  2. =?Utf-8?B?U2ltb24gV2FsbGlz?=

    French characters messed up

    =?Utf-8?B?U2ltb24gV2FsbGlz?=, Jun 15, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    541
    Natty Gur
    Jun 15, 2004
  3. =?Utf-8?B?THU=?=
    Replies:
    4
    Views:
    1,461
    Joerg Jooss
    Sep 2, 2005
  4. John C.
    Replies:
    5
    Views:
    7,968
    John C.
    Feb 24, 2006
  5. Ess355

    French characters not recognised in C?

    Ess355, Apr 2, 2004, in forum: C Programming
    Replies:
    6
    Views:
    1,972
    Dan Pop
    Apr 5, 2004
Loading...

Share This Page