Reading MacRoman

Discussion in 'Java' started by cndc, Jul 24, 2003.

  1. cndc

    cndc Guest

    Hi,

    I have a textfile created on a Macintosh and its encoding is
    MacRoman. Unfortunately, I'm having difficulty working with this
    encoding. As a test case, I wrote this simple class that should read
    in the MacRoman file and produce an ISO8859-1 file:

    import java.io.*;
    import java.nio.charset.*;

    class Cheesy {
    public static void main(String[] args) {
    int i;
    for(i = 0; i < args.length ; i++) {
    try {
    InputStreamReader r = new InputStreamReader(new FileInputStream(args), "MacRoman");
    OutputStreamWriter o = new OutputStreamWriter(System.out, "8859_1");
    int c;
    while( (c = r.read() ) != -1) {
    o.write(c);
    }
    } catch(IOException e) {
    System.err.println(e.toString());
    }
    }
    }
    }

    Sadly, however, many of the weird characters in MacRoman continue to
    be converted to question marks as opposed to their normal character.

    Am I doing something wrong?

    Thank you,
    Elizabeth
     
    cndc, Jul 24, 2003
    #1
    1. Advertising

  2. cndc

    Jon Skeet Guest

    cndc <> wrote:

    <snip>

    > Sadly, however, many of the weird characters in MacRoman continue to
    > be converted to question marks as opposed to their normal character.
    >
    > Am I doing something wrong?


    Well, are the characters you're reading actually *in* ISO-8859-1?

    --
    Jon Skeet - <>
    http://www.pobox.com/~skeet/
    If replying to the group, please do not mail me too
     
    Jon Skeet, Jul 24, 2003
    #2
    1. Advertising

  3. cndc

    cndc Guest

    Jon writes:

    > > Sadly, however, many of the weird characters in MacRoman continue to
    > > be converted to question marks as opposed to their normal character.
    > >
    > > Am I doing something wrong?

    >
    > Well, are the characters you're reading actually *in* ISO-8859-1?


    Hi Jon,

    No. They're in MacRoman format. The idea of the code is to convert
    the input stream from MacRoman and send it out in ISO-8859-1.

    Elizabeth
     
    cndc, Jul 24, 2003
    #3
  4. cndc

    cndc Guest

    Jon writes:

    > > No. They're in MacRoman format.

    >
    > I know they are originally - but my point was to ask whether or not
    > the actual character is in the ISO-8859-1 set as well.


    I'm not sure whether or not it is.

    I have text file that was generated on a Macintosh and, having looked
    at it the Macintosh's character numberings, I have determined that it
    uses the MacRoman charset. I'd like to be able to work with this data
    internally but due to the different charsets, some kind of translation
    is necessary.

    > > The idea of the code is to convert the input stream from MacRoman
    > > and send it out in ISO-8859-1.

    >
    > But my point is that you can't convert a character which doesn't
    > even *exist* in ISO-8859-1 into a value in that character encoding.
    > Which unicode character is it you're trying to convert?


    I'd like to change some of characters used in MacRoman to character
    entities, such as 0xD2 to &ldquo;, for example.

    Does reading a file with the its charset parameter set not
    automatically convert the incoming stream to some kind of normalized,
    internal format?

    Thank you for your help,

    Elizabeth
     
    cndc, Jul 24, 2003
    #4
  5. cndc

    Jon A. Cruz Guest

    cndc wrote:
    > Jon writes:
    >
    >
    >>> No. They're in MacRoman format.

    >>
    >>I know they are originally - but my point was to ask whether or not
    >>the actual character is in the ISO-8859-1 set as well.

    >
    >
    > I'm not sure whether or not it is.
    >
    > I have text file that was generated on a Macintosh and, having looked
    > at it the Macintosh's character numberings, I have determined that it
    > uses the MacRoman charset. I'd like to be able to work with this data
    > internally but due to the different charsets, some kind of translation
    > is necessary.


    Use Unicode.



    > I'd like to change some of characters used in MacRoman to character
    > entities, such as 0xD2 to &ldquo;, for example.


    Then do that before writing.


    > Does reading a file with the its charset parameter set not
    > automatically convert the incoming stream to some kind of normalized,
    > internal format?


    It does convert it.
    The internal format as far as the Java programmer is concerned is always
    Unicode.

    So, in Java, all char's are Unicode. Once you read properly, that's what
    you'll have.

    Here's the "official" MacRoman Unicode mapping.

    http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT

    However... you shouldn't need that.

    If something has a Unicode value over 255, then it won't map into
    Latin-1 (AKA ISO-8859-1). Simple, huh?

    This might help in those cases:
    http://www.w3.org/TR/REC-html40/sgml/entities.html
     
    Jon A. Cruz, Jul 25, 2003
    #5
  6. cndc

    cndc Guest

    "Jon" writes:

    > It does convert it. The internal format as far as the Java
    > programmer is concerned is always Unicode.
    >
    > So, in Java, all char's are Unicode. Once you read properly, that's
    > what you'll have.
    >
    > Here's the "official" MacRoman Unicode mapping.
    >
    > http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT
    >
    > However... you shouldn't need that.
    >
    > If something has a Unicode value over 255, then it won't map into
    > Latin-1 (AKA ISO-8859-1). Simple, huh?
    >
    > This might help in those cases:
    > http://www.w3.org/TR/REC-html40/sgml/entities.html


    Thank you both Jons. Yes, it is very nice how Java converts it into
    Unicode right from the get go.

    Elizabeth
     
    cndc, Jul 25, 2003
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Darrel
    Replies:
    3
    Views:
    699
    Kevin Spencer
    Nov 11, 2004
  2. Wael Soliman

    ASP.NET Reading problem (reading .xls)

    Wael Soliman, Jan 3, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    4,826
    =?Utf-8?B?dmluYXk=?=
    Jan 3, 2005
  3. Replies:
    0
    Views:
    807
  4. Karim Ali

    Reading a file and resuming reading.

    Karim Ali, May 25, 2007, in forum: Python
    Replies:
    2
    Views:
    393
    Hrvoje Niksic
    May 25, 2007
  5. Une Bévue
    Replies:
    0
    Views:
    181
    Une Bévue
    Apr 22, 2010
Loading...

Share This Page