Reading and writing extended ascii characters

Discussion in 'Java' started by Geoff Warnock, Mar 9, 2005.

  1. I am writing a Java program (using the jdk 1.5.0) for a project. It
    has to read in words from an input file. It is a language translator
    so that each line consists of an english word, german, french, spanish
    etc. Some of these words contain accents and other characters which
    are ascii values of beyond 127.

    My problem is that i have a punctuation removal method which strips
    all words of unnecessary characters before adding each line to a new
    Binary Search Tree Node. The method just drops the ascii values
    greater than 127 and does not store them. Why is this happening?


    for(int i = 0; i < data.length; i++)
    {
    int j =0;
    while(j != data.length())
    {
    c = data.charAt(j);
    if((c <= 'z') && (c >= 'a'))
    temp = temp + c;
    else if((c <= 'Z') && (c >= 'A'))
    temp = temp + c;
    else if(c == 39)
    temp = temp + c;
    else if((c > (char)127) && (c < (char)168))
    temp = temp + c;
    else if(c == ' ')
    {
    if((j > 0) && (j < data.length() - 1))
    lastChar = data.charAt(j-1);
    nextChar = data.charAt(j+1);
    if(((lastChar <= 'z' && lastChar >= 'a') || (lastChar <= 'Z' &&
    lastChar >= 'A') || (lastChar == 39)) && ((nextChar <=
    'z' && nextChar >= 'a') || (nextChar <= 'Z' && nextChar <= 'A') ||
    (nextChar == 39)))
    temp = temp + c;
    }
    }
    }
    j++;
    }
    data = temp;
    temp = "";


    It is the "else if((c > (char)127) && (c < (char)168)) that will not
    work. Any ideas??

    Thanks, Geoff Warnock.
     
    Geoff Warnock, Mar 9, 2005
    #1
    1. Advertising

  2. On 8 Mar 2005 23:19:41 -0800, Geoff Warnock wrote:
    > My problem is that i have a punctuation removal method which strips
    > all words of unnecessary characters before adding each line to a new
    > Binary Search Tree Node. The method just drops the ascii values
    > greater than 127 and does not store them. Why is this happening?


    [...]

    > It is the "else if((c > (char)127) && (c < (char)168)) that will not
    > work. Any ideas??


    Probably it has to do with how "c" and "data" are declared or
    initialized, but you didn't post those parts.

    I'll bet that you've read your data from the file without specifying
    the correct character encoding. Have you confirmed (e.g. with
    System.out.println()) that lines containing those characters have been
    read correctly?

    That said, I'd like to suggest that instead of doing it "the hard
    way", and making assumptions about the values of various characters,
    you consider using a simple test like this:

    if (Character.isLetter(c)) {
    }

    /gordon

    --
    [ do not email me copies of your followups ]
    g o r d o n + n e w s @ b a l d e r 1 3 . s e
     
    Gordon Beaton, Mar 9, 2005
    #2
    1. Advertising

  3. Geoff Warnock

    Daniel Tryba Guest

    Geoff Warnock <> wrote:
    [some really scary code]
    > It is the "else if((c > (char)127) && (c < (char)168)) that will not
    > work. Any ideas??


    How is it not working?

    BTW tmp is a string. Looping and appending one char at a time to it is
    very inefficient

    BTW2 your input is a string, you should look at reglar expression to
    filter out unwanted characters.

    BTW3 Strings are Unicode, your source is something else (most likely),
    so anything you read gets translated to Unicode. On most platforms the
    default system encoding is iso-88591-1, which doesn't have values for
    values >127 and < 160
     
    Daniel Tryba, Mar 9, 2005
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. brrrdog
    Replies:
    0
    Views:
    859
    brrrdog
    Jul 9, 2003
  2. Bob Hartung
    Replies:
    5
    Views:
    8,594
    shan23
    May 28, 2009
  3. wob
    Replies:
    4
    Views:
    469
    Dave Thompson
    Aug 1, 2005
  4. James O'Brien
    Replies:
    3
    Views:
    288
    Ben Morrow
    Mar 5, 2004
  5. Alextophi
    Replies:
    8
    Views:
    574
    Alan J. Flavell
    Dec 30, 2005
Loading...

Share This Page