Re: convert unicode characters to visibly similar ascii characters

Discussion in 'Python' started by Laszlo Nagy, Jul 1, 2008.

  1. Laszlo Nagy

    Laszlo Nagy Guest

    Peter Bulychev wrote:
    > Hello.
    >
    > I want to convert unicode character into ascii one.
    > The method ".encode('ASCII') " can convert only those unicode
    > characters, which fit into 0..128 range.
    >
    > But there are still lots of characters beyond this range, which can be
    > manually converted to some visibly similar ascii characters. For
    > instance, there are several quotation marks in unicode, which can be
    > converted into ascii quotation mark.

    Please be more specific. There is no general solution. Unicode can
    handle latin, cyrilic (russian), chinese, japanese and arabic characters
    in the same string. There are thousands of possible non-ascii characters
    and many of them are not similar to any ascii character.

    If you only want this to work for a subset, please define that subset.

    Laszlo
    Laszlo Nagy, Jul 1, 2008
    #1
    1. Advertising

  2. Laszlo Nagy

    Jim Guest

    Peter Bulychev wrote:
    > I want to convert unicode character into ascii one.

    You have to make some arbitrary choices of what to translate. Based
    on some materials on effbot's site, and a recipe, I made
    ftp://alan.smcvt.edu/hefferon/unicode2ascii.py
    which has at least some of what you are looking for.
    $ grep HYPHEN unicode2ascii.py
    u'\N{SOFT HYPHEN}':u'-',
    u'\N{HYPHEN}':u'-',
    u'\N{NON-BREAKING HYPHEN}':u'-',
    u'\N{SOFT HYPHEN}': '-',
    No doubt I have some terrible gaffes and some things missing.
    Corrections appreciated.

    Jim
    Jim, Jul 2, 2008
    #2
    1. Advertising

  3. Laszlo Nagy

    Jim Guest

    Peter Bulychev wrote:
    > I want to convert unicode character into ascii one.

    You have to make some arbitrary choices of what to translate. Based
    on some materials on effbot's site, and a recipe, I made
    ftp://alan.smcvt.edu/hefferon/unicode2ascii.py
    which has at least some of what you are looking for.
    $ grep HYPHEN unicode2ascii.py
    u'\N{SOFT HYPHEN}':u'-',
    u'\N{HYPHEN}':u'-',
    u'\N{NON-BREAKING HYPHEN}':u'-',
    u'\N{SOFT HYPHEN}': '-',
    No doubt I have some terrible gaffes and some things missing.
    Corrections appreciated.

    Jim
    Jim, Jul 2, 2008
    #3
  4. Laszlo Nagy

    John Machin Guest

    On Jul 2, 9:55 am, Jim <> wrote:
    > Peter Bulychev wrote:
    > > I want to convert unicode character into ascii one.

    >
    > You have to make some arbitrary choices of what to translate. Based
    > on some materials on effbot's site, and a recipe, I made
    > ftp://alan.smcvt.edu/hefferon/unicode2ascii.py
    > which has at least some of what you are looking for.
    > $ grep HYPHEN unicode2ascii.py
    > u'\N{SOFT HYPHEN}':u'-',
    > u'\N{HYPHEN}':u'-',
    > u'\N{NON-BREAKING HYPHEN}':u'-',
    > u'\N{SOFT HYPHEN}': '-',
    > No doubt I have some terrible gaffes and some things missing.
    > Corrections appreciated.


    Comments on the above grep output:
    1. You have SOFT HYPHEN twice, mapping it to u'-' and '-'
    2. The idea of a soft hyphen is as a hint to a hyphenator about where
    to insert a hyphen if one is necessary and the hyphenator is suspected
    of acting cluelessly without the hint. IMHO, asciification should
    substitute u'', not u'-'.
    3. Read PEP 8. s/:/: /

    Cheers,
    John
    John Machin, Jul 2, 2008
    #4
  5. Laszlo Nagy

    Jim Guest

    On Jul 1, 8:29 pm, John Machin <> wrote:
    > On Jul 2, 9:55 am, Jim <> wrote:
    >
    > Comments on the above grep output:
    > 1. You have SOFT HYPHEN twice, mapping it to u'-' and '-'

    Hmph. I'll correct that. Thanks.
    > 2. The idea of a soft hyphen is as a hint to a hyphenator about where
    > to insert a hyphen if one is necessary and the hyphenator is suspected
    > of acting cluelessly without the hint. IMHO, asciification should
    > substitute u'', not u'-'.

    Thanks also here. I'll think about it.
    > 3. Read PEP 8. s/:/: /

    I don't like the spacing in 8, personally.

    Thanks,
    Jim
    Jim, Jul 2, 2008
    #5
  6. Laszlo Nagy

    Jim Guest

    On Jul 1, 8:29 pm, John Machin <> wrote:
    > On Jul 2, 9:55 am, Jim <> wrote:
    >
    > Comments on the above grep output:
    > 1. You have SOFT HYPHEN twice, mapping it to u'-' and '-'

    Hmph. I'll correct that. Thanks.
    > 2. The idea of a soft hyphen is as a hint to a hyphenator about where
    > to insert a hyphen if one is necessary and the hyphenator is suspected
    > of acting cluelessly without the hint. IMHO, asciification should
    > substitute u'', not u'-'.

    Thanks also here. I'll think about it.
    > 3. Read PEP 8. s/:/: /

    I don't like the spacing in 8, personally.

    Thanks,
    Jim
    Jim, Jul 2, 2008
    #6
  7. Laszlo Nagy

    Jim Guest

    On Jul 1, 8:42 pm, Jim <> wrote:
    > On Jul 1, 8:29 pm, John Machin <> wrote:
    > > Comments on the above grep output:
    > > 1. You have SOFT HYPHEN twice, mapping it to u'-' and '-'

    >
    > Hmph. I'll correct that. Thanks.

    Well, maybe not. I forgot that I got the by-hand conversions from
    three different sources and that's why that character appears in two
    different places. (I thought that listing all cases for each source
    was less confusing. Arguable, for sure.)

    > 2. The idea of a soft hyphen is as a hint to a hyphenator about where
    > > to insert a hyphen if one is necessary and the hyphenator is suspected
    > > of acting cluelessly without the hint. IMHO, asciification should
    > > substitute u'', not u'-'.

    >
    > Thanks also here. I'll think about it.

    Googling "soft hyphen" showed me that the question is not perfectly
    clear-- some people seem to have very elaborate opinions on the
    topic-- but I've gone with your suggestion. Thank you.

    Again, I'd appreciate additional corrections. Not do I only speak
    ASCII :-( but I admit to entering the data while watching a basketball
    game, so no doubt there are some real blunders.

    Thanks,
    Jim
    Jim, Jul 2, 2008
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. smith Smith

    Making the cursor to blink more visibly...

    smith Smith, May 23, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    352
    Dan Brussee
    May 23, 2004
  2. Terry Reedy
    Replies:
    0
    Views:
    502
    Terry Reedy
    Jul 1, 2008
  3. M.-A. Lemburg
    Replies:
    0
    Views:
    883
    M.-A. Lemburg
    Jul 2, 2008
  4. Alextophi
    Replies:
    8
    Views:
    490
    Alan J. Flavell
    Dec 30, 2005
Loading...

Share This Page