Problem processing Chinese character with Python

Discussion in 'Python' started by Anthony Liu, Mar 6, 2004.

  1. Anthony Liu

    Anthony Liu Guest

    Andrew gave me a sample code with let me read a text
    file sentence by sentence.

    Suppose I just wanna read the part between 2 full
    stops each time.

    It works nicely with English text files, where the
    full stop is a dot (.).

    But when I tried to read Chinese text files, I found
    that it sometimes reads a few sentences at one time.

    I guess the reason is that in Chinese, the full stop
    is not a dot (.), but a little circle, as many of you
    probably know.

    Indeed, if I replace the Chinese full stop with the
    dot. It nicely gets only one sentence each time.

    So, how should I fix this problem? I am really having
    headache processing Chinese characters with Python.

    Here is the sample code that Andrew offered:

    def bytes(f):
    # Below: f.read(2) to process Chinese
    for byte in iter(lambda: f.read(1), ''):
    yield byte

    def sentences(iterable):
    sentence = ''
    for char in iterable:
    sentence += char
    # The little cirlce is the Chinese
    # full stop. Some of might not be able
    # view it if you don't have
    # east Asian language support.
    if char in ('。','.'):
    yield sentence.strip()
    sentence = ''
    sentence = sentence.strip()
    if sentence:
    yield sentence


    __________________________________
    Do you Yahoo!?
    Yahoo! Search - Find what you’re looking for faster
    http://search.yahoo.com
     
    Anthony Liu, Mar 6, 2004
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?U3BpZGVyX0ppYQ==?=

    how to diaplay chinese character in aspx page

    =?Utf-8?B?U3BpZGVyX0ppYQ==?=, May 27, 2004, in forum: ASP .Net
    Replies:
    3
    Views:
    768
    Natty Gur
    May 28, 2004
  2. Anthony Liu
    Replies:
    0
    Views:
    446
    Anthony Liu
    Mar 7, 2004
  3. Anthony Liu

    Problem processing Chinese

    Anthony Liu, Oct 14, 2005, in forum: Python
    Replies:
    1
    Views:
    369
    Peter Otten
    Oct 14, 2005
  4. Blackguester
    Replies:
    0
    Views:
    404
    Blackguester
    Jan 12, 2009
  5. bob
    Replies:
    1
    Views:
    144
    Axel Etzold
    Jun 14, 2007
Loading...

Share This Page