string u'hyv\xe4' to file as 'hyvä'

Discussion in 'Python' started by gintare, Dec 26, 2010.

  1. gintare

    gintare Guest

    Could you please help me with special characters saving to file.

    I need to write the string u'hyv\xe4' to file.
    I would like to open file and to have line 'hyvä'

    import codecs
    word= u'hyv\xe4'
    F=codecs.open(/opt/finnish.txt, 'w+','Latin-1')

    F.writelines(item.encode('Latin-1'))
    F.writelines(item.encode('utf8'))
    F.writelines(item)

    F.close()

    All three writelines gives the same result in finnish.txt: hyv\xe4
    i would like to find 'hyvä'.

    regards,
    gintare
     
    gintare, Dec 26, 2010
    #1
    1. Advertising

  2. gintare

    MRAB Guest

    On 26/12/2010 22:43, gintare wrote:
    > Could you please help me with special characters saving to file.
    >
    > I need to write the string u'hyv\xe4' to file.
    > I would like to open file and to have line 'hyvä'
    >
    > import codecs
    > word= u'hyv\xe4'
    > F=codecs.open(/opt/finnish.txt, 'w+','Latin-1')


    This opens the file using the Latin-1 encoding (although only if you
    put the filename in quotes).
    >
    > F.writelines(item.encode('Latin-1'))


    This encodes the Unicode item (did you mean 'word'?) to a bytestring
    using the Latin-1 encoding. You opened the file using Latin-1 encoding,
    so this is pointless. You should pass a Unicode string; it will encode
    it for you.

    You're also passing a bytestring to the .writelines method, which
    expects a list of strings.

    What you should be doing is this:

    F.write(word)

    > F.writelines(item.encode('utf8'))


    This encodes the Unicode item to a bytestring using the UTF-8 encoding.
    This is also pointless. You shouldn't be encoding to UTF-8 and then
    trying to write it to a file which was opened using Latin-1 encoding!

    > F.writelines(item)
    >
    > F.close()
    >
    > All three writelines gives the same result in finnish.txt: hyv\xe4
    > i would like to find 'hyvä'.
    >
     
    MRAB, Dec 26, 2010
    #2
    1. Advertising

  3. gintare

    gintare Guest

    Re: string u'hyv\xe4' to file as 'hyvä'

    Hello,
    STILL do not work. WHAT to be done.

    import codecs
    item=u'hyv\xe4'
    F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
    F.writelines(item.encode('utf8'))
    F.close()

    In file i find 'hyv\xe4' instead of hyvä.

    (Sorry for mistyping in previous letter about 'latin-1'. I was making
    all possible combinations, when normal example syntax did not work,
    before writting to this forum.)

    regards,
    gintare

    On 27 Gruo, 00:43, gintare <> wrote:
    > Could you please help me with special characters saving to file.
    >
    > I need to write the string u'hyv\xe4' to file.
    > I would like to open file and to have line 'hyvä'
    >
    > import codecs
    > word= u'hyv\xe4'
    > F=codecs.open(/opt/finnish.txt, 'w+','Latin-1')
    >
    > F.writelines(item.encode('Latin-1'))
    > F.writelines(item.encode('utf8'))
    > F.writelines(item)
    >
    > F.close()
    >
    > All three writelines gives the same result in finnish.txt:   hyv\xe4
    > i would like to find 'hyvä'.
    >
    > regards,
    > gintare
     
    gintare, Dec 27, 2010
    #3
  4. gintare

    Mark Tolonen Guest

    Re: string u'hyv\xe4' to file as 'hyvä'

    "gintare" <> wrote in message
    news:...
    > Hello,
    > STILL do not work. WHAT to be done.
    >
    > import codecs
    > item=u'hyv\xe4'
    > F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
    > F.writelines(item.encode('utf8'))
    > F.close()
    >
    > In file i find 'hyv\xe4' instead of hyvä.


    When you open a file with codecs.open(), it expects Unicode strings to be
    written to the file. Don't encode them again. Also, .writelines() expects
    a list of strings. Use .write():

    import codecs
    item=u'hyv\xe4'
    F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
    F.write(item)
    F.close()

    An additional comment, if you save the script in UTF8, you can inform Python
    of that fact with a special comment, and actually use the correct characters
    in your string constants (ä instead of \xe4). Make sure to use a text
    editor that can save in UTF8, or use the correct coding comment for whatever
    encoding in which you save the file.

    # coding: utf8
    import codecs
    item=u'hyvä'
    F=codecs.open('finnish.txt', 'w+', 'utf8')
    F.write(item)
    F.close()

    -Mark
     
    Mark Tolonen, Dec 27, 2010
    #4
  5. gintare

    Alex Willmer Guest

    Re: string u'hyv\xe4' to file as 'hyvä'

    On Dec 27, 6:47 am, "Mark Tolonen" <> wrote:
    > "gintare" <> wrote in message
    > > In file i find 'hyv\xe4' instead of hyv .

    >
    > When you open a file with codecs.open(), it expects Unicode strings to be
    > written to the file.  Don't encode them again.  Also, .writelines() expects
    > a list of strings.  Use .write():
    >
    >     import codecs
    >     item=u'hyv\xe4'
    >     F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
    >     F.write(item)
    >     F.close()


    Gintare, Mark's code is correct. When you are reading the file back
    make sure you understand what you are seeing:

    >>> F2 = codecs.open('finnish.txt', 'r', 'utf8')
    >>> item2 = F2.read()
    >>> item2

    u'hyv\xe4'

    That might like as though item2 is 7 characters long, and it contains
    a backslash followed by x, e, 4. However item2 is identical to item,
    they both contain 4 characters - the final one being a-umlaut. Python
    has shown the string using a backslash escape, because printing a non-
    ascii character might fail. You can see this directly, if your Python
    session is running in a terminal (or GUI) that can handle non-ascii
    characters:

    >>> print item2

    hyvä
     
    Alex Willmer, Dec 27, 2010
    #5
  6. gintare

    MRAB Guest

    Re: string u'hyv\xe4' to file as 'hyvä'

    On 27/12/2010 05:56, gintare wrote:
    > Hello,
    > STILL do not work. WHAT to be done.
    >
    > import codecs
    > item=u'hyv\xe4'
    > F=codecs.open('/opt/finnish.txt', 'w+', 'utf8')
    > F.writelines(item.encode('utf8'))
    > F.close()


    As I said in my previous post, you shouldn't be using .writelines, and
    you shouldn't encode it when writing it to the file because codecs.open
    will do that for you, that's its purpose:

    import codecs
    item = u'hyv\xe4'
    F = codecs.open('/opt/finnish.txt', 'w+', 'utf8')
    F.write(item)
    F.close()

    >
    > In file i find 'hyv\xe4' instead of hyvä.
    >
    > Sorry for mistyping in previous letter about 'latin-1'. I was making
    > all possible combinations, when normal example syntax did not work,
    > before writting to this forum
    >
    > regards,
    > gintare
    >
    >
    >
    > On 27 Gruo, 01:14, MRAB<> wrote:
    >> On 26/12/2010 22:43, gintare wrote:
    >>
    >>> Could you please help me with special characters saving to file.

    >>
    >>> I need to write the string u'hyv\xe4' to file.
    >>> I would like to open file and to have line 'hyv '

    >>
    >>> import codecs
    >>> word= u'hyv\xe4'
    >>> F=codecs.open(/opt/finnish.txt, 'w+','Latin-1')

    >>
    >> This opens the file using the Latin-1 encoding (although only if you
    >> put the filename in quotes).
    >>
    >>
    >>
    >>> F.writelines(item.encode('Latin-1'))

    >>
    >> This encodes the Unicode item (did you mean 'word'?) to a bytestring
    >> using the Latin-1 encoding. You opened the file using Latin-1 encoding,
    >> so this is pointless. You should pass a Unicode string; it will encode
    >> it for you.
    >>
    >> You're also passing a bytestring to the .writelines method, which
    >> expects a list of strings.
    >>
    >> What you should be doing is this:
    >>
    >> F.write(word)
    >>
    >>> F.writelines(item.encode('utf8'))

    >>
    >> This encodes the Unicode item to a bytestring using the UTF-8 encoding.
    >> This is also pointless. You shouldn't be encoding to UTF-8 and then
    >> trying to write it to a file which was opened using Latin-1 encoding!
    >>
    >>
    >>
    >>> F.writelines(item)

    >>
    >>> F.close()

    >>
    >>> All three writelines gives the same result in finnish.txt: hyv\xe4
    >>> i would like to find 'hyv '.- SlÄ—pti cituojamÄ… tekstÄ… -

    >>
    >> - Rodyti cituojamÄ… tekstÄ… -

    >
    >
     
    MRAB, Dec 27, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mladen Adamovic
    Replies:
    0
    Views:
    741
    Mladen Adamovic
    Dec 4, 2003
  2. Mladen Adamovic
    Replies:
    3
    Views:
    14,620
    Mladen Adamovic
    Dec 5, 2003
  3. Matt
    Replies:
    3
    Views:
    510
    Tor Iver Wilhelmsen
    Sep 17, 2004
  4. Fei Liu
    Replies:
    9
    Views:
    447
  5. balavignesh
    Replies:
    0
    Views:
    1,995
    balavignesh
    Nov 8, 2009
Loading...

Share This Page