unicode and socket

Z

zyqnews

hello all,
I am new in Python. And I have got a problem about unicode.
I have got a unicode string, when I was going to send it out throuth a
socket by send(), I got an exception. How can I send the unicode string
to the remote end of the socket as it is without any conversion of
encode, so the remote end of the socket will receive unicode string?

Thanks
 
A

aurora

You could not. Unicode is an abstract data type. It must be encoded into
octets in order to send via socket. And the other end must decode the
octets to retrieve the unicode string. Needless to say the encoding scheme
must be consistent and understood by both ends.
 
I

Irmen de Jong

aurora said:
You could not. Unicode is an abstract data type. It must be encoded
into octets in order to send via socket. And the other end must decode
the octets to retrieve the unicode string. Needless to say the encoding
scheme must be consistent and understood by both ends.

So use pickle.

--Irmen
 
I

Irmen de Jong

Irmen said:
So use pickle.

--Irmen

Well, on second thought: don't use pickle.
If all you want to transfer is unicode strings (or normal strings)
it's safer to just encode them to, say, UTF-8, transfer
that octet stream across, and on the other side, decode the
UTF-8 octets back into a unicode string.


--Irmen
 
L

Lion Kimbro

You probably want to use UTF-16 or UTF-8 on both sides of the socket.

See http://www.python.org/moin/Unicode for more information.

So, we have a Unicode string...u'eggs and ham'

Now, we want to send it over:'eggs and ham'

It's encoded in UTF-8 now.

u'eggs and ham'

You have transfered a unicode string. {:)}=
 
Z

zyqnews

It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.
then, how about converting the unicode string to a binary stream? It is
possible to send a binary through socket with python?
 
S

Serge Orlov

It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.

You may really start laughing loudly <wink> after you find out that you
can send arbitrary python objects over sockets. If you want language
specific way of sending objects, see Irmen's first answer: use pickle.
then, how about converting the unicode string to a binary stream?

Sure, there are already three answers in this thread that suggest you
to do that. Use encode method of unicode strings.
It is possible to send a binary through socket with python?

Sure. If it wouldn't be possible to send bytes through sockets with Python
what else do you think could be sent? Perhaps you're confused that
bytes are stored in byte strings in Python, which are often called strings in
documentation and conversations? It will be fixed in Python 3.0, but
these days you have to store bytes in str type.

Serge.
 
A

aurora

It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.
then, how about converting the unicode string to a binary stream? It is
possible to send a binary through socket with python?

I was answering your specific question:

"How can I send the unicode string to the remote end of the socket as it
is without any conversion of encode"

The answer is you could not. Not that you cannot sent unicode but you have
to encode it. The same applies to perl, c or Java. The only difference is
the detail of how strings get encoded.

There are a few posts suggest various means. Or you can check out
codecs.getwriter() which closer resembles Java's way.
 
F

Fredrik Lundh

anonymous coward said:
It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.

Are you sure you understand what Unicode is, and how sockets work?

Sockets are used to transfer byte streams. If you want to transfer
a python-level object, you have to decide how to encode it as a
byte stream. For integers, you have to decide whether to use a single
byte, a string of decimal ascii characters, netstring syntax, etc. For
text, you have to decide what character encoding to use. For arbitrary
objects, you have to decide what serialisation protocol to use. etc.

(and yes, the same applies to all other languages. Java sockets and C
sockets are no different from Python sockets...)

</F>
 
C

Christos TZOTZIOY Georgiou

It's really funny, I cannot send a unicode stream throuth socket with
python while all the other languages as perl,c and java can do it.

I don't know about perl. What I think you mean by unicode in C most probably is
the wchar_t, which is Unicode encoded as 'ucs-2' or 'utf-16' (little or big
endian, depending on your platform) or maybe a 4-byte int, for which I don't
know a Python equivalent. And I /assume/ in Java that Unicode is equivalent to
'utf-16' encoded strings when input/output.

Perhaps Unicode encoded as 'utf-16' is what you're after. However, Unicode
encoded as 'utf-8' (like others also suggested) might be what you /should/ be
using, given that this encoding has some attractive properties (no null bytes,
no spurious control characters etc).

Don't interpret as weakness the explicitness requested from Python.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,234
Latest member
SkyeWeems

Latest Threads

Top