same character show different code in two machine

Ryan Smith · Feb 7, 2010

one chinese character show different code in two different machine.

machine A: \243\244
machine B: \302\245

so I have to using different pattern for two machines, like this:
machine A: text.split("\243\244")
machine B: text.split("\302\245")

I know this is not the proper way, but could some one tell me:
what is the root course?
What different between machine A and B?
what is the proper way to handle this ?

thanks very much!

-ryan

Ryan Smith · Feb 8, 2010

Thanks, Walton,

need include something?

irb(main):006:0> "Hello".encoding
NoMethodError: undefined method `encoding' for "Hello":String
from (irb):6

Brian Candler · Feb 8, 2010

Ryan said:
one chinese character show different code in two different machine.

machine A: \243\244
machine B: \302\245

In hex those are: \xa3\xa4
\xc2\xa5

The first is not valid UTF-8. I suppose it might be UTF-16: U+A3A4 or
U+A4A3 depending on little or big-endian. Or it could be some older
proprietary Asian encoding.

The second of these could be UTF-8. If so it would be codepoint 165, the
'yen' symbol. Or it could be U+C2A5 in UTF-16.

Marnen Laibow-Koser · Feb 8, 2010

Ryan said:
Thanks, Walton,

need include something?

irb(main):006:0> "Hello".encoding
NoMethodError: undefined method `encoding' for "Hello":String
from (irb):6

No, I don't think that method exists in 1.8.

Best,
--Â
Marnen Laibow-Koser
http://www.marnen.org
(e-mail address removed)

Brian Candler · Feb 8, 2010

Ryan said:
The first is not valid UTF-8. I suppose it might be UTF-16: U+A3A4 or
U+A4A3 depending on little or big-endian. Or it could be some older
proprietary Asian encoding.

Click to expand...

[Ryan] How to correct this (to UTF-8), it is a English XP Pro with PRC
as system locale.

Sorry, I have no idea. Are you sure that \xa3\xa4 correponds exactly to
that one character? Is the rest of the encoding variable length or fixed
length? (e.g. are all characters two bytes long, even a western letter
"A"?)

Questions about Microsoft operating systems and what encodings they use
really belong in a Microsoft users' forum, as it's not anything to do
with Ruby.

Ryan Smith · Feb 8, 2010

Brian said:
Ryan said:

The first is not valid UTF-8. I suppose it might be UTF-16: U+A3A4 or
U+A4A3 depending on little or big-endian. Or it could be some older
proprietary Asian encoding.

Click to expand...

[Ryan] How to correct this (to UTF-8), it is a English XP Pro with PRC
as system locale.

Click to expand...

Sorry, I have no idea. Are you sure that \xa3\xa4 correponds exactly to
that one character? Is the rest of the encoding variable length or fixed
length? (e.g. are all characters two bytes long, even a western letter
"A"?)

Questions about Microsoft operating systems and what encodings they use
really belong in a Microsoft users' forum, as it's not anything to do
with Ruby.

I have no idea either, but I will upgrade to ruby 1.9 to leverage
string.encoding feature. thank you.

Two different element types with the same name ?	11	Oct 2, 2012
Problem with a login script, SESSION user rights and put this together so it works with the other pages and MySQL. Code examples.	2	May 5, 2023
[SUMMARY] The Turing Machine (#162)	4	May 15, 2008
Handling different implementations of the same algorithm	1	Aug 10, 2011
Decoding no of ways and printing each decode message	2	Jun 1, 2021
Weird Behavior with Rays in C and OpenGL	4	Feb 13, 2024
C++ grammar: universal-character-name in identifiers	4	Sep 6, 2009
Create subprocess (two distinct processes)	8	Nov 28, 2012

same character show different code in two machine

Ryan Smith

Ryan Smith

Brian Candler

Marnen Laibow-Koser

Brian Candler

Ryan Smith

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads