XMLRPC (REXML) incorrectly handles UTF-8 data

P

Petr Klima

Hi,
I'm running ruby 1.9.2-p0 on Centos 5.5 x86_64 along with rails 2.3.8.

I have XMLRPC server on another windows machine (rails 1.9.1) and XMLRPC
client on the Centos machine. I need to return UTF-8 encoded data from
server to client and this is where I'm stuck.

The Server seems to be sending correct UTF-8 encoded data, bud client is
unable to parse the XML. If the XML contains ASCII only strings,
everything's OK, but once there is any multi-byte UTF-8 character, ruby
bails out and outputs this:

----------------------
REXML::parseException (#<Encoding::CompatibilityError: incompatible
encoding regexp match (UTF-8 regexp with ASCII-8BIT string)>
/usr/local/lib/ruby/1.9.1/rexml/source.rb:212:in `match'
/usr/local/lib/ruby/1.9.1/rexml/source.rb:212:in `match'
/usr/local/lib/ruby/1.9.1/rexml/parsers/baseparser.rb:425:in `pull'
/usr/local/lib/ruby/1.9.1/rexml/parsers/streamparser.rb:16:in `parse'
/usr/local/lib/ruby/1.9.1/rexml/document.rb:204:in `parse_stream'
/usr/local/lib/ruby/1.9.1/xmlrpc/parser.rb:717:in `parse'
/usr/local/lib/ruby/1.9.1/xmlrpc/parser.rb:460:in `parseMethodResponse'
/usr/local/lib/ruby/1.9.1/xmlrpc/client.rb:421:in `call2'
/usr/local/lib/ruby/1.9.1/xmlrpc/client.rb:410:in `call'
......
----------------------

There seems to be something wrong with REXML non-ASCII data parsing or
maybe encoding detection. I've tracked it down to the "match" method in
IOSource wrapper class in rexml/source.rb file. The problem seems to be
that the @buffer which the method matches against contains ASCII-8bit
string sometimes. Strangely, it happens only when it contains some
non-ASCII data. If there are only ASCII characters in @buffer, it
happily proceeds as UTF-8.

BTW, my client script looks like this:
----------------------
module SubmitFilesHelper

@rpc_server_url='http://172.16.1.2:3000'

def self.sendToServer(filename,language)
require 'xmlrpc/client'
server = XMLRPC::Client.new2(@rpc_server_url)
result = server.call('check', filename,language)
end
end
----------------------


Centos has locale set to en_us.UTF-8

Is there anything I'm doing wrong, or is it ruby bug?

Thanks,
Petr
 
P

Petr Klima

Hm, it's possible to encode the offending string to base64 before
handing it to xmlrpc, effectively bypassing any ruby 1.9 encoding
awareness. Not exactly what I would like to see...

Anyway, is there a correct solution to my problem? Base64 encoding is
working solution, but not correct as I'm manually bypassing a language
feature worth having.

Cheers,
Petr
 
B

botp

REXML::parseException (#<Encoding::CompatibilityError: incompatible
encoding regexp match (UTF-8 regexp with ASCII-8BIT string)>

try,
Encoding.default_internal = Encoding.default_external = "UTF-8"

best regards -botp
 
P

Petr Klima

botp wrote in post #961846:
try,
Encoding.default_internal = Encoding.default_external = "UTF-8"

Damn, I have seen this before and I would swear I tried it and it didn't
help (I was using 1.9.1 at the time). Hm, probably somehow slipped
between my fingers. Thanks a lot, works now :)
 
K

Kouhei Sutou

Hi,

In <[email protected]>
"XMLRPC (REXML) incorrectly handles UTF-8 data" on Tue, 16 Nov 2010 23:37:48 +0900,
Petr Klima said:
I have XMLRPC server on another windows machine (rails 1.9.1) and XMLRPC
client on the Centos machine. I need to return UTF-8 encoded data from
server to client and this is where I'm stuck.

The Server seems to be sending correct UTF-8 encoded data, bud client is
unable to parse the XML. If the XML contains ASCII only strings,
everything's OK, but once there is any multi-byte UTF-8 character, ruby
bails out and outputs this:

Could you show us a reproducable example? We need at least
the HTTP response header and the XML response from your
XML-RPC server.


Thanks,
 
P

Petr Klima

Hi,
here is the reply from XMLRPC server:

HTTP header:
---------------
HTTP/1.1 200: OK
Content-Length: 921
Content-Type: text/xml; charset=3Dutf-8
Server: WEBrick/1.3.1 (Ruby/1.9.1/2010-01-10)
Date: Thu, 18 Nov 2010 07:57:17 GMT
Connection: Keep-Alive
---------------

XML response (should be one line):
---------------
<?xml version=3D"1.0"
?><methodResponse><params><param><value><struct><member><name>result</nam=
e><value><string>ok
</string></value></member><member><name>program_ver</name><value><string>=
10.0.1153</string></value></member><member><na
me>engine_ver</name><value><string>10.0.424</string></value></member><mem=
ber><name>virus_db_ver</name><value><string>42
4/3263
2010-11-1 said:
<string>=D0=9E=D0=BF=D1=80=D0=B5=D0=B4=D0=B5=D0=BB=D0=B5=D0=BD
=D0=B2=D0=B8=D1=80=D1=83=D1=81 EICAR_Test </s
tring></value></member><member><name>infections_found</name><value><strin=
g>1</string></value></member><member><name>pup
s_found</name><value><string>0</string></value></member><member><name>inf=
ections_healed said:
</value></member><member><name>warnings</name
<value><string>0</string></value></member></struct></value></param></par=
ams></methodResponse>
---------------
As you can see, there's correct UTF-8 string in cyrillic in the middle
of the XML.

BTW, botp's suggested solution (Encoding.default_internal =3D
Encoding.default_external =3D "UTF-8") doesn't work in Apache module
Passenger 3.0.0

-- =

Posted via http://www.ruby-forum.com/.=
 
K

Kouhei Sutou

Hi,

In <[email protected]>
"Re: XMLRPC (REXML) incorrectly handles UTF-8 data" on Thu, 18 Nov 2010 17:21:45 +0900,
Petr Klima said:
Hi,
here is the reply from XMLRPC server:

HTTP header: ...
XML response (should be one line): ...

As you can see, there's correct UTF-8 string in cyrillic in the middle
of the XML.

Thanks. I can reproduce it.
This had been fixed in trunk.

This is a problem of REXML but maybe the following code will
fix it. (I don't try it. Sorry.)

module SubmitFilesHelper
module XMLRPCWorkAround
def do_rpc(request, async=false)
data = super
data.force_encoding("UTF-8")
data
end
end

@rpc_server_url='http://172.16.1.2:3000'

def self.sendToServer(filename,language)
require 'xmlrpc/client'
server = XMLRPC::Client.new2(@rpc_server_url)
server.extend(XMLRPCWorkAround)
result = server.call('check', filename,language)
end
end


Thanks,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,276
Latest member
Sawatmakal

Latest Threads

Top