Problem in writing ruby extension.

Gyoung-Yoon Noh · Nov 5, 2005

Hi,

I am writing a ruby binding for C library implements an automaton
for supporting non-english keyboard input. Following shows an example
usage:

require 'hangul'

hic =3D Hangul::InputContext.new(Hangul::KEYBOARD_2)
input =3D "fnql gksrmf fkdlqmfjfl xptmxm"
buffer =3D ''
input.each_byte do |c|
ret =3D hic.filter(c) # filtering [a-zA-Z] for automaton
commit =3D hic.commit_string # output produced by automaton
buffer << commit if commit
buffer << c.chr unless ret # just append unfiltered chars.
end
hic.flush
buffer << hic.commit_string.to_s

It works as I expected when I paste the code on IRB, or run it with
built-in ruby debugger(-r debug). See the actual session:

$ irb -f
irb(main):001:0> require 'hangul'
=3D> true
irb(main):002:0> hic =3D Hangul::InputContext.new(Hangul::KEYBOARD_2)
=3D> #<Hangul::InputContext:0xb7d50494>
(...)
irb(main):012:0> buffer << hic.commit_string.to_s # bottom line
=3D> "\353\243\250\353\271\204 \355\225\234\352\270\200
\353\235\274\354\235\264\353\270\214\353\237\254\353\246\254
\355\205\214\354\212\244\355\212\270"

FYI, the variable 'buffer' is finally utf-8 encoded string, expected
result.

But it returns strange result when I run the code directly from
command-line. After running the code, the 'buffer' would be filled with
only 3 spaces(0x20). The 'hic.commit_string' always returns nil. I don't
know why the result differs. It doesn't relate with concurrency issues.

My extension codes can be found at:
http://nohmad.sub-port.net/tmp/ruby-hangul/

Any comment will be appreciated.

ts · Nov 5, 2005

G> Any comment will be appreciated.

Can you test the value of len in rbhic_commit_string() ?

wcstombs() can return -1 and in this case

cbuf[len] = '\0';

can do something strange.

Guy Decoux

Park Heesob · Nov 5, 2005

Hi,

From: Gyoung-Yoon Noh <[email protected]>
Reply-To: (e-mail address removed)
To: (e-mail address removed) (ruby-talk ML)
Subject: Problem in writing ruby extension.
Date: Sat, 5 Nov 2005 20:56:29 +0900

Hi,

I am writing a ruby binding for C library implements an automaton
for supporting non-english keyboard input. Following shows an example
usage:

require 'hangul'

hic = Hangul::InputContext.new(Hangul::KEYBOARD_2)
input = "fnql gksrmf fkdlqmfjfl xptmxm"
buffer = ''
input.each_byte do |c|
ret = hic.filter(c) # filtering [a-zA-Z] for automaton
commit = hic.commit_string # output produced by automaton
buffer << commit if commit
buffer << c.chr unless ret # just append unfiltered chars.
end
hic.flush
buffer << hic.commit_string.to_s

It works as I expected when I paste the code on IRB, or run it with
built-in ruby debugger(-r debug). See the actual session:
...

FYI, the variable 'buffer' is finally utf-8 encoded string, expected
result.

But it returns strange result when I run the code directly from
command-line. After running the code, the 'buffer' would be filled with
only 3 spaces(0x20). The 'hic.commit_string' always returns nil. I don't
know why the result differs. It doesn't relate with concurrency issues.

My extension codes can be found at:
http://nohmad.sub-port.net/tmp/ruby-hangul/

Any comment will be appreciated.

The behaviour of wcstombs depends on the LC_CTYPE category of the current
locale.

Modify hangul.c like this:

#include "ruby.h"
#include "hangul.h"
#include <locale.h> // ADDED

static void
rbhic_free(HangulInputContext *hic)
{
hangul_ic_delete(hic);
}

static VALUE
rbhic_alloc(VALUE klass)
{
HangulInputContext *hic = hangul_ic_new(HANGUL_KEYBOARD_2);
setlocale(LC_CTYPE,"ko_KR.eucKR"); // ADDED
return Data_Wrap_Struct(klass, 0, rbhic_free, hic);
}
....

HTH,

Park Heesob

Gyoung-Yoon Noh · Nov 5, 2005

G> Any comment will be appreciated.

Can you test the value of len in rbhic_commit_string() ?

wcstombs() can return -1 and in this case

cbuf[len] =3D '\0';

can do something strange.

Guy Decoux

Thanks for comment.

You're right. That statement is unnecessary, and dangerous.
I've removed that, but still got the same results.

Anyway, it seems that wcstombs(3) does not fill 'cbuf', at least
from command-line. But why on IRB or on ruby debugger, it could
be filled with multibyte string properly?

Gyoung-Yoon Noh · Nov 5, 2005

Hi,

From: Gyoung-Yoon Noh <[email protected]>
Reply-To: (e-mail address removed)
To: (e-mail address removed) (ruby-talk ML)
Subject: Problem in writing ruby extension.
Date: Sat, 5 Nov 2005 20:56:29 +0900

Hi,

I am writing a ruby binding for C library implements an automaton
for supporting non-english keyboard input. Following shows an example
usage:

require 'hangul'

hic =3D Hangul::InputContext.new(Hangul::KEYBOARD_2)
input =3D "fnql gksrmf fkdlqmfjfl xptmxm"
buffer =3D ''
input.each_byte do |c|
ret =3D hic.filter(c) # filtering [a-zA-Z] for automaton
commit =3D hic.commit_string # output produced by automaton
buffer << commit if commit
buffer << c.chr unless ret # just append unfiltered chars.
end
hic.flush
buffer << hic.commit_string.to_s

It works as I expected when I paste the code on IRB, or run it with
built-in ruby debugger(-r debug). See the actual session:
...

FYI, the variable 'buffer' is finally utf-8 encoded string, expected
result.

But it returns strange result when I run the code directly from
command-line. After running the code, the 'buffer' would be filled with
only 3 spaces(0x20). The 'hic.commit_string' always returns nil. I don't
know why the result differs. It doesn't relate with concurrency issues.

My extension codes can be found at:
http://nohmad.sub-port.net/tmp/ruby-hangul/

Any comment will be appreciated.

Click to expand...

The behaviour of wcstombs depends on the LC_CTYPE category of the curren= t
locale.

Modify hangul.c like this:

#include "ruby.h"
#include "hangul.h"
#include <locale.h> // ADDED

static void
rbhic_free(HangulInputContext *hic)
{
hangul_ic_delete(hic);
}

static VALUE
rbhic_alloc(VALUE klass)
{
HangulInputContext *hic =3D hangul_ic_new(HANGUL_KEYBOARD_2);
setlocale(LC_CTYPE,"ko_KR.eucKR"); // ADDED
return Data_Wrap_Struct(klass, 0, rbhic_free, hic);
}
....

HTH,

Park Heesob

Thanks, it works great!

I think fixing specific locale would not be a good idea.
So I modified LC_CTYPE to respect user's environment:

static VALUE
rbhic_alloc(VALUE klass)
{
HangulInputContext *hic =3D hangul_ic_new(HANGUL_KEYBOARD_2);
setlocale(LC_CTYPE, "");
return Data_Wrap_Struct(klass, 0, rbhic_free, hic);
}

writing on file not until the end	8	May 24, 2009
M2Crypto-0.20.2, SWIG-2.0.0, and OpenSSL-1.0.0a build problem	5	Jul 13, 2010
atan2 weirdness	3	Jul 20, 2008
SyncEnumerator?	0	Jul 10, 2004
problem with array sorting - urgent responce needed, due tommrow	5	Mar 4, 2005
Problem with 'header' user control in copied asp.net project	1	Feb 21, 2004
libnet-ldap binary transfers	0	Dec 19, 2003
SINGAPORE PRIVATE CONDO / APT FOR SALE / Singapore New Upcoming Residential Projects	5	Dec 16, 2006

Problem in writing ruby extension.

Gyoung-Yoon Noh

ts

Park Heesob

Gyoung-Yoon Noh

Gyoung-Yoon Noh

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads