Problem in writing ruby extension.

  • Thread starter Gyoung-Yoon Noh
  • Start date
G

Gyoung-Yoon Noh

Hi,

I am writing a ruby binding for C library implements an automaton
for supporting non-english keyboard input. Following shows an example
usage:

require 'hangul'

hic =3D Hangul::InputContext.new(Hangul::KEYBOARD_2)
input =3D "fnql gksrmf fkdlqmfjfl xptmxm"
buffer =3D ''
input.each_byte do |c|
ret =3D hic.filter(c) # filtering [a-zA-Z] for automaton
commit =3D hic.commit_string # output produced by automaton
buffer << commit if commit
buffer << c.chr unless ret # just append unfiltered chars.
end
hic.flush
buffer << hic.commit_string.to_s

It works as I expected when I paste the code on IRB, or run it with
built-in ruby debugger(-r debug). See the actual session:

$ irb -f
irb(main):001:0> require 'hangul'
=3D> true
irb(main):002:0> hic =3D Hangul::InputContext.new(Hangul::KEYBOARD_2)
=3D> #<Hangul::InputContext:0xb7d50494>
(...)
irb(main):012:0> buffer << hic.commit_string.to_s # bottom line
=3D> "\353\243\250\353\271\204 \355\225\234\352\270\200
\353\235\274\354\235\264\353\270\214\353\237\254\353\246\254
\355\205\214\354\212\244\355\212\270"

FYI, the variable 'buffer' is finally utf-8 encoded string, expected
result.

But it returns strange result when I run the code directly from
command-line. After running the code, the 'buffer' would be filled with
only 3 spaces(0x20). The 'hic.commit_string' always returns nil. I don't
know why the result differs. It doesn't relate with concurrency issues.

My extension codes can be found at:
http://nohmad.sub-port.net/tmp/ruby-hangul/

Any comment will be appreciated.
 
T

ts

G> Any comment will be appreciated.

Can you test the value of len in rbhic_commit_string() ?

wcstombs() can return -1 and in this case

cbuf[len] = '\0';

can do something strange.


Guy Decoux
 
P

Park Heesob

Hi,
From: Gyoung-Yoon Noh <[email protected]>
Reply-To: (e-mail address removed)
To: (e-mail address removed) (ruby-talk ML)
Subject: Problem in writing ruby extension.
Date: Sat, 5 Nov 2005 20:56:29 +0900

Hi,

I am writing a ruby binding for C library implements an automaton
for supporting non-english keyboard input. Following shows an example
usage:

require 'hangul'

hic = Hangul::InputContext.new(Hangul::KEYBOARD_2)
input = "fnql gksrmf fkdlqmfjfl xptmxm"
buffer = ''
input.each_byte do |c|
ret = hic.filter(c) # filtering [a-zA-Z] for automaton
commit = hic.commit_string # output produced by automaton
buffer << commit if commit
buffer << c.chr unless ret # just append unfiltered chars.
end
hic.flush
buffer << hic.commit_string.to_s

It works as I expected when I paste the code on IRB, or run it with
built-in ruby debugger(-r debug). See the actual session:
...

FYI, the variable 'buffer' is finally utf-8 encoded string, expected
result.

But it returns strange result when I run the code directly from
command-line. After running the code, the 'buffer' would be filled with
only 3 spaces(0x20). The 'hic.commit_string' always returns nil. I don't
know why the result differs. It doesn't relate with concurrency issues.

My extension codes can be found at:
http://nohmad.sub-port.net/tmp/ruby-hangul/

Any comment will be appreciated.

The behaviour of wcstombs depends on the LC_CTYPE category of the current
locale.

Modify hangul.c like this:

#include "ruby.h"
#include "hangul.h"
#include <locale.h> // ADDED

static void
rbhic_free(HangulInputContext *hic)
{
hangul_ic_delete(hic);
}

static VALUE
rbhic_alloc(VALUE klass)
{
HangulInputContext *hic = hangul_ic_new(HANGUL_KEYBOARD_2);
setlocale(LC_CTYPE,"ko_KR.eucKR"); // ADDED
return Data_Wrap_Struct(klass, 0, rbhic_free, hic);
}
....

HTH,

Park Heesob
 
G

Gyoung-Yoon Noh

G> Any comment will be appreciated.

Can you test the value of len in rbhic_commit_string() ?

wcstombs() can return -1 and in this case

cbuf[len] =3D '\0';

can do something strange.


Guy Decoux

Thanks for comment.

You're right. That statement is unnecessary, and dangerous.
I've removed that, but still got the same results.

Anyway, it seems that wcstombs(3) does not fill 'cbuf', at least
from command-line. But why on IRB or on ruby debugger, it could
be filled with multibyte string properly?
 
G

Gyoung-Yoon Noh

Hi,
From: Gyoung-Yoon Noh <[email protected]>
Reply-To: (e-mail address removed)
To: (e-mail address removed) (ruby-talk ML)
Subject: Problem in writing ruby extension.
Date: Sat, 5 Nov 2005 20:56:29 +0900

Hi,

I am writing a ruby binding for C library implements an automaton
for supporting non-english keyboard input. Following shows an example
usage:

require 'hangul'

hic =3D Hangul::InputContext.new(Hangul::KEYBOARD_2)
input =3D "fnql gksrmf fkdlqmfjfl xptmxm"
buffer =3D ''
input.each_byte do |c|
ret =3D hic.filter(c) # filtering [a-zA-Z] for automaton
commit =3D hic.commit_string # output produced by automaton
buffer << commit if commit
buffer << c.chr unless ret # just append unfiltered chars.
end
hic.flush
buffer << hic.commit_string.to_s

It works as I expected when I paste the code on IRB, or run it with
built-in ruby debugger(-r debug). See the actual session:
...

FYI, the variable 'buffer' is finally utf-8 encoded string, expected
result.

But it returns strange result when I run the code directly from
command-line. After running the code, the 'buffer' would be filled with
only 3 spaces(0x20). The 'hic.commit_string' always returns nil. I don't
know why the result differs. It doesn't relate with concurrency issues.

My extension codes can be found at:
http://nohmad.sub-port.net/tmp/ruby-hangul/

Any comment will be appreciated.

The behaviour of wcstombs depends on the LC_CTYPE category of the curren= t
locale.

Modify hangul.c like this:

#include "ruby.h"
#include "hangul.h"
#include <locale.h> // ADDED

static void
rbhic_free(HangulInputContext *hic)
{
hangul_ic_delete(hic);
}

static VALUE
rbhic_alloc(VALUE klass)
{
HangulInputContext *hic =3D hangul_ic_new(HANGUL_KEYBOARD_2);
setlocale(LC_CTYPE,"ko_KR.eucKR"); // ADDED
return Data_Wrap_Struct(klass, 0, rbhic_free, hic);
}
....

HTH,

Park Heesob

Thanks, it works great!

I think fixing specific locale would not be a good idea.
So I modified LC_CTYPE to respect user's environment:

static VALUE
rbhic_alloc(VALUE klass)
{
HangulInputContext *hic =3D hangul_ic_new(HANGUL_KEYBOARD_2);
setlocale(LC_CTYPE, "");
return Data_Wrap_Struct(klass, 0, rbhic_free, hic);
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top