How to use rb_enc_str_new() to create a String with UTF-8 encoding?

  • Thread starter Iñaki Baz Castillo
  • Start date
I

Iñaki Baz Castillo

Hi, when I create a Ruby String from a C extension by using "rb_str_new(s,=
=20
len)" I get a String with US-ASCII encoding.

I don't want to call later String#force_encoding:)"UTF-8") but instead use =
the=20
rb_enc_str_new() function in string.c:


VALUE
rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
{
VALUE str =3D rb_str_new(ptr, len);
rb_enc_associate(str, enc);
return str;
}

But I have no idea on how to set 'enc' parameter to be "UTF-8".
How should I fill the third 'enc' argument?

Thanks a lot.

=2D-=20
I=C3=B1aki Baz Castillo <[email protected]>
 
B

Brian Candler

Iñaki Baz Castillo said:
VALUE
rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
{
VALUE str = rb_str_new(ptr, len);
rb_enc_associate(str, enc);
return str;
}

But I have no idea on how to set 'enc' parameter to be "UTF-8".
How should I fill the third 'enc' argument?

I'd say give it a pointer to an rb_encoding object.

Have a look in encoding.c, this particular function might be useful:

rb_encoding *
rb_enc_find(const char *name)
{
int idx = rb_enc_find_index(name);
if (idx < 0) idx = 0;
return rb_enc_from_index(idx);
}
 
I

Iñaki Baz Castillo

El Mi=C3=A9rcoles, 2 de Diciembre de 2009, Brian Candler escribi=C3=B3:
=20
I'd say give it a pointer to an rb_encoding object.
=20
Have a look in encoding.c, this particular function might be useful:
=20
rb_encoding *
rb_enc_find(const char *name)
{
int idx =3D rb_enc_find_index(name);
if (idx < 0) idx =3D 0;
return rb_enc_from_index(idx);
}

Humm, it involves allocating memory for the rb_encoding object and so... no=
t=20
so trivial as I desired :)
But that's the way. Thanks a lot.


=2D-=20
I=C3=B1aki Baz Castillo <[email protected]>
 
B

Brian Candler

Iñaki Baz Castillo said:
El Miércoles, 2 de Diciembre de 2009, Brian Candler escribió:

Humm, it involves allocating memory for the rb_encoding object

Why? AFAICS, you can just pass a pointer to an existing encoding object.
They are not mutated.

There are other examples, e.g. from io.c

#ifdef _WIN32
if (utf16 == (rb_encoding *)-1) {
utf16 = rb_enc_find("UTF-16LE");
if (utf16 == rb_ascii8bit_encoding())
utf16 = NULL;
}
if (utf16) {
VALUE wfname = rb_str_encode(fname, rb_enc_from_encoding(utf16),
0,
Qnil);
rb_enc_str_buf_cat(wfname, "", 1, utf16); /* workaround */
data.fname = RSTRING_PTR(wfname);
data.wchar = 1;
}
else {
data.wchar = 0;
}
#endif

It looks like rb_enc_from_encoding() takes a pointer to the rb_encoding
object returned from rb_enc_find, and turns it into a VALUE
 
I

Iñaki Baz Castillo

El Mi=C3=A9rcoles, 2 de Diciembre de 2009, Brian Candler escribi=C3=B3:
=20
Why? AFAICS, you can just pass a pointer to an existing encoding object.
They are not mutated.
=20
There are other examples, e.g. from io.c
=20
#ifdef _WIN32
if (utf16 =3D=3D (rb_encoding *)-1) {
utf16 =3D rb_enc_find("UTF-16LE");
if (utf16 =3D=3D rb_ascii8bit_encoding())
utf16 =3D NULL;
}
if (utf16) {
VALUE wfname =3D rb_str_encode(fname, rb_enc_from_encoding(utf16),
0,
Qnil);
rb_enc_str_buf_cat(wfname, "", 1, utf16); /* workaround */
data.fname =3D RSTRING_PTR(wfname);
data.wchar =3D 1;
}
else {
data.wchar =3D 0;
}
#endif
=20
It looks like rb_enc_from_encoding() takes a pointer to the rb_encoding
object returned from rb_enc_find, and turns it into a VALUE

Ok, so the rb_encoding objects already exist and I just must use a point to=
=20
it.
Thanks a lot.=20


=2D-=20
I=C3=B1aki Baz Castillo <[email protected]>
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,019
Latest member
RoxannaSta

Latest Threads

Top