Interfacing with Ruby garbage collector - when returning value fromC extension to Ruby

B

Benjie Chen

Hi

I am working my way through Ferret (Ruby port of Lucene) code to solve
a bug. Ferret code is mainly a C extension to Ruby. I am running into
some issues with the garbage collector. I managed to fix it, but I
don't completely understand my fix =) I am hoping someone with deeper
knowledge of Ruby and C extension (this is my 3rd day with Ruby) can
elaborate. Thanks.

Here is the situation:

Some where in Ferret C code, I am returning a "Token" to Ruby land.
The code looks like

static VALUE get_token (...)
{
...
RToken *token = ALLOC(RToken);
token->text = rb_str_new2("some text");
return Data_Wrap_Struct(..., &frt_token_mark, &frt_token_free, token);
}

frt_token_mark calls rb_gc_mark(token->text) and frt_token_free
just frees the token with free(token)

In Ruby, this code correlates to the following:

token = @input.next

Basically, @input is set to some object, calling the next method on it
triggers the get_token C call, which returns a token object.

In Ruby land, I then do something like w = token.text.scan('\w+')

When I run this code inside a while 1 loop (to isolate my problem), at
some point (roughly when my ruby process mem footprint goes to 256MB,
probably some GC threshold), Ruby dies with errors like

scan method called on terminated object

Or just core dumps. My guess was that token.text was garbage collected.

I don't know enough about Ruby C extension to know what happens with
Data_Wrap_Struct returned objects. Seems to me the assignment in Ruby
land, token =, should create a reference to it.

My "work-around"/"fix" is to create a Ruby instance variable in the
object referred to by @input, and stores the token text in there, to
get an extra reference to it. So the C code looks like

RToken *token = ALLOC(RToken);
token->text = rb_str_new2(tk->text);
/* added code: prevent garbage collection */
rb_ivar_set(input, id_curtoken, token->text);
return Data_Wrap_Struct(cToken, &frt_token_mark, &frt_token_free, token);

So now I've created a "curtoken" in the input instance variable, and
saved a copy of the text there... I've taken care to remove/delete
this reference in the free callback of the class for @input.

With this code, it works in that I no longer get the terminated object error.

The fix seems to make sense to me -- it keeps an extra ref in curtoken
to the token.text string so an instance of token.text won't be removed
until the next time @input.next is called (at which time a different
token.text replaces the old value in curtoken).

My question is: why did it not work before? Shouldn't
Data_Wrap_Structure return an object that, when assigned in Ruby land,
has a valid reference and not be removed by Ruby?

Thanks,
Benjie
 
C

Caleb Clausen

Some where in Ferret C code, I am returning a "Token" to Ruby land.
The code looks like

static VALUE get_token (...)
{
...
RToken *token = ALLOC(RToken);
token->text = rb_str_new2("some text");
return Data_Wrap_Struct(..., &frt_token_mark, &frt_token_free, token);
}

frt_token_mark calls rb_gc_mark(token->text) and frt_token_free
just frees the token with free(token)

In Ruby, this code correlates to the following:

token = @input.next

Basically, @input is set to some object, calling the next method on it
triggers the get_token C call, which returns a token object.

In Ruby land, I then do something like w = token.text.scan('\w+')

What if you change this to:
text=token.text
w=text.scan('\w+')

It's that text object that your error message is complaining about;
what if we make an explicit reference to it the ruby side?

Or maybe the token.text is being freed between when the token is
created and when it's assigned to a ruby variable? There are no
ruby-land references to it or the token which refers to it during that
brief time, so if the garbage collector happens to be invoked there...
but presumably there is a reference to token somewhere on the c call
stack, so that shouldn't be an issue.
When I run this code inside a while 1 loop (to isolate my problem), at
some point (roughly when my ruby process mem footprint goes to 256MB,
probably some GC threshold), Ruby dies with errors like

scan method called on terminated object

Or just core dumps. My guess was that token.text was garbage collected.

Weird. It seems like at first glance the code you cite is doing
everything right.
I don't know enough about Ruby C extension to know what happens with
Data_Wrap_Struct returned objects. Seems to me the assignment in Ruby
land, token =, should create a reference to it.

To the token... but it's the text field in token that the warning was
about specifically.
My "work-around"/"fix" is to create a Ruby instance variable in the
object referred to by @input, and stores the token text in there, to
get an extra reference to it. So the C code looks like

That shouldn't be necessary; the mark routine you showed should be
enough to keep ruby's gc informed as to the liveness of your
token.texts. This isn't a proper fix but only a hack; it may well be a
good interim measure, but something deeper is wrong and needs to be
understood.
My question is: why did it not work before? Shouldn't
Data_Wrap_Structure return an object that, when assigned in Ruby land,
has a valid reference and not be removed by Ruby?

Maybe this never really worked... at least not 100% of the time.
Ferret has its bugs.
 
B

Benjie Chen

Or maybe the token.text is being freed between when the token is
created and when it's assigned to a ruby variable? There are no
ruby-land references to it or the token which refers to it during that
brief time, so if the garbage collector happens to be invoked there...
but presumably there is a reference to token somewhere on the c call
stack, so that shouldn't be an issue.

Hmm, can you elaborate on where the reference in C is created to
prevent GC? If you look at the code, it has

RToken *token = ALLOC(RToken);
token->text = rb_str_new2(tk->text);
return Data_Wrap_Struct(cToken, &frt_token_mark, &frt_token_free, token);

My question is, where is the reference to the Ruby token object
created to keep GC from reaping it?

1. token is a ptr to an allocated memory from ALLOC. Does ALLOC create
memory pts that keeps reference? I don't believe so, because RToken is
just a struct, and there's no fancy C++ copy constructors to bump
references on assignment.

2. Does Data_Wrap_Struct automatically creates a Ruby object with 1
reference? Or a new, un-assigned Ruby object w/ reference 0 and
waiting to be assigned to a Ruby variable or a temporary? In the
latter case, then we'd have an issue, because if GC runs before any
assignment, GC would get rid of the token and token->text memory.

Note that I am assuming in Ruby, if I do t = token, then a reference
to token is created on the assignment and that keeps token->text alive
as well.
 
B

Benjie Chen

Actually, I believe a better solution to the problem I described is
before returning a VALUE from a C proc to Ruby land, keeps a reference
of it somewhere in Ruby. So the code looks like

VALUE v;
...
v =3D Data_Wrap_Struct (...);
rb_ivar_set (..., &v);
return v;

So that what you returned, v, is not removed by GC before it's
assigned or referenced in Ruby land.

Does this sound like the right approach?

Thanks,

Benjie
 
C

Caleb Clausen

Hmm, can you elaborate on where the reference in C is created to
prevent GC? If you look at the code, it has

RToken *token = ALLOC(RToken);
token->text = rb_str_new2(tk->text);
return Data_Wrap_Struct(cToken, &frt_token_mark, &frt_token_free, token);

So, Data_Wrap_Struct returns a VALUE, that is, something that ruby's
GC is responsible for freeing when it dies. That VALUE contains a
c-level pointer to your token struct. So, when the wrapping VALUE
dies, that token struct also needs to be freed. Which is what
frt_token_free does. So far, so good.

But, when the GC is scanning thru memory looking for objects to mark
(as being not eligible to free, because they're already referenced
somewhere), if it sees that the wrapping VALUE is live, it calls
frt_token_mark. Which lets the GC know about the c-level reference to
another VALUE, the token.text. The GC would otherwise miss this
reference, since it does not scan or manage objects created via ALLOC.
The mark callback passed to Data_Wrap_Struct helps the GC know about
such references. And frt_token_mark is doing the right thing; manually
marking token.text for the gc.

Ah-hah, but, what happens if the GC happens to get invoked during the
call to Data_Wrap_Struct? There's no ruby-level reference to the token
struct; that's ok, the GC won't try to free it. There's no ruby-level
reference to the VALUE wrapping the token struct, that's what
Data_Wrap_Struct is trying to create. Still ok so far, but...

!! There's no reference to the token.text !!
!! Which is a VALUE, managed by the GC !!
!! So, GC thinks its ok to free token.text !!

Normally, there should be a reference to that token.text VALUE
somewhere on the c call stack, (which is part of the roots used by the
GC), but in this case its being assigned directly into a struct
member, which is not stored on the stack. And GC does not know to scan
the token struct, because it doesn't follow pointers to c objects, and
doesn't yet have a wrapper VALUE that tells it what to do.

Try rewriting those 3 lines as:

RToken *token = ALLOC(RToken);
volatile VALUE text = rb_str_new2(tk->text);
token->text = text;
return Data_Wrap_Struct(cToken, &frt_token_mark, &frt_token_free, token);

That should ensure that the reference to text remains on the c call
stack. The volatile may be overkill, but you can't be too careful....

I think this may be your problem. I've had similar issues with ferret,
which I never managed to track down. (I wonder how many other places
in ferret have the same problem...?)
My question is, where is the reference to the Ruby token object
created to keep GC from reaping it?

Hopefully, the explanation above will be informative to you. If not,
ask again and I'll try to clarify.
1. token is a ptr to an allocated memory from ALLOC. Does ALLOC create
memory pts that keeps reference? I don't believe so, because RToken is
just a struct, and there's no fancy C++ copy constructors to bump
references on assignment.

There's no reference counting in ruby; it's a mark-sweep garbage collector.
2. Does Data_Wrap_Struct automatically creates a Ruby object with 1
reference? Or a new, un-assigned Ruby object w/ reference 0 and
waiting to be assigned to a Ruby variable or a temporary? In the
latter case, then we'd have an issue, because if GC runs before any
assignment, GC would get rid of the token and token->text memory.

The reference to the VALUE returned by Data_Wrap_Struct kept on the c
call stack should keep it from being garbage collected.
Note that I am assuming in Ruby, if I do t = token, then a reference
to token is created on the assignment and that keeps token->text alive
as well.
Yes.


Actually, I believe a better solution to the problem I described is
before returning a VALUE from a C proc to Ruby land, keeps a reference
of it somewhere in Ruby. So the code looks like

VALUE v;
...
v = Data_Wrap_Struct (...);
rb_ivar_set (..., &v);
return v;

So that what you returned, v, is not removed by GC before it's
assigned or referenced in Ruby land.

Does this sound like the right approach?

This is still hacky; you shouldn't need to create any more references
in ruby-level variables. If the diagnosis I gave above is correct this
won't solve the problem for you anyway, since it's the token.text that
seems to be vulnerable to a premature free, not token itself.
 
B

Benjie Chen

Caleb,

Thanks for continuing with this.
Try rewriting those 3 lines as:

=A0 =A0RToken *token =3D ALLOC(RToken);
=A0 =A0volatile VALUE text =3D rb_str_new2(tk->text);
=A0 =A0token->text =3D text;
=A0 =A0return Data_Wrap_Struct(cToken, &frt_token_mark, &frt_token_free, =
token);

I tried this, but it did not fix the problem. I get the same error,
"scan" method called on terminated object, and I was calling scan on
token->text in Ruby land.

I believe your analysis is mostly right, leading to this suggestion,
so I think it's possible that having the volatile VALUE text in the
stack is right too (although, since Ruby, at least the version I use,
has cooperative threads, I don't think it would ever interrupt in the
middle of this particularly C extension to run GC, in this C extension
there's no point where the thread blocks and gives up control of the
processor).

However, if GC is called after the procedure returns, therefore the
stack "VALUE text" is destroyed, and before the result of the
Data_Wrap_Struct is assigned to anything, then it's possible that GC
may not know about token yet, and token gets removed. I am not clear,
w/o looking at the GC code, exactly how the GC works. I can't think of
a scenario why this is the case though, since Data_Wrap_Struct's
result, when returned, should be in the caller's stack.

Right now the only fix I have is to do something like

VALUE v =3D Data_Wrap_Struct(...);
rb_ivar_set (..., &v);
return v;

This really suggests a couple of things: 1) it's token that gets
destroyed, and since I always use token->text first, that's why it
seems like token->text is at fault; 2) after the return
Data_Wrap_Struct in the original code, GC snuck in and reaped the
returned value...
 
C

Caleb Clausen

Caleb,

Thanks for continuing with this.


I tried this, but it did not fix the problem. I get the same error,
"scan" method called on terminated object, and I was calling scan on
token->text in Ruby land.

damn. I had my hopes.
I believe your analysis is mostly right, leading to this suggestion,
so I think it's possible that having the volatile VALUE text in the
stack is right too (although, since Ruby, at least the version I use,
has cooperative threads, I don't think it would ever interrupt in the
middle of this particularly C extension to run GC, in this C extension
there's no point where the thread blocks and gives up control of the
processor).

ruby gc doesn't run in a separate thread; AFAIK, it's invoked whenever
the memory manager runs out of room in its current heap and is about
to ask the system for more memory. So, anytime some kind of ruby VALUE
gets allocated, the gc could potentially run. (like Data_Wrap_Struct,
rb_str_new2....)
However, if GC is called after the procedure returns, therefore the
stack "VALUE text" is destroyed, and before the result of the
Data_Wrap_Struct is assigned to anything, then it's possible that GC
may not know about token yet, and token gets removed. I am not clear,
w/o looking at the GC code, exactly how the GC works. I can't think of
a scenario why this is the case though, since Data_Wrap_Struct's
result, when returned, should be in the caller's stack.

You could try invoking the gc manually at the point where you think
it's causing a race condition. Like right after the call to
rb_str_new2 is where I would first try. That won't fix a thing, but it
may make the problem easier to reproduce... so you don't have to run a
loop 25 million times.
Right now the only fix I have is to do something like

VALUE v = Data_Wrap_Struct(...);
rb_ivar_set (..., &v);
return v;

At least you do have a fix that works...
This really suggests a couple of things: 1) it's token that gets
destroyed, and since I always use token->text first, that's why it
seems like token->text is at fault; 2) after the return
Data_Wrap_Struct in the original code, GC snuck in and reaped the
returned value...

This shouldn't be the case because there should still be a reference
somewhere on the c call stack to the result of Data_Wrap_Struct, which
would prevent it from being freed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top