Using array instead of hash and still have efficient cross-referencing

Intransition · May 5, 2010

Have a look at this class:

http://github.com/proutils/lemon/blob/master/lib/lemon/snapshot.rb

I am storing a collection of OfModule instances in a hash (@modules)
indexed by the modules they correspond to. I'd rather just use an
array, since that is the basic intent. But I ran into difficulty when
implementing the #- method, which led me to use the hash instead.

Is there an efficient way to implement the #- method if @modules were
an array instead of a hash?

Feel free to offer any other critiques of this code too, btw.

trans.

Caleb Clausen · May 5, 2010

Have a look at this class:

http://github.com/proutils/lemon/blob/master/lib/lemon/snapshot.rb

I am storing a collection of OfModule instances in a hash (@modules)
indexed by the modules they correspond to. I'd rather just use an
array, since that is the basic intent. But I ran into difficulty when
implementing the #- method, which led me to use the hash instead.

Is there an efficient way to implement the #- method if @modules were
an array instead of a hash?

So, am I understanding you correctly? The core problem in on line 77?
You need an efficient way to tell if c has a particular module in it
or not, so you can know whether to fool around with the methods it
advertises?

Why not create a temporary hash before the other.modules.each loop and
use it to tell which modules are present.... something like (assuming
you rewrite @modules as an array):

known_mods={}
c.modules.each{|mod| known_mods[mod.base]=mod }
other.modules.each do |ofmod|
if known_mods[ofmod.base]
...
end
end

Your Snapshot#- won't be as efficient as it is now, since you have to
build that index up every time its called, but it should be a fairly
minor performance degradation, I would think.

Feel free to offer any other critiques of this code too, btw.

I am curious. Why do you need this Snapshot class? And what does it do exactly?

Intransition · May 5, 2010

Have a look at this class:

I am storing a collection of OfModule instances in a hash (@modules)
indexed by the modules they correspond to. I'd rather just use an
array, since that is the basic intent. But I ran into difficulty when
implementing the #- method, which led me to use the hash instead.

Click to expand...

Is there an efficient way to implement the #- method if @modules were
an array instead of a hash?

Click to expand...

So, am I understanding you correctly? The core problem in on line 77?
You need an efficient way to tell if c has a particular module in it
or not, so you can know whether to fool around with the methods it
advertises?

Why not create a temporary hash before the other.modules.each loop and
use it to tell which modules are present.... something like (assuming
you rewrite @modules as an array):

=A0 known_mods=3D{}
=A0 c.modules.each{|mod| known_mods[mod.base]=3Dmod }
=A0 other.modules.each do |ofmod|
=A0 =A0 =A0if known_mods[ofmod.base]
=A0 =A0 =A0 =A0...
=A0 =A0 =A0end
=A0 end

Your Snapshot#- won't be as efficient as it is now, since you have to
build that index up every time its called, but it should be a fairly
minor performance degradation, I would think.

That was my first alternative idea too. I'm just not sure if it's
worth the efficiency trade-off. I was thinking there might be a way to
do it were the two arrays are sorted by name and then iterate down the
list popping off one or the other and merging base on <=3D>, but I
haven't worked it out yet. Even though there's two sorts involved it
should be just as fast I think.

I am curious. Why do you need this Snapshot class? And what does it do ex=

actly?

Snapshot is used to records a list of all modules/classes and their
methods in the current process at a given moment. Eg.

s1 =3D Snapshot.capture
require 'foo'
s2 =3D Snapshot.capture
d =3D s2 - s1

now d will contain only the class/modules and methods that were
defined by loading 'foo'.

Lemon is unit testing framework that has a strict testcase<->class/
module and unit<->method correspondence. By taking a snapshot of the
system before and after a target library is loaded it can provide test
coverage information.

Intransition · May 5, 2010

BTW, Caleb, I watched your Ocelot talk last night. Pretty neat idea to
use the unit tests for type information. I looked at the code, I was
surprise by how little code it took (but then I did not look at
RedParse). Is it possible to compile anything yet?

Caleb Clausen · May 5, 2010

Why not create a temporary hash before the other.modules.each loop and
use it to tell which modules are present.... something like (assuming
you rewrite @modules as an array):

known_mods={}
c.modules.each{|mod| known_mods[mod.base]=mod }
other.modules.each do |ofmod|
if known_mods[ofmod.base]
...
end
end

Your Snapshot#- won't be as efficient as it is now, since you have to
build that index up every time its called, but it should be a fairly
minor performance degradation, I would think.

Click to expand...

That was my first alternative idea too. I'm just not sure if it's
worth the efficiency trade-off.

From what you say about how this is used, it doesn't sound like this
method is called all that much. Once per file in the lib tested? I'd
say just use a temporary hash and get on with your life until and
unless the performance actually proves to be a problem. Or stick with
your current solution of a permanent hash.

It's worth noting that (if I recall correctly) Array#-, which you are
calling 6 times inside that if statement, creates a temporary hash of
the receiver's contents in order to do its own work efficiently. So,
the cost of your one temporary hash is probably vastly dwarfed by the
cost of the 6 temporary hashes created by stdlib on every loop
iteration.

I expect that a temp hash would perform reasonably even with thousands
of files to scan and/or thousands of classes/modules in ObjectSpace.
So, it should scale to all but the very largest projects, and those
should probably expect a performance hit for their large size.

I was thinking there might be a way to
do it were the two arrays are sorted by name and then iterate down the
list popping off one or the other and merging base on <=>, but I
haven't worked it out yet. Even though there's two sorts involved it
should be just as fast I think.

It's a neat idea.... but sounds a little complex to implement.

Lemon is unit testing framework that has a strict testcase<->class/
module and unit<->method correspondence. By taking a snapshot of the
system before and after a target library is loaded it can provide test
coverage information.

Is this just verifying that there is a test of some sort for every
method? Or actually a deeper level of coverage (line coverage) like
what rcov does?

Intransition · May 5, 2010

Why not create a temporary hash before the other.modules.each loop and
use it to tell which modules are present.... something like (assuming
you rewrite @modules as an array):
=A0 known_mods=3D{}
=A0 c.modules.each{|mod| known_mods[mod.base]=3Dmod }
=A0 other.modules.each do |ofmod|
=A0 =A0 =A0if known_mods[ofmod.base]
=A0 =A0 =A0 =A0...
=A0 =A0 =A0end
=A0 end
Your Snapshot#- won't be as efficient as it is now, since you have to
build that index up every time its called, but it should be a fairly
minor performance degradation, I would think.

Click to expand...

Click to expand...

That was my first alternative idea too. I'm just not sure if it's
worth the efficiency trade-off.

Click to expand...

From what you say about how this is used, it doesn't sound like this
method is called all that much. Once per file in the lib tested? I'd
say just use a temporary hash and get on with your life until and
unless the performance actually proves to be a problem. Or stick with
your current solution of a permanent hash.

It's worth noting that (if I recall correctly) Array#-, which you are
calling 6 times inside that if statement, creates a temporary hash of
the receiver's contents in order to do its own work efficiently. So,
the cost of your one temporary hash is probably vastly dwarfed by the
cost of the 6 temporary hashes created by stdlib on every loop
iteration.

I expect that a temp hash would perform reasonably even with thousands
of files to scan and/or thousands of classes/modules in ObjectSpace.
So, it should scale to all but the very largest projects, and those
should probably expect a performance hit for their large size.

Interesting. I may go ahead do it this way then. Thanks.

It's a neat idea.... but sounds a little complex to implement.

Which is why I haven't yet tried it. I may give it a shot. If it
works, great. If not, I'll go with the above.

Is this just verifying that there is a test of some sort for every
method? Or actually a deeper level of coverage (line coverage) like
what rcov does?

Just verifying --nothing like rcov. It's main purpose is to act as a
guide.

Having a new test library is perhaps overkill for what it does. I have
been thinking about refactoring it into an extension for Test::Unit/
MiniTest. But I do not particularly relish working with either of
those code bases. We'll see.

Object instead of array? Any problems?	13	Jun 7, 2007
ANN: Sequel 3.5.0 Released	0	Oct 1, 2009
ANN: Sequel 2.12.0 Released	0	Apr 3, 2009
ANN: Sequel 3.1.0 Released	0	Jun 4, 2009
SQL Server and .NET Interview questions free download	0	Oct 28, 2006
Download the JAVA , .NET and SQL Server interview with answers	0	Sep 14, 2006
Download the JAVA , .NET and SQL Server interview PDF	0	Sep 17, 2006
comp.lang.vhdl FAQ part 1 of 4: general	0	Jul 8, 2003

Using array instead of hash and still have efficient cross-referencing

Intransition

Caleb Clausen

Intransition

Intransition

Caleb Clausen

Intransition

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads