Using array instead of hash and still have efficient cross-referencing

Discussion in 'Ruby' started by Intransition, May 5, 2010.

  1. Intransition

    Intransition Guest

    Have a look at this class:

    http://github.com/proutils/lemon/blob/master/lib/lemon/snapshot.rb

    I am storing a collection of OfModule instances in a hash (@modules)
    indexed by the modules they correspond to. I'd rather just use an
    array, since that is the basic intent. But I ran into difficulty when
    implementing the #- method, which led me to use the hash instead.

    Is there an efficient way to implement the #- method if @modules were
    an array instead of a hash?

    Feel free to offer any other critiques of this code too, btw.

    trans.
    Intransition, May 5, 2010
    #1
    1. Advertising

  2. On 5/5/10, Intransition <> wrote:
    > Have a look at this class:
    >
    > http://github.com/proutils/lemon/blob/master/lib/lemon/snapshot.rb
    >
    > I am storing a collection of OfModule instances in a hash (@modules)
    > indexed by the modules they correspond to. I'd rather just use an
    > array, since that is the basic intent. But I ran into difficulty when
    > implementing the #- method, which led me to use the hash instead.
    >
    > Is there an efficient way to implement the #- method if @modules were
    > an array instead of a hash?


    So, am I understanding you correctly? The core problem in on line 77?
    You need an efficient way to tell if c has a particular module in it
    or not, so you can know whether to fool around with the methods it
    advertises?

    Why not create a temporary hash before the other.modules.each loop and
    use it to tell which modules are present.... something like (assuming
    you rewrite @modules as an array):

    known_mods={}
    c.modules.each{|mod| known_mods[mod.base]=mod }
    other.modules.each do |ofmod|
    if known_mods[ofmod.base]
    ...
    end
    end

    Your Snapshot#- won't be as efficient as it is now, since you have to
    build that index up every time its called, but it should be a fairly
    minor performance degradation, I would think.

    > Feel free to offer any other critiques of this code too, btw.


    I am curious. Why do you need this Snapshot class? And what does it do exactly?
    Caleb Clausen, May 5, 2010
    #2
    1. Advertising

  3. Intransition

    Intransition Guest

    On May 5, 12:58=A0pm, Caleb Clausen <> wrote:
    > On 5/5/10, Intransition <> wrote:
    >
    > > Have a look at this class:

    >
    > > =A0http://github.com/proutils/lemon/blob/master/lib/lemon/snapshot.rb

    >
    > > I am storing a collection of OfModule instances in a hash (@modules)
    > > indexed by the modules they correspond to. I'd rather just use an
    > > array, since that is the basic intent. But I ran into difficulty when
    > > implementing the #- method, which led me to use the hash instead.

    >
    > > Is there an efficient way to implement the #- method if @modules were
    > > an array instead of a hash?

    >
    > So, am I understanding you correctly? The core problem in on line 77?
    > You need an efficient way to tell if c has a particular module in it
    > or not, so you can know whether to fool around with the methods it
    > advertises?
    >
    > Why not create a temporary hash before the other.modules.each loop and
    > use it to tell which modules are present.... something like (assuming
    > you rewrite @modules as an array):
    >
    > =A0 known_mods=3D{}
    > =A0 c.modules.each{|mod| known_mods[mod.base]=3Dmod }
    > =A0 other.modules.each do |ofmod|
    > =A0 =A0 =A0if known_mods[ofmod.base]
    > =A0 =A0 =A0 =A0...
    > =A0 =A0 =A0end
    > =A0 end
    >
    > Your Snapshot#- won't be as efficient as it is now, since you have to
    > build that index up every time its called, but it should be a fairly
    > minor performance degradation, I would think.


    That was my first alternative idea too. I'm just not sure if it's
    worth the efficiency trade-off. I was thinking there might be a way to
    do it were the two arrays are sorted by name and then iterate down the
    list popping off one or the other and merging base on <=3D>, but I
    haven't worked it out yet. Even though there's two sorts involved it
    should be just as fast I think.

    > > Feel free to offer any other critiques of this code too, btw.

    >
    > I am curious. Why do you need this Snapshot class? And what does it do ex=

    actly?

    Snapshot is used to records a list of all modules/classes and their
    methods in the current process at a given moment. Eg.

    s1 =3D Snapshot.capture
    require 'foo'
    s2 =3D Snapshot.capture
    d =3D s2 - s1

    now d will contain only the class/modules and methods that were
    defined by loading 'foo'.

    Lemon is unit testing framework that has a strict testcase<->class/
    module and unit<->method correspondence. By taking a snapshot of the
    system before and after a target library is loaded it can provide test
    coverage information.
    Intransition, May 5, 2010
    #3
  4. Intransition

    Intransition Guest

    BTW, Caleb, I watched your Ocelot talk last night. Pretty neat idea to
    use the unit tests for type information. I looked at the code, I was
    surprise by how little code it took (but then I did not look at
    RedParse). Is it possible to compile anything yet?
    Intransition, May 5, 2010
    #4
  5. On 5/5/10, Intransition <> wrote:
    > On May 5, 12:58 pm, Caleb Clausen <> wrote:
    >> Why not create a temporary hash before the other.modules.each loop and
    >> use it to tell which modules are present.... something like (assuming
    >> you rewrite @modules as an array):
    >>
    >> known_mods={}
    >> c.modules.each{|mod| known_mods[mod.base]=mod }
    >> other.modules.each do |ofmod|
    >> if known_mods[ofmod.base]
    >> ...
    >> end
    >> end
    >>
    >> Your Snapshot#- won't be as efficient as it is now, since you have to
    >> build that index up every time its called, but it should be a fairly
    >> minor performance degradation, I would think.

    >
    > That was my first alternative idea too. I'm just not sure if it's
    > worth the efficiency trade-off.


    From what you say about how this is used, it doesn't sound like this
    method is called all that much. Once per file in the lib tested? I'd
    say just use a temporary hash and get on with your life until and
    unless the performance actually proves to be a problem. Or stick with
    your current solution of a permanent hash.

    It's worth noting that (if I recall correctly) Array#-, which you are
    calling 6 times inside that if statement, creates a temporary hash of
    the receiver's contents in order to do its own work efficiently. So,
    the cost of your one temporary hash is probably vastly dwarfed by the
    cost of the 6 temporary hashes created by stdlib on every loop
    iteration.

    I expect that a temp hash would perform reasonably even with thousands
    of files to scan and/or thousands of classes/modules in ObjectSpace.
    So, it should scale to all but the very largest projects, and those
    should probably expect a performance hit for their large size.

    > I was thinking there might be a way to
    > do it were the two arrays are sorted by name and then iterate down the
    > list popping off one or the other and merging base on <=>, but I
    > haven't worked it out yet. Even though there's two sorts involved it
    > should be just as fast I think.


    It's a neat idea.... but sounds a little complex to implement.

    > Lemon is unit testing framework that has a strict testcase<->class/
    > module and unit<->method correspondence. By taking a snapshot of the
    > system before and after a target library is loaded it can provide test
    > coverage information.


    Is this just verifying that there is a test of some sort for every
    method? Or actually a deeper level of coverage (line coverage) like
    what rcov does?
    Caleb Clausen, May 5, 2010
    #5
  6. Intransition

    Intransition Guest

    On May 5, 2:40=A0pm, Caleb Clausen <> wrote:
    > On 5/5/10, Intransition <> wrote:
    >
    >
    >
    >
    >
    > > On May 5, 12:58 pm, Caleb Clausen <> wrote:
    > >> Why not create a temporary hash before the other.modules.each loop and
    > >> use it to tell which modules are present.... something like (assuming
    > >> you rewrite @modules as an array):

    >
    > >> =A0 known_mods=3D{}
    > >> =A0 c.modules.each{|mod| known_mods[mod.base]=3Dmod }
    > >> =A0 other.modules.each do |ofmod|
    > >> =A0 =A0 =A0if known_mods[ofmod.base]
    > >> =A0 =A0 =A0 =A0...
    > >> =A0 =A0 =A0end
    > >> =A0 end

    >
    > >> Your Snapshot#- won't be as efficient as it is now, since you have to
    > >> build that index up every time its called, but it should be a fairly
    > >> minor performance degradation, I would think.

    >
    > > That was my first alternative idea too. I'm just not sure if it's
    > > worth the efficiency trade-off.

    >
    > From what you say about how this is used, it doesn't sound like this
    > method is called all that much. Once per file in the lib tested? I'd
    > say just use a temporary hash and get on with your life until and
    > unless the performance actually proves to be a problem. Or stick with
    > your current solution of a permanent hash.
    >
    > It's worth noting that (if I recall correctly) Array#-, which you are
    > calling 6 times inside that if statement, creates a temporary hash of
    > the receiver's contents in order to do its own work efficiently. So,
    > the cost of your one temporary hash is probably vastly dwarfed by the
    > cost of the 6 temporary hashes created by stdlib on every loop
    > iteration.
    >
    > I expect that a temp hash would perform reasonably even with thousands
    > of files to scan and/or thousands of classes/modules in ObjectSpace.
    > So, it should scale to all but the very largest projects, and those
    > should probably expect a performance hit for their large size.


    Interesting. I may go ahead do it this way then. Thanks.

    > > I was thinking there might be a way to
    > > do it were the two arrays are sorted by name and then iterate down the
    > > list popping off one or the other and merging base on <=3D>, but I
    > > haven't worked it out yet. Even though there's two sorts involved it
    > > should be just as fast I think.

    >
    > It's a neat idea.... but sounds a little complex to implement.


    Which is why I haven't yet tried it. I may give it a shot. If it
    works, great. If not, I'll go with the above.

    > > Lemon is unit testing framework that has a strict testcase<->class/
    > > module and unit<->method correspondence. By taking a snapshot of the
    > > system before and after a target library is loaded it can provide test
    > > coverage information.

    >
    > Is this just verifying that there is a test of some sort for every
    > method? Or actually a deeper level of coverage (line coverage) like
    > what rcov does?


    Just verifying --nothing like rcov. It's main purpose is to act as a
    guide.

    Having a new test library is perhaps overkill for what it does. I have
    been thinking about refactoring it into an extension for Test::Unit/
    MiniTest. But I do not particularly relish working with either of
    those code bases. We'll see.
    Intransition, May 5, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. rp
    Replies:
    1
    Views:
    512
    red floyd
    Nov 10, 2011
  2. Anthony Martinez
    Replies:
    4
    Views:
    268
    Robert Klemme
    Jun 11, 2007
  3. Michal Suchanek
    Replies:
    6
    Views:
    226
    Nobuyoshi Nakada
    Jun 13, 2007
  4. Srijayanth Sridhar
    Replies:
    19
    Views:
    608
    David A. Black
    Jul 2, 2008
  5. * Tong *

    What's more efficient, hash or array

    * Tong *, Dec 11, 2003, in forum: Perl Misc
    Replies:
    5
    Views:
    148
    J├╝rgen Exner
    Dec 11, 2003
Loading...

Share This Page