[SUMMARY] Records and Arrays (#170)

Discussion in 'Ruby' started by Matthew Moss, Jul 24, 2008.

  1. Matthew Moss

    Matthew Moss Guest

    It is worthwhile to note that, for most real situations, this week's
    quiz problem isn't necessary. From a human perspective, we naturally
    organize things in aggregate, as structured records. Whether it be
    object-oriented data, rows in a SQL table, XML records or other,
    that's how we naturally organize things. Except for a few specialized
    conditions, collections of records are sufficient for our
    organizational needs as either programmers or users.

    What are those specialized conditions? I can think of a couple situations:

    1. *Performance.* When dealing with older hardware, or perhaps
    specialized hardware (e.g. vector units on some processors or GPUs),
    it can be quite beneficial to "de-organize" data. Tight, little loops
    can then quickly process individual bits of data, and not be required
    to either process nor hold the whole data record.

    2. *Interface.* If the data must be provided to another library or
    component that expects parallel bits of data (e.g. a graphing
    component as suggested by the quiz), then obviously we need to provide
    the data as requested.

    In the quiz, I asked that first for a data moving solution. This is
    often appropriate for performance situations, since an adapter does
    not provide the compact data that performance may require. (For
    example, some vector units may have limited memory, so it's important
    to extract only the data needed for vector processing.)

    The extra-credit adapter (wrapper) solution is more appropriate for
    interface situations. Here, especially with Ruby, we can easily wrap a
    new interface around the old, and make one set of data work in
    numerous situations.

    Let's start by looking at a solution that moves data into a new
    "record of arrays" structure, submitted by _Andrea Fazzi_.

    def aor_to_roa(arr)
    hash = { }
    arr.each do |record|
    record.marshal_dump.each_key { |field|
    (hash[field] ||= []) << record.send(field)
    }
    end
    OpenStruct.new(hash)
    end

    This function accepts arrays of `OpenStruct` instances, as specified
    by the quiz. The outer loop enumerates over each record. For each
    record, `marshal_dump` is called: this is an instance method of
    `OpenStruct` and generates a hash consisting of field names associated
    to the field values.

    For each key in that hash, Andrea first ensures that the new hash is
    either already initialized, or gets initialized to an empty array. The
    old record is sent the field name as a message (which returns the
    field value), which is appended to the new hash's array for that
    field. Finally, a new `OpenStruct` instance is returned, which takes
    the new hash as its content.

    (On a side note... Technically, for vector units and similar
    situations, you'd need to move the *actual* data into a memory block,
    rather than mere copying of object references as done above... but the
    code is sufficient for displaying the basic idea.)

    I'm going to pass on talking about the opposite function: that is,
    moving data from a "record of arrays" to an "array of records." Given
    that we usually store data as an "array of records," this is less
    important since we can refer to the original data. Still, take a look
    at Andrea's `roa_to_aor` method, which maintains a similar approach
    and simplicity.

    I'm also not going to talk about pushing updates back to the original
    data for this case. Generally, this isn't desireable for the typical
    cases (i.e. performance) that move/duplicate data. Either the data
    isn't being modified, or if being modified, it gets pushed out the end
    of the pipe and *intentionally* lost. (Had I considered this when
    writing the quiz, I would have skipped my request for the `roa_to_aor`
    method.)

    Situations that focus on interface, however, will prefer to work with
    the original data. In fact, it's important that the adapter allow the
    original data to be modified to the extent it was originally
    modifiable: read-only data should not suddenly become read-write.
    (Another thing I should have put into the original quiz description...
    sorry.)

    First, I want to look at the adapter from _James Gray_, which is quite
    simple and satifies the quiz description as was written. (I've left
    out the method `__enum__`, since it only existed to support
    `roa_to_aor`.)

    class MapToEnum
    instance_methods.each { |meth|
    undef_method(meth) unless meth =~ /\A__/
    }

    def initialize(enum)
    @enum = enum
    end

    def method_missing(meth, *args, &block)
    @enum.map { |o| o.send(meth, *args, &block) }
    end
    end

    The first line undefines almost all the instance methods of this class
    (excepting those that start with two underscores), and does so
    immediately. To see what this leaves behind, try the following in irb:

    > class WhatIsLeft
    > instance_methods.each { |m| undef_method(m) unless m =~ /\A__/ }
    > instance_methods
    > end


    => ["__id__", "__send__"]

    Basically, the intent here is that `MapToEnum` do very little on its
    own: most work will be passed onto the data, which we'll see in a
    moment. Handly little thing... at least when using the class, but
    better to comment it out while debugging it.

    The `MapToEnum` initializer simply remembers the array passed in,
    which is then used in `method_missing`. Here, any methods called
    except those above are sent to each of the objects in `@enum`. Using
    `@enum.map` means that arrays will be returned. And because all
    instance methods were cleared out first, you can do some other things
    besides getting the field names.

    # Assuming people is an array of Person structs...

    > peeps = MapToEnum.new(people)
    > peeps.name

    => ["James", "Dana", "Matthew"]

    > peeps.name[2]

    => "Matthew"

    > peeps.class

    => [Person, Person, Person]

    What doesn't work with James' solution is modifying the data.

    > peeps.name[2] = "Bob"

    => "Bob"

    > peeps.name

    => ["James", "Dana", "Matthew"]

    > people.map { |x| x.name }

    => ["James", "Dana", "Matthew"]

    This is because the attempt to modify `peeps.name[2]` actually changes
    the reference contained in the third slot of the array generated by
    `peeps.name`. To change the source data, `peeps.name[2]` would need to
    represent the `name` field of the original record.

    Let's look at one more solution, this one from _Frederick Cheung_.

    class ArrayOfRecordsWrapper
    def initialize(array)
    @array = array
    end

    def method_missing(name, *args)
    FieldProxy.new(@array, name)
    end

    class FieldProxy
    instance_methods.each { |m| undef_method m unless m =~ /(^__)/ }

    def initialize array, name
    @array, @name = array, name
    end

    def [](index)
    @array[index].send @name
    end

    def []=(index,value)
    @array[index].send(@name.to_s + '=', value)
    end

    def to_a
    @field_array = @array.collect {|a| a.send @name}
    end

    def method_missing(*args, &block)
    to_a.send *args, &block
    end
    end
    end

    Frederick's solution is, in many ways, similar to James' solution,
    though Frederick adds in a proxy object to manage named fields. In
    fact, if you remove the square bracket methods `[]` and `[]=` in the
    proxy, simplify and refactor, you'd essentially have James' solution.

    Yet, Frederick's solution _does_ work for manipulating the data.

    > people[2].name

    => "Matthew"

    > peeps = ArrayOfRecordsWrapper.new(people)


    > peeps.name

    => ["James", "Dana", "Matthew"]

    > peeps.name[2] = "Bob"

    => "Bob"

    > peeps.name

    => ["James", "Dana", "Bob"]

    > people[2].name

    => "Bob"

    This works specifically because Frederick provided the square bracket
    methods, which manipulate the original data directly, rather than
    through the array generated from `to_a` and `method_missing`. If we
    attempt to change the original data through those (rather than the
    proxy supplied square bracket methods), it won't work.

    > peeps.name

    => ["James", "Dana", "Bob"]

    > peeps.name.to_a[2] = "Matthew"

    => "Matthew"

    > peeps.name

    => ["James", "Dana", "Bob"] # not changed!

    It might be worthwhile to hide `to_a`, though I expect that's not the
    only way to circumvent the goal of modifying the original data. (And
    with a dynamic language such as Ruby, I'm pretty sure of it.) If we
    set a goal of defining and supporting common usage, then Frederick's
    solution is a pretty satisfactory one.



    --
    Matthew Moss <>
     
    Matthew Moss, Jul 24, 2008
    #1
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Drew Volpe
    Replies:
    0
    Views:
    359
    Drew Volpe
    Nov 25, 2003
  2. Luke Airig
    Replies:
    0
    Views:
    795
    Luke Airig
    Dec 31, 2003
  3. Philipp
    Replies:
    21
    Views:
    1,130
    Philipp
    Jan 20, 2009
  4. Matthew Moss

    [QUIZ] Records and Arrays (#170)

    Matthew Moss, Jul 18, 2008, in forum: Ruby
    Replies:
    12
    Views:
    350
    SHINDO Motoaki
    Jul 22, 2008
  5. Junkone
    Replies:
    0
    Views:
    314
    Junkone
    May 25, 2011
Loading...

Share This Page