Performance question

Discussion in 'Ruby' started by Pito Salas, Jul 13, 2009.

  1. Pito Salas

    Pito Salas Guest

    I have a scenario where I will be storing thousands of objects in an
    array. Each one will have, say, two attributes, name and age. Some of
    them may have other attributes in addition.

    Note: I don't need to have any methods, inheritance or any other
    'class-ish' behavior. Just the data.

    Here are three representations. Which one will be faster? Which one will
    be smaller?

    class O
    def initialize (n, a)
    @name = n
    @age = a
    end

    O.new("jack", 11)

    OR

    {:name => "Jack", :age => 11}

    OR

    Struct.new("O", :name, :age)
    O.new("Jack", 11)
    --
    Posted via http://www.ruby-forum.com/.
     
    Pito Salas, Jul 13, 2009
    #1
    1. Advertising

  2. Pito Salas wrote:
    > I have a scenario where I will be storing thousands of objects in an
    > array. Each one will have, say, two attributes, name and age. Some of
    > them may have other attributes in addition.
    >
    > Note: I don't need to have any methods, inheritance or any other
    > 'class-ish' behavior. Just the data.
    >
    > Here are three representations. Which one will be faster? Which one will
    > be smaller?


    Probably the hash representation, but what about an array of

    [ [name, age], ...]

    Can't beat that for fast access, and storing a fixnum (age) in an array
    is particularly space-efficient (only 4 bytes for this field, per
    entry). Another option is the arrayfields gem, if you want array storage
    but keyword interface.

    --
    vjoel : Joel VanderWerf : path berkeley edu : 510 665 3407
     
    Joel VanderWerf, Jul 13, 2009
    #2
    1. Advertising

  3. Pito Salas

    Ben Giddings Guest

    On Monday 13 July 2009 15:15:16 Pito Salas wrote:
    > Here are three representations. Which one will be faster? Which one will
    > be smaller?


    Thousands of records, each with two fields is a very small amount of data.

    With "Name" and "Age" as an example, if you have a name like "Apu
    Nahasapeemapetilon" and set aside 4 bytes for the age, 100,000 records would
    take up roughly: 21+4 * 100,000 => 25 * 100,000 bytes, or 2.5 megabytes.
    There will be some overhead as well, but even if you double that to 5
    megabytes, it's tiny by the standards of modern memory.

    Speed will probably also not be a major issue. All you're doing is looking up
    a value stored in memory.

    Rather than worry too much about whether to use structures, objects or hashes
    for speed, why not choose the data type that makes the rest of the program
    easiest to write and to read. If later you decide it needs to be faster or
    smaller, you can adjust it, but unless you're running on an embedded system,
    it's unlikely you'll have to.

    Ben
     
    Ben Giddings, Jul 13, 2009
    #3
  4. Pito Salas

    Pito Salas Guest

    Ben Giddings wrote:

    > Rather than worry too much about whether to use structures, objects or
    > hashes for speed, why not choose the data type that makes the rest of the
    > program easiest to write and to read. If later you decide it needs to be
    > faster or smaller, you can adjust it, but unless you're running on an
    > embedded system, it's unlikely you'll have to.


    The example I gave was purposely oversimplified to make it easy to
    explain and understand. In reality the records will be far more complex
    and the numbers perhaps in the hundreds of thousands.

    But still I do agree with you. I was just trying to see if one of the
    three choices was clearly brain dead or clearly the best one. Would
    using a hash repeat over and over the text of the keys (I assume 'no')
    or have far slower access? Would using a class that never would have a
    method incur a major performance overhead because accessing each value
    required a method call anyway?

    - Pito





    --
    Posted via http://www.ruby-forum.com/.
     
    Pito Salas, Jul 14, 2009
    #4
  5. Pito Salas

    Roger Pack Guest


    > Here are three representations. Which one will be faster? Which one will
    > be smaller?


    My answer is try them each and find out :)
    I remember open structs using up a lot of memory, for some reason
    though.
    =r
    --
    Posted via http://www.ruby-forum.com/.
     
    Roger Pack, Jul 14, 2009
    #5
  6. Pito Salas

    Ben Giddings Guest

    On Jul 13, 2009, at 21:23, Pito Salas wrote:
    > The example I gave was purposely oversimplified to make it easy to
    > explain and understand. In reality the records will be far more
    > complex
    > and the numbers perhaps in the hundreds of thousands.


    Ok, well if you have an average number of bytes per record, you can
    probably guesstimate how much space each record will type.

    > But still I do agree with you. I was just trying to see if one of the
    > three choices was clearly brain dead or clearly the best one. Would
    > using a hash repeat over and over the text of the keys (I assume 'no')


    When you use a string (or anything else) as a hash key, Ruby
    uses .hash to figure out a hash key for it. The access time for
    people["matz"].age should only be slightly slower than matz.age,
    because people["matz"] has to find the hash value of "matz", look up
    the value in the hash, then look up the data, whereas matz.age only
    has to look up the data.

    I don't know the internals of Ruby's struct vs. class implementations,
    but they should be pretty similar.

    > or have far slower access? Would using a class that never would have a
    > method incur a major performance overhead because accessing each value
    > required a method call anyway?


    I'm not sure quite what you mean -- a class that never would have a
    method? Do you mean a class that has no associated instance methods,
    other than the attribute accessors? If you mean the attribute
    accessor methods, it does add a tiny bit of overhead, but I'm pretty
    sure most of that is implemented in C. Whether the classes have
    additional methods associated (but never called) should not slow them
    down or add additional space per instance. If you never call the
    methods they're just additional memory used once in the object
    definition. In memory, each unique instance of the class should just
    have the data associated with each instance of a class.

    If you happen to tack an extra method onto one particular instance of
    a class, you'll end up with unique method data for that one instance.

    In other words, somewhere in memory you should have something like:

    bob = Employee.new("bob smith", 42)
    ang = Employee.new("angela carter", 35)
    jj = Employee.new("jason james", 73)
    def jj.retired?
    true
    end

    Employees:
    initialize: (name, age); @name = var, @age = age
    to_s: "Name: #{name}, Age: #{age}"
    name: name
    name=: name = var
    age: age
    age=: age = var
    ...

    Data:
    {employee, bob smith, 42}
    {employee, angela carter, 35}
    {employee, jason james, 73, retired?: true}
    ...

    (I'm not completely sure how Ruby does the internals, esp instance
    methods on objects, but it should be something like that).

    Overall, all three implementations you talked about should be
    relatively similar in speed and relatively similar in size. I think
    it's easiest just to throw some sample data at them if you're
    interested in how they perform. You can even use ObjectSpace to get
    an idea of the count of objects in memory at various points in your
    program.

    Ben
     
    Ben Giddings, Jul 14, 2009
    #6
  7. On Monday 13 July 2009 08:23:26 pm Pito Salas wrote:
    > Would
    > using a hash repeat over and over the text of the keys (I assume 'no')


    Definitely no, because it looks like you're using symbols, not strings. In 1.8,
    at least, it should be roughly as efficient as using integers.
     
    David Masover, Jul 14, 2009
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Don Beal
    Replies:
    13
    Views:
    843
    Richard Grimes [MVP]
    Sep 29, 2003
  2. jm
    Replies:
    1
    Views:
    512
    alien2_51
    Dec 12, 2003
  3. Cris Rock

    Performance related Question.....

    Cris Rock, Feb 12, 2004, in forum: ASP .Net
    Replies:
    1
    Views:
    313
    Stefano Mostarda
    Feb 12, 2004
  4. cjl
    Replies:
    3
    Views:
    993
    John Nagle
    May 21, 2007
  5. Software Engineer
    Replies:
    0
    Views:
    335
    Software Engineer
    Jun 10, 2011
Loading...

Share This Page