Struct is slow

W

Wayne Magor

I have a script in which I was using a 2-element array where a struct
would be used in another language, so I decided to give the Ruby class
Struct a try.

My script went from taking 2 seconds to taking 27 seconds from this
simple change! The use of this 2-element array is only one small part
of the script so it was very surprising to me that it could even cause
the script to take twice as long no matter how inefficient it might be.
To take more than 10 times longer for the entire script to execute was a
shocker.

I will probably shy away from the use of the Struct class after this
brief experience and use arrays with constants for the indexes. I'm
wondering if this is an erroneous conclusion and perhaps I am missing
something.

What are the advantages/disadvantages of using the Struct class in Ruby?
 
R

Robert Dober

I have a script in which I was using a 2-element array where a struct
would be used in another language, so I decided to give the Ruby class
Struct a try.

My script went from taking 2 seconds to taking 27 seconds from this
simple change! The use of this 2-element array is only one small part
of the script so it was very surprising to me that it could even cause
the script to take twice as long no matter how inefficient it might be.
To take more than 10 times longer for the entire script to execute was a
shocker.
Hmm some code to look at would be nice, this seems difficult to believe indeed.

Robert
 
A

Alex Fenton

Wayne said:
I have a script in which I was using a 2-element array where a struct
would be used in another language, so I decided to give the Ruby class
Struct a try.

My script went from taking 2 seconds to taking 27 seconds from this
simple change!

This doesn't sound likely, even if your script did nothing else but use
the Struct class. I'd look elsewhere for the source of slowness (try
running your script with -rprofile).

A quick benchmark suggests that Struct is somewhat slower to instantiate
(+100%), marginally slower to set values in (+33%), and the same speed
to fetch values from as an Array:

BENCHMARK:

require 'benchmark'

GC.disable
Foo = Struct.new:)foo, :bar)

repetitions = 500_000

puts 'struct-init', Benchmark::measure {
repetitions.times { f = Foo.new('x', 666) }
}

puts 'array-init', Benchmark::measure {
repetitions.times { f = ['x', 666] }
}

puts 'struct-get', Benchmark::measure {
f = Foo.new('x', 666)
repetitions.times { g = f.foo }
}

puts 'array-get', Benchmark::measure {
f = ['x', 666]
repetitions.times { g = f[0] }
}

puts 'struct-set', Benchmark::measure {
f = Foo.new('x', 666)
repetitions.times { f.foo = 'y' }
}

puts 'array-set', Benchmark::measure {
f = ['x', 666]
repetitions.times { f[0] = 'y' }
}


__END__

RESULTS (ruby 1.8.4 (2005-12-24) [i386-mswin32])

struct-init
1.359000 0.031000 1.390000 ( 1.390000)
array-init
0.563000 0.016000 0.579000 ( 0.579000)
struct-get
0.281000 0.000000 0.281000 ( 0.281000)
array-get
0.297000 0.000000 0.297000 ( 0.297000)
struct-set
0.422000 0.000000 0.422000 ( 0.422000)
array-set
0.281000 0.000000 0.281000 ( 0.281000)


alex
 
W

Wayne Magor

I understand your skepticism, it seems impossible, and I totally agree
it DOES seem impossible, yet that is what I am seeing.

I thought maybe I had done something wrong, so I recreated my experiment
(I had already deleted the file). I was running it from Eclipse, so I
opened a DOS window and ran it directly and got the same result.

The script is too large to post, but what it does is read C header files
and generates an array of these "structs" which have two strings: the
type and the name of a typedef. It then goes through the list making
them to be somewhat "primary" types for another script to process. Here
are some examples of the output:

float UBT_Feet_Type
uint_8 UBT_Turn_Directions_Type
struct NCBYTERD_PositionConvRecType
uint_32 NCTPLYGN_PolygonIterationsType
struct[4] NCTPLYGN_QuadralateralPolygonVerticesType
struct NCTPLYGN_QuadralateralClassType
struct[8] NCTPLYGN_QuadralateralPolygonSetType
uint_8 NCPRMYTP_DataStatusType
double NCPRMYTP_DegreesType
enum NCPRMYTP_PeriodsTypeEnum

One thing I just noticed is that the script with the Struct class
doesn't work correctly. It doesn't make the types "primary" and it
doesn't generate as many lines. Obviously, there's something I don't
understand about how to use Struct.

Here is the difference between the 2 scripts. The array

FILE COMPARISON
Produced: 10/16/2007 1:27:12 PM

Mode: Just Differences

Left file: C:\workspace\CreateSfct\h_file_types_diff.txt Right file:
C:\workspace\CreateSfct\h_file_types (before struct mod).rb
L19 $typedef_rec = Struct.new("Typedef", :typedef_type,
:typedef_name)
R19 TYPEDEF_TYPE = 0
TYPEDEF_NAME = 1
------------------------------------------------------------------------
------------------------------------------------------------------------
L74 typedef_type = type_arr[j][:typedef_type]
typedef_name = type_arr[j][:typedef_name]
R75 typedef_type = type_arr[j][TYPEDEF_TYPE]
typedef_name = type_arr[j][TYPEDEF_NAME]
------------------------------------------------------------------------
------------------------------------------------------------------------
L88 type_arr[j][:typedef_type].sub!(/\w+/, primary_t)
R89 type_arr[j][TYPEDEF_TYPE].sub!(/\w+/, primary_t)
------------------------------------------------------------------------
------------------------------------------------------------------------
L91 if type_arr[j][:typedef_type] =~ /\[([A-Z]\w*)\]/ then
type_arr[j][:typedef_type].sub!($1,define_lookup($1,define_hash))
R92 if type_arr[j][TYPEDEF_TYPE] =~ /\[([A-Z]\w*)\]/ then
type_arr[j][TYPEDEF_TYPE].sub!($1,define_lookup($1,define_hash))
 
M

MenTaLguY

If type_arr[j] is a struct, then try type_arr[j].typedef_type
rather than type_arr[j][:typedef_type], and see if that makes
a difference.

-mental
 
W

Wayne Magor

Mental said:
If type_arr[j] is a struct, then try type_arr[j].typedef_type
rather than type_arr[j][:typedef_type], and see if that makes
a difference.

Well, I tried it and, as expected, the syntax change made no difference.
The one with Struct's still takes 27 seconds and doesn't work correctly.
It's not clear to me why it should be different than just using a
2-element array (shrug).
 
R

Robert Klemme

Mental said:
If type_arr[j] is a struct, then try type_arr[j].typedef_type
rather than type_arr[j][:typedef_type], and see if that makes
a difference.

Well, I tried it and, as expected, the syntax change made no difference.
The one with Struct's still takes 27 seconds and doesn't work correctly.
It's not clear to me why it should be different than just using a
2-element array (shrug).

Did you actually also use the same block call semantics?

L546 type_arr.each do |x|
print(x[:typedef_type], "\t\t", x[:typedef_name], "\n")
R547 type_arr.each do |typedef_type, typedef_name|
print(typedef_type, "\t\t", typedef_name, "\n")

Make the latter

R547 type_arr.each do |x|
print(x.typedef_type, "\t\t", x.typedef_name, "\n")

Where x is your struct type.

robert
 
R

Robert Klemme

Wayne said:
I have a script in which I was using a 2-element array where a struct
would be used in another language, so I decided to give the Ruby class
Struct a try.

My script went from taking 2 seconds to taking 27 seconds from this
simple change!

This doesn't sound likely, even if your script did nothing else but use
the Struct class. I'd look elsewhere for the source of slowness (try
running your script with -rprofile).

A quick benchmark suggests that Struct is somewhat slower to instantiate
(+100%), marginally slower to set values in (+33%), and the same speed
to fetch values from as an Array:

BENCHMARK:

require 'benchmark'

GC.disable
Foo = Struct.new:)foo, :bar)

repetitions = 500_000

puts 'struct-init', Benchmark::measure {
repetitions.times { f = Foo.new('x', 666) }
}

puts 'array-init', Benchmark::measure {
repetitions.times { f = ['x', 666] }
}

puts 'struct-get', Benchmark::measure {
f = Foo.new('x', 666)
repetitions.times { g = f.foo }
}

puts 'array-get', Benchmark::measure {
f = ['x', 666]
repetitions.times { g = f[0] }
}

puts 'struct-set', Benchmark::measure {
f = Foo.new('x', 666)
repetitions.times { f.foo = 'y' }
}

puts 'array-set', Benchmark::measure {
f = ['x', 666]
repetitions.times { f[0] = 'y' }
}


__END__

RESULTS (ruby 1.8.4 (2005-12-24) [i386-mswin32])

struct-init
1.359000 0.031000 1.390000 ( 1.390000)
array-init
0.563000 0.016000 0.579000 ( 0.579000)
struct-get
0.281000 0.000000 0.281000 ( 0.281000)
array-get
0.297000 0.000000 0.297000 ( 0.297000)
struct-set
0.422000 0.000000 0.422000 ( 0.422000)
array-set
0.281000 0.000000 0.281000 ( 0.281000)

I'm not sure whether you are familiar with Benchmark#bmbm which does a
rehearsal - personally I rather not switch off GC since in realistic
situations GC time belongs into the mix. But results are rather similar:

Robert@Babelfish2 /cygdrive/c/TEMP
$ ruby array-struct.rb
Rehearsal --------------------------------------------------
struct init 3.812000 0.000000 3.812000 ( 3.923000)
array init 1.672000 0.000000 1.672000 ( 1.709000)
struct get 0.437000 0.000000 0.437000 ( 0.440000)
array get 0.485000 0.000000 0.485000 ( 0.485000)
struct set 0.718000 0.000000 0.718000 ( 0.716000)
array set 0.500000 0.000000 0.500000 ( 0.510000)
----------------------------------------- total: 7.624000sec

user system total real
struct init 3.891000 0.000000 3.891000 ( 3.984000)
array init 1.640000 0.000000 1.640000 ( 1.690000)
struct get 0.438000 0.000000 0.438000 ( 0.450000)
array get 0.469000 0.000000 0.469000 ( 0.469000)
struct set 0.703000 0.000000 0.703000 ( 0.715000)
array set 0.484000 0.000000 0.484000 ( 0.504000)

Robert@Babelfish2 /cygdrive/c/TEMP
$ cat array-struct.rb
require 'benchmark'

REP = 500_000

Foo = Struct.new :foo, :bar

data = "foo"
c1 = Foo.new data, data
c2 = [data, data]

Benchmark.bmbm 15 do |x|
x.report 'struct init' do
REP.times { Foo.new data, data }
end

x.report 'array init' do
REP.times { [data, data] }
end

x.report 'struct get' do
REP.times { c1.bar }
end

x.report 'array get' do
REP.times { c2[1] }
end

x.report 'struct set' do
REP.times { c1.bar = data }
end

x.report 'array set' do
REP.times { c2[1] = data }
end
end

Kind regards

robert
 
S

Sylvain Joyeux

I'm not sure whether you are familiar with Benchmark#bmbm which does a
rehearsal - personally I rather not switch off GC since in realistic
situations GC time belongs into the mix. But results are rather
similar:
Sure. GC does belong to the mix. Now, if GC is enabled you are not able to
compare anything. Let's assume that GC runs during 'array init', you will
say 'hey, struct init is faster'. Now, if GC runs during 'struct init' the
result may change ...

Keeping GC is meaningful when benchmarking a whole application. In
microbenchmarks like these, it is simply noise.
 
J

Joel VanderWerf

Sylvain said:
Sure. GC does belong to the mix. Now, if GC is enabled you are not able to
compare anything. Let's assume that GC runs during 'array init', you will
say 'hey, struct init is faster'. Now, if GC runs during 'struct init' the
result may change ...

Keeping GC is meaningful when benchmarking a whole application. In
microbenchmarks like these, it is simply noise.

For fairness, you could do it this way (assuming REP is large enough):

...
Benchmark.bmbm 15 do |x|
GC.start
x.report 'struct init' do
REP.times { Foo.new data, data }
GC.start
end

GC.start
x.report 'array init' do
REP.times { [data, data] }
GC.start
end
...
 
R

Robert Klemme

But you have a *lot* invocations and it's highly unlikely that GC runs
during every init. On the other hand, if you allocate a lot of memory
from OS that may slow things down. And this will happen without GC.

Also, IMHO cost of memory GC overhead is also part of the runtime
performance. If you compare two approaches and one allocates a lot
more memory than the other, then GC times belong in the measurement
because in a real application that approach will also lead to
increased GC overhead.
Keeping GC is meaningful when benchmarking a whole application. In
microbenchmarks like these, it is simply noise.

For fairness, you could do it this way (assuming REP is large enough):

...
Benchmark.bmbm 15 do |x|
GC.start
x.report 'struct init' do
REP.times { Foo.new data, data }
GC.start
end

GC.start
x.report 'array init' do
REP.times { [data, data] }
GC.start
end
...

GC.start is not guaranteed to actually run GC AFAIK. I'd probably rather do

Benchmark.bmbm do
GC.stop
# tests
GC.start
end

There is a chance that GC.start will clean up all the memory allocated
during the first run.

Kind regards

robert
 
R

Rick DeNatale

But you have a *lot* invocations and it's highly unlikely that GC runs
during every init. On the other hand, if you allocate a lot of memory
from OS that may slow things down. And this will happen without GC.

Also, IMHO cost of memory GC overhead is also part of the runtime
performance. If you compare two approaches and one allocates a lot
more memory than the other, then GC times belong in the measurement
because in a real application that approach will also lead to
increased GC overhead.

Spot on, IMHO
 
W

Wayne Magor

Robert said:
Make the latter

R547 type_arr.each do |x|
print(x.typedef_type, "\t\t", x.typedef_name, "\n")

Where x is your struct type.

Yes, that's exactly what I did, which was no different. You wouldn't
expect it to be different, would you? My understanding is that the two
are exactly the same, just different syntax, right?

Anyway, the clue to the problem is that it doesn't work the same way.
I'm on a trip in another state right now and can't work on it, but in a
few weeks if I have time I'll run it with a debugger and maybe I'll
understand why using Struct would be different from using a 2-element
array.

The difference file shows exactly what I did. It's correct Ruby, isn't
it? If there's something I did wrong, I don't see it.
 
W

Wayne Magor

Just wanted to follow-up so people don't think there might be problems
with the Struct class. It isn't actually slow at all. The problem was
just that an inner loop was not converted from an array and so didn't
work and used a lot of CPU resources. Once fixed, I can't tell any
difference between the two versions.

It was this:

type_arr.each do |td_type, td_name|
if t == td_name then
primary_t = td_type
break
end
end

needed to be converted to this:

type_arr.each do |x|
if t == x.typedef_name then
primary_t = x.typedef_type
break
end
end

I used a different naming scheme in that loop (td_name instead of
typedef_name) which caused me not to find it when doing a search for all
places that needed to be converted to a struct.

This is one example where strong typing could have made a difference.
Had the array been declared as an array of arrays or array of structs,
perhaps another language could have detected the error. I'm not
advocating for strongly typed languages, I'm just saying that some kinds
of errors could be harder to find in Ruby. What do you folks think?
 
R

Robert Dober

Just wanted to follow-up so people don't think there might be problems
with the Struct class. It isn't actually slow at all.
Actually I did not think so ;) but others might, kudos for telling us!!
Cheers
Robert
 
R

Rick DeNatale

Just wanted to follow-up so people don't think there might be problems
with the Struct class. It isn't actually slow at all. The problem was
just that an inner loop was not converted from an array and so didn't
work and used a lot of CPU resources. Once fixed, I can't tell any
difference between the two versions.

It was this:

type_arr.each do |td_type, td_name|
if t == td_name then
primary_t = td_type
break
end
end

needed to be converted to this:

type_arr.each do |x|
if t == x.typedef_name then
primary_t = x.typedef_type
break
end
end

I used a different naming scheme in that loop (td_name instead of
typedef_name) which caused me not to find it when doing a search for all
places that needed to be converted to a struct.

This is one example where strong typing could have made a difference.
Had the array been declared as an array of arrays or array of structs,
perhaps another language could have detected the error. I'm not
advocating for strongly typed languages, I'm just saying that some kinds
of errors could be harder to find in Ruby. What do you folks think?

What would have made a bigger difference would be to develop that code
in the contest of specifications written as tests.

This would allow you to first concentrate on making the code do the
right thing before trying to optimize it.

I usually find that it really doesn't matter how fast things run if
they produce the wrong results.
 
R

Robert Dober

What would have made a bigger difference would be to develop that code
in the contest of specifications written as tests.

This would allow you to first concentrate on making the code do the
right thing before trying to optimize it.

I usually find that it really doesn't matter how fast things run if
they produce the wrong results.
Funny you say that Rick, because I was thinking about the role testing
should have in this.
I kind of ponder of writing two test suits for my projects, one which
clearly is BDD (shall be) and one that kind of replaces very defensive
programming style full of assertions.
Now this is indeed a dangerous road because one might fall
into the trap of implementing a straight jacket, killing the duck ;) etc.

I still believe however that such "microtests" can be a valuable tool
and might be
more useful to find some errors than debuggers, depending on the type
of error of course.
Any thoughts, experiences about/with that approach?

Cheers
Robert
 
R

Rick DeNatale

On 10/24/07, Rick DeNatale <[email protected]> wrote:
Paraphrasing, you should test before you optimize, and use tdd/bdd.
Funny you say that Rick, because I was thinking about the role testing
should have in this.
I kind of ponder of writing two test suits for my projects, one which
clearly is BDD (shall be) and one that kind of replaces very defensive
programming style full of assertions.
Now this is indeed a dangerous road because one might fall
into the trap of implementing a straight jacket, killing the duck ;) etc.

I still believe however that such "microtests" can be a valuable tool
and might be
more useful to find some errors than debuggers, depending on the type
of error of course.
Any thoughts, experiences about/with that approach?

I should, and probably will eventually, answer this more fully on my
blog, but off the top of my head

1) I'm not sure what you mean by the second type of test suite. I
don't think you're talking about putting test assertions into the
code. If you're not I'm not sure I see the real distinction

2) That said, I firmly believe that tests belong outside of the code
under test. We talked about this some time ago when Trans asked for
opinions of the inplace tests in facets. I distinguish between
instrumentation and test equipment. Instrumentation is writing code
to be testable, but the 'bulky' test equipment doesn't weigh down the
F1 car being tested.

3) The chicken typing, I recently wrote about (and I like the term the
more I use it) is just that, weighing down the car unnecessarily. If
the code will fail it's going to fail with or without the chicken
tests, and usually in a way that the chicken tests fail to anticipate.
Furthermore, introducing the chicken tests introduce additional ways
for the code to fail unnecessarily when the chicken tests report false
positives.

4) Getting back to the distinction between two types of test suite.
Personally, the attitude and approach to testing is much more
important than Sapir-Worff considerations, syntax, and whether you
call it TDD or BDD or call them tests or specs. IMHO BDD is a sect of
the main test-first religion, and things like RSpec are kind of like
fancy exercise equipment. If it helps you acquire the proper attitude
towards test first design that's great. To my mind, when I've looked
at the bdd tools, although the have some nice ancillary features, I've
sensed that they are a little too intrusive on the code under test for
my taste, so I'd rather just use 'free weights' such as Test::Unit and
discipline.

Just like exercise equipment, neither does much good if you just use
it for a clothes rack! <G>
 
G

Gary Wright

1) I'm not sure what you mean by the second type of test suite. I
don't think you're talking about putting test assertions into the
code. If you're not I'm not sure I see the real distinction

I interpreted the two types of testing as follows.

Assume you have a method that expects a single non-negative integer
argument. You'll probably want to test that it behaves correctly
for the following:
foo(0) # edge case
foo(1) # typical call
foo(36) # random case from all values
foo(100000000) # large magnitude
foo(1023) # testing near powers of two
foo(1024)
foo(1025)

These tests will be designed to make sure that foo works correctly
on *expected* input.

Then there is the question about whether you should test:
foo(-1)
foo(nil)
foo("bogus")
foo("too", "many", :arguments)
foo()

and so on. These tests are throwing *unexpected* input at the
method. Bertrand Meyer in his writings about design by contract
programming would say that foo isn't obligated to do anything at
all when called with arguments that are outside its 'contract'.
If foo blows up on that input it isn't the failure of 'foo' but
instead is the failure of the caller to adhere to the contract.

So the philosophical question is should you extend foo's contract
to say that it will raise a particular exception (ArgumentError),
add the code to foo, and keep the tests or instead should you
decide that the tests are superfluous and instead focus on
better tests for the code that eventually calls foo (i.e. make
sure your code never calls foo incorrectly).

Gary Wright
 
R

Robert Dober

Paraphrasing, you should test before you optimize, and use tdd/bdd.


I should, and probably will eventually, answer this more fully on my
blog, looking forward to it
but off the top of my head

1) I'm not sure what you mean by the second type of test suite. I
don't think you're talking about putting test assertions into the
code. If you're not I'm not sure I see the real distinction
Well better give an example, let us assume I want to write a package
that does err, what, no idea, just translation, than I would do
TDD/BDD somehow like follows, sorry I am not fluent in rspec but I
will just use something similar.

"Vive la France".shall.translate.to "Viva Italia"
"Je ne suis pas disponible".shall.translate.to "Gone fishing"

Now that is normally considered enough. But for my personal comfort
and anticipating problems and bugs inside my great plan of
implementation - experience speaking ;). I will add some tests very
closely related to the implementation. You see, something totally
opposed to the BDD thing.
E.g.

assert_kind_of? Enumeration, my_thingy.instance_variable_get("@words")
but better would be duck type tests
assert_respond_to? :each, my_thing.....
assert [:each, :map, :size] <= my_thingy.ivarget("@words").methods

The trick would be to identify some few strategic test points to get
alarmed about any
misconception or coding error.
2) That said, I firmly believe that tests belong outside of the code
under test.
I believe it too but less firmely.
We talked about this some time ago when Trans asked for
opinions of the inplace tests in facets. I distinguish between
instrumentation and test equipment. Instrumentation is writing code
to be testable, but the 'bulky' test equipment doesn't weigh down the
F1 car being tested.
True but, I find Trans' tests rather instructive and he does not
really want a F1 car, but generally I agree with you. Libraries like
Factes and Labrador are somehow different because they are so
reflective and code pretty well describes what their code does.
3) The chicken typing, I recently wrote about (and I like the term the
more I use it) is just that, weighing down the car unnecessarily. If
the code will fail it's going to fail with or without the chicken
tests, and usually in a way that the chicken tests fail to anticipate.
Here again I am with you, even instrumentation is a pain I guess and
that is why these microtests would be almost an exercise in
metaprogramming, I guess you would know some Smalltalk stories about
that.
The idea would be to identify stratigic points where being chicken
makes the duck more confident elswhere.
But not sure at all if that might work!
Furthermore, introducing the chicken tests introduce additional ways
for the code to fail unnecessarily when the chicken tests report false
positives.
Here I am losing you, should we not test because of the fear of false positives?
4) Getting back to the distinction between two types of test suite.
Personally, the attitude and approach to testing is much more
important than Sapir-Worff considerations, syntax, and whether you
call it TDD or BDD or call them tests or specs. IMHO BDD is a sect of
the main test-first religion, and things like RSpec are kind of like
fancy exercise equipment. If it helps you acquire the proper attitude
towards test first design that's great. I guess it does for me
To my mind, when I've looked
at the bdd tools, although the have some nice ancillary features, I've
sensed that they are a little too intrusive
that was true when I tested Rspec about 1.5 years ago, is it still?
on the code under test for
my taste, so I'd rather just use 'free weights' such as Test::Unit and
discipline.

Just like exercise equipment, neither does much good if you just use
it for a clothes rack! <G>
Err how did you know? ;)
R.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top