Idiomatic conversion of yielding block to array

D

David Brady

So I have a function that generates like 300 lines of text and I want to
unit test it. If it fails, it barfs a dozen pages of text then says
there's a difference in there somewhere and we wish you the best of luck
trying to find it.

So, yeah.

I'd like to break this out into a line-by-line test. The following code
doesn't work because each_line is a generator function, not an array:

def test.to_html
# test the Calendar.to_html method
cal = Calendar.new(@date_test)
line_no = 0
@html_reference.each_line.zip(cal.to_html.each_line).each do |test_line|
line_no += 1
assert_equal( test_line[0], cal_line[1], "Calendar#to_html output
incorrect at line #{line_no}"
# Test::Unit will then emit the reference line and the failed test
line.
end

Okay, so I want these each_lines done up as arrays.

ref_lines = []; @html_reference.each_line {|line| ref_lines << line }
cal_lines = []; cal.to_html.each_line {|line| cal_lines << line }
line_no = 0
ref_lines.zip(cal_lines).each do |test_line|
line_no += 1
assert_equal( test_line[0], test_line[1], "Calendar.to_html
incorrect on line #{line_no}")
end

This smells very kludgy and procedural. I'm using an index/counter
variable, explicit Array declarations, and zipping two arrays together
only to split them apart again. Ugh!

Can someone help me make this more elegant? C++ has pooped Ruby's pants.

-dB
 
L

Levin Alexander

David Brady said:
cal_lines =3D []; cal.to_html.each_line {|line| cal_lines << line }

require 'enumerator'
cal_lines =3D cal.to_html.to_enum:)each_line).to_a

-Levin
 
S

Simon Kröger

Levin said:
cal_lines = []; cal.to_html.each_line {|line| cal_lines << line }


require 'enumerator'
cal_lines = cal.to_html.to_enum:)each_line).to_a

-Levin

You don't need the to_a but still zip will create 3 arrays.

interesting...

Simon
 
D

David Brady

Levin said:
require 'enumerator'
cal_lines = cal.to_html.to_enum:)each_line).to_a
Innnteresting. This condenses the loop initialization into a single
line... but it's pretty hairy:


@html_reference.to_enum:)each_line).zip(cal.to_html.to_enum:)each_line)).each_with_index
do |test_line, line_no|
assert_equal( test_line[0], test_line[1], "Calendar#to_html
incorrect on line #{line_no+1}")
end

I don't think we've really cleaned much up, though. We've moved from
C++ pooping Ruby's pants to Ruby pooping its own pants. It's idiomatic,
but we haven't reached elegance yet.

-dB
 
S

Simon Kröger

Simon said:
You don't need the to_a but still zip will create 3 arrays.

this does not, but its still not very ruby like:
-------------------------------------------------------------
s1 =3D ['this', 'is', 'a', 'test'].join("\n")
s2 =3D ['this', 'is', 'the', 'test'].join("\n")

q1, q2 =3D SizedQueue.new(1), SizedQueue.new(1)
Thread.new {s1.each_line {|l| q1 << l}; q1 << nil}
Thread.new {s2.each_line {|l| q2 << l}; q2 << nil}

line_no =3D 0
while true
line_no +=3D 1
t1, t2 =3D q1.pop, q2.pop
break if !t1 || !t2
puts "difference on line #{line_no}" if t1 !=3D t2
end
-------------------------------------------------------------

perhaps better, but still kind of hardcore:

-------------------------------------------------------------
require 'enumerator'
require 'thread'

module Enumerable
def each_zip(other)
q =3D SizedQueue.new(1)
Thread.new {other.each{|l| q << l}}
self.each{|a| yield a, q.pop}
end
end

s1 =3D ['this', 'is', 'a', 'test'].join("\n")
s2 =3D ['this', 'is', 'the', 'test'].join("\n")

s1.to_enum:)each_line).enum_for:)each_zip,
s2.to_enum:)each_line)).each_with_index do |t, l|
puts "difference on line: #{l+1}" if t[0] !=3D t[1]
end
 
L

Levin Alexander

food for thoughts...

require 'generator'

s1 =3D %w{this is the first line}.join("\n")
s2 =3D %w{this is the second line}.join("\n")

compare =3D SyncEnumerator.new(s1.to_enum:)each_line), s2.to_enum:)each_lin=
e))
compare.each { |a,b|
# ...
}


-Levin
 
S

Simon Kröger

Levin said:
=20
=20
=20
=20
require 'generator'
=20
s1 =3D %w{this is the first line}.join("\n")
s2 =3D %w{this is the second line}.join("\n")
=20
compare =3D SyncEnumerator.new(s1.to_enum:)each_line), s2.to_enum:)each= _line))
compare.each { |a,b|
# ...
}
=20
=20
-Levin
=20
=20

Yes, perfekt, that's what i was looking for (and implemented :) )

the trivial solution is of course:
----------------------------------------------------------
s1 =3D ['this', 'is', 'a', 'test'].join("\n")
s2 =3D ['this', 'is', 'the', 'test'].join("\n")

(0...s1.size).inject(1) do |l, i|
puts "difference on line: #{l}" or break if s1 !=3D s2
l =3D (s1 =3D=3D 10) ? l+1 : l
end
 
J

Jacob Fugal

Good heavens, no! Neither of those are thread safe. Criminy!

Why not something simple like so:

require 'enumerator'

def compare_by_line( expected, actual, message )
expected_lines =3D expected.to_enum:)each_line)
actual_lines =3D actual.to_enum:)each_line)
expected_lines.zip(actual_lines).each_with_index do |pair, index|
assert_equal( *pair, "#{message} on line #{index + 1}")
end
end

compare_by_line( @html_reference, cal.to_html, "Calendar.to_html incorrect"=
)

There's no need for threads...

Jacob Fugal
 
J

Jacob Fugal

Good heavens, no! Neither of those are thread safe. Criminy!

(Sorry, for those in non-tree-threading clients, that was in reply to
ruby-talk:154785)

Jacob Fugal
 
S

Simon Kröger

Jacob said:
Good heavens, no! Neither of those are thread safe. Criminy!

Why not something simple like so:

require 'enumerator'

def compare_by_line( expected, actual, message )
expected_lines = expected.to_enum:)each_line)
actual_lines = actual.to_enum:)each_line)
expected_lines.zip(actual_lines).each_with_index do |pair, index|
assert_equal( *pair, "#{message} on line #{index + 1}")
end
end

compare_by_line( @html_reference, cal.to_html, "Calendar.to_html incorrect" )

because zip converts each argument to an array, which isn't what you
want if there are many lines.
There's no need for threads...

True SyncEnumerator does it with continuations but i wasn't ready for
this mind bending stuff (yet). (and now i don't need to make knots in my
brain, because Levin was so nice to tell us about SyncEnumerator)

But what exactly isn't thread safe? SizedQueue is.
Jacob Fugal

Threads aren't evil.

cheers

Simon
 
J

Jacob Fugal

s1 =3D ['this', 'is', 'a', 'test'].join("\n")
s2 =3D ['this', 'is', 'the', 'test'].join("\n")
=20
(0...s1.size).inject(1) do |l, i|
puts "difference on line: #{l}" or break if s1 !=3D s2
l =3D (s1 =3D=3D 10) ? l+1 : l
end


What's the second to last line for?

l =3D (s1 =3D=3D 10) ? l+1 : l

That will always return false (since s1 is always a string, and
string/integer comparison is always false), so your line count will
always be 1.

Jacob Fugal
 
S

Simon Kröger

Jacob said:
[...]
What's the second to last line for?

l = (s1 == 10) ? l+1 : l

That will always return false (since s1 is always a string, and
string/integer comparison is always false), so your line count will
always be 1.


s1 is in fact never a string:

str[fixnum] => fixnum or nil
http://www.ruby-doc.org/core/classes/String.html#M001328

so it is for counting newline chars. It could be written shorter

(s1 == 10) ? l+1 : l

without the assignment.
Jacob Fugal

cheers

Simon
 
J

Jacob Fugal

because zip converts each argument to an array, which isn't what you
want if there are many lines.

Ah, yes. I'd probably do it with SyncEnumerator rather than zip as
well, then[1]. I just used zip since its what you started with and you
didn't specify performance as an issue.
True SyncEnumerator does it with continuations but i wasn't ready for
this mind bending stuff (yet). (and now i don't need to make knots in my
brain, because Levin was so nice to tell us about SyncEnumerator)

Very true. Thanks, Levin.
But what exactly isn't thread safe? SizedQueue is.

Unless SizedQueue is built with threads in mind and does alternating
blocking on pushes (<<) and pops. If it does, I apologize.

Assuming I'm not wasting time by not looking at the SizedQueue
documentation, the problem is that there's no guarantee that the main
thread won't run several pops before returning control to the thread
that's pushing into the queue.
Threads aren't evil.

No, but naive non-thread-safe code is. Well, it's broken, at least...
not necessarily *evil*. :)

Jacob Fugal
 
J

Jacob Fugal

=20
Unless SizedQueue is built with threads in mind and does alternating
blocking on pushes (<<) and pops. If it does, I apologize.

Again, I need to proofread my posts better. That should read "Only if"
instead of "Unless".

Jacob Fugal
 
J

Jacob Fugal

Jacob Fugal wrote:
=20
[...]
What's the second to last line for?

l =3D (s1 =3D=3D 10) ? l+1 : l

That will always return false (since s1 is always a string, and
string/integer comparison is always false), so your line count will
always be 1.

=20
s1 is in fact never a string:


Ah, sorry, I missed the fact that you were iterating over characters
instead of lines. 10 =3D=3D ?\n. Gotcha.

But isn't that different than the original functionality? He
originally wanted a warning (specifically assert_equal as part of
Test::Unit) for each line that differed. Your code will only match the
first such case then break out of the loop.

Jacob Fugal
 
S

Simon Kröger

Jacob said:
=20
=20
Again, I need to proofread my posts better. That should read "Only if"
instead of "Unless".
=20
Jacob Fugal

It is a _sized_ queue wich i initailized to be of size 1, so there is=20
only 1 item max in the queue at any given time. This comes from a lib=20
called 'thread' so i would assume that it is in fact "built with threads=20
in mind". It blocks if there is an item in it and you want to write and=20
it block if there is no such item and you want to read.

cheers

Simon
 
S

Simon Kröger

[...]
Ah, sorry, I missed the fact that you were iterating over characters
instead of lines. 10 == ?\n. Gotcha.

But isn't that different than the original functionality? He
originally wanted a warning (specifically assert_equal as part of
Test::Unit) for each line that differed. Your code will only match the
first such case then break out of the loop.

Jacob Fugal

hmm, true .. on the other hand assert_equal throws an
AssertionFailedError, so this is kind of breaking out of the loop also.
(But right, its not the same thing)

cheers

Simon
 
J

Jacob Fugal

hmm, true .. on the other hand assert_equal throws an
AssertionFailedError, so this is kind of breaking out of the loop also.
(But right, its not the same thing)

Indeed, I forgot about that as well. Most my programming has been in
PHP lately (*shudder*) and the testing framework I use continues past
failed assertions, so you can have many failed assertions in one test.
So if you replaced the puts "..." or break if ... with an assertEqual,
you'd pretty much have it right on. :) And since you fail at the first
mismatched character, your prior newlines are sure to have matched up.
Cool beans.

Jacob Fugal
 
J

Jacob Fugal

Performance would probably be the wrong motivation to switch to
SyncEnumerator. On my machine it is about 100 times slower then
Enumerable#zip.
=20
sync-enum: 1.676
zip-enum: 0.018
sizedqueue: 0.226

Yeah, but that performance is probably linear (again, I'm shooting
from my hip here). So in the extreme cases, SyncEnumerable would
outperform #zip. And even before it takes the lead in execution speed,
it definitely has an advantage in memory, which I was more concerned
about. SizedQueue however seems to be a much better choice than
SyncEnumerable for those situations.

Jacob Fugal.
 
J

Jacob Fugal

It is a _sized_ queue wich i initailized to be of size 1, so there is
only 1 item max in the queue at any given time. This comes from a lib
called 'thread' so i would assume that it is in fact "built with threads
in mind". It blocks if there is an item in it and you want to write and
it block if there is no such item and you want to read.

Yeah, I should have looked into that before my first post. Again, my
apologies :)

Jacob Fugal
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top