Loop weirdness

J

Jonathan Leighton

Hi,

I'm going ever so slightly crazy over some looping behaviour. Here's a
simplified test case I made:

----
def categorise(items)
categorised = []
add = proc { |item| categorised << { :num => item[:num], :items => [ item[:foo] ] } }

items.each do |item|
if categorised.empty?
add.call(item)
else
categorised.each do |cat|
if cat[:num] == item[:num]
cat[:items] << item[:foo]
break
elsif cat == categorised.last
add.call(item)
break
end
end
end
end

categorised
end

p categorise( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo => 'foo45' } ] )
p categorise( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo => 'foo45' }, { :num => 1, :foo => 'foo1' } ] )
----

Which produces:

[{:num=>1, :items=>["foo1"]}, {:num=>45, :items=>["foo45"]}]
[{:num=>1, :items=>["foo1", "foo1"]}, {:num=>45, :items=>["foo45"]}]

Good-oh.

HOWEVER, if I remove the breaks from the categorised loop, it produces
all sorts of spectacular results (try it yourself). This I cannot
understand:

* For the first one, if cat[:num] == item[:num], this implies that
there are no other instances of this particular number. Which of course
is what I want. The break perhaps speeds things up because it doesn't
have to iterate more than needed, but it shouldn't affect the end result
as far as I can see.
* For the second one, it should only be run if cat == categorised.last,
ie we're on the last item in the array. Which means that the iteration
should not be run again. So why does the break affect things?

I'm really getting quite confused about this so I'd be really glad of an
explanation.

Thanks a lot
 
E

Eero Saynatkari

Hi,

I'm going ever so slightly crazy over some looping behaviour. Here's a
simplified test case I made:

I am feeling really dense today so I may be totally off mark, but...
----
def categorise(items)
categorised = []
add = proc { |item| categorised << { :num => item[:num], :items => [ item[:foo] ] } }

items.each do |item|
if categorised.empty?
add.call(item)

This is only done once, because categorised will no longer be empty.
else
categorised.each do |cat|

cat is the first and last element of categorised.
if cat[:num] == item[:num]
cat[:items] << item[:foo]
break
elsif cat == categorised.last

This will be true.
add.call(item)
break
end
end
end
end

categorised
end

p categorise( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo => 'foo45' } ] )
p categorise( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo => 'foo45' }, { :num => 1, :foo => 'foo1' } ] )
----

Which produces:

[{:num=>1, :items=>["foo1"]}, {:num=>45, :items=>["foo45"]}]
[{:num=>1, :items=>["foo1", "foo1"]}, {:num=>45, :items=>["foo45"]}]

Good-oh.

HOWEVER, if I remove the breaks from the categorised loop, it produces
all sorts of spectacular results (try it yourself). This I cannot
understand:

* For the first one, if cat[:num] == item[:num], this implies that
there are no other instances of this particular number. Which of course
is what I want. The break perhaps speeds things up because it doesn't
have to iterate more than needed, but it shouldn't affect the end result
as far as I can see.
* For the second one, it should only be run if cat == categorised.last,
ie we're on the last item in the array. Which means that the iteration
should not be run again. So why does the break affect things?

I'm really getting quite confused about this so I'd be really glad of an
explanation.

Thanks a lot

Jonathan Leighton


E
 
J

Johan Veenstra

------=_Part_7021_26703939.1138526312172
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

The without the breaks if the last "cat[:num] =3D=3D item[:num]" isn't true=
, "
add.call(item)" is executed, even if the item was already added to an
'earlier' category

by the way, your solution seems a bit complex, if you can live with a Hash
as the result, instead of an Array:

def categorise( items )
catHash =3D Hash.new { |h, k| Array.new }
items.each do | item |
catHash[ item[ :num ] ] +=3D [ item[ :foo ] ]
end
catHash
end

p categorise( [ { :num =3D> 1, :foo =3D> 'foo1' }, { :num =3D> 45, :foo =3D=
} ] )
p categorise( [ { :num =3D> 1, :foo =3D> 'foo1' }, { :num =3D> 45, :foo =3D=
}, { :num =3D> 1, :foo =3D> 'foo1' } ] )

{45 =3D> [ "foo45" ], 1 =3D> [ "foo1" ]}
{45 =3D> [ "foo45" ], 1 =3D> [ "foo1", "foo1" ]}

if you can't live with that:

def categorise( items )
catHash =3D Hash.new { |h, k| Array.new }
items.each do | item |
catHash[ item[ :num ] ] +=3D [ item[ :foo ] ]
end
catArray =3D []
catHash.each do | num, foo |
catArray << { :num =3D> num, :foo =3D> foo }
end
catArray
end

p categorize( [ { :num =3D> 1, :foo =3D> 'foo1' }, { :num =3D> 45, :foo =3D=
} ] )
p categorize( [ { :num =3D> 1, :foo =3D> 'foo1' }, { :num =3D> 45, :foo =3D=
}, { :num =3D> 1, :foo =3D> 'foo1' } ] )

[{:num=3D>1, :items=3D>["foo1"]}, {:num=3D>45, :items=3D>["foo45"]}]
[{:num=3D>1, :items=3D>["foo1", "foo1"]}, {:num=3D>45, :items=3D>["foo45"]}=
]

don't know about efficiency, but it look pretty simple

Hi,

I'm going ever so slightly crazy over some looping behaviour. Here's a
simplified test case I made:

----
def categorise(items)
categorised =3D []
add =3D proc { |item| categorised << { :num =3D> item[:num], :items =3D=
[
item[:foo] ] } }

items.each do |item|
if categorised.empty?
add.call(item)
else
categorised.each do |cat|
if cat[:num] =3D=3D item[:num]
cat[:items] << item[:foo]
break
elsif cat =3D=3D categorised.last
add.call(item)
break
end
end
end
end

categorised
end

p categorise( [ { :num =3D> 1, :foo =3D> 'foo1' }, { :num =3D> 45, :foo = =3D>
'foo45' } ] )
p categorise( [ { :num =3D> 1, :foo =3D> 'foo1' }, { :num =3D> 45, :foo = =3D>
'foo45' }, { :num =3D> 1, :foo =3D> 'foo1' } ] )
----

Which produces:

[{:num=3D>1, :items=3D>["foo1"]}, {:num=3D>45, :items=3D>["foo45"]}]
[{:num=3D>1, :items=3D>["foo1", "foo1"]}, {:num=3D>45, :items=3D>["foo45"= ]}]

Good-oh.

HOWEVER, if I remove the breaks from the categorised loop, it produces
all sorts of spectacular results (try it yourself). This I cannot
understand:

* For the first one, if cat[:num] =3D=3D item[:num], this implies that
there are no other instances of this particular number. Which of course
is what I want. The break perhaps speeds things up because it doesn't
have to iterate more than needed, but it shouldn't affect the end result
as far as I can see.
* For the second one, it should only be run if cat =3D=3D categorised.las= t,
ie we're on the last item in the array. Which means that the iteration
should not be run again. So why does the break affect things?

I'm really getting quite confused about this so I'd be really glad of an
explanation.

Thanks a lot

------=_Part_7021_26703939.1138526312172--
 
R

Robert Klemme

Johan Veenstra said:
The without the breaks if the last "cat[:num] == item[:num]" isn't
true, " add.call(item)" is executed, even if the item was already
added to an 'earlier' category

by the way, your solution seems a bit complex, if you can live with a
Hash as the result, instead of an Array:

def categorise( items )
catHash = Hash.new { |h, k| Array.new }
items.each do | item |
catHash[ item[ :num ] ] += [ item[ :foo ] ]
end
catHash
end

p categorise( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo =>
'foo45' } ] )
p categorise( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo =>
'foo45' }, { :num => 1, :foo => 'foo1' } ] )

{45 => [ "foo45" ], 1 => [ "foo1" ]}
{45 => [ "foo45" ], 1 => [ "foo1", "foo1" ]}

if you can't live with that:

def categorise( items )
catHash = Hash.new { |h, k| Array.new }

I guess you rather meant:

cat_hash = Hash.new {|h,k| h[k]=[]}

Your code never inserts something into the Hash.
items.each do | item |
catHash[ item[ :num ] ] += [ item[ :foo ] ]
end
catArray = []
catHash.each do | num, foo |
catArray << { :num => num, :foo => foo }
end
catArray
end

But otherwise I agree: using a Hash here seems a better choice.

Kind regards

robert
 
J

Jonathan Leighton

The without the breaks if the last "cat[:num] == item[:num]" isn't true, "
add.call(item)" is executed, even if the item was already added to an
'earlier' category

Ah yes! Thanks, I didn't spot that. However, I still don't understand
why the second break is necessary. Without it I get:

[{:num=>1, :items=>["foo1"]}, {:num=>45, :items=>["foo45", "foo45"]}]
[{:num=>1, :items=>["foo1", "foo1"]}, {:num=>45, :items=>["foo45", "foo45"]}]

Which should be:

[{:num=>1, :items=>["foo1"]}, {:num=>45, :items=>["foo45"]}]
[{:num=>1, :items=>["foo1", "foo1"]}, {:num=>45, :items=>["foo45"]}]

Any ideas on that?
by the way, your solution seems a bit complex, if you can live with a Hash
as the result, instead of an Array:

def categorise( items )
catHash = Hash.new { |h, k| Array.new }
items.each do | item |
catHash[ item[ :num ] ] += [ item[ :foo ] ]
end
catHash
end

p categorise( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo => 'foo45'
} ] )
p categorise( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo => 'foo45'
}, { :num => 1, :foo => 'foo1' } ] )

{45 => [ "foo45" ], 1 => [ "foo1" ]}
{45 => [ "foo45" ], 1 => [ "foo1", "foo1" ]}

if you can't live with that:

def categorise( items )
catHash = Hash.new { |h, k| Array.new }
items.each do | item |
catHash[ item[ :num ] ] += [ item[ :foo ] ]
end
catArray = []
catHash.each do | num, foo |
catArray << { :num => num, :foo => foo }
end
catArray
end

p categorize( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo => 'foo45'
} ] )
p categorize( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo => 'foo45'
}, { :num => 1, :foo => 'foo1' } ] )

[{:num=>1, :items=>["foo1"]}, {:num=>45, :items=>["foo45"]}]
[{:num=>1, :items=>["foo1", "foo1"]}, {:num=>45, :items=>["foo45"]}]

don't know about efficiency, but it look pretty simple

These are interesting but unfortunately I need the order of the original
items array to be maintained. This is because for the *actual* purpose,
it's sorting a number of rows from a database and categorising them by a
date field. The rows are returned in a certain order from the database
and that order must be kept.

Cheers
Hi,

I'm going ever so slightly crazy over some looping behaviour. Here's a
simplified test case I made:

----
def categorise(items)
categorised = []
add = proc { |item| categorised << { :num => item[:num], :items => [
item[:foo] ] } }

items.each do |item|
if categorised.empty?
add.call(item)
else
categorised.each do |cat|
if cat[:num] == item[:num]
cat[:items] << item[:foo]
break
elsif cat == categorised.last
add.call(item)
break
end
end
end
end

categorised
end

p categorise( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo =>
'foo45' } ] )
p categorise( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo =>
'foo45' }, { :num => 1, :foo => 'foo1' } ] )
----

Which produces:

[{:num=>1, :items=>["foo1"]}, {:num=>45, :items=>["foo45"]}]
[{:num=>1, :items=>["foo1", "foo1"]}, {:num=>45, :items=>["foo45"]}]

Good-oh.

HOWEVER, if I remove the breaks from the categorised loop, it produces
all sorts of spectacular results (try it yourself). This I cannot
understand:

* For the first one, if cat[:num] == item[:num], this implies that
there are no other instances of this particular number. Which of course
is what I want. The break perhaps speeds things up because it doesn't
have to iterate more than needed, but it shouldn't affect the end result
as far as I can see.
* For the second one, it should only be run if cat == categorised.last,
ie we're on the last item in the array. Which means that the iteration
should not be run again. So why does the break affect things?

I'm really getting quite confused about this so I'd be really glad of an
explanation.

Thanks a lot
 
J

Johan Veenstra

------=_Part_8062_25616106.1138539971312
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

Hmm interesting, I've added line number, couldn't explain it other wise:

def categorise(items)
1 categorised =3D []
2 add =3D proc { |item| categorised said:
item[:foo] ] } }
3 items.each do |item|
4 if categorised.empty?
5 add.call(item)
6 else
7 categorised.each do |cat|
8 if cat[:num] =3D=3D item[:num]
9 cat[:items] << item[:foo]
10 break
11 elsif cat =3D=3D categorised.last
12 add.call(item)
13 end
14 end
15 end
16 end
17 categorised
end

p categorise( [ { :num =3D> 1, :foo =3D> 'foo1' }, { :num =3D> 45, :foo =3D=
} ] )

1: categorised =3D []
3: first item( item[:num] =3D=3D 1 )
4: categorised is empty
5: {:num=3D>1, :items=3D>["foo1"]} is added to categorised, so far so good
3: next item( item[:num] =3D 45 )
4: categorised is not empty
7: first category (cat[:num] =3D=3D 1)
8: item[:num] is not equal to cat[:num] (45 !=3D 1 )
11: cat is last category
12: { :num =3D> 45, :foo =3D> ['foo45'] } is added to categorised, just wha=
t we
need
7: next category (yes there is a next category, number 45 that we've just
added a moment ago) ( cat[:num] =3D=3D 45)
8: added ['foo45'] to category 45, oh boy
10:break
3: no more items
18: return categorised

------=_Part_8062_25616106.1138539971312--
 
J

Jonathan Leighton

Okay, so let me get this right... if you do

foo.each { }

foo is re-evaluated with each iteration? Hence the bug? Why?

Jon

Hmm interesting, I've added line number, couldn't explain it other wise:

def categorise(items)
1 categorised = []
2 add = proc { |item| categorised << { :num => item[:num], :items => [
item[:foo] ] } }
3 items.each do |item|
4 if categorised.empty?
5 add.call(item)
6 else
7 categorised.each do |cat|
8 if cat[:num] == item[:num]
9 cat[:items] << item[:foo]
10 break
11 elsif cat == categorised.last
12 add.call(item)
13 end
14 end
15 end
16 end
17 categorised
end

p categorise( [ { :num => 1, :foo => 'foo1' }, { :num => 45, :foo => 'foo45'
} ] )

1: categorised = []
3: first item( item[:num] == 1 )
4: categorised is empty
5: {:num=>1, :items=>["foo1"]} is added to categorised, so far so good
3: next item( item[:num] = 45 )
4: categorised is not empty
7: first category (cat[:num] == 1)
8: item[:num] is not equal to cat[:num] (45 != 1 )
11: cat is last category
12: { :num => 45, :foo => ['foo45'] } is added to categorised, just what we
need
7: next category (yes there is a next category, number 45 that we've just
added a moment ago) ( cat[:num] == 45)
8: added ['foo45'] to category 45, oh boy
10:break
3: no more items
18: return categorised
 
D

dblack

Hi --

Okay, so let me get this right... if you do

foo.each { }

foo is re-evaluated with each iteration? Hence the bug? Why?

The length of foo is re-evaluated each time; that's how the decision
is made whether or not to keep iterating. Array#each essentially
works like this:

def each
i = 0
while i < length
yield self
i += 1
end
self
end

So if the length of the array increases in mid-stream, the iteration
will continue, based on the while test.

You can always get around this with something like:

array.length.times do |i|
# do something with array
end

where length won't be re-evaluated.


David

--
David A. Black
(e-mail address removed)

"Ruby for Rails", from Manning Publications, coming May 1, 2006!
http://www.manning.com/books/black
 
J

Jonathan Leighton

Hi --

Okay, so let me get this right... if you do

foo.each { }

foo is re-evaluated with each iteration? Hence the bug? Why?

The length of foo is re-evaluated each time; that's how the decision
is made whether or not to keep iterating. Array#each essentially
works like this:

def each
i = 0
while i < length
yield self
i += 1
end
self
end

So if the length of the array increases in mid-stream, the iteration
will continue, based on the while test.

You can always get around this with something like:

array.length.times do |i|
# do something with array
end

where length won't be re-evaluated.


Thanks very much! I understand now.
 
S

Simon Kröger

Jonathan said:
Hi,

I'm going ever so slightly crazy over some looping behaviour. Here's a
simplified test case I made:

Is this what you are trying to do?

def categorise(items)
items.inject(Hash.new([])) do |h, item|
h[item[:num]] += [item[:foo]]; h
end.map{|k, v| {:num => k, :items => v}}
end

(sorry, i know this wasn't your question - i love refactoring and
hope there is a trick or two in it you didn't came across yet)

cheers

Simon
 
R

Robert Klemme

Jonathan said:
Hi --

Okay, so let me get this right... if you do

foo.each { }

foo is re-evaluated with each iteration? Hence the bug? Why?

The length of foo is re-evaluated each time; that's how the decision
is made whether or not to keep iterating. Array#each essentially
works like this:

def each
i = 0
while i < length
yield self
i += 1
end
self
end

So if the length of the array increases in mid-stream, the iteration
will continue, based on the while test.

You can always get around this with something like:

array.length.times do |i|
# do something with array
end

where length won't be re-evaluated.


Thanks very much! I understand now.


Adding to this I'd say it's generally a good idea not to modify the
container you're iterating (unless it's an array and you iterate by index)
because all sorts of strange things may happen. (Java folks have
ConcurrentModificationException for this...) At least you should look
twice whether you need to actually modify the container you walk though.

Kind regards

robert
 
J

Jonathan Leighton

Jonathan Leighton wrote:
=20
Hi,
=20
I'm going ever so slightly crazy over some looping behaviour. Here's = a
simplified test case I made:
=20
Is this what you are trying to do?
=20
def categorise(items)
items.inject(Hash.new([])) do |h, item|
h[item[:num]] +=3D [item[:foo]]; h
end.map{|k, v| {:num =3D> k, :items =3D> v}}
end

That's cool (although crowded), but it doesn't fit the bill for me as I
need the order to be retained.

Thanks anyway

Jon

--=20
Jonathan Leighton
http://turnipspatch.com/ | http://jonathanleighton.com/ | http://digital-=
proof.org/
 
R

Robert Klemme

2006/1/30 said:
Jonathan said:
Hi,

I'm going ever so slightly crazy over some looping behaviour. Here's = a
simplified test case I made:

Is this what you are trying to do?

def categorise(items)
items.inject(Hash.new([])) do |h, item|
h[item[:num]] +=3D [item[:foo]]; h
end.map{|k, v| {:num =3D> k, :items =3D> v}}
end

That's cool (although crowded), but it doesn't fit the bill for me as I
need the order to be retained.

What exactly do you want to do? Can you describe it other than in
code? Input, output etc.

Cheers

robert
 
J

Jonathan Leighton

2006/1/30 said:
Jonathan Leighton wrote:

Hi,

I'm going ever so slightly crazy over some looping behaviour. Her= e's a
simplified test case I made:

Is this what you are trying to do?

def categorise(items)
items.inject(Hash.new([])) do |h, item|
h[item[:num]] +=3D [item[:foo]]; h
end.map{|k, v| {:num =3D> k, :items =3D> v}}
end

That's cool (although crowded), but it doesn't fit the bill for me as= I
need the order to be retained.
=20
What exactly do you want to do? Can you describe it other than in
code? Input, output etc.

I did, back in the thread a little:

----
These are interesting but unfortunately I need the order of the original
items array to be maintained. This is because for the *actual* purpose,
it's sorting a number of rows from a database and categorising them by a
date field. The rows are returned in a certain order from the database
and that order must be kept.
----

But don't worry about it, I am happy with the solution I've found. (That
is, there is no longer a *problem*, although I'm always interested to
see better/leaner/cleaner ways of doing things).

Jon

--=20
Jonathan Leighton
http://turnipspatch.com/ | http://jonathanleighton.com/ | http://digital-=
proof.org/
 
S

Simon Kröger

I did, back in the thread a little:

Sorry, missed that.
But don't worry about it, I am happy with the solution I've found. (That
is, there is no longer a *problem*, although I'm always interested to
see better/leaner/cleaner ways of doing things).

:) another try:

def categorise(items)
items.inject([]) do |a, item|
ary = a.assoc(item[:num]) || (a << [item[:num], []]).last
ary.last << item[:foo]; a
end.map{|k, v| {:num => k, :items => v}}
end

cheers

Simon
 
R

Robert Klemme

Jonathan said:
2006/1/30 said:
Hi,

I'm going ever so slightly crazy over some looping behaviour.
Here's a simplified test case I made:

Is this what you are trying to do?

def categorise(items)
items.inject(Hash.new([])) do |h, item|
h[item[:num]] += [item[:foo]]; h
end.map{|k, v| {:num => k, :items => v}}
end

That's cool (although crowded), but it doesn't fit the bill for me
as I need the order to be retained.

What exactly do you want to do? Can you describe it other than in
code? Input, output etc.

I did, back in the thread a little:

Sorry, that slipped by me.
----
These are interesting but unfortunately I need the order of the
original
items array to be maintained. This is because for the *actual*
purpose,
it's sorting a number of rows from a database and categorising them
by a
date field. The rows are returned in a certain order from the database
and that order must be kept.
----

That description sounds as if a straightforward hash solution would work:

def cat(records)
ha = Hash.new {|h,k| h[k]=[]}
records.each do |rec|
ha[rec[:date]] << rec
end
ha
end

But apparently that's not what you want. (Is it?) You could of course
return two items (records and ha) from this method to preserve original
order.

Cheers

robert
 
J

Jonathan Leighton

=20
I did, back in the thread a little:
=20
Sorry, missed that.
=20
But don't worry about it, I am happy with the solution I've found. = (That
is, there is no longer a *problem*, although I'm always interested to
see better/leaner/cleaner ways of doing things).
=20
:) another try:
=20
def categorise(items)
items.inject([]) do |a, item|
ary =3D a.assoc(item[:num]) || (a << [item[:num], []]).last
ary.last << item[:foo]; a
end.map{|k, v| {:num =3D> k, :items =3D> v}}
end

That's pretty slick :). One question though:

The docs say Array#map only yeilds on value: the value of the item in
the array. However, you are using two in your block. I understand (from
looking at the code), that this assigns each value in the item array to
k and v respectively, but how does that work? Is there any docs about
that anywhere?

Cheers

--=20
Jonathan Leighton
http://turnipspatch.com/ | http://jonathanleighton.com/ | http://digital-=
proof.org/
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,158
Latest member
Vinay_Kumar Nevatia
Top