String#to_ary and Test::Unit


T

Trans

In Facets I offer:

class String
def to_ary
self.split(//)
end
end

This proves useful in adding other methods to Enumerable that can act
to all enumerables and on strings as character arrays. For instance:

# Generates a hash mapping each unique element to the frequency it
appears.
#--
# Credit goes to Derek.
#++
def freq
arr = respond_to?:)to_ary) ? self.to_ary : self.to_a
probHash = Hash.new
size = arr.size.to_f
arr.uniq.each do |i|
ct = arr.inject(0) do |mem,obj|
obj.eql?(i) ? (mem+1) : mem
end
probHash = ct.to_f/size
end
probHash
end

I had hopped by using Sting#to_ary I would avoid any problems b/c it is
not defined by default. Alas I don't seem so fortunate. Test::Unit
chokes on it:

irb(main):001:0> require 'facet/string/to_ary'
=> true
irb(main):002:0> require 'test/unit'
=> true
irb(main):003:0> class TC < Test::Unit::TestCase
irb(main):004:1> def test01
irb(main):005:2> assert_equal( ['a','b','c'], 'abc'.to_ary )
irb(main):006:2> end
irb(main):007:1> end
=> nil
irb(main):008:0> require 'test/unit/ui/console/testrunner'
=> true
irb(main):009:0> Test::Unit::UI::Console::TestRunner.run(TC)
SystemStackError: stack level too deep
from
/usr/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:105:in `puts'
from
/usr/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:105:in `output'
from
/usr/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:49:in
`setup_mediator'
from
/usr/lib/ruby/1.8/test/unit/ui/console/testrunner.rb:37:in `start'
from
/usr/lib/ruby/1.8/test/unit/ui/testrunnerutilities.rb:27:in `run'
from (irb):10

Why is this and what can be done about it?

Thanks,
T.
 
Ad

Advertisements

D

David A. Black

Hi --

In Facets I offer:

class String
def to_ary
self.split(//)
end
end

This proves useful in adding other methods to Enumerable that can act
to all enumerables and on strings as character arrays. For instance:

# Generates a hash mapping each unique element to the frequency it
appears.
#--
# Credit goes to Derek.
#++
def freq
arr = respond_to?:)to_ary) ? self.to_ary : self.to_a
probHash = Hash.new
size = arr.size.to_f
arr.uniq.each do |i|
ct = arr.inject(0) do |mem,obj|
obj.eql?(i) ? (mem+1) : mem
end
probHash = ct.to_f/size
end
probHash
end


This is just a sidenote, but here's a fun way to do that:

module Enumerable
def freq
arr = to_ary rescue to_a
probs = Hash.new {|h,k| h[k] = arr.find_all {|e| e == k }.size }
arr.uniq.each {|k| probs[k] /= arr.size.to_f }
probs
end
end

:)

Anyway...
I had hopped by using Sting#to_ary I would avoid any problems b/c it is
not defined by default. Alas I don't seem so fortunate. Test::Unit
chokes on it:

It's not test/unit related, except that line 105 of testrunner.rb
happens to call puts. Here's what I think is happening.

When you call puts x, there's a test somewhere to see if it's
an array (including via to_ary). If it is, then puts x turns into
puts x.to_ary.

In the case of a string (assuming your String#to_ary), that means
that:

puts "abc"

becomes

puts ["a","b","c"]

which starts with

puts "a"

but that triggers puts "a".to_ary, which becomes puts ["a"], which
starts with puts "a", which triggers puts "a".to_ary, which becomes
puts ["a"], etc., in an infinite loop.

(I haven't tracked this down in the source; it's inference from the
Ruby side.)

To see this in action:

class String
def to_ary
split(/#{p self}/)
end
end

puts "abc"

and get ready to hit ctrl-c :)


David
 
Y

Yukihiro Matsumoto

Hi,

In message "Re: String#to_ary and Test::Unit"

|In Facets I offer:
|
| class String
| def to_ary
| self.split(//)
| end
| end

I don't think providing to_ary for objects which are not really
arrays. In this case, puts recurse for objects with to_ary, then each
array returned from to_ary contains objects with to_ary (strings),
then infinite recursion.

matz.
 
M

Martin DeMello

Yukihiro Matsumoto said:
I don't think providing to_ary for objects which are not really
arrays. In this case, puts recurse for objects with to_ary, then each
array returned from to_ary contains objects with to_ary (strings),
then infinite recursion.

I'd go further, and say that String should not implement Enumerable - as
it is, any code that wants to recursively traverse collections has to
include a check for String so that it doesn't get into the infinite
"String is a collection of Strings" recursion. each_byte, each_word and
each_line (without any default 'each') should provide most of the
functionality people need anyway - I can't remember ever needing to call
map or inject on a String unless I'd already #split or #scanned it into
an Array first.

martin
 
B

Brian Schröder

Hi --
=20
On Mon, 8 Aug 2005, Trans wrote:
=20
In Facets I offer:

class String
def to_ary
self.split(//)
end
end

This proves useful in adding other methods to Enumerable that can act
to all enumerables and on strings as character arrays. For instance:

# Generates a hash mapping each unique element to the frequency it
appears.
#--
# Credit goes to Derek.
#++
def freq
arr =3D respond_to?:)to_ary) ? self.to_ary : self.to_a
probHash =3D Hash.new
size =3D arr.size.to_f
arr.uniq.each do |i|
ct =3D arr.inject(0) do |mem,obj|
obj.eql?(i) ? (mem+1) : mem
end
probHash =3D ct.to_f/size
end
probHash
end

=20
This is just a sidenote, but here's a fun way to do that:
=20
module Enumerable
def freq
arr =3D to_ary rescue to_a
probs =3D Hash.new {|h,k| h[k] =3D arr.find_all {|e| e =3D=3D k }.= size }
arr.uniq.each {|k| probs[k] /=3D arr.size.to_f }
probs
end
end
=20
:)
=20


Another sidenote that removes the need for to_ary completely and is a
lot faster:

module Enumerable
def freq_brian=20
probs =3D Hash.new(0.0)
size =3D 0.0
each do | e |=20
probs[e] +=3D 1.0
size +=3D 1.0
end
probs.keys.each do | e | probs[e] /=3D size end
probs
end
end


user system total real
Derek 6.760000 0.790000 7.550000 ( 8.286280)
David 4.100000 1.310000 5.410000 ( 5.698582)
Brian 0.120000 0.010000 0.130000 ( 0.134124)

require 'test/unit'

class TC_Frequency < Test::Unit::TestCase
ARRAYS =3D [
[1,2,1,1,1,4,5,6],
[1,2,3,4],
[],
[0,0,0],
Array.new(10) { rand(10) },
Array.new(100) { rand(10) },
Array.new(1000) { rand(10) },=20
Array.new(10000) { rand(10) },=20
Array.new(10000) { rand(100) }
]=20

def test_brian_david
ARRAYS.each do | a |
assert_equal(a.freq_david, a.freq_brian, "Brian's is not equal
to David's on #{a}")
end
end

def test_brian_derek
ARRAYS.each do | a |
assert_equal(a.freq_derek, a.freq_brian, "Brian's is not equal
to Derek's on #{a}")
end
end
end

require 'benchmark'

Benchmark.bm do | b |
b.report('Derek') do TC_Frequency::ARRAYS.each do | a | a.freq_derek end =
end
b.report('David') do TC_Frequency::ARRAYS.each do | a | a.freq_david end =
end
b.report('Brian') do TC_Frequency::ARRAYS.each do | a | a.freq_brian end =
end
end

regards,

Brian

--=20
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/
 
T

Trans

Martin said:
I'd go further, and say that String should not implement Enumerable - as
it is, any code that wants to recursively traverse collections has to
include a check for String so that it doesn't get into the infinite
"String is a collection of Strings" recursion. each_byte, each_word and
each_line (without any default 'each') should provide most of the
functionality people need anyway - I can't remember ever needing to call
map or inject on a String unless I'd already #split or #scanned it into
an Array first.

Yes and no.

No b/c I don't see any reason for String not to have a _default_
representation as an array. The problem is that right now that
"default" is dependent on a global setting $/ via its String's def of
#each and its use of the #to_a method from Enumerable. Using a global
smells. Globally changing the way #each behaves, as well as the methods
that depend on it, is awefully fragile. It means you can't depend their
behavior.

I know that matz endlessly insists that string is not an array of
chars, despite that other languages do represent them as such, that's
fine wih me. String is not an Array. And so I understand about not
having a String#to_ary. I was just trying to find a way around the
above problem.

So also Yes, I agree with you in that its hardly useful for String to
include Enumerable --the way things are. But if String#to_a were
defined as split(//), so that there was a _consistant_ result, then it
would be quite useful. Unfortuately, one can't just redefine
String#to_a in this way b/c, like I said, the whole thing is so
fragile, and one might cause other code to break that depends on the
globablly setable Enumerable version.

T.
 
Ad

Advertisements

T

Trans

Wow, thanks Brian. I'll use that!

Oh, and thanks David for the explination as to why it breaks. That make
sense.
 
M

Martin DeMello

Trans said:
No b/c I don't see any reason for String not to have a _default_
representation as an array. The problem is that right now that

As I said, my primary objection is not the inconsistency, but the fact
that the array elements are Strings again.

Consider the following fragment

def to_tree
if self.respond_to? :)each) and self.type != String # <-- ugly!
self.node << self.map {|i| i.node}
else
self.node
end
end

Similarly if you have a to_ary (as you yourself found out) - the way
things stand, *any* code that recursively descends an arbitrary object
this way will have to specialcase Strings.

Limited use case? Sure, but I've definitely run into it more often than
I've needed a String#random_Enumerable_method.

martin
 
D

David A. Black

--8323328-637917959-1123499947=:20453
Content-Type: MULTIPART/MIXED; BOUNDARY="8323328-637917959-1123499947=:20453"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--8323328-637917959-1123499947=:20453
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi --

Another sidenote that removes the need for to_ary completely and is a
lot faster:

It's not needed if it's OK just to call #each, but I thought the
#to_ary call was part of Tom's original requirement.


David

--=20
David A. Black
(e-mail address removed)
--8323328-637917959-1123499947=:20453--
--8323328-637917959-1123499947=:20453--
 
T

Trans

David said:
This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

Hi --



It's not needed if it's OK just to call #each, but I thought the
#to_ary call was part of Tom's original requirement.

Unfortunately David is correct. In looking more closely at your
(Brian's) code offering, I see that it still uses #each. So in the
context of String, the same problem arsies --I can't count on a string
being iterated over by chars. I'll have to create an overriding #freq
method specifically for string --and the other methods that work the
same way.

T.
 
T

Trans

Martin said:
each_byte, each_word and
each_line (without any default 'each') should provide most of the
functionality people need anyway

Those really aren;t very good. Oe ends creating all sorts of methods
along these lines:

freq_bytes, freq_word, freq_lines.

I suppose the best way would be:

arr.bytes.freq, arr.words.freq, etc.

So I agree with you even more. There are really two reasonable choices.
Either make String#to_a consistant, with the obvious definition being
split(//), or remove Enumerable altogether. As it stands it is a bit
confusing, mostly useless, and a likely potential for creeping bugs.

T.
 
Ad

Advertisements

B

Brian Schröder

=20

=20
Unfortunately David is correct. In looking more closely at your
(Brian's) code offering, I see that it still uses #each. So in the
context of String, the same problem arsies --I can't count on a string
being iterated over by chars. I'll have to create an overriding #freq
method specifically for string --and the other methods that work the
same way.
=20
T.
=20

Yes, but why not use:

probability_hash =3D "this is my string".split(//).freq

Each object that could support to_ary can also support each, while not
every object that has an each makes sense with to_ary. So it would
seem better to me to put this functionality into the enumerable and
supply an enumerable.

Then you can even do

module CharString
def each(&block)
self.split(//).each &block
end
end

module WordString
def each(&block)
self.split(/\s+/).each &block
end
end

charstring =3D "some chars"
class << charstring
include CharString
end

charstring.freq_brian
=3D> {" "=3D>0.1, "a"=3D>0.1, "m"=3D>0.1, "c"=3D>0.1, "o"=3D>0.1, "e"=3D>0=
1,
"r"=3D>0.1, "h"=3D>0.1, "s"=3D>0.2}

wordstring =3D "This is my word string"
class << wordstring
include WordString
end

wordstring.freq_brian
=3D> {"my"=3D>0.2, "word"=3D>0.2, "This"=3D>0.2, "string"=3D>0.2, "is"=3D>=
0.2}

hth,

brian

--=20
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/
 
D

David A. Black

--8323328-937761061-1123512965=:7838
Content-Type: MULTIPART/MIXED; BOUNDARY="8323328-937761061-1123512965=:7838"

This message is in MIME format. The first part should be readable text,
while the remaining parts are likely unreadable without MIME-aware tools.

--8323328-937761061-1123512965=:7838
Content-Type: TEXT/PLAIN; charset=X-UNKNOWN; format=flowed
Content-Transfer-Encoding: QUOTED-PRINTABLE

Hi --

This message is in MIME format. The first part should be readable text= ,
while the remaining parts are likely unreadable without MIME-aware to= ols.

Hi --

On Mon, 8 Aug 2005, [ISO-8859-1] Brian Schr=F6der wrote:

Another sidenote that removes the need for to_ary completely and is a
lot faster:

It's not needed if it's OK just to call #each, but I thought the
#to_ary call was part of Tom's original requirement.

Unfortunately David is correct. In looking more closely at your
(Brian's) code offering, I see that it still uses #each. So in the
context of String, the same problem arsies --I can't count on a string
being iterated over by chars. I'll have to create an overriding #freq
method specifically for string --and the other methods that work the
same way.

T.

Yes, but why not use:

probability_hash =3D "this is my string".split(//).freq

Each object that could support to_ary can also support each, while not
every object that has an each makes sense with to_ary. So it would
seem better to me to put this functionality into the enumerable and
supply an enumerable.

I got the impression Tom was just trying to make things as transparent
as possible (though given the String#to_ary infinite loop, transparent
as possible may not be very transparent).
Then you can even do

module CharString
def each(&block)
self.split(//).each &block
end
end

module WordString
def each(&block)
self.split(/\s+/).each &block
end
end

charstring =3D "some chars"
class << charstring
include CharString
end

I thought you disliked the "<<" notation :)

charstring.extend(CharString)


David

--=20
David A. Black
(e-mail address removed)
--8323328-937761061-1123512965=:7838--
--8323328-937761061-1123512965=:7838--
 
Ad

Advertisements

B

Brian Schröder

Hi --
=20
On Mon, 8 Aug 2005, [ISO-8859-1] Brian Schr=F6der wrote:
=20
David A. Black wrote:
This message is in MIME format. The first part should be readable te= xt,
while the remaining parts are likely unreadable without MIME-aware = tools.

Hi --

On Mon, 8 Aug 2005, [ISO-8859-1] Brian Schr=F6der wrote:

Another sidenote that removes the need for to_ary completely and is = a
lot faster:

It's not needed if it's OK just to call #each, but I thought the
#to_ary call was part of Tom's original requirement.

Unfortunately David is correct. In looking more closely at your
(Brian's) code offering, I see that it still uses #each. So in the
context of String, the same problem arsies --I can't count on a string
being iterated over by chars. I'll have to create an overriding #freq
method specifically for string --and the other methods that work the
same way.

T.

Yes, but why not use:

probability_hash =3D "this is my string".split(//).freq

Each object that could support to_ary can also support each, while not
every object that has an each makes sense with to_ary. So it would
seem better to me to put this functionality into the enumerable and
supply an enumerable.
=20
I got the impression Tom was just trying to make things as transparent
as possible (though given the String#to_ary infinite loop, transparent
as possible may not be very transparent).
=20
Then you can even do

module CharString
def each(&block)
self.split(//).each &block
end
end

module WordString
def each(&block)
self.split(/\s+/).each &block
end
end

charstring =3D "some chars"
class << charstring
include CharString
end
=20
I thought you disliked the "<<" notation :)

Yes I do :-(=20
But I like the concept!
=20
charstring.extend(CharString)
=20

Thank you for pointing this out, there are always so many ways to do
it and I too often stick with what I find before I find the best.
=20
David
=20


best regards,

Brian


--=20
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/
 

Top