FEATURE SUGGESTION: Accept default value for to_f and to_i

M

Mr Magpie

I suggest that to_i() and to_f() have an optional parameter added with
the default value of 0 (for backwards compatibility).

This would allow code like

if astring.to_f(nil)
# valid, so use it
else
# not a valid float, nil was returned, so handle error
end

Currently, from the output of these functions, a conversion error is
indistiguishable from a valid input of 0.

It would allow even more succinct code when, say, reading in a
configuration value :
delay = configuration['delay'].to_f(DEFAULT_DELAY)

I find this pattern of providing a default that is returned in the event
of an error (instead of throwing an exception or returning nil or 0)
allows for simple, safe and succinct code.

Another example is xpath_value(aRootNode,aXPath,aDefault=nil) which
returns aDefault if there is any problem returning the value selected by
aXPath (it often doesn't matter what the problem is).
 
M

Mr Magpie

Yukihiro said:
I think Float(astring) that raises an exception for invalid string can
do the work for you.

|It would allow even more succinct code when, say, reading in a
|configuration value :
|delay = configuration['delay'].to_f(DEFAULT_DELAY)

Try

delay = Float(configuration['delay']) rescue DEFAULT_DELAY

matz.

Thankyou for replying matz. I very much like the existing to_i and to_f
methods approach of never throwing exceptions as they allow things like
single line method chaining without fear of exceptions in the majority
of cases where 0 is the desired default.

I'm merely suggesting that the error default value be customisable to
distinguish bad input from valid input
eg. "0".to_i and "dds".to_i both return 0 and sometimes thats fine, but
other times we want to know whether the input was a valid integer or
not.

I believe exceptions are a performance drag, and these little functions
are often called thousands of times in a loop for processing input, so
I'd prefer to avoid a method that potentially causes exceptions. I think
they would be among the more performance critical of all Ruby methods,
which is why I'm suggesting these changes for the C based core rather
than just making my own monkey patch.

Anyway, thanks for Ruby and replying to me.
 
A

ara.t.howard

I believe exceptions are a performance drag, and these little
functions
are often called thousands of times in a loop for processing input, so
I'd prefer to avoid a method that potentially causes exceptions. I
think



but, unless you use #to_i exactly as it is now that is still the
case? with your suggestion this code

'forty-two'.to_i(nil).abs

raises a NameError

so i think the point is that you either have

'forty-two'.to_i.abs # let zero be the default

Integer 'forty-two' # need to handle exceptions

and nothing in between because a to_i with a default that is non-
numeric doesn't provide anything over the built-in Integer or Float.

fwiw i use this alot:

#
# convert a string to an integer of any base
#
def strtod(s, opts = {})
base = getopt 'base', opts, 10
case base
when 2
s = "0b#{ s }" unless s =~ %r/^0b\d/
when 8
s = "0#{ s }" unless s =~ %r/^0\d/
when 16
s = "0x#{ s }" unless s =~ %r/^0x\d/
end
Integer s
end
#
# convert a string to an integer
#
def atoi(s, opts = {})
strtod("#{ s }".gsub(%r/^0+/,''), 'base' => 10)
end



a @ http://codeforpeople.com/
 
M

Mr Magpie

ara.t.howard said:
but, unless you use #to_i exactly as it is now that is still the
case? with your suggestion this code

'forty-two'.to_i(nil).abs

raises a NameError

Of course if you were chaining class-specific methods like abs you would
have to use a default supporting that method.
so i think the point is that you either have

'forty-two'.to_i.abs # let zero be the default

Integer 'forty-two' # need to handle exceptions

and nothing in between because a to_i with a default that is non-
numeric doesn't provide anything over the built-in Integer or Float.

Yes, that is the choice at present.

The benefits my suggestion provides are :
1) allows an application specific default (of any type) to be supplied,
reducing code required.
2) allows bad input to be unambigously detected, which (can distinguish
"fds".to_i from "0".to_i)
3) because the result of to_i always evaluates to true, you can't do
num.to_i ? 'valid int' : 'invalid int'
but with my sugestion you could do
num.to_i(false) ? 'valid int' : 'invalid int'
4) would be a miniscule change to the existing optimised C unlike some
monkey patch I could do
5) would avoid performance-sapping exceptions
6) would avoid expensive regular expressions
7) as a default parameter, wouldn't affect existing code.

I don't think any other approach satisfies all of the above.

Thanks for your reply and the code examples.

Regards,

magpie
 
M

MonkeeSage

The benefits my suggestion provides are :
1) allows an application specific default (of any type) to be supplied,
reducing code required.
2) allows bad input to be unambigously detected, which (can distinguish
"fds".to_i from "0".to_i)
3) because the result of to_i always evaluates to true, you can't do
num.to_i ? 'valid int' : 'invalid int'
but with my sugestion you could do
num.to_i(false) ? 'valid int' : 'invalid int'
4) would be a miniscule change to the existing optimised C unlike some
monkey patch I could do
5) would avoid performance-sapping exceptions
6) would avoid expensive regular expressions
7) as a default parameter, wouldn't affect existing code.

I could be dense; well, I probably am. No, I'm sure about it. ;) But
let me give it a go anyhow...

All of the functionality you mention can be had now, it's just that it
wouldn't be as fast. So most of the points are moot. Only 5 & 6
remain. Also, 7 isn't exactly true, since it would require a extra
compare operation in the back-end to see if a default was given and
return that, or else return 0. But that is probably negligible.

Regarding 5 & 6. I benchmarked some code against the default to_i/_f.
Here are the code and the results:

$ cat test.rb && ./test.rb
#!/usr/bin/env ruby

class String
def to_i2(default=0)
Integer(self) rescue default
end
def to_f2(default=0)
Float(self) rescue default
end
def num?
self =~ /^[-+.0-9]+$/
end
def to_i3(default=0)
self.num? ? self.to_i : default
end
def to_f3(default=0)
self.num? ? self.to_f : default
end
end

require 'benchmark'

s1 = "10"
s2 = "10a"
s3 = "1.0"
s4 = "1.0a"

n = 1000000
Benchmark.bm { |x|
x.report("to_i valid ") { n.times { s1.to_i } }
x.report("to_i invalid ") { n.times { s2.to_i } }
x.report("to_f valid ") { n.times { s3.to_f } }
x.report("to_f invalid ") { n.times { s4.to_f } }
x.report("to_i2 valid ") { n.times { s1.to_i2 } }
x.report("to_i2 invalid") { n.times { s2.to_i2 } }
x.report("to_f2 valid ") { n.times { s3.to_f2 } }
x.report("to_f2 invalid") { n.times { s4.to_f2 } }
x.report("to_i3 valid ") { n.times { s1.to_i3 } }
x.report("to_i3 invalid") { n.times { s2.to_i3 } }
x.report("to_f3 valid ") { n.times { s3.to_f3 } }
x.report("to_f3 invalid") { n.times { s4.to_f3 } }
}


user system total real
to_i valid 1.160000 0.110000 1.270000 ( 1.307932)
to_i invalid 1.180000 0.100000 1.280000 ( 1.318455)
to_f valid 1.570000 0.190000 1.760000 ( 1.788322)
to_f invalid 1.980000 0.090000 2.070000 ( 2.105102)
to_i2 valid 2.310000 0.350000 2.660000 ( 2.703812)
to_i2 invalid 39.640000 1.240000 40.880000 ( 42.264511)
to_f2 valid 2.880000 0.310000 3.190000 ( 3.377140)
to_f2 invalid 40.680000 1.100000 41.780000 ( 43.211592)
to_i3 valid 6.470000 0.390000 6.860000 ( 6.975072)
to_i3 invalid 3.400000 0.350000 3.750000 ( 3.959219)
to_f3 valid 7.250000 0.320000 7.570000 ( 7.605764)
to_f3 invalid 3.600000 0.380000 3.980000 ( 4.005525)


As you can see, you were correct about point 5 when it is the
exceptional case; however, regarding point 6, performance is close to
within an order of magnitude of the built-in versions of to_i/_f.
That's not too awful.

If I may make three counter-points against your suggestion:

1.) It is wierd and completely unintuitive for to_i to return anything
*other than integer*! Maybe it's just me, but that would be like
calling to_a and getting back a String. Holy return types Batman, what
gives?

2.) Would a non-zero default really be used enough (or in cases where
the speed of using something like the code I listed above with regexps
is not fast enougg) to warrant inclusion? Do you have any real world
examples that are not just corner-cases?

3.) (Like Ara said...) If you're worried about the performance of
exceptions, how helpful is it to do something like: "10a".to_i(nil) %
2? That's either going to terminate with a NoMethodError, or you'll
have to rescue it (eating just as much cycles).

Regards,
Jordan
 
R

Robert Dober

Hi,

In message "Re: FEATURE SUGGESTION: Accept default value for to_f and to_i"

|3) because the result of to_i always evaluates to true, you can't do
| num.to_i ? 'valid int' : 'invalid int'
|but with my sugestion you could do
| num.to_i(false) ? 'valid int' : 'invalid int'

Argument for String#to_i is already taken for base specification, i.e.

"abcd".to_i(16) # => 43981

matz.
Not wanting to enter into the discussion I believe that OP's idea is a
sound one, it might however be better to allow default behavior be
expressed by a block.

def to_i &blk
return conversion if valid
return blk.call if blk
##" The tricky part here
nil or 0, well 0 for backward compatibility
end

Now I would use this very often

s.to_i do raise MyError, "What a numba??" end

better to raise MyError than what #Integer(str) raises, right ;).

cheers
R.
 
R

Robert Klemme

2007/11/27 said:
The benefits my suggestion provides are :
1) allows an application specific default (of any type) to be supplied,
reducing code required.
2) allows bad input to be unambigously detected, which (can distinguish
"fds".to_i from "0".to_i)
3) because the result of to_i always evaluates to true, you can't do
num.to_i ? 'valid int' : 'invalid int'
but with my sugestion you could do
num.to_i(false) ? 'valid int' : 'invalid int'
4) would be a miniscule change to the existing optimised C unlike some
monkey patch I could do
5) would avoid performance-sapping exceptions
6) would avoid expensive regular expressions
7) as a default parameter, wouldn't affect existing code.

I could be dense; well, I probably am. No, I'm sure about it. ;) But
let me give it a go anyhow...

All of the functionality you mention can be had now, it's just that it
wouldn't be as fast. So most of the points are moot. Only 5 & 6
remain. Also, 7 isn't exactly true, since it would require a extra
compare operation in the back-end to see if a default was given and
return that, or else return 0. But that is probably negligible.

Regarding 5 & 6. I benchmarked some code against the default to_i/_f.
Here are the code and the results:

$ cat test.rb && ./test.rb
#!/usr/bin/env ruby

class String
def to_i2(default=0)
Integer(self) rescue default
end
def to_f2(default=0)
Float(self) rescue default
end
def num?
self =~ /^[-+.0-9]+$/
end
def to_i3(default=0)
self.num? ? self.to_i : default
end
def to_f3(default=0)
self.num? ? self.to_f : default
end
end

require 'benchmark'

s1 = "10"
s2 = "10a"
s3 = "1.0"
s4 = "1.0a"

n = 1000000
Benchmark.bm { |x|
x.report("to_i valid ") { n.times { s1.to_i } }
x.report("to_i invalid ") { n.times { s2.to_i } }
x.report("to_f valid ") { n.times { s3.to_f } }
x.report("to_f invalid ") { n.times { s4.to_f } }
x.report("to_i2 valid ") { n.times { s1.to_i2 } }
x.report("to_i2 invalid") { n.times { s2.to_i2 } }
x.report("to_f2 valid ") { n.times { s3.to_f2 } }
x.report("to_f2 invalid") { n.times { s4.to_f2 } }
x.report("to_i3 valid ") { n.times { s1.to_i3 } }
x.report("to_i3 invalid") { n.times { s2.to_i3 } }
x.report("to_f3 valid ") { n.times { s3.to_f3 } }
x.report("to_f3 invalid") { n.times { s4.to_f3 } }
}


user system total real
to_i valid 1.160000 0.110000 1.270000 ( 1.307932)
to_i invalid 1.180000 0.100000 1.280000 ( 1.318455)
to_f valid 1.570000 0.190000 1.760000 ( 1.788322)
to_f invalid 1.980000 0.090000 2.070000 ( 2.105102)
to_i2 valid 2.310000 0.350000 2.660000 ( 2.703812)
to_i2 invalid 39.640000 1.240000 40.880000 ( 42.264511)
to_f2 valid 2.880000 0.310000 3.190000 ( 3.377140)
to_f2 invalid 40.680000 1.100000 41.780000 ( 43.211592)
to_i3 valid 6.470000 0.390000 6.860000 ( 6.975072)
to_i3 invalid 3.400000 0.350000 3.750000 ( 3.959219)
to_f3 valid 7.250000 0.320000 7.570000 ( 7.605764)
to_f3 invalid 3.600000 0.380000 3.980000 ( 4.005525)


As you can see, you were correct about point 5 when it is the
exceptional case; however, regarding point 6, performance is close to
within an order of magnitude of the built-in versions of to_i/_f.
That's not too awful.

If I may make three counter-points against your suggestion:

1.) It is wierd and completely unintuitive for to_i to return anything
*other than integer*! Maybe it's just me, but that would be like
calling to_a and getting back a String. Holy return types Batman, what
gives?

2.) Would a non-zero default really be used enough (or in cases where
the speed of using something like the code I listed above with regexps
is not fast enougg) to warrant inclusion? Do you have any real world
examples that are not just corner-cases?

3.) (Like Ara said...) If you're worried about the performance of
exceptions, how helpful is it to do something like: "10a".to_i(nil) %
2? That's either going to terminate with a NoMethodError, or you'll
have to rescue it (eating just as much cycles).

Another point you did not mention (as far as I can see): optimizing
the performance of the /exceptional/ case is likely to yield only
minor benefits if at all.

Kind regards

robert
 
T

Trans

I suggest that to_i() and to_f() have an optional parameter added with
the default value of 0 (for backwards compatibility).

This would allow code like

if astring.to_f(nil)
# valid, so use it
else
# not a valid float, nil was returned, so handle error
end

if (num = astring.to_f) == 0
# may or may not be valid
begin
num = Float(astring)
rescue
# not a valid float, nil was returned, so handle error
end
end

# valid num, so use it

You can wrap it in a "monkey patch" if you like.

T.
 
M

Mr Magpie

All of the functionality you mention can be had now, it's just that it
wouldn't be as fast. So most of the points are moot. Only 5 & 6
remain. Also, 7 isn't exactly true, since it would require a extra
compare operation in the back-end to see if a default was given and
return that, or else return 0. But that is probably negligible.

Wow, thanks for doing the numbers Jordan.

I know it can be done now, but such basic functionality is best done
fast and right ie. in C. There would be zillions of examples of tight
loops in frameworks, libraries and peoples applications out there that
does string to number conversions, eg. a SQL results to a Fixnum.

Some have said that performance is less of an issue in the exceptional
case, but just how exceptional bad input is depends on the application,
and shouldn't cause a 20x time difference. eg if 1 in 20 input values
are bad, the conversion takes twice as long.

If I may make three counter-points against your suggestion:

1.) It is wierd and completely unintuitive for to_i to return anything
*other than integer*! Maybe it's just me, but that would be like
calling to_a and getting back a String. Holy return types Batman, what
gives?

I get this, but it would only do so because "you asked for it". This
kind of thing isn't uncommon in Ruby though.
2.) Would a non-zero default really be used enough (or in cases where
the speed of using something like the code I listed above with regexps
is not fast enougg) to warrant inclusion? Do you have any real world
examples that are not just corner-cases?

If I was implementing Ruby I would lean towards nil as the default (0
would come a close second best in my mind). It would allow the 'or'
operators to be used for any defaults eg. (aString.to_i || 0) would
achieve a default of 0.

The most common example that comes to mind is when reading in
configuration where you are reading a value from a string source eg. xml
and if a value isn't provided you return a sensible default which isn't
normally 0.
3.) (Like Ara said...) If you're worried about the performance of
exceptions, how helpful is it to do something like: "10a".to_i(nil) %
2? That's either going to terminate with a NoMethodError, or you'll
have to rescue it (eating just as much cycles).

In that example, you asked for a nil default, and thats what you got.

matz reminds us that to_i already takes a base argument. I guess the
default value would have to be the second default argument - not so
pretty.

Robert suggests a block handler. I don't know what the performance
implications are of blocks, but I guess it would work, and obviously
allow more advanced handling. Most of the time however I would just
return a value, not do any logic.

<Suggestion>

Because of the existing base argument on to_i, and the need to keep such
basic methods simple and fast, and the 7 points I listed previously, I
propose the following :

as_i(default=nil) and as_f(default=nil) methods added to Fixnum, Float,
String
For Float.as_i, NaN, Infinity etc would return the default.

If I'm outnumbered on the default argument, then as_i and as_f could
simply be equivalent to to_i and to_f, just with a nil default. I would
then use (aString.as_i || DEFAULT_VALUE).

If enough people would use an optional block and its not a significant
performance drag, that could be added too.

</Suggestion>

Thanks again Jordan for the numbers,

magpie.
 
M

MonkeeSage

Wow, thanks for doing the numbers Jordan.

I know it can be done now, but such basic functionality is best done
fast and right ie. in C. There would be zillions of examples of tight
loops in frameworks, libraries and peoples applications out there that
does string to number conversions, eg. a SQL results to a Fixnum.

Some have said that performance is less of an issue in the exceptional
case, but just how exceptional bad input is depends on the application,
and shouldn't cause a 20x time difference. eg if 1 in 20 input values
are bad, the conversion takes twice as long.

<very useful numbers deleted, see previous post>





I get this, but it would only do so because "you asked for it". This
kind of thing isn't uncommon in Ruby though.




If I was implementing Ruby I would lean towards nil as the default (0
would come a close second best in my mind). It would allow the 'or'
operators to be used for any defaults eg. (aString.to_i || 0) would
achieve a default of 0.

The most common example that comes to mind is when reading in
configuration where you are reading a value from a string source eg. xml
and if a value isn't provided you return a sensible default which isn't
normally 0.




In that example, you asked for a nil default, and thats what you got.

matz reminds us that to_i already takes a base argument. I guess the
default value would have to be the second default argument - not so
pretty.

Robert suggests a block handler. I don't know what the performance
implications are of blocks, but I guess it would work, and obviously
allow more advanced handling. Most of the time however I would just
return a value, not do any logic.

<Suggestion>

Because of the existing base argument on to_i, and the need to keep such
basic methods simple and fast, and the 7 points I listed previously, I
propose the following :

as_i(default=nil) and as_f(default=nil) methods added to Fixnum, Float,
String
For Float.as_i, NaN, Infinity etc would return the default.

If I'm outnumbered on the default argument, then as_i and as_f could
simply be equivalent to to_i and to_f, just with a nil default. I would
then use (aString.as_i || DEFAULT_VALUE).

If enough people would use an optional block and its not a significant
performance drag, that could be added too.

</Suggestion>

Thanks again Jordan for the numbers,

magpie.

NP. I was curious about the performance penalty myself. Might I
suggest, if you think it is truly worthy, that you write a small ruby
extension in C to add as_i/_f to class String. You could get the
behavior and speed you desire, and still be compatible with mri, and
if enough people found it useful it could find its way into the
standard lib.

Regards,
Jordan
 
D

Daniel Sheppard

=20
6) would avoid expensive regular expressions

First, you'd have to conjure up some expensive regular expressions,
you'll find that regular expressions are much more efficient that you
might think.

Pointless micro-benchmark time. String input of 'ab', 1 million
iterations.

user system
total real
string.to_i 0.625000 0.000000
0.625000 ( 0.657000)
Integer(string) rescue 57 32.422000 0.782000
33.204000 ( 34.844000)
/^-?\d+$/=3D=3D=3Dstring ? string.to_i : 57 1.125000 =
0.000000
1.125000 ( 1.218000)
string.to_f 0.718000 0.000000
0.718000 ( 0.843000)
Float(string) rescue 57 32.281000 0.765000
33.046000 ( 34.750000)
/^-?\d+(?=3D\.\d+)?$/=3D=3D=3Dstring ? string.to_f : 57 0.672000 =
0.000000
0.672000 ( 0.734000)


The only advantage to your proposal is to optimise an exceptional case.
If it's not an exceptional case, regex validation gives you almost as
much speed as you'd get with raw C.

Once you've written an application with this functionality, benchmarked
it, and found that that validation of string data as numeric is your
problem, you can go off and write a C extension to do what you want.
Raising this discussion before that point is just wasting your time.

Dan.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,056
Latest member
GlycogenSupporthealth

Latest Threads

Top