FEATURE SUGGESTION: Accept default value for to_f and to_i

Mr Magpie · Nov 27, 2007

I suggest that to_i() and to_f() have an optional parameter added with
the default value of 0 (for backwards compatibility).

This would allow code like

if astring.to_f(nil)
# valid, so use it
else
# not a valid float, nil was returned, so handle error
end

Currently, from the output of these functions, a conversion error is
indistiguishable from a valid input of 0.

It would allow even more succinct code when, say, reading in a
configuration value :
delay = configuration['delay'].to_f(DEFAULT_DELAY)

I find this pattern of providing a default that is returned in the event
of an error (instead of throwing an exception or returning nil or 0)
allows for simple, safe and succinct code.

Another example is xpath_value(aRootNode,aXPath,aDefault=nil) which
returns aDefault if there is any problem returning the value selected by
aXPath (it often doesn't matter what the problem is).

Mr Magpie · Nov 27, 2007

Yukihiro said:
I think Float(astring) that raises an exception for invalid string can
do the work for you.

|It would allow even more succinct code when, say, reading in a
|configuration value :
|delay = configuration['delay'].to_f(DEFAULT_DELAY)

Try

delay = Float(configuration['delay']) rescue DEFAULT_DELAY

matz.

Thankyou for replying matz. I very much like the existing to_i and to_f
methods approach of never throwing exceptions as they allow things like
single line method chaining without fear of exceptions in the majority
of cases where 0 is the desired default.

I'm merely suggesting that the error default value be customisable to
distinguish bad input from valid input
eg. "0".to_i and "dds".to_i both return 0 and sometimes thats fine, but
other times we want to know whether the input was a valid integer or
not.

I believe exceptions are a performance drag, and these little functions
are often called thousands of times in a loop for processing input, so
I'd prefer to avoid a method that potentially causes exceptions. I think
they would be among the more performance critical of all Ruby methods,
which is why I'm suggesting these changes for the C based core rather
than just making my own monkey patch.

Anyway, thanks for Ruby and replying to me.

ara.t.howard · Nov 27, 2007

I believe exceptions are a performance drag, and these little
functions
are often called thousands of times in a loop for processing input, so
I'd prefer to avoid a method that potentially causes exceptions. I
think

but, unless you use #to_i exactly as it is now that is still the
case? with your suggestion this code

'forty-two'.to_i(nil).abs

raises a NameError

so i think the point is that you either have

'forty-two'.to_i.abs # let zero be the default

Integer 'forty-two' # need to handle exceptions

and nothing in between because a to_i with a default that is non-
numeric doesn't provide anything over the built-in Integer or Float.

fwiw i use this alot:

#
# convert a string to an integer of any base
#
def strtod(s, opts = {})
base = getopt 'base', opts, 10
case base
when 2
s = "0b#{ s }" unless s =~ %r/^0b\d/
when 8
s = "0#{ s }" unless s =~ %r/^0\d/
when 16
s = "0x#{ s }" unless s =~ %r/^0x\d/
end
Integer s
end
#
# convert a string to an integer
#
def atoi(s, opts = {})
strtod("#{ s }".gsub(%r/^0+/,''), 'base' => 10)
end

a @ http://codeforpeople.com/

MonkeeSage · Nov 27, 2007

base = getopt 'base', opts, 10

Why not opts.fetch('base', 10)? Does getopt do something fancy?

Regards,
Jordan

Mr Magpie · Nov 27, 2007

ara.t.howard said:
but, unless you use #to_i exactly as it is now that is still the
case? with your suggestion this code

'forty-two'.to_i(nil).abs

raises a NameError

Of course if you were chaining class-specific methods like abs you would
have to use a default supporting that method.

so i think the point is that you either have

'forty-two'.to_i.abs # let zero be the default

Integer 'forty-two' # need to handle exceptions

and nothing in between because a to_i with a default that is non-
numeric doesn't provide anything over the built-in Integer or Float.

Yes, that is the choice at present.

The benefits my suggestion provides are :
1) allows an application specific default (of any type) to be supplied,
reducing code required.
2) allows bad input to be unambigously detected, which (can distinguish
"fds".to_i from "0".to_i)
3) because the result of to_i always evaluates to true, you can't do
num.to_i ? 'valid int' : 'invalid int'
but with my sugestion you could do
num.to_i(false) ? 'valid int' : 'invalid int'
4) would be a miniscule change to the existing optimised C unlike some
monkey patch I could do
5) would avoid performance-sapping exceptions
6) would avoid expensive regular expressions
7) as a default parameter, wouldn't affect existing code.

I don't think any other approach satisfies all of the above.

Thanks for your reply and the code examples.

Regards,

magpie

MonkeeSage · Nov 27, 2007

The benefits my suggestion provides are :
1) allows an application specific default (of any type) to be supplied,
reducing code required.
2) allows bad input to be unambigously detected, which (can distinguish
"fds".to_i from "0".to_i)
3) because the result of to_i always evaluates to true, you can't do
num.to_i ? 'valid int' : 'invalid int'
but with my sugestion you could do
num.to_i(false) ? 'valid int' : 'invalid int'
4) would be a miniscule change to the existing optimised C unlike some
monkey patch I could do
5) would avoid performance-sapping exceptions
6) would avoid expensive regular expressions
7) as a default parameter, wouldn't affect existing code.

I could be dense; well, I probably am. No, I'm sure about it.

But
let me give it a go anyhow...

All of the functionality you mention can be had now, it's just that it
wouldn't be as fast. So most of the points are moot. Only 5 & 6
remain. Also, 7 isn't exactly true, since it would require a extra
compare operation in the back-end to see if a default was given and
return that, or else return 0. But that is probably negligible.

Regarding 5 & 6. I benchmarked some code against the default to_i/_f.
Here are the code and the results:

$ cat test.rb && ./test.rb
#!/usr/bin/env ruby

class String
def to_i2(default=0)
Integer(self) rescue default
end
def to_f2(default=0)
Float(self) rescue default
end
def num?
self =~ /^[-+.0-9]+$/
end
def to_i3(default=0)
self.num? ? self.to_i : default
end
def to_f3(default=0)
self.num? ? self.to_f : default
end
end

require 'benchmark'

s1 = "10"
s2 = "10a"
s3 = "1.0"
s4 = "1.0a"

n = 1000000
Benchmark.bm { |x|
x.report("to_i valid ") { n.times { s1.to_i } }
x.report("to_i invalid ") { n.times { s2.to_i } }
x.report("to_f valid ") { n.times { s3.to_f } }
x.report("to_f invalid ") { n.times { s4.to_f } }
x.report("to_i2 valid ") { n.times { s1.to_i2 } }
x.report("to_i2 invalid") { n.times { s2.to_i2 } }
x.report("to_f2 valid ") { n.times { s3.to_f2 } }
x.report("to_f2 invalid") { n.times { s4.to_f2 } }
x.report("to_i3 valid ") { n.times { s1.to_i3 } }
x.report("to_i3 invalid") { n.times { s2.to_i3 } }
x.report("to_f3 valid ") { n.times { s3.to_f3 } }
x.report("to_f3 invalid") { n.times { s4.to_f3 } }
}

user system total real
to_i valid 1.160000 0.110000 1.270000 ( 1.307932)
to_i invalid 1.180000 0.100000 1.280000 ( 1.318455)
to_f valid 1.570000 0.190000 1.760000 ( 1.788322)
to_f invalid 1.980000 0.090000 2.070000 ( 2.105102)
to_i2 valid 2.310000 0.350000 2.660000 ( 2.703812)
to_i2 invalid 39.640000 1.240000 40.880000 ( 42.264511)
to_f2 valid 2.880000 0.310000 3.190000 ( 3.377140)
to_f2 invalid 40.680000 1.100000 41.780000 ( 43.211592)
to_i3 valid 6.470000 0.390000 6.860000 ( 6.975072)
to_i3 invalid 3.400000 0.350000 3.750000 ( 3.959219)
to_f3 valid 7.250000 0.320000 7.570000 ( 7.605764)
to_f3 invalid 3.600000 0.380000 3.980000 ( 4.005525)

As you can see, you were correct about point 5 when it is the
exceptional case; however, regarding point 6, performance is close to
within an order of magnitude of the built-in versions of to_i/_f.
That's not too awful.

If I may make three counter-points against your suggestion:

1.) It is wierd and completely unintuitive for to_i to return anything
*other than integer*! Maybe it's just me, but that would be like
calling to_a and getting back a String. Holy return types Batman, what
gives?

2.) Would a non-zero default really be used enough (or in cases where
the speed of using something like the code I listed above with regexps
is not fast enougg) to warrant inclusion? Do you have any real world
examples that are not just corner-cases?

3.) (Like Ara said...) If you're worried about the performance of
exceptions, how helpful is it to do something like: "10a".to_i(nil) %
2? That's either going to terminate with a NoMethodError, or you'll
have to rescue it (eating just as much cycles).

Regards,
Jordan

Robert Dober · Nov 27, 2007

Hi,

In message "Re: FEATURE SUGGESTION: Accept default value for to_f and to_i"

|3) because the result of to_i always evaluates to true, you can't do
| num.to_i ? 'valid int' : 'invalid int'
|but with my sugestion you could do
| num.to_i(false) ? 'valid int' : 'invalid int'

Argument for String#to_i is already taken for base specification, i.e.

"abcd".to_i(16) # => 43981

matz.

Not wanting to enter into the discussion I believe that OP's idea is a
sound one, it might however be better to allow default behavior be
expressed by a block.

def to_i &blk
return conversion if valid
return blk.call if blk
##" The tricky part here
nil or 0, well 0 for backward compatibility
end

Now I would use this very often

s.to_i do raise MyError, "What a numba??" end

better to raise MyError than what #Integer(str) raises, right

.

cheers
R.

Robert Klemme · Nov 27, 2007

2007/11/27 said:
The benefits my suggestion provides are :
1) allows an application specific default (of any type) to be supplied,
reducing code required.
2) allows bad input to be unambigously detected, which (can distinguish
"fds".to_i from "0".to_i)
3) because the result of to_i always evaluates to true, you can't do
num.to_i ? 'valid int' : 'invalid int'
but with my sugestion you could do
num.to_i(false) ? 'valid int' : 'invalid int'
4) would be a miniscule change to the existing optimised C unlike some
monkey patch I could do
5) would avoid performance-sapping exceptions
6) would avoid expensive regular expressions
7) as a default parameter, wouldn't affect existing code.

Click to expand...

I could be dense; well, I probably am. No, I'm sure about it. But
let me give it a go anyhow...

All of the functionality you mention can be had now, it's just that it
wouldn't be as fast. So most of the points are moot. Only 5 & 6
remain. Also, 7 isn't exactly true, since it would require a extra
compare operation in the back-end to see if a default was given and
return that, or else return 0. But that is probably negligible.

Regarding 5 & 6. I benchmarked some code against the default to_i/_f.
Here are the code and the results:

$ cat test.rb && ./test.rb
#!/usr/bin/env ruby

class String
def to_i2(default=0)
Integer(self) rescue default
end
def to_f2(default=0)
Float(self) rescue default
end
def num?
self =~ /^[-+.0-9]+$/
end
def to_i3(default=0)
self.num? ? self.to_i : default
end
def to_f3(default=0)
self.num? ? self.to_f : default
end
end

require 'benchmark'

s1 = "10"
s2 = "10a"
s3 = "1.0"
s4 = "1.0a"

n = 1000000
Benchmark.bm { |x|
x.report("to_i valid ") { n.times { s1.to_i } }
x.report("to_i invalid ") { n.times { s2.to_i } }
x.report("to_f valid ") { n.times { s3.to_f } }
x.report("to_f invalid ") { n.times { s4.to_f } }
x.report("to_i2 valid ") { n.times { s1.to_i2 } }
x.report("to_i2 invalid") { n.times { s2.to_i2 } }
x.report("to_f2 valid ") { n.times { s3.to_f2 } }
x.report("to_f2 invalid") { n.times { s4.to_f2 } }
x.report("to_i3 valid ") { n.times { s1.to_i3 } }
x.report("to_i3 invalid") { n.times { s2.to_i3 } }
x.report("to_f3 valid ") { n.times { s3.to_f3 } }
x.report("to_f3 invalid") { n.times { s4.to_f3 } }
}

user system total real
to_i valid 1.160000 0.110000 1.270000 ( 1.307932)
to_i invalid 1.180000 0.100000 1.280000 ( 1.318455)
to_f valid 1.570000 0.190000 1.760000 ( 1.788322)
to_f invalid 1.980000 0.090000 2.070000 ( 2.105102)
to_i2 valid 2.310000 0.350000 2.660000 ( 2.703812)
to_i2 invalid 39.640000 1.240000 40.880000 ( 42.264511)
to_f2 valid 2.880000 0.310000 3.190000 ( 3.377140)
to_f2 invalid 40.680000 1.100000 41.780000 ( 43.211592)
to_i3 valid 6.470000 0.390000 6.860000 ( 6.975072)
to_i3 invalid 3.400000 0.350000 3.750000 ( 3.959219)
to_f3 valid 7.250000 0.320000 7.570000 ( 7.605764)
to_f3 invalid 3.600000 0.380000 3.980000 ( 4.005525)

As you can see, you were correct about point 5 when it is the
exceptional case; however, regarding point 6, performance is close to
within an order of magnitude of the built-in versions of to_i/_f.
That's not too awful.

If I may make three counter-points against your suggestion:

1.) It is wierd and completely unintuitive for to_i to return anything
*other than integer*! Maybe it's just me, but that would be like
calling to_a and getting back a String. Holy return types Batman, what
gives?

2.) Would a non-zero default really be used enough (or in cases where
the speed of using something like the code I listed above with regexps
is not fast enougg) to warrant inclusion? Do you have any real world
examples that are not just corner-cases?

3.) (Like Ara said...) If you're worried about the performance of
exceptions, how helpful is it to do something like: "10a".to_i(nil) %
2? That's either going to terminate with a NoMethodError, or you'll
have to rescue it (eating just as much cycles).

Another point you did not mention (as far as I can see): optimizing
the performance of the /exceptional/ case is likely to yield only
minor benefits if at all.

Kind regards

robert

Trans · Nov 27, 2007

I suggest that to_i() and to_f() have an optional parameter added with
the default value of 0 (for backwards compatibility).

This would allow code like

if astring.to_f(nil)
# valid, so use it
else
# not a valid float, nil was returned, so handle error
end

if (num = astring.to_f) == 0
# may or may not be valid
begin
num = Float(astring)
rescue
# not a valid float, nil was returned, so handle error
end
end

# valid num, so use it

You can wrap it in a "monkey patch" if you like.

T.

Mr Magpie · Nov 28, 2007

All of the functionality you mention can be had now, it's just that it

wouldn't be as fast. So most of the points are moot. Only 5 & 6
remain. Also, 7 isn't exactly true, since it would require a extra
compare operation in the back-end to see if a default was given and
return that, or else return 0. But that is probably negligible.

Wow, thanks for doing the numbers Jordan.

I know it can be done now, but such basic functionality is best done
fast and right ie. in C. There would be zillions of examples of tight
loops in frameworks, libraries and peoples applications out there that
does string to number conversions, eg. a SQL results to a Fixnum.

Some have said that performance is less of an issue in the exceptional
case, but just how exceptional bad input is depends on the application,
and shouldn't cause a 20x time difference. eg if 1 in 20 input values
are bad, the conversion takes twice as long.

If I may make three counter-points against your suggestion:

1.) It is wierd and completely unintuitive for to_i to return anything
*other than integer*! Maybe it's just me, but that would be like
calling to_a and getting back a String. Holy return types Batman, what
gives?

I get this, but it would only do so because "you asked for it". This
kind of thing isn't uncommon in Ruby though.

2.) Would a non-zero default really be used enough (or in cases where
the speed of using something like the code I listed above with regexps
is not fast enougg) to warrant inclusion? Do you have any real world
examples that are not just corner-cases?

If I was implementing Ruby I would lean towards nil as the default (0
would come a close second best in my mind). It would allow the 'or'
operators to be used for any defaults eg. (aString.to_i || 0) would
achieve a default of 0.

The most common example that comes to mind is when reading in
configuration where you are reading a value from a string source eg. xml
and if a value isn't provided you return a sensible default which isn't
normally 0.

3.) (Like Ara said...) If you're worried about the performance of
exceptions, how helpful is it to do something like: "10a".to_i(nil) %
2? That's either going to terminate with a NoMethodError, or you'll
have to rescue it (eating just as much cycles).

In that example, you asked for a nil default, and thats what you got.

matz reminds us that to_i already takes a base argument. I guess the
default value would have to be the second default argument - not so
pretty.

Robert suggests a block handler. I don't know what the performance
implications are of blocks, but I guess it would work, and obviously
allow more advanced handling. Most of the time however I would just
return a value, not do any logic.

<Suggestion>

Because of the existing base argument on to_i, and the need to keep such
basic methods simple and fast, and the 7 points I listed previously, I
propose the following :

as_i(default=nil) and as_f(default=nil) methods added to Fixnum, Float,
String
For Float.as_i, NaN, Infinity etc would return the default.

If I'm outnumbered on the default argument, then as_i and as_f could
simply be equivalent to to_i and to_f, just with a nil default. I would
then use (aString.as_i || DEFAULT_VALUE).

If enough people would use an optional block and its not a significant
performance drag, that could be added too.

</Suggestion>

Thanks again Jordan for the numbers,

magpie.

MonkeeSage · Nov 30, 2007

Wow, thanks for doing the numbers Jordan.

I know it can be done now, but such basic functionality is best done
fast and right ie. in C. There would be zillions of examples of tight
loops in frameworks, libraries and peoples applications out there that
does string to number conversions, eg. a SQL results to a Fixnum.

Some have said that performance is less of an issue in the exceptional
case, but just how exceptional bad input is depends on the application,
and shouldn't cause a 20x time difference. eg if 1 in 20 input values
are bad, the conversion takes twice as long.

<very useful numbers deleted, see previous post>

I get this, but it would only do so because "you asked for it". This
kind of thing isn't uncommon in Ruby though.

If I was implementing Ruby I would lean towards nil as the default (0
would come a close second best in my mind). It would allow the 'or'
operators to be used for any defaults eg. (aString.to_i || 0) would
achieve a default of 0.

The most common example that comes to mind is when reading in
configuration where you are reading a value from a string source eg. xml
and if a value isn't provided you return a sensible default which isn't
normally 0.

In that example, you asked for a nil default, and thats what you got.

matz reminds us that to_i already takes a base argument. I guess the
default value would have to be the second default argument - not so
pretty.

Robert suggests a block handler. I don't know what the performance
implications are of blocks, but I guess it would work, and obviously
allow more advanced handling. Most of the time however I would just
return a value, not do any logic.

<Suggestion>

Because of the existing base argument on to_i, and the need to keep such
basic methods simple and fast, and the 7 points I listed previously, I
propose the following :

as_i(default=nil) and as_f(default=nil) methods added to Fixnum, Float,
String
For Float.as_i, NaN, Infinity etc would return the default.

If I'm outnumbered on the default argument, then as_i and as_f could
simply be equivalent to to_i and to_f, just with a nil default. I would
then use (aString.as_i || DEFAULT_VALUE).

If enough people would use an optional block and its not a significant
performance drag, that could be added too.

</Suggestion>

Thanks again Jordan for the numbers,

magpie.

NP. I was curious about the performance penalty myself. Might I
suggest, if you think it is truly worthy, that you write a small ruby
extension in C to add as_i/_f to class String. You could get the
behavior and speed you desire, and still be compatible with mri, and
if enough people found it useful it could find its way into the
standard lib.

Regards,
Jordan

Daniel Sheppard · Nov 30, 2007

=20

6) would avoid expensive regular expressions

First, you'd have to conjure up some expensive regular expressions,
you'll find that regular expressions are much more efficient that you
might think.

Pointless micro-benchmark time. String input of 'ab', 1 million
iterations.

user system
total real
string.to_i 0.625000 0.000000
0.625000 ( 0.657000)
Integer(string) rescue 57 32.422000 0.782000
33.204000 ( 34.844000)
/^-?\d+$/=3D=3D=3Dstring ? string.to_i : 57 1.125000 =
0.000000
1.125000 ( 1.218000)
string.to_f 0.718000 0.000000
0.718000 ( 0.843000)
Float(string) rescue 57 32.281000 0.765000
33.046000 ( 34.750000)
/^-?\d+(?=3D\.\d+)?$/=3D=3D=3Dstring ? string.to_f : 57 0.672000 =
0.000000
0.672000 ( 0.734000)

The only advantage to your proposal is to optimise an exceptional case.
If it's not an exceptional case, regex validation gives you almost as
much speed as you'd get with raw C.

Once you've written an application with this functionality, benchmarked
it, and found that that validation of string data as numeric is your
problem, you can go off and write a C extension to do what you want.
Raising this discussion before that point is just wasting your time.

Dan.

to_i	10	Nov 10, 2003
Hash default value question	2	Sep 19, 2008
Nested hash with arrays for default value	11	Jan 25, 2010
Extend OpenStruct's functionality with explicit default value	8	Sep 5, 2008
VHDL Feature Suggestions	6	Nov 11, 2010
RCR 303: nil should accept missing methods and return nil	64	May 6, 2005
Default value for optional parameters unexpected behaviour?	6	Jun 26, 2011
Default value for an unconstrained port	11	May 16, 2010

FEATURE SUGGESTION: Accept default value for to_f and to_i

Mr Magpie

Mr Magpie

ara.t.howard

MonkeeSage

Mr Magpie

MonkeeSage

Robert Dober

Robert Klemme

Trans

Mr Magpie

MonkeeSage

Daniel Sheppard

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads