Inverse scanf: finding format specifers of existing fields

B

Bil Kleb

Hi,

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

'0.4577' -> '0.7728'

or

'-2.345e-02' -> ' 1.232e-03'

Using scanf for the output seems to be the solution to
the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?

Thanks,
 
X

Xavier Noria

Hi,

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

'0.4577' -> '0.7728'

or

'-2.345e-02' -> ' 1.232e-03'

Are there many different formats?

-- fxn
 
R

Robert Klemme

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

'0.4577' -> '0.7728'

or

'-2.345e-02' -> ' 1.232e-03'

Using scanf for the output seems to be the solution to
the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?

If there is a fixed number of formats you can probably use a cascade of
RX matches. Otherwise it probably becomes a bit more complex like
matching sequences of digits and measuring their lengths.
md = %r{^(\d+)\.(\d+)?$}.match('0.4577')
=> # said:
pa="%#{md[0].size}.#{md[2].size}f" => "%6.4f"
pa % 0.4577111
=> "0.4577"

HTH

robert
 
D

David A. Black

Hi --

Hi,

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

'0.4577' -> '0.7728'

or

'-2.345e-02' -> ' 1.232e-03'

Using scanf for the output seems to be the solution to
the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?

You could probably just do a gsub, like this:

require 'scanf'

re = /-?\d+\.\d+(e-\d+)?/

a = "'0.4577' -> '0.7728'"
b = "'-2.345e-02' -> ' 1.232e-03'"

as = a.gsub(re, "%f")
bs = a.gsub(re, "%f")

p a.scanf(as)
p b.scanf(bs)

Output:

[0.4577, 0.7728]
[-0.02345, 0.001232]


David

--
Upcoming Rails training by Ruby Power and Light:
Four-day Intro to Intermediate
May 8-11, 2007
Edison, NJ
http://www.rubypal.com/events/05082007
 
B

Bil Kleb

Xavier said:
Are there many different formats?

Yes, in that the field lengths are different.

No, in that the there are really only three "types":
integers, vanilla floats, and exponentials.

Regards,
 
B

Bil Kleb

David said:
Hi --
Hi.

Output:

[0.4577, 0.7728]
[-0.02345, 0.001232]

The second output indicates that I failed to express
my predicament clearly, as the numbers are no longer
in exponential format?

A brief re-cast:

The original file has numbers of the form

5 0.4577 -2.345e-02

Something reads the numbers and spits out new numbers,
but in exactly the same format as the original file, e.g.,

8 0.7728 1.232e-03

I.e., I can't write the last number out as 0.001232 --
it has to be in exponential format with the same field
lengths.

Regards,
 
X

Xavier Noria

Yes, in that the field lengths are different.

No, in that the there are really only three "types":
integers, vanilla floats, and exponentials.

Then I think you could base the solution on String#index/regexps
depending on the existence of "e" and ".", since we can assume
numbers are well-formed. The idea would be:

if none
%d
elsif "e"
%e
else
%f with computed widths
end

-- fxn
 
B

Bil Kleb

Robert said:
If there is a fixed number of formats you can probably use a cascade of
RX matches.

Unfortunately not.
Otherwise it probably becomes a bit more complex like
matching sequences of digits and measuring their lengths.
md = %r{^(\d+)\.(\d+)?$}.match('0.4577')
=> # said:
pa="%#{md[0].size}.#{md[2].size}f"

Hmmm, this looks like a viable path.

I hadn't thought of using MatchData groups, but as you say,
it may get ugly fast... I'm thinking of edge cases like
dealing with the leading space if positive numbers become
negative, or accommodating the number of digits needed for
exponentials or integers if the new number exceeds the
capacity of the existing format.

Thanks,
 
B

Bil Kleb

Xavier said:
Then I think you could base the solution on String#index/regexps
depending on the existence of "e" and ".", since we can assume numbers
are well-formed. The idea would be:

if none
%d
elsif "e"
%e
else
%f with computed widths
end

This, coupled with Robert's computed field lengths
is beginning to look tractable...

Thanks,
 
R

Robert Klemme

Robert said:
If there is a fixed number of formats you can probably use a cascade
of RX matches.

Unfortunately not.
Otherwise it probably becomes a bit more complex like matching
sequences of digits and measuring their lengths.
md = %r{^(\d+)\.(\d+)?$}.match('0.4577')
=> # said:
pa="%#{md[0].size}.#{md[2].size}f"

Hmmm, this looks like a viable path.

I hadn't thought of using MatchData groups, but as you say,
it may get ugly fast... I'm thinking of edge cases like
dealing with the leading space if positive numbers become
negative, or accommodating the number of digits needed for
exponentials or integers if the new number exceeds the
capacity of the existing format.

For floating point numbers you might even get away with a single regexp
if that is crafted appropriately and group values are evaluated accordingly.

Kind regards

robert
 
R

Rick DeNatale

Hi,

I have files full of numbers that I need to twiddle,
but the format of the numbers cannot change[1], e.g.,

'0.4577' -> '0.7728'

or

'-2.345e-02' -> ' 1.232e-03'

Using scanf for the output seems to be the solution to
the second half of the problem, but how does one derive
the format specifier string of the input fields, which vary?
Bill,

How's this for a start? I wrote it leaning towards clarity vs. conciseness.

rick@frodo:/public/rubyscripts$ cat number_format.rb
class String
def to_number_format
m = match(%r{^([ ]*)([+-]?)(.*)$})
leading_blanks, sign, rest = m[1], m[2], m[3]
plus_flag = sign == '+' ? sign : ''
case rest
when %r{^([\d]\.([\d]+)([eE])[+-][\d]+)(.*)$}
# exponentiated float
entirety, frac_part, e_or_E, exponent, suffix = $1, $2, $3, $4, $5
entirety = leading_blanks << entirety
"%#{entirety.length}.#{frac_part.length}#{e_or_E}#{suffix}"
when %r{^([\d]+\.([\d]*))(.*)$}
# simple float
entirety, frac_part, suffix = $1, $2, $3
zero = frac_part.match(/00$/) ? '0' : ''
"%#{zero}#{entirety.length}.#{frac_part.length}f#{suffix}"
when %r{^(0[\d]+)([^e.]*)$}
# zero padded integer
digits, suffix = $1, $2
"#{leading_blanks}%#{plus_flag}0#{digits.length}d#{$suffix}"
when %r{^([\d]+)([^e.]*)$}
# whitespace padded integer
digits, suffix = $1, $2
digits = leading_blanks << digits
"%#{digits.length}d#{suffix}"
else
nil
end
end
end

x = '0.4577'
puts x
puts x.to_number_format
puts x.to_number_format % x.to_f
puts(x.to_number_format % 0.7728)
puts (x.to_number_format % x.to_f) == x
puts

x = '-2.345e-02'
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_f)
puts(x.to_number_format % 1.232e-03)
puts (x.to_number_format % x.to_f) == x
puts


x = '12345'
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_i)
puts(x.to_number_format % 765)
puts (x.to_number_format % x.to_f) == x
puts

x = ' 00012345'
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_i)
puts(x.to_number_format % 765)
puts (x.to_number_format % x.to_i) == x
puts

x = ' 12345'
puts x
puts x.to_number_format
puts(x.to_number_format % x.to_i)
puts(x.to_number_format % 765)
puts (x.to_number_format % x.to_i) == x

rick@frodo:/public/rubyscripts$ ruby number_format.rb
0.4577
%6.4f
0.4577
0.7728
true

-2.345e-02
%9.3e
-2.345e-02
1.232e-03
true

12345
%5d
12345
765
true

00012345
%08d
00012345
00000765
true

12345
%7d
12345
765
true
 
B

Bil Kleb

Rick said:
How's this for a start?

Excellent! Thanks.

All but my last test passed:

require 'test/unit'
require 'number_format'
class TestNumberFormat < Test::Unit::TestCase
def test_some_floats
assert_equal( '%3.1f', '8.3'.to_number_format )
assert_equal( '%05.3f', '0.500'.to_number_format )
assert_equal( '%8.7f', '.0001170'.to_number_format )
assert_equal( '%7.1f', '14000.0'.to_number_format )
assert_equal( '%9.3E', '4.480E+09'.to_number_format )
assert_equal( '%6.1e', '3.2e-5'.to_number_format )
assert_equal( '%6.1f', '-254.2'.to_number_format )
end
end

1) Failure:
test_some_floats(TestNumberFormat) [-:11]:
<"%6.1f"> expected but was
<"%5.1f">.

Note: made the simple float leading digit match 0
or more to get the third test to pass.

Puzzling the minus sign part now...

Thanks again,
 
R

Rick DeNatale

Excellent! Thanks.

All but my last test passed:

require 'test/unit'
require 'number_format'
class TestNumberFormat < Test::Unit::TestCase
def test_some_floats
assert_equal( '%3.1f', '8.3'.to_number_format )
assert_equal( '%05.3f', '0.500'.to_number_format )
assert_equal( '%8.7f', '.0001170'.to_number_format )

Not sure how this one worked, it fails for me. As a matter of fact:
irb(main):001:0> '%8.7f' % 0.0001170
=> "0.0001170"

And I haven't been able to find an sprintf format string which
supresses a leading zero on a float.
assert_equal( '%7.1f', '14000.0'.to_number_format )
assert_equal( '%9.3E', '4.480E+09'.to_number_format )
assert_equal( '%6.1e', '3.2e-5'.to_number_format )
assert_equal( '%6.1f', '-254.2'.to_number_format )
end
end

1) Failure:
test_some_floats(TestNumberFormat) [-:11]:
<"%6.1f"> expected but was
<"%5.1f">.

Note: made the simple float leading digit match 0
or more to get the third test to pass.

Puzzling the minus sign part now...

I see that you figured this out.

Another thing to test is that the values actually round trip. Here's my test:

rick@frodo:/public/rubyscripts$ cat test_number_format.rb
require 'test/unit'
require 'number_format'
class TestNumberFormat < Test::Unit::TestCase
def test_some_floats
assert_equal( '%3.1f', '8.3'.to_number_format )
assert_nf('8.3')
assert_equal( '%05.3f', '0.500'.to_number_format )
assert_nf('0.500')
assert_equal( '%8.7f', '.0001170'.to_number_format )
assert_nf('.0001170')
assert_equal( '%7.1f', '14000.0'.to_number_format )
assert_nf('14000.0')
assert_equal( '%9.3E', '4.480E+09'.to_number_format )
assert_nf('4.480E+09')
assert_equal( '%6.1e', '3.2e-5'.to_number_format )
assert_nf('3.2e-5')
assert_equal( '%6.1f', '-254.2'.to_number_format )
assert_nf('-254.2')
end

private
def assert_nf(str)
assert_equal(str, str.to_number_format % eval(str))
end
end
 
B

Bil Kleb

Rick said:
Not sure how this one worked, it fails for me. As a matter of fact:
irb(main):001:0> '%8.7f' % 0.0001170
=> "0.0001170"

And I haven't been able to find an sprintf format string which
supresses a leading zero on a float.

You're correct; as you wrote, I wasn't testing round-trip.

Thanks,
 
B

Bil Kleb

Rick said:
By the way Bill, seeing who you seem to work for, I'd like to dedicate
whatever help I've given to you to the memory of Wally Schirra!

You helped me learn more Ruby; always a pure joy. Thank you.

I've since decided that I'm going to require the users
specify the format instead of trying to back it out --
there are cases for which you just can't back out the
correct format. Besides, the need is infrequent, and
I have no sympathy for code that employs formatted reads...
Are you a turtle? <G>

You bet your sweet ass I am! ;)

Regards,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top