Inverse scanf: finding format specifers of existing fields

Discussion in 'Ruby' started by Bil Kleb, May 2, 2007.

  1. Bil Kleb

    Bil Kleb Guest

    Hi,

    I have files full of numbers that I need to twiddle,
    but the format of the numbers cannot change[1], e.g.,

    '0.4577' -> '0.7728'

    or

    '-2.345e-02' -> ' 1.232e-03'

    Using scanf for the output seems to be the solution to
    the second half of the problem, but how does one derive
    the format specifier string of the input fields, which vary?

    Thanks,
    --
    Bil Kleb
    http://fun3d.larc.nasa.gov

    [1] Legacy formatted-Fortran data files.
    Bil Kleb, May 2, 2007
    #1
    1. Advertising

  2. Bil Kleb

    Xavier Noria Guest

    On May 2, 2007, at 12:50 PM, Bil Kleb wrote:

    > Hi,
    >
    > I have files full of numbers that I need to twiddle,
    > but the format of the numbers cannot change[1], e.g.,
    >
    > '0.4577' -> '0.7728'
    >
    > or
    >
    > '-2.345e-02' -> ' 1.232e-03'


    Are there many different formats?

    -- fxn
    Xavier Noria, May 2, 2007
    #2
    1. Advertising

  3. On 02.05.2007 12:47, Bil Kleb wrote:
    > I have files full of numbers that I need to twiddle,
    > but the format of the numbers cannot change[1], e.g.,
    >
    > '0.4577' -> '0.7728'
    >
    > or
    >
    > '-2.345e-02' -> ' 1.232e-03'
    >
    > Using scanf for the output seems to be the solution to
    > the second half of the problem, but how does one derive
    > the format specifier string of the input fields, which vary?


    If there is a fixed number of formats you can probably use a cascade of
    RX matches. Otherwise it probably becomes a bit more complex like
    matching sequences of digits and measuring their lengths.

    >> md = %r{^(\d+)\.(\d+)?$}.match('0.4577')

    => #<MatchData:0x7ef61250>
    >> pa="%#{md[0].size}.#{md[2].size}f"

    => "%6.4f"
    >> pa % 0.4577111

    => "0.4577"

    HTH

    robert
    Robert Klemme, May 2, 2007
    #3
  4. Hi --

    On 5/2/07, Bil Kleb <> wrote:
    > Hi,
    >
    > I have files full of numbers that I need to twiddle,
    > but the format of the numbers cannot change[1], e.g.,
    >
    > '0.4577' -> '0.7728'
    >
    > or
    >
    > '-2.345e-02' -> ' 1.232e-03'
    >
    > Using scanf for the output seems to be the solution to
    > the second half of the problem, but how does one derive
    > the format specifier string of the input fields, which vary?


    You could probably just do a gsub, like this:

    require 'scanf'

    re = /-?\d+\.\d+(e-\d+)?/

    a = "'0.4577' -> '0.7728'"
    b = "'-2.345e-02' -> ' 1.232e-03'"

    as = a.gsub(re, "%f")
    bs = a.gsub(re, "%f")

    p a.scanf(as)
    p b.scanf(bs)

    Output:

    [0.4577, 0.7728]
    [-0.02345, 0.001232]


    David

    --
    Upcoming Rails training by Ruby Power and Light:
    Four-day Intro to Intermediate
    May 8-11, 2007
    Edison, NJ
    http://www.rubypal.com/events/05082007
    David A. Black, May 2, 2007
    #4
  5. Bil Kleb

    Bil Kleb Guest

    Xavier Noria wrote:
    >
    > Are there many different formats?


    Yes, in that the field lengths are different.

    No, in that the there are really only three "types":
    integers, vanilla floats, and exponentials.

    Regards,
    --
    Bil Kleb
    http://fun3d.larc.nasa.gov
    Bil Kleb, May 2, 2007
    #5
  6. Bil Kleb

    Bil Kleb Guest

    David A. Black wrote:
    > Hi --


    Hi.

    > Output:
    >
    > [0.4577, 0.7728]
    > [-0.02345, 0.001232]


    The second output indicates that I failed to express
    my predicament clearly, as the numbers are no longer
    in exponential format?

    A brief re-cast:

    The original file has numbers of the form

    5 0.4577 -2.345e-02

    Something reads the numbers and spits out new numbers,
    but in exactly the same format as the original file, e.g.,

    8 0.7728 1.232e-03

    I.e., I can't write the last number out as 0.001232 --
    it has to be in exponential format with the same field
    lengths.

    Regards,
    --
    Bil Kleb
    http://fun3d.larc.nasa.gov
    Bil Kleb, May 2, 2007
    #6
  7. Bil Kleb

    Xavier Noria Guest

    On May 2, 2007, at 2:50 PM, Bil Kleb wrote:

    > Xavier Noria wrote:
    >> Are there many different formats?

    >
    > Yes, in that the field lengths are different.
    >
    > No, in that the there are really only three "types":
    > integers, vanilla floats, and exponentials.


    Then I think you could base the solution on String#index/regexps
    depending on the existence of "e" and ".", since we can assume
    numbers are well-formed. The idea would be:

    if none
    %d
    elsif "e"
    %e
    else
    %f with computed widths
    end

    -- fxn
    Xavier Noria, May 2, 2007
    #7
  8. Bil Kleb

    Bil Kleb Guest

    Robert Klemme wrote:
    >
    > If there is a fixed number of formats you can probably use a cascade of
    > RX matches.


    Unfortunately not.

    > Otherwise it probably becomes a bit more complex like
    > matching sequences of digits and measuring their lengths.
    >
    > >> md = %r{^(\d+)\.(\d+)?$}.match('0.4577')

    > => #<MatchData:0x7ef61250>
    > >> pa="%#{md[0].size}.#{md[2].size}f"


    Hmmm, this looks like a viable path.

    I hadn't thought of using MatchData groups, but as you say,
    it may get ugly fast... I'm thinking of edge cases like
    dealing with the leading space if positive numbers become
    negative, or accommodating the number of digits needed for
    exponentials or integers if the new number exceeds the
    capacity of the existing format.

    Thanks,
    --
    Bil Kleb
    http://fun3d.larc.nasa.gov
    Bil Kleb, May 2, 2007
    #8
  9. Bil Kleb

    Bil Kleb Guest

    Xavier Noria wrote:
    >
    > Then I think you could base the solution on String#index/regexps
    > depending on the existence of "e" and ".", since we can assume numbers
    > are well-formed. The idea would be:
    >
    > if none
    > %d
    > elsif "e"
    > %e
    > else
    > %f with computed widths
    > end


    This, coupled with Robert's computed field lengths
    is beginning to look tractable...

    Thanks,
    --
    Bil Kleb
    http://fun3d.larc.nasa.gov
    Bil Kleb, May 2, 2007
    #9
  10. On 02.05.2007 15:08, Bil Kleb wrote:
    > Robert Klemme wrote:
    >>
    >> If there is a fixed number of formats you can probably use a cascade
    >> of RX matches.

    >
    > Unfortunately not.
    >
    >> Otherwise it probably becomes a bit more complex like matching
    >> sequences of digits and measuring their lengths.
    >>
    >> >> md = %r{^(\d+)\.(\d+)?$}.match('0.4577')

    >> => #<MatchData:0x7ef61250>
    >> >> pa="%#{md[0].size}.#{md[2].size}f"

    >
    > Hmmm, this looks like a viable path.
    >
    > I hadn't thought of using MatchData groups, but as you say,
    > it may get ugly fast... I'm thinking of edge cases like
    > dealing with the leading space if positive numbers become
    > negative, or accommodating the number of digits needed for
    > exponentials or integers if the new number exceeds the
    > capacity of the existing format.


    For floating point numbers you might even get away with a single regexp
    if that is crafted appropriately and group values are evaluated accordingly.

    Kind regards

    robert
    Robert Klemme, May 2, 2007
    #10
  11. On 5/2/07, Bil Kleb <> wrote:
    > Hi,
    >
    > I have files full of numbers that I need to twiddle,
    > but the format of the numbers cannot change[1], e.g.,
    >
    > '0.4577' -> '0.7728'
    >
    > or
    >
    > '-2.345e-02' -> ' 1.232e-03'
    >
    > Using scanf for the output seems to be the solution to
    > the second half of the problem, but how does one derive
    > the format specifier string of the input fields, which vary?
    >

    Bill,

    How's this for a start? I wrote it leaning towards clarity vs. conciseness.

    rick@frodo:/public/rubyscripts$ cat number_format.rb
    class String
    def to_number_format
    m = match(%r{^([ ]*)([+-]?)(.*)$})
    leading_blanks, sign, rest = m[1], m[2], m[3]
    plus_flag = sign == '+' ? sign : ''
    case rest
    when %r{^([\d]\.([\d]+)([eE])[+-][\d]+)(.*)$}
    # exponentiated float
    entirety, frac_part, e_or_E, exponent, suffix = $1, $2, $3, $4, $5
    entirety = leading_blanks << entirety
    "%#{entirety.length}.#{frac_part.length}#{e_or_E}#{suffix}"
    when %r{^([\d]+\.([\d]*))(.*)$}
    # simple float
    entirety, frac_part, suffix = $1, $2, $3
    zero = frac_part.match(/00$/) ? '0' : ''
    "%#{zero}#{entirety.length}.#{frac_part.length}f#{suffix}"
    when %r{^(0[\d]+)([^e.]*)$}
    # zero padded integer
    digits, suffix = $1, $2
    "#{leading_blanks}%#{plus_flag}0#{digits.length}d#{$suffix}"
    when %r{^([\d]+)([^e.]*)$}
    # whitespace padded integer
    digits, suffix = $1, $2
    digits = leading_blanks << digits
    "%#{digits.length}d#{suffix}"
    else
    nil
    end
    end
    end

    x = '0.4577'
    puts x
    puts x.to_number_format
    puts x.to_number_format % x.to_f
    puts(x.to_number_format % 0.7728)
    puts (x.to_number_format % x.to_f) == x
    puts

    x = '-2.345e-02'
    puts x
    puts x.to_number_format
    puts(x.to_number_format % x.to_f)
    puts(x.to_number_format % 1.232e-03)
    puts (x.to_number_format % x.to_f) == x
    puts


    x = '12345'
    puts x
    puts x.to_number_format
    puts(x.to_number_format % x.to_i)
    puts(x.to_number_format % 765)
    puts (x.to_number_format % x.to_f) == x
    puts

    x = ' 00012345'
    puts x
    puts x.to_number_format
    puts(x.to_number_format % x.to_i)
    puts(x.to_number_format % 765)
    puts (x.to_number_format % x.to_i) == x
    puts

    x = ' 12345'
    puts x
    puts x.to_number_format
    puts(x.to_number_format % x.to_i)
    puts(x.to_number_format % 765)
    puts (x.to_number_format % x.to_i) == x

    rick@frodo:/public/rubyscripts$ ruby number_format.rb
    0.4577
    %6.4f
    0.4577
    0.7728
    true

    -2.345e-02
    %9.3e
    -2.345e-02
    1.232e-03
    true

    12345
    %5d
    12345
    765
    true

    00012345
    %08d
    00012345
    00000765
    true

    12345
    %7d
    12345
    765
    true


    --
    Rick DeNatale

    My blog on Ruby
    http://talklikeaduck.denhaven2.com/
    Rick DeNatale, May 3, 2007
    #11
  12. Bil Kleb

    Bil Kleb Guest

    Rick DeNatale wrote:
    >
    > How's this for a start?


    Excellent! Thanks.

    All but my last test passed:

    require 'test/unit'
    require 'number_format'
    class TestNumberFormat < Test::Unit::TestCase
    def test_some_floats
    assert_equal( '%3.1f', '8.3'.to_number_format )
    assert_equal( '%05.3f', '0.500'.to_number_format )
    assert_equal( '%8.7f', '.0001170'.to_number_format )
    assert_equal( '%7.1f', '14000.0'.to_number_format )
    assert_equal( '%9.3E', '4.480E+09'.to_number_format )
    assert_equal( '%6.1e', '3.2e-5'.to_number_format )
    assert_equal( '%6.1f', '-254.2'.to_number_format )
    end
    end

    1) Failure:
    test_some_floats(TestNumberFormat) [-:11]:
    <"%6.1f"> expected but was
    <"%5.1f">.

    Note: made the simple float leading digit match 0
    or more to get the third test to pass.

    Puzzling the minus sign part now...

    Thanks again,
    --
    Bil Kleb
    http://fun3d.larc.nasa.gov
    Bil Kleb, May 4, 2007
    #12
  13. Bil Kleb

    Bil Kleb Guest

    Bil Kleb wrote:
    >
    > Puzzling the minus sign part now...


    "%#{zero}#{sign.length+entirety.length}.#{frac_part.length}f#{suffix}"
    ^^^^^^^^^^^^
    Later,
    --
    Bil Kleb
    http://fun3d.larc.nasa.gov
    Bil Kleb, May 4, 2007
    #13
  14. On 5/4/07, Bil Kleb <> wrote:
    > Rick DeNatale wrote:
    > >
    > > How's this for a start?

    >
    > Excellent! Thanks.
    >
    > All but my last test passed:
    >
    > require 'test/unit'
    > require 'number_format'
    > class TestNumberFormat < Test::Unit::TestCase
    > def test_some_floats
    > assert_equal( '%3.1f', '8.3'.to_number_format )
    > assert_equal( '%05.3f', '0.500'.to_number_format )
    > assert_equal( '%8.7f', '.0001170'.to_number_format )


    Not sure how this one worked, it fails for me. As a matter of fact:
    irb(main):001:0> '%8.7f' % 0.0001170
    => "0.0001170"

    And I haven't been able to find an sprintf format string which
    supresses a leading zero on a float.

    > assert_equal( '%7.1f', '14000.0'.to_number_format )
    > assert_equal( '%9.3E', '4.480E+09'.to_number_format )
    > assert_equal( '%6.1e', '3.2e-5'.to_number_format )
    > assert_equal( '%6.1f', '-254.2'.to_number_format )
    > end
    > end
    >
    > 1) Failure:
    > test_some_floats(TestNumberFormat) [-:11]:
    > <"%6.1f"> expected but was
    > <"%5.1f">.
    >
    > Note: made the simple float leading digit match 0
    > or more to get the third test to pass.
    >
    > Puzzling the minus sign part now...


    I see that you figured this out.

    Another thing to test is that the values actually round trip. Here's my test:

    rick@frodo:/public/rubyscripts$ cat test_number_format.rb
    require 'test/unit'
    require 'number_format'
    class TestNumberFormat < Test::Unit::TestCase
    def test_some_floats
    assert_equal( '%3.1f', '8.3'.to_number_format )
    assert_nf('8.3')
    assert_equal( '%05.3f', '0.500'.to_number_format )
    assert_nf('0.500')
    assert_equal( '%8.7f', '.0001170'.to_number_format )
    assert_nf('.0001170')
    assert_equal( '%7.1f', '14000.0'.to_number_format )
    assert_nf('14000.0')
    assert_equal( '%9.3E', '4.480E+09'.to_number_format )
    assert_nf('4.480E+09')
    assert_equal( '%6.1e', '3.2e-5'.to_number_format )
    assert_nf('3.2e-5')
    assert_equal( '%6.1f', '-254.2'.to_number_format )
    assert_nf('-254.2')
    end

    private
    def assert_nf(str)
    assert_equal(str, str.to_number_format % eval(str))
    end
    end

    --
    Rick DeNatale

    My blog on Ruby
    http://talklikeaduck.denhaven2.com/
    Rick DeNatale, May 5, 2007
    #14
  15. On 5/5/07, Rick DeNatale <> wrote:
    > On 5/4/07, Bil Kleb <> wrote:
    > > Rick DeNatale wrote:
    > > >
    > > > How's this for a start?

    > >
    > > Excellent! Thanks.


    By the way Bill, seeing who you seem to work for, I'd like to dedicate
    whatever help I've given to you to the memory of Wally Schirra!

    Are you a turtle? <G>

    --
    Rick DeNatale

    Visit the Project Mercury Wiki Site
    http://www.mercuryspacecraft.com/

    My blog on Ruby
    http://talklikeaduck.denhaven2.com/
    Rick DeNatale, May 5, 2007
    #15
  16. Bil Kleb

    Bil Kleb Guest

    Rick DeNatale wrote:
    > On 5/4/07, Bil Kleb <> wrote:
    >> assert_equal( '%8.7f', '.0001170'.to_number_format )

    >
    > Not sure how this one worked, it fails for me. As a matter of fact:
    > irb(main):001:0> '%8.7f' % 0.0001170
    > => "0.0001170"
    >
    > And I haven't been able to find an sprintf format string which
    > supresses a leading zero on a float.


    You're correct; as you wrote, I wasn't testing round-trip.

    Thanks,
    --
    Bil Kleb
    http://fun3d.larc.nasa.gov
    Bil Kleb, May 7, 2007
    #16
  17. Bil Kleb

    Bil Kleb Guest

    Rick DeNatale wrote:
    >
    > By the way Bill, seeing who you seem to work for, I'd like to dedicate
    > whatever help I've given to you to the memory of Wally Schirra!


    You helped me learn more Ruby; always a pure joy. Thank you.

    I've since decided that I'm going to require the users
    specify the format instead of trying to back it out --
    there are cases for which you just can't back out the
    correct format. Besides, the need is infrequent, and
    I have no sympathy for code that employs formatted reads...

    > Are you a turtle? <G>


    You bet your sweet ass I am! ;)

    Regards,
    --
    Bil Kleb
    http://fun3d.larc.nasa.gov
    Bil Kleb, May 7, 2007
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. John L.

    Inverse of printf %x format

    John L., Feb 6, 2005, in forum: C Programming
    Replies:
    6
    Views:
    781
    Michael Mair
    Feb 6, 2005
  2. =?ISO-8859-1?Q?Martin_J=F8rgensen?=

    scanf (yes/no) - doesn't work + deprecation errors scanf, fopen etc.

    =?ISO-8859-1?Q?Martin_J=F8rgensen?=, Feb 16, 2006, in forum: C Programming
    Replies:
    185
    Views:
    3,314
    those who know me have no need of my name
    Apr 3, 2006
  3. =?ISO-8859-1?Q?Martin_J=F8rgensen?=

    difference between scanf("%i") and scanf("%d") ??? perhaps bug inVS2005?

    =?ISO-8859-1?Q?Martin_J=F8rgensen?=, Apr 26, 2006, in forum: C Programming
    Replies:
    18
    Views:
    651
    Richard Bos
    May 2, 2006
  4. mano
    Replies:
    3
    Views:
    1,924
    steve.kim
    Jan 31, 2007
  5. mano
    Replies:
    4
    Views:
    1,087
    Preben
    Jan 31, 2007
Loading...

Share This Page