Re: Sorting numeric strings

Discussion in 'Java' started by Daniel Pitts, May 1, 2012.

  1. Daniel Pitts

    Daniel Pitts Guest

    On 4/30/12 6:27 PM, Ben wrote:
    > Given the following data:
    >
    > Col1, Col2, Col3
    > 438.23, 991897664, ccc
    > 22.12, 991897631, bbb
    > 100.99, 881897631, aaa
    > 50.12, 991884803, ddd
    >
    > The class below will sort the data based on the column specified, except
    > Col1, which contains float values. If you set the SortCol variable below
    > to 0, sorting does not work. If you set it to 1 or 2, sorting does work.
    > How can I sort Col1 which is a column of numeric strings?
    >
    > import java.io.BufferedReader;
    > import java.io.FileReader;
    > import java.io.FileWriter;
    > import java.util.LinkedList;
    > import java.util.List;
    > import java.util.Map;
    > import java.util.TreeMap;
    >
    > public class SortColumn {
    >
    > public static void main(String[] args) throws Exception {
    >
    > BufferedReader reader = new BufferedReader(new FileReader("file.csv"));
    > //BufferedReader reader = new BufferedReader(new
    > FileReader("jtp-input2-test.csv"));
    > Map<String, List<String>> map = new TreeMap<String, List<String>>();
    > String line = reader.readLine(); //read header
    > while ((line = reader.readLine()) != null) {
    > String key = getField(line).toString();
    > List<String> l = map.get(key);
    > if (l == null) {
    > l = new LinkedList<String>();
    > map.put(key, l);
    > System.out.println(key);
    > }
    > l.add(line);
    >
    > }
    > reader.close();
    >
    > FileWriter writer = new FileWriter("sorted_numbers.txt");
    > writer.write("Col1, Col2, Col3\n");
    > // writer.write("billnumber, Copay, Discount, NonAllow, unknown\n");
    > for (List<String> list : map.values()) {
    > for (String val : list) {
    > writer.write(val);
    > writer.write("\n");
    > }
    > }
    > writer.close();
    > }
    >
    > private static String getField(String line) {
    > // Column you want to sort on (Zero based)
    > int SortCol = 0;
    > return line.split(",")[SortCol];
    > }
    > }


    In order to compare two strings as numbers, you need to pad zeros on
    both extremes away from any "dot".

    In other words, in order to compare "123" with "3.141", you'd need to
    "normalize" them to "123.000" and "003.141".

    I've actually recently written something that does this, and handles
    arbitrary "." designations. This was actually designed to work with
    revision numbering, which can have multiple ".".

    import org.apache.commons.lang.StringUtils;
    import java.util.Comparator;

    public class StringAsNumberComparator implements Comparator<String> {
    private int compare(String left, String right) {
    final String[] a = StringUtils.split(left, '.');
    final String[] b = StringUtils.split(right, '.');
    for (int i = 0; i < a.length; ++i) {
    if (i >= b.length) {
    return 1;
    }
    final int compare = compareMaybeNumeric(left, right);
    if (compare != 0) {
    return compare;
    }
    }
    return a.length - b.length;
    }

    private static int compareMaybeNumeric(String a, String b) {
    if (StringUtils.isNumeric(a) && StringUtils.isNumeric(b)) {
    final int length = Math.max(a.length(), b.length());
    return StringUtils.leftPad(a, length,
    '0').compareTo(StringUtils.leftPad(b, length, '0'));
    } else {
    return a.compareTo(b);
    }
    }
    }


    Although, now that I'm looking at this, I see a few optimizations I can
    make that don't involve padding. If two numbers aren't the same length,
    then the longer string is larger magnitude.

    Of course, this code doesn't consider negative values, but can be
    adjusted to do so.
     
    Daniel Pitts, May 1, 2012
    #1
    1. Advertisements

  2. Daniel Pitts

    Roedy Green Guest

    On Tue, 01 May 2012 10:53:35 -0700, Daniel Pitts
    <> wrote, quoted or indirectly
    quoted someone who said :

    >
    >In order to compare two strings as numbers, you need to pad zeros on
    >both extremes away from any "dot".


    the other way is with Double.parseDouble and compare the doubles.
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    Programmers love to create simplified replacements for HTML.
    They forget that the simplest language is the one you
    already know. They also forget that their simple little
    markup language will bit by bit become even more convoluted
    and complicated than HTML because of the unplanned way it grows.
    ..
     
    Roedy Green, May 1, 2012
    #2
    1. Advertisements

  3. Daniel Pitts

    Roedy Green Guest

    On Tue, 01 May 2012 15:02:09 -0700, Patricia Shanahan <>
    wrote, quoted or indirectly quoted someone who said :

    >
    >Whether comparing doubles gives the correct result does depend on the
    >maximum number of significant digits. BigDecimal is always safe for
    >this, no matter how many significant digits.



    A float will do for about 7 significant digits. A double will do for
    16.

    A currency amount can be shown with a double to the penny up to 99
    trillion dollars
    99,000,000,000,000.00,
    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    Programmers love to create simplified replacements for HTML.
    They forget that the simplest language is the one you
    already know. They also forget that their simple little
    markup language will bit by bit become even more convoluted
    and complicated than HTML because of the unplanned way it grows.
    ..
     
    Roedy Green, May 2, 2012
    #3
  4. On Wed, 02 May 2012 14:36:29 -0700, Roedy Green
    <> wrote:

    >On Tue, 01 May 2012 15:02:09 -0700, Patricia Shanahan <>
    >wrote, quoted or indirectly quoted someone who said :


    >>Whether comparing doubles gives the correct result does depend on the
    >>maximum number of significant digits. BigDecimal is always safe for
    >>this, no matter how many significant digits.


    >A float will do for about 7 significant digits. A double will do for
    >16.
    >
    >A currency amount can be shown with a double to the penny up to 99
    >trillion dollars
    >99,000,000,000,000.00,


    If you are thinking of IEEE 754 64-bit floating point, well, it
    has ~15.95 decimal digits of precision so it is not quite good enough
    for 16 digits. 15 digits is fine though. (The 32-bit vesion is good
    for ~7.22 digits.)

    Here is a counterexample in JavaScript. Adding 8765432101234567
    and 1234567890123456 gives 9999999991358024 which is one too high. I
    get the same result in Java (except that it is in E-notation) with:

    ***** Start of Code *****
    class tmp
    {

    public static void main(String [] args)
    {
    double a=8765432101234567d;
    double b=1234567890123456d;

    System.out.println(a);
    System.out.println(b);
    System.out.println(a+b);
    }

    }
    ***** End of Code *****

    I researched in this area when I was figuring out how to handle
    fixed-point numbers using JavaScript's number object which uses IEEE
    754 64-bit floating point.

    Sincerely,

    Gene Wirchenko
     
    Gene Wirchenko, May 3, 2012
    #4
  5. In comp.lang.java.programmer message <3s93q719q1787gb4ihi4fugvn8cvgc8nu2
    @4ax.com>, Wed, 2 May 2012 14:36:29, Roedy Green <
    om.invalid> posted:

    >A currency amount can be shown with a double to the penny up to 99
    >trillion dollars
    >99,000,000,000,000.00,


    One then only has a factor of about seven to spare if handling the US
    annual GDP. It's worse for the Chinese, whose yuan is cheaper than the
    USD by a larger factor than their GDP is less, and contains 100 fen.

    --
    (c) John Stockton, nr London, UK. ?@merlyn.demon.co.uk Turnpike v6.05.
    Website <http://www.merlyn.demon.co.uk/> - w. FAQish topics, links, acronyms
    PAS EXE etc. : <http://www.merlyn.demon.co.uk/programs/> - see in 00index.htm
    Dates - miscdate.htm estrdate.htm js-dates.htm pas-time.htm critdate.htm etc.
     
    Dr J R Stockton, May 3, 2012
    #5
  6. Daniel Pitts

    Roedy Green Guest

    On Thu, 3 May 2012 19:41:12 +0100, Dr J R Stockton
    <> wrote, quoted or indirectly
    quoted someone who said :

    >One then only has a factor of about seven to spare if handling the US
    >annual GDP. It's worse for the Chinese, whose yuan is cheaper than the
    >USD by a larger factor than their GDP is less, and contains 100 fen.


    They won't fail catastrophically, the way longs fail. They will just
    start getting the low order digits inaccurately, which really does not
    matter except for aesthetics.

    I once did a program for a gas utility to figure out the optimal way
    to buy gas from many suppliers each with different contract rules and
    minimum take or pay rules. The amusing thing was most of the logic
    was to get all the pennies to add up precisely even though at those
    sort of expenditures they did not really matter.

    --
    Roedy Green Canadian Mind Products
    http://mindprod.com
    Programmers love to create simplified replacements for HTML.
    They forget that the simplest language is the one you
    already know. They also forget that their simple little
    markup language will bit by bit become even more convoluted
    and complicated than HTML because of the unplanned way it grows.
    ..
     
    Roedy Green, May 4, 2012
    #6
  7. Daniel Pitts

    Lew Guest

    Roedy Green wrote:
    > Dr J R Stockton wrote, quoted or indirectly quoted someone who said :
    >
    >> One then only has a factor of about seven to spare if handling the US
    >> annual GDP. It's worse for the Chinese, whose yuan is cheaper than the
    >> USD by a larger factor than their GDP is less, and contains 100 fen.

    >
    > They won't fail catastrophically, the way longs fail. They will just
    > start getting the low order digits inaccurately, which really does not
    > matter except for aesthetics.


    You have an unusual understanding of accounting.

    The low-order bits might matter, depending on the purpose. If you're calculating, you can get unusual results due to round-off error. If your purpose is approximate, the answers might be close enough, as you suggest. But so much of accounting requires perfect accuracy, irrespective of scale, so a catastrophic failure might be preferable to a graceful wrong answer.

    Also, a 64-bit double only has 53 bits of precision; a long has 63. So arguably a long is better to represent individual amounts unless you're doing funky calculations such as for interest. You could, for example, express theChinese GDP in fen.

    But then there's BigDecimal, with perfect accuracy. So there are choices.

    --
    Lew
     
    Lew, May 4, 2012
    #7
  8. On Thu, 03 May 2012 18:11:44 -0700, Lew wrote:

    > But then there's BigDecimal, with perfect accuracy. So there are
    > choices.
    >

    Yes. Using floating point for exact financial figures is almost
    invariably a wrong move. Use int or long, depending on the size of the
    values you must handle, to represent the value in the smallest subunits,
    i.e. store dollars, and euros as an integer representing cents, and use
    an external representation with a decimal point interpolated at the
    appropriate position[*] for the currency or, alternatively, use
    BigDecimal. Always used fixed point arithmetic.

    [*] different currencies have different subunits: I've seen currencies
    with zero, 2 or 3 digits after the decimal point.

    I think you'll find that using floating point for financial amounts was
    an abberation introduced by programmers on early 8 and 16 bit micros who
    wrote financial packages in BASIC.

    Early mainframes stored currency values as binary or BCD integers and all
    modern systems should do the same


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
     
    Martin Gregorie, May 4, 2012
    #8
  9. On Fri, 4 May 2012 20:01:50 +0000 (UTC), Martin Gregorie
    <> wrote:

    [snip]

    >I think you'll find that using floating point for financial amounts was
    >an abberation introduced by programmers on early 8 and 16 bit micros who
    >wrote financial packages in BASIC.


    I suspect that it raised its head on earlier systems that did not
    have COBOL or another language with decimal arithmetic.

    With JavaScript, short of writing a fixed-point library, you have
    to use floating point since JavaScript has only one numeric type,
    namely IEEE 754 64-bit floating point. If you restrict your
    arithmetic such that calculations result only in integer values in
    range, you are safe. I have done proof-of-concept on this, and it
    does work.

    >Early mainframes stored currency values as binary or BCD integers and all
    >modern systems should do the same


    The details are the killer.

    Sincerely,

    Gene Wirchenko
     
    Gene Wirchenko, May 4, 2012
    #9
  10. On Fri, 04 May 2012 14:19:51 -0700, Gene Wirchenko wrote:

    > On Fri, 4 May 2012 20:01:50 +0000 (UTC), Martin Gregorie
    > <> wrote:
    >
    > [snip]
    >
    >>I think you'll find that using floating point for financial amounts was
    >>an abberation introduced by programmers on early 8 and 16 bit micros who
    >>wrote financial packages in BASIC.

    >
    > I suspect that it raised its head on earlier systems that did not
    > have COBOL or another language with decimal arithmetic.
    >

    Wrong. I was developing and maintaining accounting systems that held
    financial amounts as binary cents/pence in assembler long before I learnt
    COBOL. All early mainframes up to and including the S/360 and AS/400 (aka
    Future Series, System/38) and ICL 1900/2900/3900 handled financial
    calculations this way.

    COBOL merely made this easier - PIC S9(7).99 COMP SYNC.
    matched with PIC Z,ZZZ,ZZ9.99DB BLANK WHEN ZERO.

    > With JavaScript, short of writing a fixed-point library, you have
    > to use floating point since JavaScript has only one numeric type, namely
    > IEEE 754 64-bit floating point.
    >

    You may use JavaScript for financial calculations. I would not.
    Item: it is normal for currency conversions to be defined as a mandatory
    algorithm defined as a set of integer calculations by the network or
    authority who control the domain that the currency conversion is made in,
    e.g S.W.I.F.T. These do not work if floating point arithmetic is used and
    so are non-compliant.

    > The details are the killer.
    >

    Quite.


    --
    martin@ | Martin Gregorie
    gregorie. | Essex, UK
    org |
     
    Martin Gregorie, May 5, 2012
    #10
  11. In comp.lang.java.programmer message <jo1p6k$ra8$
    >, Fri, 4 May 2012 23:36:52, Martin Gregorie <martin@address-in-

    sig.invalid> posted:

    >You may use JavaScript for financial calculations. I would not.
    >Item: it is normal for currency conversions to be defined as a mandatory
    >algorithm defined as a set of integer calculations by the network or
    >authority who control the domain that the currency conversion is made in,
    >e.g S.W.I.F.T. These do not work if floating point arithmetic is used and
    >so are non-compliant.


    JavaScript, using IEEE 754 Doubles, is accurate for addition,
    subtraction, and multiplication of integers if the operands and results
    are no greater than 2^53. Division is as accurate as possible under
    those conditions; one may want to use it in conjunction with Math.round
    Math.floor or Math.ceil.

    Assumes not using a really early Pentium CPU.

    If working in dollars, quarters are safe.

    --
    (c) John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v6.05 MIME.
    Web <http://www.merlyn.demon.co.uk/> - FAQish topics, acronyms, & links.
    Proper <= 4-line sig. separator as above, a line exactly "-- " (SonOfRFC1036)
    Do not Mail News to me. Before a reply, quote with ">" or "> " (SonOfRFC1036)
     
    Dr J R Stockton, May 6, 2012
    #11
  12. On Fri, 4 May 2012 23:36:52 +0000 (UTC), Martin Gregorie
    <> wrote:

    >On Fri, 04 May 2012 14:19:51 -0700, Gene Wirchenko wrote:
    >
    >> On Fri, 4 May 2012 20:01:50 +0000 (UTC), Martin Gregorie
    >> <> wrote:


    [snip]

    >> With JavaScript, short of writing a fixed-point library, you have
    >> to use floating point since JavaScript has only one numeric type, namely
    >> IEEE 754 64-bit floating point.
    >>

    >You may use JavaScript for financial calculations. I would not.
    >Item: it is normal for currency conversions to be defined as a mandatory
    >algorithm defined as a set of integer calculations by the network or
    >authority who control the domain that the currency conversion is made in,
    >e.g S.W.I.F.T. These do not work if floating point arithmetic is used and
    >so are non-compliant.


    Your last sentence is mistaken.

    Floating point can represent some integer values exactly. The
    IEEE 754 64-bit format has 53 bits of precision. This is not quite 16
    digits of precision. Stick with integer values in the range
    (-10^15,10^15) (an exclusive range), and they will all be represented
    exactly.


    >> The details are the killer.
    >>

    >Quite.


    It is possible to get around some of them.

    Sincerely,

    Gene Wirchenko
     
    Gene Wirchenko, May 7, 2012
    #12
  13. On 5/7/2012 12:34 PM, Gene Wirchenko wrote:
    > Floating point can represent some integer values exactly. The
    > IEEE 754 64-bit format has 53 bits of precision. This is not quite 16
    > digits of precision. Stick with integer values in the range
    > (-10^15,10^15) (an exclusive range), and they will all be represented
    > exactly.


    Or, another way to look at it, double arithmetic can represent exactly
    every value which would be a Java int, which means if you would normally
    use an int in the first place, you can use a double instead (with tweaks
    needed for division). Indeed, many JS JITs will do arithmetic as
    integers if the numbers are integral.

    --
    Beware of bugs in the above code; I have only proved it correct, not
    tried it. -- Donald E. Knuth
     
    Joshua Cranmer, May 7, 2012
    #13
  14. Gene Wirchenko <> wrote:

    (snip)
    > Floating point can represent some integer values exactly. The
    > IEEE 754 64-bit format has 53 bits of precision. This is not quite 16
    > digits of precision. Stick with integer values in the range
    > (-10^15,10^15) (an exclusive range), and they will all be represented
    > exactly.


    I believe that addition, subtraction, and multiplication should
    give the right answer as long as it stays within range.

    Divide is less obvious. Are you sure that there are no cases
    where divide will round incorrectly?

    -- glen
     
    glen herrmannsfeldt, May 7, 2012
    #14
  15. On Mon, 7 May 2012 17:48:55 +0000 (UTC), glen herrmannsfeldt
    <> wrote:

    >Gene Wirchenko <> wrote:
    >
    >(snip)
    >> Floating point can represent some integer values exactly. The
    >> IEEE 754 64-bit format has 53 bits of precision. This is not quite 16
    >> digits of precision. Stick with integer values in the range
    >> (-10^15,10^15) (an exclusive range), and they will all be represented
    >> exactly.

    >
    >I believe that addition, subtraction, and multiplication should
    >give the right answer as long as it stays within range.
    >
    >Divide is less obvious. Are you sure that there are no cases
    >where divide will round incorrectly?


    I have not dealt with division since I do not need it in general.

    I do need rounding though, and I use modulo to do that. Were I
    to need division, I would round the same way.

    Sincerely,

    Gene Wirchenko
     
    Gene Wirchenko, May 7, 2012
    #15
    1. Advertisements

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    5
    Views:
    1,099
    X-Centric
    Jun 30, 2005
  2. Joe Rattz

    DataGrid-DataView-XML-Sorting strings as numeric

    Joe Rattz, Nov 19, 2003, in forum: ASP .Net Datagrid Control
    Replies:
    2
    Views:
    293
    David Knipper
    Nov 20, 2003
  3. Arne Vajhøj

    Re: Sorting numeric strings

    Arne Vajhøj, May 1, 2012, in forum: Java
    Replies:
    0
    Views:
    434
    Arne Vajhøj
    May 1, 2012
  4. Gene Wirchenko

    Re: Sorting numeric strings

    Gene Wirchenko, May 1, 2012, in forum: Java
    Replies:
    0
    Views:
    427
    Gene Wirchenko
    May 1, 2012
  5. Roedy Green

    Re: Sorting numeric strings

    Roedy Green, May 1, 2012, in forum: Java
    Replies:
    0
    Views:
    487
    Roedy Green
    May 1, 2012
Loading...

Share This Page