How to sort strings containing numbers.

Discussion in 'Java' started by Claus, Oct 5, 2004.

  1. Claus

    Claus Guest

    I need some help with a sorting problem.

    I want to sort strings that might have number inside.

    Ie: "cbr1", "cbr2", "cbr10".

    If I sort the above strings, I get this result:
    "cbr1", "cbr10", "cbr2"
    I wanted this order:
    "cbr1", "cbr2", "cbr10".

    I plan to use RuleBasedCollator, but can't get the collation rules
    right for this problem.

    Can anybody help me ?

    By the way: how does the "?" work in a rule - I can't seem to find any
    description on this "parameter" ?

    Kind regard
    C.Bro.
    Claus, Oct 5, 2004
    #1
    1. Advertising

  2. (Claus) writes:

    > I plan to use RuleBasedCollator, but can't get the collation rules
    > right for this problem.


    You need a sorting mechanism that can discover numeric substrings and
    treat them as numbers; RuleBasedCollator is not suited for the task.
    Tor Iver Wilhelmsen, Oct 5, 2004
    #2
    1. Advertising

  3. Claus

    EdUarDo Guest

    > Ie: "cbr1", "cbr2", "cbr10".

    Use zeroes...

    cbr01, cbr02, cbr10
    EdUarDo, Oct 5, 2004
    #3
  4. Claus

    VisionSet Guest

    "Claus" <> wrote in message
    news:...
    > I need some help with a sorting problem.
    >
    > I want to sort strings that might have number inside.
    >
    > Ie: "cbr1", "cbr2", "cbr10".
    >
    > If I sort the above strings, I get this result:
    > "cbr1", "cbr10", "cbr2"
    > I wanted this order:
    > "cbr1", "cbr2", "cbr10".
    >
    > I plan to use RuleBasedCollator, but can't get the collation rules
    > right for this problem.
    >
    > Can anybody help me ?
    >
    > By the way: how does the "?" work in a rule - I can't seem to find any
    > description on this "parameter" ?
    >


    Write a Comparator (implements Comparator) which extracts the numeric part
    perhaps by RegEx: [0-9]*
    Then sort with a Collections.sort method that uses the comparator.

    --
    Mike W
    VisionSet, Oct 5, 2004
    #4
  5. Claus

    davidlg Guest

    "VisionSet" <> wrote in message
    news:z%w8d.2$...
    >
    >
    > "Claus" <> wrote in message
    > news:...
    > > I need some help with a sorting problem.
    > >
    > > I want to sort strings that might have number inside.
    > >
    > > Ie: "cbr1", "cbr2", "cbr10".
    > >
    > > If I sort the above strings, I get this result:
    > > "cbr1", "cbr10", "cbr2"
    > > I wanted this order:
    > > "cbr1", "cbr2", "cbr10".
    > >
    > > I plan to use RuleBasedCollator, but can't get the collation rules
    > > right for this problem.
    > >
    > > Can anybody help me ?
    > >
    > > By the way: how does the "?" work in a rule - I can't seem to find any
    > > description on this "parameter" ?
    > >

    >
    > Write a Comparator (implements Comparator) which extracts the numeric part
    > perhaps by RegEx: [0-9]*
    > Then sort with a Collections.sort method that uses the comparator.
    >
    > --
    > Mike W
    >

    A great suggestion Mike. I like it the best. Unless the OP can somehow use
    zero's as someone earlier suggested. Either way I like the Comparator
    solution. I use it a lot in my code.

    Just my $.02.

    -David
    davidlg, Oct 5, 2004
    #5
  6. Claus

    Eric Sosman Guest

    VisionSet wrote:
    > "Claus" <> wrote in message
    > news:...
    >
    >>I need some help with a sorting problem.
    >>
    >>I want to sort strings that might have number inside.
    >>
    >>Ie: "cbr1", "cbr2", "cbr10".
    >>
    >>If I sort the above strings, I get this result:
    >>"cbr1", "cbr10", "cbr2"
    >>I wanted this order:
    >>"cbr1", "cbr2", "cbr10".
    >>
    >>I plan to use RuleBasedCollator, but can't get the collation rules
    >>right for this problem.
    >>
    >>Can anybody help me ?
    >>
    >>By the way: how does the "?" work in a rule - I can't seem to find any
    >>description on this "parameter" ?
    >>

    >
    >
    > Write a Comparator (implements Comparator) which extracts the numeric part
    > perhaps by RegEx: [0-9]*
    > Then sort with a Collections.sort method that uses the comparator.


    I haven't tried the code myself, but there's a
    possibly useful link on this page:

    http://sourcefrog.net/projects/natsort/

    (Note: "frog," not "forge.")

    --
    Eric Sosman, Oct 5, 2004
    #6
  7. Claus

    JP Martin Guest

    Hi C.B.!

    > I need some help with a sorting problem.
    > I want to sort strings that might have number inside.
    > Ie: "cbr1", "cbr2", "cbr10".


    I don't know how to use a RuleBasedCollator for that, but what I
    suggest is that you parse these strings into arrays of either
    substrings or numbers, and then sort that based on a custom
    comparator. I include the code below.

    I'd be interested to hear if someone has a shorter or simpler
    solution.

    Cheers,
    JP

    import java.util.*;

    /** Sorts strings, taking numbers into account
    * so a10 gets sorted after a2
    * (different from what lexicographical order would do).
    * This code can handle string with different formats,
    * for example abc1, abc1b, 25bcd.
    * JP Martin, Oct'04
    **/
    public class Test implements Comparator {

    public int compare(Object lhs, Object rhs) {
    String[] l = (String[]) lhs;
    String[] r = (String[]) rhs;

    for (int i=0; i<l.length; i++) {
    if (i>=r.length) return 1;
    int aux = compareStr(l,r);
    if (aux!=0) return aux;
    }
    if (r.length>l.length) return -1;
    return 0;
    }

    public int compareStr(String l, String r) {
    Double ld, rd;
    try {
    ld = new Double(l);
    rd = new Double(r);
    return ld.compareTo(rd);
    } catch (Exception e) {
    return l.compareTo(r);
    }
    }

    public static String collapse(String[] x) {
    StringBuffer aux = new StringBuffer();
    for (int i=0; i<x.length; i++)
    aux.append(x);
    return aux.toString();
    }

    public static String[] splitNumbers(String x) {
    String onlyNumbers = x.replaceAll("\\D+",":");
    String onlyLetters = x.replaceAll("\\d+",":");
    String[][] y = new String[2][];
    y[0] = onlyNumbers.split(":");
    y[1] = onlyLetters.split(":");
    int s=0;
    String[] ret = new String[y[0].length + y[1].length - 1];
    if (onlyNumbers.startsWith(":")) s=1;
    int j=0;
    for (int i=0;i<y.length;i++) {
    ret[j++]=y;
    if (y[s^1].length>i+1) ret[j++]=(y[s^1][i+1]);
    }
    return ret;
    }

    public static void main(String argv[]) {
    String[] str={"cbr10","cbr2","cbr1a","cbr3",
    "cbr25","cbr1","cbr1b"};
    // after we sort this list we'll show:
    // cbr1 cbr1a cbr1b cbr2 cbr3 cbr10 cbr25

    String[][] split = new String[str.length][];

    for (int i=0; i<str.length; i++)
    split = splitNumbers( str );

    Arrays.sort(split, new Test());

    for (int i=0; i<split.length; i++) {
    System.out.print(collapse(split) + " ");
    }
    System.out.println();
    }
    }
    JP Martin, Oct 5, 2004
    #7
  8. Claus

    marcus Guest

    Bro
    This is incredibly difficult to do correctly -- M$ is constantly
    tweaking on their sort technology to meet human expectations, but the
    expectations themselves shift. The trouble is anticipating if xm002d
    should come between xm002 and xm003, or between xm009b and xm002e.

    the natural sort package listed above looks promising, but I believe you
    need to thoroughly understand your (or your client's) expectations
    before launching into this type of task.

    BTW, this is not a "java" type issue, but a human interaction issue.

    -- clh

    Claus wrote:
    > I need some help with a sorting problem.
    >
    > I want to sort strings that might have number inside.
    >
    > Ie: "cbr1", "cbr2", "cbr10".
    >
    > If I sort the above strings, I get this result:
    > "cbr1", "cbr10", "cbr2"
    > I wanted this order:
    > "cbr1", "cbr2", "cbr10".
    >
    > I plan to use RuleBasedCollator, but can't get the collation rules
    > right for this problem.
    >
    > Can anybody help me ?
    >
    > By the way: how does the "?" work in a rule - I can't seem to find any
    > description on this "parameter" ?
    >
    > Kind regard
    > C.Bro.
    marcus, Oct 5, 2004
    #8
  9. Claus

    Alan Moore Guest

    On Tue, 05 Oct 2004 14:21:36 -0700, marcus <> wrote:

    >Bro
    >This is incredibly difficult to do correctly -- M$ is constantly
    >tweaking on their sort technology to meet human expectations, but the
    >expectations themselves shift. The trouble is anticipating if xm002d
    >should come between xm002 and xm003, or between xm009b and xm002e.


    Funny, when I was working on this problem a while back, I became
    convinced that MS introduced "intuitive" sorting just to make my life
    difficult. ;)

    I know what you mean about expectations, though. While I was working
    on it, I was surprised to learn that, if a filename has spaces in it,
    the number of spaces is significant. That is, if two filenames have
    the same prefix except that one has two spaces where the other has
    only one, the one with two spaces sorts first, no matter what comes
    after the spaces. I know that's the standard asciibetical sorting
    scheme, but I had just assumed they would treat spaces similarly to
    numbers. I mean, one space or ten, it just looks like emptiness, and
    how can one bit of nothing be more significant than another?

    That seems to be the approach Pool took, but it isn't how Windows
    Explorer works. Another difference is that, in Explorer, all
    punctuation characters sort before numbers (which sort before
    letters). In Paour's code, it looks like characters other than digits
    and spaces are simply compared asciibetically. So there's a big
    difference between Pool's "natural" sort order and MS's "intuitive"
    ordering. Of course, the OP never said he wanted to emulate Windows
    Explorer, so it probably doesn't matter.
    Alan Moore, Oct 6, 2004
    #9
  10. Claus

    marcus Guest

    Well -- whenever I can prompt a newbie into thinking a little deeper
    about a project I feel I've done my duty. This one is a bottomless pit,
    though, and leads to bizarre innovations like little paper-clip men
    popping up and quipping "is this sort no-break-space sensitive?"

    Alan Moore wrote:
    > On Tue, 05 Oct 2004 14:21:36 -0700, marcus <> wrote:
    >
    >
    >>Bro
    >>This is incredibly difficult to do correctly -- M$ is constantly
    >>tweaking on their sort technology to meet human expectations, but the
    >>expectations themselves shift. The trouble is anticipating if xm002d
    >>should come between xm002 and xm003, or between xm009b and xm002e.

    >
    >
    > Funny, when I was working on this problem a while back, I became
    > convinced that MS introduced "intuitive" sorting just to make my life
    > difficult. ;)
    >
    > I know what you mean about expectations, though. While I was working
    > on it, I was surprised to learn that, if a filename has spaces in it,
    > the number of spaces is significant. That is, if two filenames have
    > the same prefix except that one has two spaces where the other has
    > only one, the one with two spaces sorts first, no matter what comes
    > after the spaces. I know that's the standard asciibetical sorting
    > scheme, but I had just assumed they would treat spaces similarly to
    > numbers. I mean, one space or ten, it just looks like emptiness, and
    > how can one bit of nothing be more significant than another?
    >
    > That seems to be the approach Pool took, but it isn't how Windows
    > Explorer works. Another difference is that, in Explorer, all
    > punctuation characters sort before numbers (which sort before
    > letters). In Paour's code, it looks like characters other than digits
    > and spaces are simply compared asciibetically. So there's a big
    > difference between Pool's "natural" sort order and MS's "intuitive"
    > ordering. Of course, the OP never said he wanted to emulate Windows
    > Explorer, so it probably doesn't matter.
    marcus, Oct 6, 2004
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Rolf Kemper
    Replies:
    1
    Views:
    408
    David Carlisle
    Jan 21, 2005
  2. Ben

    Strings, Strings and Damned Strings

    Ben, Jun 22, 2006, in forum: C Programming
    Replies:
    14
    Views:
    755
    Malcolm
    Jun 24, 2006
  3. Navin
    Replies:
    1
    Views:
    685
    Ken Schaefer
    Sep 9, 2003
  4. GIMME
    Replies:
    5
    Views:
    185
    Thomas 'PointedEars' Lahn
    Jul 26, 2004
  5. one man army

    Numbers to strings to numbers again

    one man army, Dec 28, 2005, in forum: Javascript
    Replies:
    6
    Views:
    139
    one man army
    Dec 30, 2005
Loading...

Share This Page