How to sort strings containing numbers.

C

Claus

I need some help with a sorting problem.

I want to sort strings that might have number inside.

Ie: "cbr1", "cbr2", "cbr10".

If I sort the above strings, I get this result:
"cbr1", "cbr10", "cbr2"
I wanted this order:
"cbr1", "cbr2", "cbr10".

I plan to use RuleBasedCollator, but can't get the collation rules
right for this problem.

Can anybody help me ?

By the way: how does the "?" work in a rule - I can't seem to find any
description on this "parameter" ?

Kind regard
C.Bro.
 
T

Tor Iver Wilhelmsen

I plan to use RuleBasedCollator, but can't get the collation rules
right for this problem.

You need a sorting mechanism that can discover numeric substrings and
treat them as numbers; RuleBasedCollator is not suited for the task.
 
V

VisionSet

Claus said:
I need some help with a sorting problem.

I want to sort strings that might have number inside.

Ie: "cbr1", "cbr2", "cbr10".

If I sort the above strings, I get this result:
"cbr1", "cbr10", "cbr2"
I wanted this order:
"cbr1", "cbr2", "cbr10".

I plan to use RuleBasedCollator, but can't get the collation rules
right for this problem.

Can anybody help me ?

By the way: how does the "?" work in a rule - I can't seem to find any
description on this "parameter" ?

Write a Comparator (implements Comparator) which extracts the numeric part
perhaps by RegEx: [0-9]*
Then sort with a Collections.sort method that uses the comparator.
 
D

davidlg

VisionSet said:
Claus said:
I need some help with a sorting problem.

I want to sort strings that might have number inside.

Ie: "cbr1", "cbr2", "cbr10".

If I sort the above strings, I get this result:
"cbr1", "cbr10", "cbr2"
I wanted this order:
"cbr1", "cbr2", "cbr10".

I plan to use RuleBasedCollator, but can't get the collation rules
right for this problem.

Can anybody help me ?

By the way: how does the "?" work in a rule - I can't seem to find any
description on this "parameter" ?

Write a Comparator (implements Comparator) which extracts the numeric part
perhaps by RegEx: [0-9]*
Then sort with a Collections.sort method that uses the comparator.
A great suggestion Mike. I like it the best. Unless the OP can somehow use
zero's as someone earlier suggested. Either way I like the Comparator
solution. I use it a lot in my code.

Just my $.02.

-David
 
E

Eric Sosman

VisionSet said:
I need some help with a sorting problem.

I want to sort strings that might have number inside.

Ie: "cbr1", "cbr2", "cbr10".

If I sort the above strings, I get this result:
"cbr1", "cbr10", "cbr2"
I wanted this order:
"cbr1", "cbr2", "cbr10".

I plan to use RuleBasedCollator, but can't get the collation rules
right for this problem.

Can anybody help me ?

By the way: how does the "?" work in a rule - I can't seem to find any
description on this "parameter" ?


Write a Comparator (implements Comparator) which extracts the numeric part
perhaps by RegEx: [0-9]*
Then sort with a Collections.sort method that uses the comparator.

I haven't tried the code myself, but there's a
possibly useful link on this page:

http://sourcefrog.net/projects/natsort/

(Note: "frog," not "forge.")
 
J

JP Martin

Hi C.B.!
I need some help with a sorting problem.
I want to sort strings that might have number inside.
Ie: "cbr1", "cbr2", "cbr10".

I don't know how to use a RuleBasedCollator for that, but what I
suggest is that you parse these strings into arrays of either
substrings or numbers, and then sort that based on a custom
comparator. I include the code below.

I'd be interested to hear if someone has a shorter or simpler
solution.

Cheers,
JP

import java.util.*;

/** Sorts strings, taking numbers into account
* so a10 gets sorted after a2
* (different from what lexicographical order would do).
* This code can handle string with different formats,
* for example abc1, abc1b, 25bcd.
* JP Martin, Oct'04
**/
public class Test implements Comparator {

public int compare(Object lhs, Object rhs) {
String[] l = (String[]) lhs;
String[] r = (String[]) rhs;

for (int i=0; i<l.length; i++) {
if (i>=r.length) return 1;
int aux = compareStr(l,r);
if (aux!=0) return aux;
}
if (r.length>l.length) return -1;
return 0;
}

public int compareStr(String l, String r) {
Double ld, rd;
try {
ld = new Double(l);
rd = new Double(r);
return ld.compareTo(rd);
} catch (Exception e) {
return l.compareTo(r);
}
}

public static String collapse(String[] x) {
StringBuffer aux = new StringBuffer();
for (int i=0; i<x.length; i++)
aux.append(x);
return aux.toString();
}

public static String[] splitNumbers(String x) {
String onlyNumbers = x.replaceAll("\\D+",":");
String onlyLetters = x.replaceAll("\\d+",":");
String[][] y = new String[2][];
y[0] = onlyNumbers.split(":");
y[1] = onlyLetters.split(":");
int s=0;
String[] ret = new String[y[0].length + y[1].length - 1];
if (onlyNumbers.startsWith(":")) s=1;
int j=0;
for (int i=0;i<y.length;i++) {
ret[j++]=y;
if (y[s^1].length>i+1) ret[j++]=(y[s^1][i+1]);
}
return ret;
}

public static void main(String argv[]) {
String[] str={"cbr10","cbr2","cbr1a","cbr3",
"cbr25","cbr1","cbr1b"};
// after we sort this list we'll show:
// cbr1 cbr1a cbr1b cbr2 cbr3 cbr10 cbr25

String[][] split = new String[str.length][];

for (int i=0; i<str.length; i++)
split = splitNumbers( str );

Arrays.sort(split, new Test());

for (int i=0; i<split.length; i++) {
System.out.print(collapse(split) + " ");
}
System.out.println();
}
}
 
M

marcus

Bro
This is incredibly difficult to do correctly -- M$ is constantly
tweaking on their sort technology to meet human expectations, but the
expectations themselves shift. The trouble is anticipating if xm002d
should come between xm002 and xm003, or between xm009b and xm002e.

the natural sort package listed above looks promising, but I believe you
need to thoroughly understand your (or your client's) expectations
before launching into this type of task.

BTW, this is not a "java" type issue, but a human interaction issue.

-- clh
 
A

Alan Moore

Bro
This is incredibly difficult to do correctly -- M$ is constantly
tweaking on their sort technology to meet human expectations, but the
expectations themselves shift. The trouble is anticipating if xm002d
should come between xm002 and xm003, or between xm009b and xm002e.

Funny, when I was working on this problem a while back, I became
convinced that MS introduced "intuitive" sorting just to make my life
difficult. ;)

I know what you mean about expectations, though. While I was working
on it, I was surprised to learn that, if a filename has spaces in it,
the number of spaces is significant. That is, if two filenames have
the same prefix except that one has two spaces where the other has
only one, the one with two spaces sorts first, no matter what comes
after the spaces. I know that's the standard asciibetical sorting
scheme, but I had just assumed they would treat spaces similarly to
numbers. I mean, one space or ten, it just looks like emptiness, and
how can one bit of nothing be more significant than another?

That seems to be the approach Pool took, but it isn't how Windows
Explorer works. Another difference is that, in Explorer, all
punctuation characters sort before numbers (which sort before
letters). In Paour's code, it looks like characters other than digits
and spaces are simply compared asciibetically. So there's a big
difference between Pool's "natural" sort order and MS's "intuitive"
ordering. Of course, the OP never said he wanted to emulate Windows
Explorer, so it probably doesn't matter.
 
M

marcus

Well -- whenever I can prompt a newbie into thinking a little deeper
about a project I feel I've done my duty. This one is a bottomless pit,
though, and leads to bizarre innovations like little paper-clip men
popping up and quipping "is this sort no-break-space sensitive?"
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top