sorting strings with embedder digits

T

tony

Currently I need to sort a list of filenames such as 'track 11.mp3',
'track 4.mp3'. My problem is that a simple sort will put track 11 first
because it's alphabetical. My approach seems tortuous - split around
any digits, then from there build an array holding each part in turn.
Once that is done, you can compare the array entries and where digits
are involved, compare them numerically.

Does anyone know of a neater way to do this?

Tony
 
D

Donkey Hottie

Currently I need to sort a list of filenames such as 'track 11.mp3',
'track 4.mp3'. My problem is that a simple sort will put track 11 first
because it's alphabetical. My approach seems tortuous - split around
any digits, then from there build an array holding each part in turn.
Once that is done, you can compare the array entries and where digits
are involved, compare them numerically.

Does anyone know of a neater way to do this?

Sure. A custom Comparator. Should be quite straightforward.
 
T

Tom Anderson

Sure. A custom Comparator. Should be quite straightforward.

It'll have to do the same work under the hood, though, so i don't see why
it would be any more straightforward than what Tony is currently doing.
Still, it's a way of packaging the logic, and it might be neater than
however he currently has it packaged.

I'd suggest implementing a Collator rather than a Comparator, so you can
make CollationKeys, and so avoid having to repeat the analysis of the
strings for each comparison. Having written your FilenameCollator, you can
do:

public void sortUsingCollator(List<String> strings, Collator collator) {
CollationKey[] keys = new CollationKey[strings.size()];
for (int i = 0; i < keys.length; ++i) {
keys = collator.getCollationKey(string);
}
Arrays.sort(keys);
for (int i = 0; i < keys.length; ++i) {
strings.set(i, keys.getSourceString());
}
}

It's a shame there isn't a method to do this in the standard library (is
there?), since it's absolutely generic.

tom
 
R

Roedy Green

Currently I need to sort a list of filenames such as 'track 11.mp3',
'track 4.mp3'. My problem is that a simple sort will put track 11 first
because it's alphabetical. My approach seems tortuous - split around
any digits, then from there build an array holding each part in turn.
Once that is done, you can compare the array entries and where digits
are involved, compare them numerically.

see the Comparator in http://mindprod.com/products1.html#INI
 
E

Eric Sosman

bugbear said:
Still fails; if the preceding text fields are different lengths,
the digits are in different positions and don't "play nice"

If the preceding text fields are of different lengths, the
two Strings disagree before the values of the digit strings need
be considered. "XYZ10" vs. "XY9" never needs to worry about
"10" vs. "9". It might worry about "Z" vs. "9" or about "Z"
vs 9, but that's all.

More directly to the O.P.'s question, see

http://sourcefrog.net/projects/natsort/

(yes, that's "frog" and not "forge"), where there's C code to
compare strings this way. There's also a link to a Java version

http://pierre-luc.paour.9online.fr/NaturalOrderComparator.java

.... but I haven't used it myself.
 
B

Bent C Dalager

Currently I need to sort a list of filenames such as 'track 11.mp3',
'track 4.mp3'. My problem is that a simple sort will put track 11 first
because it's alphabetical. My approach seems tortuous - split around
any digits, then from there build an array holding each part in turn.
Once that is done, you can compare the array entries and where digits
are involved, compare them numerically.

Does anyone know of a neater way to do this?

I wrote something like this once, I have put it here:
http://www.pvv.org/~bcd/StringNumberComparator.java

Cheers,
Bent D
 
T

tony

I wrote something like this once, I have put it here:
http://www.pvv.org/~bcd/StringNumberComparator.java

Cheers,
Bent D

I'd actually got as far as using a custom comparator. I didn't want to
assume that the initial text was the same in each case, so the approach
was:
1: If there were no digits, just use the String compareTo otherwise
2: split each string with split("[0-9]") and remove any empty entries.
3: rebuild the array inserting any digit sequences by finding their
position, extracting them and putting hem in the right place
4: Compare the matching array elements until a decision could be made.
The digit strings are identified and compared by parsing as integers.

There were a few hiccups but it seems to work. I've modified it slightly
to have one for files using filenames as well.

Tony
 
N

Nigel Wade

I wrote something like this once, I have put it here:
http://www.pvv.org/~bcd/StringNumberComparator.java

Cheers,
Bent D

I'd actually got as far as using a custom comparator. I didn't want to
assume that the initial text was the same in each case, so the approach
was:
1: If there were no digits, just use the String compareTo otherwise 2:
split each string with split("[0-9]") and remove any empty entries. 3:
rebuild the array inserting any digit sequences by finding their
position, extracting them and putting hem in the right place 4: Compare
the matching array elements until a decision could be made. The digit
strings are identified and compared by parsing as integers.

There were a few hiccups but it seems to work. I've modified it
slightly to have one for files using filenames as well.

Tony

I would have thought it would be much simpler and easier to use a regular
expression and a match group. Something like:

Pattern p = Pattern.compile(".*?(\\d+).*");
Matcher m = p.matcher(yourString);
if ( m.matches() ) {
// The Pattern matched yourString, and contains a set of digits.
// m.group(1) should contain the matched string of digits.
}

You can tweak the RE to get a more accurate match if it's not exactly as
required for your string contents.
 
E

Eric Sosman

Nigel said:
Currently I need to sort a list of filenames such as 'track 11.mp3',
'track 4.mp3'. My problem is that a simple sort will put track 11
first because it's alphabetical. My approach seems tortuous - split
around any digits, then from there build an array holding each part in
turn. Once that is done, you can compare the array entries and where
digits are involved, compare them numerically.

Does anyone know of a neater way to do this?
I wrote something like this once, I have put it here:
http://www.pvv.org/~bcd/StringNumberComparator.java

Cheers,
Bent D
I'd actually got as far as using a custom comparator. I didn't want to
assume that the initial text was the same in each case, so the approach
was:
1: If there were no digits, just use the String compareTo otherwise 2:
split each string with split("[0-9]") and remove any empty entries. 3:
rebuild the array inserting any digit sequences by finding their
position, extracting them and putting hem in the right place 4: Compare
the matching array elements until a decision could be made. The digit
strings are identified and compared by parsing as integers.

There were a few hiccups but it seems to work. I've modified it
slightly to have one for files using filenames as well.

Tony

I would have thought it would be much simpler and easier to use a regular
expression and a match group. Something like:

Pattern p = Pattern.compile(".*?(\\d+).*");
Matcher m = p.matcher(yourString);
if ( m.matches() ) {
// The Pattern matched yourString, and contains a set of digits.
// m.group(1) should contain the matched string of digits.
}

You can tweak the RE to get a more accurate match if it's not exactly as
required for your string contents.

"file12.ver3"
"file12.ver10"
 
R

Roedy Green


for alpha-numeric preprocess the keys into an alpha and numeric key.

/**
* Compare alphabetically with numeric tie breaker.
* Defines default the sort order for Item Objects.
* Compare this Item with another Item.
* Compares alphaKey then numericKey.
* Informally, returns (this-other) or +ve if this is more
positive than other.
*
* @param other other Item to compare with this one
*
* @return +ve if this&gt;other, 0 if this==other, -ve if
this&lt;other
*/
public final int compareTo( Item other )
{
int diff = this.alphaKey.compareToIgnoreCase(
other.alphaKey );
if ( diff != 0 )
{
return diff;
}
return Long.signum( this.numericKey - other.numericKey );
}
 
T

Tom Anderson

return Long.signum( this.numericKey - other.numericKey );

Never do that! Consider this.numericKey = Long.MIN_VALUE and
other.numericKey = 1 - the answer should be negative, because the former
is less than the latter, but it will be positive.

Or are the numericKeys ints, and you're thinking that putting the
subtraction inside the call to signum will coax them to long, thus
avoiding this problem? It doesn't. You could achieve this by casting one
of the operands to long, though.

However, i suggest doing:

return new Long(this.numericKey).compareTo(other.numericKey);

And trusting that the optimiser will flatten out the boxing and the method
call.

tom
 
L

Lew

Tom said:
return new Long(this.numericKey).compareTo(other.numericKey);

And trusting that the optimiser will flatten out the boxing and the
method call.

Or you can use Long.valueOf().
 
J

John B. Matthews

[QUOTE="Lew said:
return new Long(this.numericKey).compareTo(other.numericKey);

And trusting that the optimiser will flatten out the boxing and the
method call.

Or you can use Long.valueOf().[/QUOTE]

Reflecting the 1.5+ API, FindBugs elaborates: "Using [a constructor] is
guaranteed to always result in a new object, whereas valueOf() allows
caching of values to be done by the compiler, class library, or JVM.
Using cached values avoids object allocation and the code will be
faster. Values between -128 and 127 are guaranteed to have
corresponding cached instances, and using valueOf is approximately 3.5
times faster than using the constructor. For values outside the
constant range, the performance of both styles is the same. Unless the
class must be compatible with JVMs predating Java 1.5, use either
autoboxing or the valueOf() method when creating instances of Long,
Integer, Short, Character, and Byte."

To mitigate the Comparator overhead mentioned by Tom, I thought to wrap
the name in a class that implements Comparable:

<code>
import java.util.*;
import java.util.regex.Pattern;

/** @author John B. Matthews */
public class ComparableTest {

public static void main(String[] args) {
List<TrackName> list = new ArrayList<TrackName>(Arrays.asList(
new TrackName("track 10.mp3"),
new TrackName("title-1.ogg"),
new TrackName("name5.raw"),
new TrackName("file.mp3"),
new TrackName("dreck 9876543210.wav")
));
print(list, Sort.Alphabetic);
print(list, Sort.Numeric);
}

private static void print(List<TrackName> list, Sort sort) {
TrackName.sort = sort;
Collections.sort(list);
for (TrackName name : list) {
System.out.println(name);
}
}

private enum Sort { Alphabetic, Numeric; }

/** NB: this numeric ordering is inconsistent with equals. */
private static class TrackName implements Comparable<TrackName> {

private static final Pattern p = Pattern.compile("\\D+");
private static Sort sort = Sort.Numeric;
private final String name;
private Integer value = Integer.MIN_VALUE;

TrackName(String name) {
this.name = name;
Scanner s = new Scanner(trimExt(name.trim()));
try {
s.skip(p);
this.value = s.nextInt();
} catch (InputMismatchException e) {
System.err.println("Bad track: " + name);
value = Integer.MAX_VALUE;
} catch (NoSuchElementException e) {
System.err.println("No track: " + name);
}
}

private String trimExt(String s) {
int i = s.lastIndexOf('.');
return s.substring(0, i < 0 ? s.length() : i);
}

@Override
public int compareTo(TrackName o) {
if (TrackName.sort == Sort.Numeric) {
return this.value.compareTo(o.value);
} else {
return this.name.compareTo(o.name);
}
}

@Override
public String toString() {
return name;
}
}
}
</code
 
N

Nigel Wade

Nigel Wade wrote:

"file12.ver3"
"file12.ver10"

Which can be matched correctly by tweaking the RE, as I said. The RE I
provided was nothing more than an example. Given that I don't know what
the filename specification is, I can hardly provide a definitive RE, can
I?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,040
Latest member
papereejit

Latest Threads

Top