Sorting strings with characters and numbers

C

Carsten Zerbst

Hello,

I'd need to sort some strings using the order as given by Tcls lsort
command with -dictionary option:

% lsort -dictionary {a1 a2 a3 a4 a10 a20 a30}
a1 a2 a3 a4 a10 a20 a30


In java I get something like this

bsh % print(l);
[a1, a2, a3, a10, a20, a30]
bsh % Collections.sort(l);
bsh % print(l);
[a1, a10, a2, a20, a3, a30]
bsh %

I looked at the RuleBasedCollator but found to way to achive this
sorting. As this is a standard problem, there must be a solution
available somewhere ?

Thanks, Carsten
 
M

Marko Lahma

bsh % print(l);
[a1, a2, a3, a10, a20, a30]
bsh % Collections.sort(l);
bsh % print(l);
[a1, a10, a2, a20, a3, a30]
bsh %

The brute force way could be creating a java.util.Comparator for String
objects which could sort with your custom needs (RuleBasedCollator
implements it). The example you gave would be easy if all words just end
with numerical value.

I don't think RuleBasedCollator would be right solution anyways. Maybe
you could even port the tcl's lsort to java and share it! ;)

-Marko
 
R

Roedy Green

I'd need to sort some strings using the order as given by Tcls lsort
command with -dictionary option:

A have no idea what a Tcl lsort is, but given your European name, I
will guess your problem is you need to sort alphabetically putting the
accented letters in a different place than Unicode would naturally
place them.


see http://mindprod.com/jgloss/sort.html

particularly the reference to java.text.Collator and
java.text.CollationKey
 
R

Roedy Green

% lsort -dictionary {a1 a2 a3 a4 a10 a20 a30}
a1 a2 a3 a4 a10 a20 a30

Perhaps what you really want to do is split each field in two, and
sort alphabetically on the alpha part and numerically on the numeric
part. It would be fastest to do this split before the sort starts
rather than on every compare.
 
C

Carsten Zerbst

Hello,

for the record, this is the Collator implementation I wrote.

Bye, Carsten
=================


public int compare( String source, String target ) {
// a tragical error in most code pages ß comes befor ä,ü,ö,
// but must be sorted after sz. Replace it for comparison
// by sz
source = source.replaceAll( "ß", "ss" );
target = target.replaceAll( "ß", "ss" );

// ä equals ae
source = source.replaceAll( "ä", "ae" );
source = source.replaceAll( "Ä", "Ae" );
target = target.replaceAll( "ä", "ae" );
target = target.replaceAll( "Ä", "Ae" );

// ö equals oe
source = source.replaceAll( "ö", "oe" );
source = source.replaceAll( "Ö", "Oe" );
target = target.replaceAll( "ö", "oe" );
target = target.replaceAll( "Ö", "Oe" );

// ü equals ue
source = source.replaceAll( "\u00fc", "ue" );
source = source.replaceAll( "\u00dc", "Ue" );
target = target.replaceAll( "\u00fc", "ue" );
target = target.replaceAll( "\u00dc", "Ue" );


if ( source.equals( target ) ) {
return 0;
}

int index = -1;

// compare char by char until the first difference occures
int ls = source.length( );
int lt = target.length( );

while ( true ) {

// reached end of one string ?
if ( ++index > ls ) {
return -10 - index;
}

if ( index > lt ) {
return 10 + index;
}

// common substring ?
if ( !( source.substring( 0, index ).equals( target.substring( 0, index ) ) ) ) {
break;
}
}

index--;

//look at the remaining difference
char sDiffChar = source.charAt( index );
char tDiffChar = target.charAt( index );

// both are letters, compare using unicode
if ( Character.isLetter( sDiffChar ) && Character.isLetter( tDiffChar ) ) {
return ( sDiffChar < tDiffChar ) ? ( -100 ) : 100;
}

// one is digit, one is letter, digit first
if ( Character.isLetterOrDigit( sDiffChar ) && Character.isLetterOrDigit( tDiffChar ) ) {
return Character.isDigit( sDiffChar ) ? ( -1000 ) : 1000;
}

// both are digit, try to find the longest possible integers
if ( Character.isDigit( sDiffChar ) && Character.isDigit( tDiffChar ) ) {
StringBuffer sb = new StringBuffer( );
sb.append( sDiffChar );

StringBuffer tb = new StringBuffer( );
tb.append( tDiffChar );

boolean foundDigit = true;
while ( foundDigit ) {
foundDigit = false;
if ( Character.isDigit( source.charAt( ++index ) ) ) {
sb.append( source.charAt( index ) );
foundDigit = true;
}

if ( Character.isDigit( target.charAt( index ) ) ) {
tb.append( target.charAt( index ) );
foundDigit = true;
}
}

int snumber = Integer.parseInt( sb.toString( ) );
int tnumber = Integer.parseInt( tb.toString( ) );

return ( snumber < tnumber ) ? ( -10000 ) : 10000;
}

return -10000;
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top