Sorting alphanumeric

B

Bob

Sorting the following alphanumerics using myArray.sort():

04-273-0001
04-272-0001
04-272-0003
04-272-0001
04-273-0001

Results in:

04-272-0001
04-272-0001
04-273-0001
04-273-0001
04-272-0003 <--
^

I cannot assume a standard format nor can I assume what alphanumeric
characters might be in the array. How do I sort this accuratly when
these values could contain any alpha numeric character? Is this
possible without creating a crazy hierarchy or characters and exception
rules??

Thanks.
Bob - Cerner Corp.
 
E

Evertjan.

Bob wrote on 08 dec 2004 in comp.lang.javascript:
Sorting the following alphanumerics using myArray.sort():

04-273-0001
04-272-0001
04-272-0003
04-272-0001
04-273-0001

Results in:

04-272-0001
04-272-0001
04-273-0001
04-273-0001
04-272-0003 <--

That is numeric sorting:

4-272-1 = -269
4-273-1 = -270
4-272-3 = -271
 
M

Mick White

Bob said:
Sorting the following alphanumerics using myArray.sort():

04-273-0001
04-272-0001
04-272-0003
04-272-0001
04-273-0001

Results in:

04-272-0001
04-272-0001
04-273-0001
04-273-0001
04-272-0003 <--
^

I cannot assume a standard format nor can I assume what alphanumeric
characters might be in the array. How do I sort this accuratly when
these values could contain any alpha numeric character? Is this
possible without creating a crazy hierarchy or characters and exception
rules??

You'll have to roll your own sort function:
A= ["04-273-0001","04-272-0001","04-272-0003","04-272-0001","04-273-0001"]

function bobSort(a,b){
c=Number(a.split("-")[0])
d=Number(b.split("-")[0])
if (c==d){
c=Number(a.split("-")[1])
d=Number(b.split("-")[1])
}
if (c==d){
c=Number(a.split("-")[2])
d=Number(b.split("-")[2])
}
return c-d
}

alert(A.sort(bobSort))

The following format may be superior, though:

c=parseInt(a.split("-")[0],10)
d=parseInt(b.split("-")[0],10)

....

Mick
 
R

RobG

Mick White wrote:
[...]
You'll have to roll your own sort function:
[...]

A good start Mick that got me thinking. I can't believe JavaScript
doesn't have a generic sort that works on alpha-numeric strings. So I
had a hack at your code and came up with what's below. My contribution
to the world is to kick off a generic sort function.

Whether numbers sort ahead of non-numbers can be modified to suit by
making all comparisons using ASCII codes or by changing the charCodeAt
lines slight to add or subtract a constant, or multiply by -1;

My modification of your script handles any format string. To handle
numbers and non-numbers, I change non-numbers to their ASCII code and
compare that to single digits. Not great, but it does sort OK - caveat
below.

If the sort runs out of characters to compare, it should put the
shortest one ahead of the longest. Different browsers require a
different return value: Safari needs -1, Firefox needs 0. I don't know
how to discriminate using feature detection - or should I be returning
something else?

Also, this causes a difference in the sort order for different browsers
(arggghh).

I got it this far, over to the gurus. An improvement would be to have
two sorts: sortAsNum() and sortAsChar().

Test results (all on Mac):
Safari: fine
Camino: fine
Firefox: need to change return -1 to return 0
IE 5.2: fails
Netscape: need to change return -1 to return 0
Opera: Sometimes needs -1, sometimes 0 depending on whether some
entries start with alphas or not.

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<HTML>
<HEAD><title>Sort play</title>
<script type="text/javascript">
function shortestOf(a,b) {
return (a.length <= b.length)? a.length:b.length;
}
function bobSort(a,b){
var z = shortestOf(a,b);
for (var i=0; i<shortestOf(a,b); i++) {
var c = a.split('');
var d = b.split('');

if ( c != d ) {
c = (isNaN(c))? c.charCodeAt(0):c;
d = (isNaN(d))? d.charCodeAt(0):d;
var x = c-d;
return c - d;
}

}
return -1;
}


function saySort(inp) {
var p = inp.split('\n');
alert(p);
alert(p.sort(bobSort).join('\n'));
}
</script>

</HEAD>
<BODY>
<form action="">
<textarea cols="40" rows="20" name="stuff">04-273-0005
040-272-0001
040-272-0003
040-272-0001
04a-2y2-00c0
04-273-0001
04a-222-00a0
04a-222-00b0
04a-222-00b1
04z-2x2-00a0
04a-2x2-00a0
04a-2y2-00a0
04a-2y2-00c0
04a-2y2-00b0
04a
0ba-cc2&&8##y2-00b0
</textarea>
<input type="button" value="saySort()" onclick="
saySort(this.form.stuff.value);
"> <input type="reset">
</form>
</BODY>
</HTML>
 
M

Mick White

RobG said:
Mick White wrote:
[...]
You'll have to roll your own sort function:
[...]

[snip]


<script type="text/javascript">
function shortestOf(a,b) {
return (a.length <= b.length)? a.length:b.length;
}
function bobSort(a,b){
var z = shortestOf(a,b);
for (var i=0; i<shortestOf(a,b); i++) {
var c = a.split('');
var d = b.split('');

if ( c != d ) {
c = (isNaN(c))? c.charCodeAt(0):c;
d = (isNaN(d))? d.charCodeAt(0):d;
var x = c-d;
return c - d;
}

}
return -1;
}


....
I didn't read the OP's post carefully, in which he did mention alpha
characters. I'll run some tests on your code, but it seems as if I have
the same battery of browsers as you do. What I've done in the past is to
isolate alphas from numeric:
http://www.mickweb.com/football/aleague/profiles.html (Power Ratings)
I use sortNumerical2() which assigns the variables text1 and text2 an
arbitary low number to non-numeric array entries. (c & d in the OP's case).

Mick
 
D

Dr John Stockton

JRS: In article <[email protected]>,
dated Wed, 8 Dec 2004 11:52:38, seen in Bob
Sorting the following alphanumerics using myArray.sort():

04-273-0001
04-272-0001
04-272-0003
04-272-0001
04-273-0001

Results in:

04-272-0001
04-272-0001
04-273-0001
04-273-0001
04-272-0003 <--
^

I cannot assume a standard format nor can I assume what alphanumeric
characters might be in the array. How do I sort this accuratly when
these values could contain any alpha numeric character? Is this
possible without creating a crazy hierarchy or characters and exception
rules??

You do not say whether your array has been loaded with strings or with
numerical expressions; that is most important. Neither do you say what
system(s) you are testing on.

As strings, they should sort lexically to :
04-272-0001,04-272-0001,04-272-0003,04-273-0001,04-273-0001
As expressions, by value to :
-269,-269,-270,-270,-271
The result you show should not, IMHO, be obtained.
 
R

RobG

Mick said:
RobG said:
Mick White wrote:
[...]
You'll have to roll your own sort function:

[...]

[snip]



<script type="text/javascript">
function shortestOf(a,b) {
return (a.length <= b.length)? a.length:b.length;
}
function bobSort(a,b){
var z = shortestOf(a,b);
for (var i=0; i<shortestOf(a,b); i++) {
var c = a.split('');
var d = b.split('');

if ( c != d ) {
c = (isNaN(c))? c.charCodeAt(0):c;
d = (isNaN(d))? d.charCodeAt(0):d;
var x = c-d;
return c - d;
}

}
return -1;
}


....
I didn't read the OP's post carefully, in which he did mention alpha
characters. I'll run some tests on your code, but it seems as if I have
the same battery of browsers as you do. What I've done in the past is to
isolate alphas from numeric:
http://www.mickweb.com/football/aleague/profiles.html (Power Ratings)
I use sortNumerical2() which assigns the variables text1 and text2 an
arbitary low number to non-numeric array entries. (c & d in the OP's case).


An interesting approach. BTW, if you make your DOB ISO8601, they will
sort much better (e.g. 01/21/1980 becomes 1980-10-21, a fairly trivial
conversion I think). The dates will then sort properly as either chars
or numbers (noting that all single digit numbers must be zero-padded to
two digits). You could use the conversion just for the sort, then put
them back as "US Dates" if that makes your users more comfortable.

A small fix to my routine is to change:

return -1;

to

return (a.length-b.length);

The obvious error came to me whilst I was in the shower. It fixes the
"should I use -1 or zero" problem - I was using completely the wrong
logic. My only defence is that it was about midnight on a very long
day.

The lines:

var z = shortestOf(a,b);
....
var x = c-d;

can both be ditched, they are remnants of development & debug.

shortestOf is modified to return the shortest string, rather than its
length (that's more logical and useful I think).

Finally, the 'split character' can be passed to the function so that
the calling function can say what the delimiter is (comma, return, ...)

I have tested the new version in IE, Firefox and Netscape in Windows
and all is fine. So if the OP is still watching this thread, here is
a generic "I want to sort anything" routine:

<script type="text/javascript">
/* Returns the length of the shortest of two strings */
function shortestOf(a,b) {
return (a.length <= b.length)? a:b;
}

function bobSort(a,b){
// Only iterate for the length of the shortest string
for (var i=0; i<shortestOf(a,b).length; i++) {
var c = a.split('');
var d = b.split('');
// When we get to the first non-identical character,
// sort on it
if ( c != d ) {
c = (isNaN(c))? c.charCodeAt(0):c;
d = (isNaN(d))? d.charCodeAt(0):d;
return c-d;
}
}
// If get to the end of the shortest string
// and all evaluated chars are the same...
return (a.length-b.length);
}

/* inp is an string of values separated by splitChar */
function saySort(inp,splitChar) {
// splitChar is the array delimiter
var p = inp.split(splitChar);
alert(p.sort(bobSort).join('\n'));
}
</script>
 
R

RobG

The saga continues...

I was a bit concerned over performance, 300 records takes about 4
seconds to sort on my machine in Firefox, so I did a bit of
optimisation and now 300 records take about 1 second.

Change bobSort to:

function bobSort(a,b){
// Only iterate for the length of the shortest string
var c = a.split('');
var d = b.split('');
for (var i=0; i<shortestOf(a,b).length; i++) {
// When we get to the first non-identical character,
// sort on it
if ( c != d ) {
c = (isNaN(c))? c.charCodeAt(0):c;
d = (isNaN(d))? d.charCodeAt(0):d;
return c-d;
}
}
// If get to the end of the shortest string
// and all evaluated chars are the same...
return (a.length-b.length);
}

Making all comparisons on charCode makes almost zero difference to the
time taken (I thought it would be much quicker), but it does make the
if statement really simple:

if ( c != d ) {
return c.charCodeAt(0) - d.charCodeAt(0);
}

Choose whatever suits. And I'm done. ;-)
 
M

Michael Winter

I haven't been following this thread really - I've been kinda busy
elsewhere in this group - but I will contribute one thing...
I was a bit concerned over performance

[snip]

Well, one way to improve performance is to only call shortestOf once. At
the moment, it's called on *every* iteration of the loop.
for (var i=0; i<shortestOf(a,b).length; i++) {

for(var i = 0, n = shortestOf(a, b).length; i < n; ++i) {

or

for(var i = 0, n = Math.min(a.length, b.length); i < n; ++i) {

[snip]

Mike
 
R

RobG

Michael Winter wrote:
[...]
for(var i = 0, n = shortestOf(a, b).length; i < n; ++i) {

About halved the execution time. My original code was actually almost
identical but I didn't realise how much it affects performance.
for(var i = 0, n = Math.min(a.length, b.length); i < n; ++i) {

Shaved another 10%. Firefox takes about 2.8 seconds for 1,200 records,
IE about 1.8 seconds. The original takes 25 seconds (or so...).

Times are for comparative purposes only.

Thanks Mike.
 
M

Mick White

RobG wrote:

[ snip ]>
Shaved another 10%. Firefox takes about 2.8 seconds for 1,200 records,
IE about 1.8 seconds. The original takes 25 seconds (or so...).

Times are for comparative purposes only.

But "040" sorts before "04" ...
Mick
 
R

RobG

Dr John Stockton wrote:
[...]
Why are you people apparently assuming that the default string sort is
not what is needed?

The OP, having posted, didn't bother to hang around long enough to let
on. Given that a routine was required that just sorted stuff without
regard for any patterns or whether it was alpha, numeric or whatever,
it became a bit of fun to write a generic "sort anything" routine.

Without knowing what the sorted list will be used for, or understanding
any patterns within the data that should be respected by the sort,
assumptions are all we have to go on.

Post away.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,044
Latest member
RonaldNen

Latest Threads

Top