Search in sorting array

A

Asen Bozhilov

I've to search in sorting array. Possible input data for search is:

[0, 999]

function EGN(egn) {
this.value = egn.split('');

this._year = +egn.slice(0, 2);
this._month = +egn.slice(2, 4);
this._date = +egn.slice(4, 6);
this._order = +egn.slice(6, 9);
}

EGN.prototype = {
constructor : EGN,
regionIds : [-1, 43, 93, 139, 169, 183, 217, 233, 281, 301, 319,
341, 377, 395, 435, 501, 527, 555, 575, 601, 623, 721, 751, 789, 821,
843, 871, 903, 925, 999],

getRegion : function() {
var start = 0,
end = this.regionIds.length - 1,
middle,
lower_b,
upper_b;

while(start <= end)
{
middle = Math.floor((end + start) / 2);
lower_b = this.regionIds[middle];
upper_b = this.regionIds[middle + 1];
if (this._order > upper_b)
{
start = middle + 1;
}
else if (this._order < lower_b)
{
end = middle - 1;
}
else {
return middle;
}
}
return -1;
}
};


var my = new EGN('8509306247');
window.alert(my.regionIds[my.getRegion()]);

The string parameter of `EGN' constructor is identifier in my country.
Each person has identifier like that. Ten digit of identifier give
information about person.

1, 2 => Are year of born
3, 4 => Are month of born. If month > 20 the person is born before
1900 year. If menth > 40 the person is born after 2000 year.
5, 6 => Are date of born.
7, 8, 9 => Are order and region, in which person been born. If is an
even number, the person is man. Otherwise is woman. Each region has
range for people in that region. I've to get lower boundary of region,
because I need to print region name of the person and order for
region.
10 => Is only for controlling valid identifier, and calculate on other
9. Algorithm is specified by administration in my country.

At the moment I use binary search. `regionIds' contains ranges of
every region. I need to get lower boundary of region.



If there are any optimization of my code, I'll be glad to see. Any
suggestions and corrections are welcome.
Thanks.
 
J

Jorge

If there are any optimization of my code, I'll be glad to see. Any
suggestions and corrections are welcome.

middle = Math.floor((end + start) / 2);
--> middle= ((end + start) / 2) | 0;
 
R

Richard Cornford

On Mar 25, 10:06 am, Jorge wrote:
middle = Math.floor((end + start) / 2);
--> middle= ((end + start) / 2) | 0;

Then why not the simpler:-

middle= ((end + start) >>> 1);

-?

Richard.
 
A

Asen Bozhilov

Richard said:
Then why not the simpler:-

middle= ((end + start) >>> 1);

Yes, in my case it's possible to use that. Thanks. But that approach
is harmful for biggest arrays because `>>>` convert operands
`ToUint32`, which mean can truncate bits and return unexpected
result.

var len = Math.pow(2, 32) - 1;
window.alert((len + 1) >>> 1); //0
window.alert(Math.floor((len + 1) / 2)); //2147483648
 
A

Antony Scriven

I've to search in sorting array [regionIds]. Possible
input data for search is:

[0, 999]

EGN.prototype = {
constructor : EGN,
regionIds : [-1, 43, 93, 139, 169, 183, 217, 233, 281, 301, 319,
341, 377, 395, 435, 501, 527, 555, 575, 601, 623, 721, 751, 789, 821,
843, 871, 903, 925, 999],

[...]

};

[...]

At the moment I use binary search. `regionIds' contains
ranges of every region. I need to get lower boundary of
region.

If there are any optimization of my code, I'll be glad to
see. Any suggestions and corrections are welcome.
Thanks.

It depends what you mean by optimization. If you mean code
simplicity then I'd just do a simple linear scan of the
array. It would only be three or four lines of code.

If you mean speed, I'd also try a linear scan and do some
profiling. Assuming an even distribution and given the fact
that regionIds is not a large array, I wouldn't be surprised
if a linear scan were of a comparable speed to the binary
search on average. If some inputs occur more frequently than
others you could order the regionIds array appropriately.
Then I'd expect a linear scan to be significantly faster on
average.

Another option is to populate regionIds with an entry for
every possible input. Then you'd just have a simple lookup
operation. The trade-off is maintainability, though you
could make generating the array part of a build script. --Antony
 
R

Richard Cornford

Yes, in my case it's possible to use that. Thanks. But that
approach is harmful for biggest arrays because `>>>` convert
operands `ToUint32`, which mean can truncate bits and return
unexpected result.

var len = Math.pow(2, 32) - 1;
window.alert((len + 1) >>> 1); //0
window.alert(Math.floor((len + 1) / 2)); //2147483648

Bigger arrays are fine, by definition. You are working on array
indexes, which are constrained by: a property name P is an 'array
index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
3rd Ed. Section 15.4). Thus if you initially set - start - to zero and
- end - to (array.length - 1), and then all values subsequently
assigned to - start - and - end - are integers within that range, as
they should be, then the ToUint32 operation implied by the >>>
operation is safe.

(The ToInt32 operation implied by - x|0 - is not so theoretically
safe, but should be considered in relation to how big, in memory
consumption terms, an array with two to the power of thirty one
elements would be (particularly on an OS with a 32 bit memory map).)

Richard.
 
A

Asen Bozhilov

Richard said:
Bigger arrays are fine, by definition. You are working on array
indexes, which are constrained by: a property name P is an 'array
index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
3rd Ed. Section 15.4).

I agree with you and documentation in that point. But in binary search
`>>>` it can give unexpected result when you computing `middle' of the
part in which you looking. If you searching in interval:

(arr.length / 2, arr.length)

How can you compute next middle value, for array which length is equal
to (2 ^ 32) - 1?

var arr = new Array(Math.pow(2, 32) - 1),
end = arr.length - 1,
start = end >>> 1;

window.alert((end + start) >>> 1); //1073741822
window.alert(Math.floor((end + start) / 2)); //3221225470
 
J

Jorge

I agree with you and documentation in that point. But in binary search
`>>>` it can give unexpected result when you computing `middle' of the
part in which you looking. If you searching in interval:

(arr.length / 2, arr.length)

How can you compute next middle value, for array which length is equal
to (2 ^ 32) - 1?

var arr = new Array(Math.pow(2, 32) - 1),
    end = arr.length - 1,
    start = end >>> 1;

window.alert((end + start) >>> 1); //1073741822
window.alert(Math.floor((end + start) / 2)); //3221225470

((end + start) / 2) >>> 0
--> 3221225470
 
J

Jorge

I agree with you and documentation in that point. But in binary search
`>>>` it can give unexpected result when you computing `middle' of the
part in which you looking. If you searching in interval:

(arr.length / 2, arr.length)

How can you compute next middle value, for array which length is equal
to (2 ^ 32) - 1?

var arr = new Array(Math.pow(2, 32) - 1),
    end = arr.length - 1,
    start = end >>> 1;

window.alert((end + start) >>> 1); //1073741822
window.alert(Math.floor((end + start) / 2)); //3221225470

start= end= parseInt("ffffffff", 16);
--> 4294967295
((start + end) / 2) >>> 0
--> 4294967295 (toUint32, not truncated, 0xffffffff)

OTOH:

(start + end) >>> 1
--> 2147483647 (toUint32, truncated, 0x7fffffff)

((start + end) / 2) | 0
--> -1 (toInt32, not truncated, 0xffffffff)
 
S

Scott Sauyet

I've to search in sorting array [regionIds].
[ ... ]
At the moment I use binary search. `regionIds' contains
ranges of every region. I need to get lower boundary of
region.

If there are any optimization of my code, I'll be glad to
see. Any suggestions and corrections are welcome.

It depends what you mean by optimization. If you mean code
simplicity then I'd just do a simple linear scan of the
array. It would only be three or four lines of code.

If you mean speed, I'd also try a linear scan and do some
profiling. Assuming an even distribution and given the fact
that regionIds is not a large array, I wouldn't be surprised
if a linear scan were of a comparable speed to the binary
search on average. If some inputs occur more frequently than
others you could order the regionIds array appropriately.
Then I'd expect a linear scan to be significantly faster on
average.

Another option is to populate regionIds with an entry for
every possible input. Then you'd just have a simple lookup
operation. The trade-off is maintainability, though you
could make generating the array part of a build script. --Antony

Or if that region list is pretty well fixed, then you could also hard-
code the binary search:

var o = this._order
if (o < 501) {
if (o < 233) {
// ...
} else {
// ...
}
} else {
if (o < 751) {
// ...
} else {
// ...
}
}

You could even generate such a function fairly easily from your array
and "new Function()". This would presumably be at least somewhat more
efficient.

-- Scott
 
L

Lasse Reichstein Nielsen

Asen Bozhilov said:
Yes, in my case it's possible to use that. Thanks. But that approach
is harmful for biggest arrays because `>>>` convert operands
`ToUint32`, which mean can truncate bits and return unexpected
result.

Actually, using ToUint32 should be the correct thing to do for array
indices, since they are also uint32's.

The overflow is a problem, though [1].
You should do it as:
var middle = start + ((end - start) >>> 1);
This works correctly for all uint32 values of start and end (start <= end).

/L

[1] see http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
Notice that Bloch's second version:
int mid = (low + high) >>> 1 ;
does not work in JavaScript, since it relies on low and high being signed
and positive, and JavaScript array indices are not.
 
D

Dr J R Stockton

In comp.lang.javascript message <c27c69fd-8799-44c7-8561-32d557a23877@f8
g2000yqn.googlegroups.com>, Thu, 25 Mar 2010 04:27:38, Richard Cornford
On Mar 25, 10:06 am, Jorge wrote:


Then why not the simpler:-

middle= ((end + start) >>> 1);

For even vaguely reasonable data, the >> operator should suffice. If
China were divided into regions with only one person in each, 2^31
regions would suffice.

The OP needs to consider the amount of data to be searched and the
amount of other work to be done per search. It seems very possible that
optimising the search will make very little difference overall. Note
that if this routine is to be useful there must be a means of inputting
data manually or from disc, and a means of outputting data, to screen,
printer, or eyeball - all of which are slow operations.

Another approach would be to convert the array to a string of constant-
width entries at initialisation : "-01,043,093,139,...,999" and then to
use indexOf. For a reasonably short list, that could be quicker.
RegNum = thatString.indexOf(PaddedValue) >> 2
Using a base-36 string would be faster, since entries would only need
two characters.

The fastest way has already been suggested : construct a direct lookup
table of 1000 entries.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top