# Search in sorting array

Discussion in 'Javascript' started by Asen Bozhilov, Mar 25, 2010.

1. ### Asen BozhilovGuest

I've to search in sorting array. Possible input data for search is:

[0, 999]

function EGN(egn) {
this.value = egn.split('');

this._year = +egn.slice(0, 2);
this._month = +egn.slice(2, 4);
this._date = +egn.slice(4, 6);
this._order = +egn.slice(6, 9);
}

EGN.prototype = {
constructor : EGN,
regionIds : [-1, 43, 93, 139, 169, 183, 217, 233, 281, 301, 319,
341, 377, 395, 435, 501, 527, 555, 575, 601, 623, 721, 751, 789, 821,
843, 871, 903, 925, 999],

getRegion : function() {
var start = 0,
end = this.regionIds.length - 1,
middle,
lower_b,
upper_b;

while(start <= end)
{
middle = Math.floor((end + start) / 2);
lower_b = this.regionIds[middle];
upper_b = this.regionIds[middle + 1];
if (this._order > upper_b)
{
start = middle + 1;
}
else if (this._order < lower_b)
{
end = middle - 1;
}
else {
return middle;
}
}
return -1;
}
};

var my = new EGN('8509306247');

The string parameter of `EGN' constructor is identifier in my country.
Each person has identifier like that. Ten digit of identifier give

1, 2 => Are year of born
3, 4 => Are month of born. If month > 20 the person is born before
1900 year. If menth > 40 the person is born after 2000 year.
5, 6 => Are date of born.
7, 8, 9 => Are order and region, in which person been born. If is an
even number, the person is man. Otherwise is woman. Each region has
range for people in that region. I've to get lower boundary of region,
because I need to print region name of the person and order for
region.
10 => Is only for controlling valid identifier, and calculate on other
9. Algorithm is specified by administration in my country.

At the moment I use binary search. `regionIds' contains ranges of
every region. I need to get lower boundary of region.

If there are any optimization of my code, I'll be glad to see. Any
suggestions and corrections are welcome.
Thanks.

Asen Bozhilov, Mar 25, 2010

2. ### JorgeGuest

On Mar 25, 8:36 am, Asen Bozhilov <> wrote:
>
> If there are any optimization of my code, I'll be glad to see. Any
> suggestions and corrections are welcome.

middle = Math.floor((end + start) / 2);
--> middle= ((end + start) / 2) | 0;
--
Jorge.

Jorge, Mar 25, 2010

3. ### Richard CornfordGuest

On Mar 25, 10:06 am, Jorge wrote:
<snip>
> middle = Math.floor((end + start) / 2);
> --> middle= ((end + start) / 2) | 0;

Then why not the simpler:-

middle= ((end + start) >>> 1);

-?

Richard.

Richard Cornford, Mar 25, 2010
4. ### JorgeGuest

On Mar 25, 12:27 pm, Richard Cornford <>
wrote:
> On Mar 25, 10:06 am, Jorge wrote:
> <snip>
>
> > middle = Math.floor((end + start) / 2);
> > --> middle= ((end + start) / 2) | 0;

>
> Then why not the simpler:-
>
> middle= ((end + start) >>> 1);

Touché
--
Jorge.

Jorge, Mar 25, 2010
5. ### Asen BozhilovGuest

Richard Cornford wrote:

> Then why not the simpler:-
>
> middle= ((end + start) >>> 1);

Yes, in my case it's possible to use that. Thanks. But that approach
is harmful for biggest arrays because `>>>` convert operands
`ToUint32`, which mean can truncate bits and return unexpected
result.

var len = Math.pow(2, 32) - 1;
window.alert((len + 1) >>> 1); //0
window.alert(Math.floor((len + 1) / 2)); //2147483648

Asen Bozhilov, Mar 25, 2010
6. ### Antony ScrivenGuest

On Mar 25, 7:36am, Asen Bozhilov wrote:

> I've to search in sorting array [regionIds]. Possible
> input data for search is:
>
> [0, 999]
>
> EGN.prototype = {
> constructor : EGN,
> regionIds : [-1, 43, 93, 139, 169, 183, 217, 233, 281, 301, 319,
> 341, 377, 395, 435, 501, 527, 555, 575, 601, 623, 721, 751, 789,

821,
> 843, 871, 903, 925, 999],
>
> [...]
>
> };
>
> [...]
>
> At the moment I use binary search. `regionIds' contains
> ranges of every region. I need to get lower boundary of
> region.
>
> If there are any optimization of my code, I'll be glad to
> see. Any suggestions and corrections are welcome.
> Thanks.

It depends what you mean by optimization. If you mean code
simplicity then I'd just do a simple linear scan of the
array. It would only be three or four lines of code.

If you mean speed, I'd also try a linear scan and do some
profiling. Assuming an even distribution and given the fact
that regionIds is not a large array, I wouldn't be surprised
if a linear scan were of a comparable speed to the binary
search on average. If some inputs occur more frequently than
others you could order the regionIds array appropriately.
Then I'd expect a linear scan to be significantly faster on
average.

Another option is to populate regionIds with an entry for
every possible input. Then you'd just have a simple lookup
operation. The trade-off is maintainability, though you
could make generating the array part of a build script. --Antony

Antony Scriven, Mar 25, 2010
7. ### Richard CornfordGuest

On Mar 25, 12:37 pm, Asen Bozhilov wrote:
> Richard Cornford wrote:
>> Then why not the simpler:-

>
>> middle= ((end + start) >>> 1);

>
> Yes, in my case it's possible to use that. Thanks. But that
> approach is harmful for biggest arrays because `>>>` convert
> operands `ToUint32`, which mean can truncate bits and return
> unexpected result.
>
> var len = Math.pow(2, 32) - 1;
> window.alert((len + 1) >>> 1); //0
> window.alert(Math.floor((len + 1) / 2)); //2147483648

Bigger arrays are fine, by definition. You are working on array
indexes, which are constrained by: a property name P is an 'array
index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
3rd Ed. Section 15.4). Thus if you initially set - start - to zero and
- end - to (array.length - 1), and then all values subsequently
assigned to - start - and - end - are integers within that range, as
they should be, then the ToUint32 operation implied by the >>>
operation is safe.

(The ToInt32 operation implied by - x|0 - is not so theoretically
safe, but should be considered in relation to how big, in memory
consumption terms, an array with two to the power of thirty one
elements would be (particularly on an OS with a 32 bit memory map).)

Richard.

Richard Cornford, Mar 25, 2010
8. ### Asen BozhilovGuest

Richard Cornford wrote:

> Bigger arrays are fine, by definition. You are working on array
> indexes, which are constrained by: a property name P is an 'array
> index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
> 3rd Ed. Section 15.4).

I agree with you and documentation in that point. But in binary search
`>>>` it can give unexpected result when you computing `middle' of the
part in which you looking. If you searching in interval:

(arr.length / 2, arr.length)

How can you compute next middle value, for array which length is equal
to (2 ^ 32) - 1?

var arr = new Array(Math.pow(2, 32) - 1),
end = arr.length - 1,
start = end >>> 1;

window.alert((end + start) >>> 1); //1073741822
window.alert(Math.floor((end + start) / 2)); //3221225470

Asen Bozhilov, Mar 25, 2010
9. ### JorgeGuest

On Mar 25, 2:39 pm, Asen Bozhilov <> wrote:
> Richard Cornford wrote:
> > Bigger arrays are fine, by definition. You are working on array
> > indexes, which are constrained by: a property name P is an 'array
> > index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
> > 3rd Ed. Section 15.4).

>
> I agree with you and documentation in that point. But in binary search
> `>>>` it can give unexpected result when you computing `middle' of the
> part in which you looking. If you searching in interval:
>
> (arr.length / 2, arr.length)
>
> How can you compute next middle value, for array which length is equal
> to (2 ^ 32) - 1?
>
> var arr = new Array(Math.pow(2, 32) - 1),
>     end = arr.length - 1,
>     start = end >>> 1;
>
> window.alert((end + start) >>> 1); //1073741822
> window.alert(Math.floor((end + start) / 2)); //3221225470

((end + start) / 2) >>> 0
--> 3221225470

--
Jorge.

Jorge, Mar 25, 2010
10. ### JorgeGuest

On Mar 25, 2:39 pm, Asen Bozhilov <> wrote:
> Richard Cornford wrote:
> > Bigger arrays are fine, by definition. You are working on array
> > indexes, which are constrained by: a property name P is an 'array
> > index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
> > 3rd Ed. Section 15.4).

>
> I agree with you and documentation in that point. But in binary search
> `>>>` it can give unexpected result when you computing `middle' of the
> part in which you looking. If you searching in interval:
>
> (arr.length / 2, arr.length)
>
> How can you compute next middle value, for array which length is equal
> to (2 ^ 32) - 1?
>
> var arr = new Array(Math.pow(2, 32) - 1),
>     end = arr.length - 1,
>     start = end >>> 1;
>
> window.alert((end + start) >>> 1); //1073741822
> window.alert(Math.floor((end + start) / 2)); //3221225470

start= end= parseInt("ffffffff", 16);
--> 4294967295
((start + end) / 2) >>> 0
--> 4294967295 (toUint32, not truncated, 0xffffffff)

OTOH:

(start + end) >>> 1
--> 2147483647 (toUint32, truncated, 0x7fffffff)

((start + end) / 2) | 0
--> -1 (toInt32, not truncated, 0xffffffff)

--
Jorge.

Jorge, Mar 25, 2010
11. ### Scott SauyetGuest

On Mar 25, 9:03 am, Antony Scriven wrote:
> On Mar 25, 7:36am, Asen Bozhilov wrote:
>
>> I've to search in sorting array [regionIds].
>> [ ... ]
>> At the moment I use binary search. `regionIds' contains
>> ranges of every region. I need to get lower boundary of
>> region.
>>
>> If there are any optimization of my code, I'll be glad to
>> see. Any suggestions and corrections are welcome.

>
> It depends what you mean by optimization. If you mean code
> simplicity then I'd just do a simple linear scan of the
> array. It would only be three or four lines of code.
>
> If you mean speed, I'd also try a linear scan and do some
> profiling. Assuming an even distribution and given the fact
> that regionIds is not a large array, I wouldn't be surprised
> if a linear scan were of a comparable speed to the binary
> search on average. If some inputs occur more frequently than
> others you could order the regionIds array appropriately.
> Then I'd expect a linear scan to be significantly faster on
> average.
>
> Another option is to populate regionIds with an entry for
> every possible input. Then you'd just have a simple lookup
> operation. The trade-off is maintainability, though you
> could make generating the array part of a build script. --Antony

Or if that region list is pretty well fixed, then you could also hard-
code the binary search:

var o = this._order
if (o < 501) {
if (o < 233) {
// ...
} else {
// ...
}
} else {
if (o < 751) {
// ...
} else {
// ...
}
}

You could even generate such a function fairly easily from your array
and "new Function()". This would presumably be at least somewhat more
efficient.

-- Scott

Scott Sauyet, Mar 25, 2010
12. ### Lasse Reichstein NielsenGuest

Asen Bozhilov <> writes:

> Richard Cornford wrote:
>
>> Then why not the simpler:-
>>
>> middle= ((end + start) >>> 1);

>
> Yes, in my case it's possible to use that. Thanks. But that approach
> is harmful for biggest arrays because `>>>` convert operands
> `ToUint32`, which mean can truncate bits and return unexpected
> result.

Actually, using ToUint32 should be the correct thing to do for array
indices, since they are also uint32's.

The overflow is a problem, though [1].
You should do it as:
var middle = start + ((end - start) >>> 1);
This works correctly for all uint32 values of start and end (start <= end).

/L

Notice that Bloch's second version:
int mid = (low + high) >>> 1 ;
does not work in JavaScript, since it relies on low and high being signed
and positive, and JavaScript array indices are not.
--
Lasse Reichstein Holst Nielsen
'Javascript frameworks is a disruptive technology'

Lasse Reichstein Nielsen, Mar 25, 2010
13. ### Dr J R StocktonGuest

In comp.lang.javascript message <c27c69fd-8799-44c7-8561-32d557a23877@f8
g2000yqn.googlegroups.com>, Thu, 25 Mar 2010 04:27:38, Richard Cornford
<> posted:
>On Mar 25, 10:06 am, Jorge wrote:
><snip>
>> middle = Math.floor((end + start) / 2);
>> --> middle= ((end + start) / 2) | 0;

>
>Then why not the simpler:-
>
>middle= ((end + start) >>> 1);

For even vaguely reasonable data, the >> operator should suffice. If
China were divided into regions with only one person in each, 2^31
regions would suffice.

The OP needs to consider the amount of data to be searched and the
amount of other work to be done per search. It seems very possible that
optimising the search will make very little difference overall. Note
that if this routine is to be useful there must be a means of inputting
data manually or from disc, and a means of outputting data, to screen,
printer, or eyeball - all of which are slow operations.

Another approach would be to convert the array to a string of constant-
width entries at initialisation : "-01,043,093,139,...,999" and then to
use indexOf. For a reasonably short list, that could be quicker.
Using a base-36 string would be faster, since entries would only need
two characters.

The fastest way has already been suggested : construct a direct lookup
table of 1000 entries.

--
(c) John Stockton, nr London UK. ?@merlyn.demon.co.uk IE8 FF3 Op10 Sf4 Cr4
news:comp.lang.javascript FAQ <URL:http://www.jibbering.com/faq/index.html>.
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.