Search in sorting array

Discussion in 'Javascript' started by Asen Bozhilov, Mar 25, 2010.

  1. I've to search in sorting array. Possible input data for search is:

    [0, 999]

    function EGN(egn) {
    this.value = egn.split('');

    this._year = +egn.slice(0, 2);
    this._month = +egn.slice(2, 4);
    this._date = +egn.slice(4, 6);
    this._order = +egn.slice(6, 9);
    }

    EGN.prototype = {
    constructor : EGN,
    regionIds : [-1, 43, 93, 139, 169, 183, 217, 233, 281, 301, 319,
    341, 377, 395, 435, 501, 527, 555, 575, 601, 623, 721, 751, 789, 821,
    843, 871, 903, 925, 999],

    getRegion : function() {
    var start = 0,
    end = this.regionIds.length - 1,
    middle,
    lower_b,
    upper_b;

    while(start <= end)
    {
    middle = Math.floor((end + start) / 2);
    lower_b = this.regionIds[middle];
    upper_b = this.regionIds[middle + 1];
    if (this._order > upper_b)
    {
    start = middle + 1;
    }
    else if (this._order < lower_b)
    {
    end = middle - 1;
    }
    else {
    return middle;
    }
    }
    return -1;
    }
    };


    var my = new EGN('8509306247');
    window.alert(my.regionIds[my.getRegion()]);

    The string parameter of `EGN' constructor is identifier in my country.
    Each person has identifier like that. Ten digit of identifier give
    information about person.

    1, 2 => Are year of born
    3, 4 => Are month of born. If month > 20 the person is born before
    1900 year. If menth > 40 the person is born after 2000 year.
    5, 6 => Are date of born.
    7, 8, 9 => Are order and region, in which person been born. If is an
    even number, the person is man. Otherwise is woman. Each region has
    range for people in that region. I've to get lower boundary of region,
    because I need to print region name of the person and order for
    region.
    10 => Is only for controlling valid identifier, and calculate on other
    9. Algorithm is specified by administration in my country.

    At the moment I use binary search. `regionIds' contains ranges of
    every region. I need to get lower boundary of region.



    If there are any optimization of my code, I'll be glad to see. Any
    suggestions and corrections are welcome.
    Thanks.
    Asen Bozhilov, Mar 25, 2010
    #1
    1. Advertising

  2. Asen Bozhilov

    Jorge Guest

    On Mar 25, 8:36 am, Asen Bozhilov <> wrote:
    >
    > If there are any optimization of my code, I'll be glad to see. Any
    > suggestions and corrections are welcome.


    middle = Math.floor((end + start) / 2);
    --> middle= ((end + start) / 2) | 0;
    --
    Jorge.
    Jorge, Mar 25, 2010
    #2
    1. Advertising

  3. On Mar 25, 10:06 am, Jorge wrote:
    <snip>
    > middle = Math.floor((end + start) / 2);
    > --> middle= ((end + start) / 2) | 0;


    Then why not the simpler:-

    middle= ((end + start) >>> 1);

    -?

    Richard.
    Richard Cornford, Mar 25, 2010
    #3
  4. Asen Bozhilov

    Jorge Guest

    On Mar 25, 12:27 pm, Richard Cornford <>
    wrote:
    > On Mar 25, 10:06 am, Jorge wrote:
    > <snip>
    >
    > > middle = Math.floor((end + start) / 2);
    > > --> middle= ((end + start) / 2) | 0;

    >
    > Then why not the simpler:-
    >
    > middle= ((end + start) >>> 1);


    Touché :)
    --
    Jorge.
    Jorge, Mar 25, 2010
    #4
  5. Richard Cornford wrote:

    > Then why not the simpler:-
    >
    > middle= ((end + start) >>> 1);


    Yes, in my case it's possible to use that. Thanks. But that approach
    is harmful for biggest arrays because `>>>` convert operands
    `ToUint32`, which mean can truncate bits and return unexpected
    result.

    var len = Math.pow(2, 32) - 1;
    window.alert((len + 1) >>> 1); //0
    window.alert(Math.floor((len + 1) / 2)); //2147483648
    Asen Bozhilov, Mar 25, 2010
    #5
  6. On Mar 25, 7:36am, Asen Bozhilov wrote:

    > I've to search in sorting array [regionIds]. Possible
    > input data for search is:
    >
    > [0, 999]
    >
    > EGN.prototype = {
    > constructor : EGN,
    > regionIds : [-1, 43, 93, 139, 169, 183, 217, 233, 281, 301, 319,
    > 341, 377, 395, 435, 501, 527, 555, 575, 601, 623, 721, 751, 789,

    821,
    > 843, 871, 903, 925, 999],
    >
    > [...]
    >
    > };
    >
    > [...]
    >
    > At the moment I use binary search. `regionIds' contains
    > ranges of every region. I need to get lower boundary of
    > region.
    >
    > If there are any optimization of my code, I'll be glad to
    > see. Any suggestions and corrections are welcome.
    > Thanks.


    It depends what you mean by optimization. If you mean code
    simplicity then I'd just do a simple linear scan of the
    array. It would only be three or four lines of code.

    If you mean speed, I'd also try a linear scan and do some
    profiling. Assuming an even distribution and given the fact
    that regionIds is not a large array, I wouldn't be surprised
    if a linear scan were of a comparable speed to the binary
    search on average. If some inputs occur more frequently than
    others you could order the regionIds array appropriately.
    Then I'd expect a linear scan to be significantly faster on
    average.

    Another option is to populate regionIds with an entry for
    every possible input. Then you'd just have a simple lookup
    operation. The trade-off is maintainability, though you
    could make generating the array part of a build script. --Antony
    Antony Scriven, Mar 25, 2010
    #6
  7. On Mar 25, 12:37 pm, Asen Bozhilov wrote:
    > Richard Cornford wrote:
    >> Then why not the simpler:-

    >
    >> middle= ((end + start) >>> 1);

    >
    > Yes, in my case it's possible to use that. Thanks. But that
    > approach is harmful for biggest arrays because `>>>` convert
    > operands `ToUint32`, which mean can truncate bits and return
    > unexpected result.
    >
    > var len = Math.pow(2, 32) - 1;
    > window.alert((len + 1) >>> 1); //0
    > window.alert(Math.floor((len + 1) / 2)); //2147483648


    Bigger arrays are fine, by definition. You are working on array
    indexes, which are constrained by: a property name P is an 'array
    index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
    3rd Ed. Section 15.4). Thus if you initially set - start - to zero and
    - end - to (array.length - 1), and then all values subsequently
    assigned to - start - and - end - are integers within that range, as
    they should be, then the ToUint32 operation implied by the >>>
    operation is safe.

    (The ToInt32 operation implied by - x|0 - is not so theoretically
    safe, but should be considered in relation to how big, in memory
    consumption terms, an array with two to the power of thirty one
    elements would be (particularly on an OS with a 32 bit memory map).)

    Richard.
    Richard Cornford, Mar 25, 2010
    #7
  8. Richard Cornford wrote:

    > Bigger arrays are fine, by definition. You are working on array
    > indexes, which are constrained by: a property name P is an 'array
    > index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
    > 3rd Ed. Section 15.4).


    I agree with you and documentation in that point. But in binary search
    `>>>` it can give unexpected result when you computing `middle' of the
    part in which you looking. If you searching in interval:

    (arr.length / 2, arr.length)

    How can you compute next middle value, for array which length is equal
    to (2 ^ 32) - 1?

    var arr = new Array(Math.pow(2, 32) - 1),
    end = arr.length - 1,
    start = end >>> 1;

    window.alert((end + start) >>> 1); //1073741822
    window.alert(Math.floor((end + start) / 2)); //3221225470
    Asen Bozhilov, Mar 25, 2010
    #8
  9. Asen Bozhilov

    Jorge Guest

    On Mar 25, 2:39 pm, Asen Bozhilov <> wrote:
    > Richard Cornford wrote:
    > > Bigger arrays are fine, by definition. You are working on array
    > > indexes, which are constrained by: a property name P is an 'array
    > > index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
    > > 3rd Ed. Section 15.4).

    >
    > I agree with you and documentation in that point. But in binary search
    > `>>>` it can give unexpected result when you computing `middle' of the
    > part in which you looking. If you searching in interval:
    >
    > (arr.length / 2, arr.length)
    >
    > How can you compute next middle value, for array which length is equal
    > to (2 ^ 32) - 1?
    >
    > var arr = new Array(Math.pow(2, 32) - 1),
    >     end = arr.length - 1,
    >     start = end >>> 1;
    >
    > window.alert((end + start) >>> 1); //1073741822
    > window.alert(Math.floor((end + start) / 2)); //3221225470


    ((end + start) / 2) >>> 0
    --> 3221225470

    --
    Jorge.
    Jorge, Mar 25, 2010
    #9
  10. Asen Bozhilov

    Jorge Guest

    On Mar 25, 2:39 pm, Asen Bozhilov <> wrote:
    > Richard Cornford wrote:
    > > Bigger arrays are fine, by definition. You are working on array
    > > indexes, which are constrained by: a property name P is an 'array
    > > index' if and only if ToString(ToUint32(P)) is equal to P (ECMA 262
    > > 3rd Ed. Section 15.4).

    >
    > I agree with you and documentation in that point. But in binary search
    > `>>>` it can give unexpected result when you computing `middle' of the
    > part in which you looking. If you searching in interval:
    >
    > (arr.length / 2, arr.length)
    >
    > How can you compute next middle value, for array which length is equal
    > to (2 ^ 32) - 1?
    >
    > var arr = new Array(Math.pow(2, 32) - 1),
    >     end = arr.length - 1,
    >     start = end >>> 1;
    >
    > window.alert((end + start) >>> 1); //1073741822
    > window.alert(Math.floor((end + start) / 2)); //3221225470


    start= end= parseInt("ffffffff", 16);
    --> 4294967295
    ((start + end) / 2) >>> 0
    --> 4294967295 (toUint32, not truncated, 0xffffffff)

    OTOH:

    (start + end) >>> 1
    --> 2147483647 (toUint32, truncated, 0x7fffffff)

    ((start + end) / 2) | 0
    --> -1 (toInt32, not truncated, 0xffffffff)

    --
    Jorge.
    Jorge, Mar 25, 2010
    #10
  11. Asen Bozhilov

    Scott Sauyet Guest

    On Mar 25, 9:03 am, Antony Scriven wrote:
    > On Mar 25, 7:36am, Asen Bozhilov wrote:
    >
    >> I've to search in sorting array [regionIds].
    >> [ ... ]
    >> At the moment I use binary search. `regionIds' contains
    >> ranges of every region. I need to get lower boundary of
    >> region.
    >>
    >> If there are any optimization of my code, I'll be glad to
    >> see. Any suggestions and corrections are welcome.

    >
    > It depends what you mean by optimization. If you mean code
    > simplicity then I'd just do a simple linear scan of the
    > array. It would only be three or four lines of code.
    >
    > If you mean speed, I'd also try a linear scan and do some
    > profiling. Assuming an even distribution and given the fact
    > that regionIds is not a large array, I wouldn't be surprised
    > if a linear scan were of a comparable speed to the binary
    > search on average. If some inputs occur more frequently than
    > others you could order the regionIds array appropriately.
    > Then I'd expect a linear scan to be significantly faster on
    > average.
    >
    > Another option is to populate regionIds with an entry for
    > every possible input. Then you'd just have a simple lookup
    > operation. The trade-off is maintainability, though you
    > could make generating the array part of a build script. --Antony


    Or if that region list is pretty well fixed, then you could also hard-
    code the binary search:

    var o = this._order
    if (o < 501) {
    if (o < 233) {
    // ...
    } else {
    // ...
    }
    } else {
    if (o < 751) {
    // ...
    } else {
    // ...
    }
    }

    You could even generate such a function fairly easily from your array
    and "new Function()". This would presumably be at least somewhat more
    efficient.

    -- Scott
    Scott Sauyet, Mar 25, 2010
    #11
  12. Asen Bozhilov <> writes:

    > Richard Cornford wrote:
    >
    >> Then why not the simpler:-
    >>
    >> middle= ((end + start) >>> 1);

    >
    > Yes, in my case it's possible to use that. Thanks. But that approach
    > is harmful for biggest arrays because `>>>` convert operands
    > `ToUint32`, which mean can truncate bits and return unexpected
    > result.


    Actually, using ToUint32 should be the correct thing to do for array
    indices, since they are also uint32's.

    The overflow is a problem, though [1].
    You should do it as:
    var middle = start + ((end - start) >>> 1);
    This works correctly for all uint32 values of start and end (start <= end).

    /L

    [1] see http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html
    Notice that Bloch's second version:
    int mid = (low + high) >>> 1 ;
    does not work in JavaScript, since it relies on low and high being signed
    and positive, and JavaScript array indices are not.
    --
    Lasse Reichstein Holst Nielsen
    'Javascript frameworks is a disruptive technology'
    Lasse Reichstein Nielsen, Mar 25, 2010
    #12
  13. In comp.lang.javascript message <c27c69fd-8799-44c7-8561-32d557a23877@f8
    g2000yqn.googlegroups.com>, Thu, 25 Mar 2010 04:27:38, Richard Cornford
    <> posted:
    >On Mar 25, 10:06 am, Jorge wrote:
    ><snip>
    >> middle = Math.floor((end + start) / 2);
    >> --> middle= ((end + start) / 2) | 0;

    >
    >Then why not the simpler:-
    >
    >middle= ((end + start) >>> 1);


    For even vaguely reasonable data, the >> operator should suffice. If
    China were divided into regions with only one person in each, 2^31
    regions would suffice.

    The OP needs to consider the amount of data to be searched and the
    amount of other work to be done per search. It seems very possible that
    optimising the search will make very little difference overall. Note
    that if this routine is to be useful there must be a means of inputting
    data manually or from disc, and a means of outputting data, to screen,
    printer, or eyeball - all of which are slow operations.

    Another approach would be to convert the array to a string of constant-
    width entries at initialisation : "-01,043,093,139,...,999" and then to
    use indexOf. For a reasonably short list, that could be quicker.
    RegNum = thatString.indexOf(PaddedValue) >> 2
    Using a base-36 string would be faster, since entries would only need
    two characters.

    The fastest way has already been suggested : construct a direct lookup
    table of 1000 entries.

    --
    (c) John Stockton, nr London UK. ?@merlyn.demon.co.uk IE8 FF3 Op10 Sf4 Cr4
    news:comp.lang.javascript FAQ <URL:http://www.jibbering.com/faq/index.html>.
    <URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
    <URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
    Dr J R Stockton, Mar 26, 2010
    #13
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. mike-yue
    Replies:
    15
    Views:
    972
    mike-yue
    May 5, 2008
  2. markspace
    Replies:
    1
    Views:
    377
    markspace
    Jun 25, 2009
  3. Roedy Green
    Replies:
    1
    Views:
    423
    Roedy Green
    Jun 25, 2009
  4. Abby Lee
    Replies:
    5
    Views:
    375
    Abby Lee
    Aug 2, 2004
  5. Craig Keightley
    Replies:
    10
    Views:
    255
    Craig Keightley
    Jun 29, 2005
Loading...

Share This Page