smarter zipcode search algorithm

Discussion in 'Perl Misc' started by premgrps@gmail.com, Jul 5, 2006.

  1. Guest

    Hi,
    I have a database with two tables
    a) A table of 2 million records with city, zip and associated
    information (say XYZ) and
    b) zipcode latitude, longitude table having >40,000 records/zip codes

    PROBLEM:
    I need to find the the XYZs within the the range of a certain zipcode.
    This zipcode and radial range in miles is entered by the user (web
    interface).

    The brute force way is to calculate the distance between the user
    zipcode and all the zipcodes in the database. Once the zipcode_range
    subroutine gives back the zipcodes within a certain radius, I need to
    find all the XYZs from the table #1.

    Another approach is to find the zipcodes with a square region (min/max
    of the user zipcode latitude/longitude position).

    Both the approaches are consuming too much time. Especially if the
    radial distance starts increasing.

    My questions:
    1. Is there any other smart way to do the above task.
    2. I am working on a 2.4ghz/512MB RAM machine. Any suggestions how to
    increase the performance. Right now each select command to the
    2Million record table takes about a minute.

    Thanks.
    , Jul 5, 2006
    #1
    1. Advertising

  2. Dr.Ruud Guest

    schreef:

    > I have a database with two tables
    > a) A table of 2 million records with city, zip and associated
    > information (say XYZ) and
    > b) zipcode latitude, longitude table having >40,000 records/zip codes
    >
    > PROBLEM:
    > I need to find the the XYZs within the the range of a certain zipcode.
    > This zipcode and radial range in miles is entered by the user (web
    > interface).
    >
    > The brute force way is to calculate the distance between the user
    > zipcode and all the zipcodes in the database. Once the zipcode_range
    > subroutine gives back the zipcodes within a certain radius, I need to
    > find all the XYZs from the table #1.
    >
    > Another approach is to find the zipcodes with a square region (min/max
    > of the user zipcode latitude/longitude position).
    >
    > Both the approaches are consuming too much time. Especially if the
    > radial distance starts increasing.
    >
    > My questions:
    > 1. Is there any other smart way to do the above task.
    > 2. I am working on a 2.4ghz/512MB RAM machine. Any suggestions how to
    > increase the performance. Right now each select command to the
    > 2Million record table takes about a minute.


    This is not a Perl question.

    Which RDBMS?

    1. Make sure there are indexes on the lattitude and longitude columns of
    the zipcode table. I assume that zipcode is the PK.

    2. Pre-calculate the (min_lattitude, max_lattitude, min_longitude,
    max_longitude) from the user input.

    3. Fill a temporary zipcode table (again with the same indexes) with a
    selection from the full zipcode table, with only the candidate zipcodes.
    Be sure to have the zipcode column left from the comparison:
    WHERE (zipcode.lattitude >= min_lattitude AND zipcode.lattitude <=
    max_lattitude AND zipcode.longtitude >= min_longtitude AND
    zipcode.longtitude <= max_longtitude).

    4. If necessary, refine that result (the temporary zipcode table) by
    deleting the records that aren't in the radial distance.

    That temporary zipcode table can also be a query, on which you run the
    second (radial distance) query.

    For the city table, have an index on the zipcode, and join that with the
    (refined) result.

    --
    Affijn, Ruud

    "Gewoon is een tijger."
    Dr.Ruud, Jul 6, 2006
    #2
    1. Advertising

  3. -berlin.de Guest

    <> wrote in comp.lang.perl.misc:
    > Hi,
    > I have a database with two tables
    > a) A table of 2 million records with city, zip and associated
    > information (say XYZ) and
    > b) zipcode latitude, longitude table having >40,000 records/zip codes
    >
    > PROBLEM:
    > I need to find the the XYZs within the the range of a certain zipcode.
    > This zipcode and radial range in miles is entered by the user (web
    > interface).


    [...]

    Your question has nothing to do with Perl (nor with zip codes, btw.).
    It is a problem in computational geometry, and that is where I would
    start looking for algorithms.

    Anno
    -berlin.de, Jul 6, 2006
    #3
  4. Guest

    i wrote this many years ago for a singles website i made ... you may
    find that just making a sql table with zipcode_to,zipcode_from,distance
    is where you want to be. making a table for three digits and for five
    digit zipcodes is a plus as well.

    also you can do the lookups now with google map api i just noticed, way
    to get banned with your 29000 zipcode entries :)

    good luck

    use strict;
    use Math::Trig;

    ##
    ## racine wisconsin
    ##
    my $a_lat = 42.13404;
    my $a_long = -87.934097;

    ##
    ## wheeling illinois
    ##
    my $b_lat = 42.716112;
    my $b_long = -87.823329;

    my $b_lat = 41.37372;
    my $b_long = -93.376719;

    ######################################################################

    print
    gc(
    $a_lat,
    $a_long,
    $b_lat,
    $b_long),
    "\n";

    ######################################################################

    sub gc
    {
    my ($a_lat, $a_long, $b_lat, $b_long) = @_;

    ##
    ## convert degrees to radians
    ##
    $a_long = deg2rad($a_long);
    $b_long = deg2rad($b_long);
    $a_lat = deg2rad($a_lat);
    $b_lat = deg2rad($b_lat);

    ##
    ## meters in a mile
    ##
    my $meterspermile = 1609.344;

    ##
    ## earth's radius
    ##
    my $earth_radius = 6367000;
    my $earth_radius = 6378.14 * 1000;

    my $d = acos(sin($a_lat) * sin($b_lat) + cos($a_lat) * cos($b_lat)
    * cos($a_long - $b_long));

    $d = ($earth_radius * $d) / $meterspermile;

    return($d);
    }




    -berlin.de wrote:
    > <> wrote in comp.lang.perl.misc:
    > > Hi,
    > > I have a database with two tables
    > > a) A table of 2 million records with city, zip and associated
    > > information (say XYZ) and
    > > b) zipcode latitude, longitude table having >40,000 records/zip codes
    > >
    > > PROBLEM:
    > > I need to find the the XYZs within the the range of a certain zipcode.
    > > This zipcode and radial range in miles is entered by the user (web
    > > interface).

    >
    > [...]
    >
    > Your question has nothing to do with Perl (nor with zip codes, btw.).
    > It is a problem in computational geometry, and that is where I would
    > start looking for algorithms.
    >
    > Anno
    , Jul 6, 2006
    #4
  5. Guest

    i wrote this many years ago for a singles website i made ... you may
    find that just making a sql table with zipcode_to,zipcode_from,distance
    is where you want to be. making a table for three digits and for five
    digit zipcodes is a plus as well.

    also you can do the lookups now with google map api i just noticed, way
    to get banned with your 29000 zipcode entries :)

    good luck

    use strict;
    use Math::Trig;

    ##
    ## racine wisconsin
    ##
    my $a_lat = 42.13404;
    my $a_long = -87.934097;

    ##
    ## wheeling illinois
    ##
    my $b_lat = 42.716112;
    my $b_long = -87.823329;

    my $b_lat = 41.37372;
    my $b_long = -93.376719;

    ######################################################################

    print
    gc(
    $a_lat,
    $a_long,
    $b_lat,
    $b_long),
    "\n";

    ######################################################################

    sub gc
    {
    my ($a_lat, $a_long, $b_lat, $b_long) = @_;

    ##
    ## convert degrees to radians
    ##
    $a_long = deg2rad($a_long);
    $b_long = deg2rad($b_long);
    $a_lat = deg2rad($a_lat);
    $b_lat = deg2rad($b_lat);

    ##
    ## meters in a mile
    ##
    my $meterspermile = 1609.344;

    ##
    ## earth's radius
    ##
    my $earth_radius = 6367000;
    my $earth_radius = 6378.14 * 1000;

    my $d = acos(sin($a_lat) * sin($b_lat) + cos($a_lat) * cos($b_lat)
    * cos($a_long - $b_long));

    $d = ($earth_radius * $d) / $meterspermile;

    return($d);
    }




    -berlin.de wrote:
    > <> wrote in comp.lang.perl.misc:
    > > Hi,
    > > I have a database with two tables
    > > a) A table of 2 million records with city, zip and associated
    > > information (say XYZ) and
    > > b) zipcode latitude, longitude table having >40,000 records/zip codes
    > >
    > > PROBLEM:
    > > I need to find the the XYZs within the the range of a certain zipcode.
    > > This zipcode and radial range in miles is entered by the user (web
    > > interface).

    >
    > [...]
    >
    > Your question has nothing to do with Perl (nor with zip codes, btw.).
    > It is a problem in computational geometry, and that is where I would
    > start looking for algorithms.
    >
    > Anno
    , Jul 6, 2006
    #5
  6. Guest

    wrote:
    > Hi,
    > I have a database with two tables
    > a) A table of 2 million records with city, zip and associated
    > information (say XYZ) and
    > b) zipcode latitude, longitude table having >40,000 records/zip codes


    40,001 and 10**89 are both greater than 40,000. I assuming you mean ~40,
    000 rather than >40,000, otherwise it is meaningless.

    Is that ~40,000 records where each record is a zipcode?
    Or is it ~40,000 records per zip code?


    > PROBLEM:
    > I need to find the the XYZs within the the range of a certain zipcode.
    > This zipcode and radial range in miles is entered by the user (web
    > interface).


    Are you assuming zip codes to be points?


    > The brute force way is to calculate the distance between the user
    > zipcode and all the zipcodes in the database. Once the zipcode_range
    > subroutine gives back the zipcodes within a certain radius, I need to
    > find all the XYZs from the table #1.


    OK. You could probably even build a table giving the distances between
    every pair of zip codes, or at least every pair within 300 miles or so
    of each other.

    > Another approach is to find the zipcodes with a square region (min/max
    > of the user zipcode latitude/longitude position).


    Yep, that is another approach. You could construct a square around the
    circle, use a data structure to get everything in the square, then brute
    force just those things in the square to see if they are also within the
    circle. (neglecting the curvature of the earth.)

    > Both the approaches are consuming too much time. Especially if the
    > radial distance starts increasing.


    Starts increasing? You make it sound like an organic creature. Isn't it
    a simple number provided by the end user?

    Anyway, what is taking a long time? Getting a list of zip codes within
    X miles of the reference zip code? Turning that list into a list of all
    the qualifying XYZ?

    >
    > My questions:
    > 1. Is there any other smart way to do the above task.


    We only have the vaguest idea of how you are doing it now. You can't make
    tuning decisions based on vague notions.

    > 2. I am working on a 2.4ghz/512MB RAM machine. Any suggestions how to
    > increase the performance. Right now each select command to the
    > 2Million record table takes about a minute.


    What database system are you using? What query are you issuing to it?
    What indices exist? What does this have to do with Perl?

    If you are using Perl as your database system, have you looked at Tree::R?

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Jul 6, 2006
    #6
  7. PGPS Guest

    Thanks for all your answers. I finally got the problem figured out
    myself with a subquery. I'm new to databases though.

    Thanks.

    wrote:
    > wrote:
    > > Hi,
    > > I have a database with two tables
    > > a) A table of 2 million records with city, zip and associated
    > > information (say XYZ) and
    > > b) zipcode latitude, longitude table having >40,000 records/zip codes

    >
    > 40,001 and 10**89 are both greater than 40,000. I assuming you mean ~40,
    > 000 rather than >40,000, otherwise it is meaningless.
    >
    > Is that ~40,000 records where each record is a zipcode?
    > Or is it ~40,000 records per zip code?
    >
    >
    > > PROBLEM:
    > > I need to find the the XYZs within the the range of a certain zipcode.
    > > This zipcode and radial range in miles is entered by the user (web
    > > interface).

    >
    > Are you assuming zip codes to be points?
    >
    >
    > > The brute force way is to calculate the distance between the user
    > > zipcode and all the zipcodes in the database. Once the zipcode_range
    > > subroutine gives back the zipcodes within a certain radius, I need to
    > > find all the XYZs from the table #1.

    >
    > OK. You could probably even build a table giving the distances between
    > every pair of zip codes, or at least every pair within 300 miles or so
    > of each other.
    >
    > > Another approach is to find the zipcodes with a square region (min/max
    > > of the user zipcode latitude/longitude position).

    >
    > Yep, that is another approach. You could construct a square around the
    > circle, use a data structure to get everything in the square, then brute
    > force just those things in the square to see if they are also within the
    > circle. (neglecting the curvature of the earth.)
    >
    > > Both the approaches are consuming too much time. Especially if the
    > > radial distance starts increasing.

    >
    > Starts increasing? You make it sound like an organic creature. Isn't it
    > a simple number provided by the end user?
    >
    > Anyway, what is taking a long time? Getting a list of zip codes within
    > X miles of the reference zip code? Turning that list into a list of all
    > the qualifying XYZ?
    >
    > >
    > > My questions:
    > > 1. Is there any other smart way to do the above task.

    >
    > We only have the vaguest idea of how you are doing it now. You can't make
    > tuning decisions based on vague notions.
    >
    > > 2. I am working on a 2.4ghz/512MB RAM machine. Any suggestions how to
    > > increase the performance. Right now each select command to the
    > > 2Million record table takes about a minute.

    >
    > What database system are you using? What query are you issuing to it?
    > What indices exist? What does this have to do with Perl?
    >
    > If you are using Perl as your database system, have you looked at Tree::R?
    >
    > Xho
    >
    > --
    > -------------------- http://NewsReader.Com/ --------------------
    > Usenet Newsgroup Service $9.95/Month 30GB
    PGPS, Jul 11, 2006
    #7
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. DaveF

    ZIPCODE DB ACCESS TO SQL

    DaveF, Sep 29, 2004, in forum: ASP .Net
    Replies:
    6
    Views:
    418
    Scott M.
    Sep 30, 2004
  2. Jeff Thur

    Need to Format a zipcode into xxxxx-xxxx.

    Jeff Thur, Feb 18, 2005, in forum: ASP .Net Datagrid Control
    Replies:
    1
    Views:
    177
  3. Jeff Thur
    Replies:
    1
    Views:
    121
    Alvin Bruney [MVP]
    Feb 17, 2005
  4. Irishmaninusa

    Canada ZipCode Finder

    Irishmaninusa, Aug 18, 2003, in forum: ASP General
    Replies:
    1
    Views:
    112
    Aaron Bertrand - MVP
    Aug 18, 2003
  5. Mauricio Fernandez
    Replies:
    7
    Views:
    121
    Thomas Nitsche
    Nov 21, 2006
Loading...

Share This Page