smarter zipcode search algorithm

P

premgrps

Hi,
I have a database with two tables
a) A table of 2 million records with city, zip and associated
information (say XYZ) and
b) zipcode latitude, longitude table having >40,000 records/zip codes

PROBLEM:
I need to find the the XYZs within the the range of a certain zipcode.
This zipcode and radial range in miles is entered by the user (web
interface).

The brute force way is to calculate the distance between the user
zipcode and all the zipcodes in the database. Once the zipcode_range
subroutine gives back the zipcodes within a certain radius, I need to
find all the XYZs from the table #1.

Another approach is to find the zipcodes with a square region (min/max
of the user zipcode latitude/longitude position).

Both the approaches are consuming too much time. Especially if the
radial distance starts increasing.

My questions:
1. Is there any other smart way to do the above task.
2. I am working on a 2.4ghz/512MB RAM machine. Any suggestions how to
increase the performance. Right now each select command to the
2Million record table takes about a minute.

Thanks.
 
D

Dr.Ruud

(e-mail address removed) schreef:
I have a database with two tables
a) A table of 2 million records with city, zip and associated
information (say XYZ) and
b) zipcode latitude, longitude table having >40,000 records/zip codes

PROBLEM:
I need to find the the XYZs within the the range of a certain zipcode.
This zipcode and radial range in miles is entered by the user (web
interface).

The brute force way is to calculate the distance between the user
zipcode and all the zipcodes in the database. Once the zipcode_range
subroutine gives back the zipcodes within a certain radius, I need to
find all the XYZs from the table #1.

Another approach is to find the zipcodes with a square region (min/max
of the user zipcode latitude/longitude position).

Both the approaches are consuming too much time. Especially if the
radial distance starts increasing.

My questions:
1. Is there any other smart way to do the above task.
2. I am working on a 2.4ghz/512MB RAM machine. Any suggestions how to
increase the performance. Right now each select command to the
2Million record table takes about a minute.

This is not a Perl question.

Which RDBMS?

1. Make sure there are indexes on the lattitude and longitude columns of
the zipcode table. I assume that zipcode is the PK.

2. Pre-calculate the (min_lattitude, max_lattitude, min_longitude,
max_longitude) from the user input.

3. Fill a temporary zipcode table (again with the same indexes) with a
selection from the full zipcode table, with only the candidate zipcodes.
Be sure to have the zipcode column left from the comparison:
WHERE (zipcode.lattitude >= min_lattitude AND zipcode.lattitude <=
max_lattitude AND zipcode.longtitude >= min_longtitude AND
zipcode.longtitude <= max_longtitude).

4. If necessary, refine that result (the temporary zipcode table) by
deleting the records that aren't in the radial distance.

That temporary zipcode table can also be a query, on which you run the
second (radial distance) query.

For the city table, have an index on the zipcode, and join that with the
(refined) result.
 
A

anno4000

Hi,
I have a database with two tables
a) A table of 2 million records with city, zip and associated
information (say XYZ) and
b) zipcode latitude, longitude table having >40,000 records/zip codes

PROBLEM:
I need to find the the XYZs within the the range of a certain zipcode.
This zipcode and radial range in miles is entered by the user (web
interface).

[...]

Your question has nothing to do with Perl (nor with zip codes, btw.).
It is a problem in computational geometry, and that is where I would
start looking for algorithms.

Anno
 
G

gmillerd

i wrote this many years ago for a singles website i made ... you may
find that just making a sql table with zipcode_to,zipcode_from,distance
is where you want to be. making a table for three digits and for five
digit zipcodes is a plus as well.

also you can do the lookups now with google map api i just noticed, way
to get banned with your 29000 zipcode entries :)

good luck

use strict;
use Math::Trig;

##
## racine wisconsin
##
my $a_lat = 42.13404;
my $a_long = -87.934097;

##
## wheeling illinois
##
my $b_lat = 42.716112;
my $b_long = -87.823329;

my $b_lat = 41.37372;
my $b_long = -93.376719;

######################################################################

print
gc(
$a_lat,
$a_long,
$b_lat,
$b_long),
"\n";

######################################################################

sub gc
{
my ($a_lat, $a_long, $b_lat, $b_long) = @_;

##
## convert degrees to radians
##
$a_long = deg2rad($a_long);
$b_long = deg2rad($b_long);
$a_lat = deg2rad($a_lat);
$b_lat = deg2rad($b_lat);

##
## meters in a mile
##
my $meterspermile = 1609.344;

##
## earth's radius
##
my $earth_radius = 6367000;
my $earth_radius = 6378.14 * 1000;

my $d = acos(sin($a_lat) * sin($b_lat) + cos($a_lat) * cos($b_lat)
* cos($a_long - $b_long));

$d = ($earth_radius * $d) / $meterspermile;

return($d);
}




Hi,
I have a database with two tables
a) A table of 2 million records with city, zip and associated
information (say XYZ) and
b) zipcode latitude, longitude table having >40,000 records/zip codes

PROBLEM:
I need to find the the XYZs within the the range of a certain zipcode.
This zipcode and radial range in miles is entered by the user (web
interface).

[...]

Your question has nothing to do with Perl (nor with zip codes, btw.).
It is a problem in computational geometry, and that is where I would
start looking for algorithms.

Anno
 
G

gmillerd

i wrote this many years ago for a singles website i made ... you may
find that just making a sql table with zipcode_to,zipcode_from,distance
is where you want to be. making a table for three digits and for five
digit zipcodes is a plus as well.

also you can do the lookups now with google map api i just noticed, way
to get banned with your 29000 zipcode entries :)

good luck

use strict;
use Math::Trig;

##
## racine wisconsin
##
my $a_lat = 42.13404;
my $a_long = -87.934097;

##
## wheeling illinois
##
my $b_lat = 42.716112;
my $b_long = -87.823329;

my $b_lat = 41.37372;
my $b_long = -93.376719;

######################################################################

print
gc(
$a_lat,
$a_long,
$b_lat,
$b_long),
"\n";

######################################################################

sub gc
{
my ($a_lat, $a_long, $b_lat, $b_long) = @_;

##
## convert degrees to radians
##
$a_long = deg2rad($a_long);
$b_long = deg2rad($b_long);
$a_lat = deg2rad($a_lat);
$b_lat = deg2rad($b_lat);

##
## meters in a mile
##
my $meterspermile = 1609.344;

##
## earth's radius
##
my $earth_radius = 6367000;
my $earth_radius = 6378.14 * 1000;

my $d = acos(sin($a_lat) * sin($b_lat) + cos($a_lat) * cos($b_lat)
* cos($a_long - $b_long));

$d = ($earth_radius * $d) / $meterspermile;

return($d);
}




Hi,
I have a database with two tables
a) A table of 2 million records with city, zip and associated
information (say XYZ) and
b) zipcode latitude, longitude table having >40,000 records/zip codes

PROBLEM:
I need to find the the XYZs within the the range of a certain zipcode.
This zipcode and radial range in miles is entered by the user (web
interface).

[...]

Your question has nothing to do with Perl (nor with zip codes, btw.).
It is a problem in computational geometry, and that is where I would
start looking for algorithms.

Anno
 
X

xhoster

Hi,
I have a database with two tables
a) A table of 2 million records with city, zip and associated
information (say XYZ) and
b) zipcode latitude, longitude table having >40,000 records/zip codes

40,001 and 10**89 are both greater than 40,000. I assuming you mean ~40,
000 rather than >40,000, otherwise it is meaningless.

Is that ~40,000 records where each record is a zipcode?
Or is it ~40,000 records per zip code?

PROBLEM:
I need to find the the XYZs within the the range of a certain zipcode.
This zipcode and radial range in miles is entered by the user (web
interface).

Are you assuming zip codes to be points?

The brute force way is to calculate the distance between the user
zipcode and all the zipcodes in the database. Once the zipcode_range
subroutine gives back the zipcodes within a certain radius, I need to
find all the XYZs from the table #1.

OK. You could probably even build a table giving the distances between
every pair of zip codes, or at least every pair within 300 miles or so
of each other.
Another approach is to find the zipcodes with a square region (min/max
of the user zipcode latitude/longitude position).

Yep, that is another approach. You could construct a square around the
circle, use a data structure to get everything in the square, then brute
force just those things in the square to see if they are also within the
circle. (neglecting the curvature of the earth.)
Both the approaches are consuming too much time. Especially if the
radial distance starts increasing.

Starts increasing? You make it sound like an organic creature. Isn't it
a simple number provided by the end user?

Anyway, what is taking a long time? Getting a list of zip codes within
X miles of the reference zip code? Turning that list into a list of all
the qualifying XYZ?
My questions:
1. Is there any other smart way to do the above task.

We only have the vaguest idea of how you are doing it now. You can't make
tuning decisions based on vague notions.
2. I am working on a 2.4ghz/512MB RAM machine. Any suggestions how to
increase the performance. Right now each select command to the
2Million record table takes about a minute.

What database system are you using? What query are you issuing to it?
What indices exist? What does this have to do with Perl?

If you are using Perl as your database system, have you looked at Tree::R?

Xho
 
P

PGPS

Thanks for all your answers. I finally got the problem figured out
myself with a subquery. I'm new to databases though.

Thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,484
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top