J
JThangiah
Hi all,
I need some help to find an efficient range or pair comparison
algorithm for integers.
My requirement is to link records in two different databases with
different set of records. Both have a pair of values called co-
ordinates which are similar and can be used to match the records.
My Solution:
I am going to pull a set of records from both databases and sort them
based on these co-ordinates. I will put them in two lists so that the
indexes for the records to be compared in both lists are the same.
I will then do a comparison of both lists to find the exact match.
If I find the match, I need to record the matched records elsewhere to
create an XML file with those results.
Constraints:
Both the databases are so large that I have to find the most efficient
algorithm and datastructure to do the job.
The execution will be done in a batch job.
The records pulled from both databases to form the lists are unique
and have no duplicates but each have a separate unique identifier.
There may be excess data that cannot be compared in either of the
databases.
Questions:
1. Is list the best data-structure for this since speed is my main
concern. Or will the autoboxing of integers cause an overhead?
2. Are there any good range-comparison or pair-comparison algorithms
for integers that I can use?
3. Can I make use of concurrency algorithms for this?
I need some help to find an efficient range or pair comparison
algorithm for integers.
My requirement is to link records in two different databases with
different set of records. Both have a pair of values called co-
ordinates which are similar and can be used to match the records.
My Solution:
I am going to pull a set of records from both databases and sort them
based on these co-ordinates. I will put them in two lists so that the
indexes for the records to be compared in both lists are the same.
I will then do a comparison of both lists to find the exact match.
If I find the match, I need to record the matched records elsewhere to
create an XML file with those results.
Constraints:
Both the databases are so large that I have to find the most efficient
algorithm and datastructure to do the job.
The execution will be done in a batch job.
The records pulled from both databases to form the lists are unique
and have no duplicates but each have a separate unique identifier.
There may be excess data that cannot be compared in either of the
databases.
Questions:
1. Is list the best data-structure for this since speed is my main
concern. Or will the autoboxing of integers cause an overhead?
2. Are there any good range-comparison or pair-comparison algorithms
for integers that I can use?
3. Can I make use of concurrency algorithms for this?