best way to index numerical data ?

J

Jack

Hi I have a lot of data that is in a TEXT file which are numbers does
anyone have a good suggestion for indexing TEXT numbers (zip codes,
other codes, dollar amounts, quantities, etc). since Lucene and other
indexers are really optimized for Alpha character indexing. What
approaches are typically taken in computer science for example to index
text numbers..hash maps or something else ??

Thanks,

Jack
 
P

Paddy

What do you want to search for in the file?
how big is the file?
What format is the data in the file?

- Paddy.
 
B

benwbrewster

here is a sample of a .txt file :
I want to search for the whole number. If possible, fuzzy search would
be nice too, but not mandatory..
1975|Y|35136|72|1927|||3|005503|003|19870301|19950301|14416887|151|20000301|100039292|N|84|F|50||10|A|100|Y|037|Y|89005|3042|M|S|P|

Thanks!
Jack
 
B

benwbrewster

here is a sample of a .txt file :
I want to search for the whole number. If possible, fuzzy search would
be nice too, but not mandatory..
1975|Y|35136|72|1927|||3|005503|003|19870301|19950301|14416887|151|20000301|100039292|N|84|F|50||10|A|100|Y|037|Y|89005|3042|M|S|P|

Thanks!
Jack
 
L

Liu Jin

Jack" == Jack said:
> Hi I have a lot of data that is in a TEXT file which are numbers
> does anyone have a good suggestion for indexing TEXT numbers
> (zip codes, other codes, dollar amounts, quantities, etc). since
> Lucene and other indexers are really optimized for Alpha
> character indexing. What approaches are typically taken in
> computer science for example to index text numbers..hash maps or
> something else ??

Lucene is not optimized for Alpha character indexing. It's for natural
language indexing. The assumption is that the dictionary is relatively
small (say, <1M words for English), and doesn't grow linearly with the
amount of text being indexed. If your data fits into this model,
Lucene can effeciently index it, no matter what the characters are.

Regards,
Liu Jin
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top