B
Benz
Hi!
Is there a smart way of finding duplicates in a large file.
This is how the file will look:
Col1 Col2 Col3 Col4
1/02/2005 20:06:10.870^F^l0091nd^F^5591^F^793423423^R
1/02/2005 21:06:15.533^F^l0091f3^F^5591^F^793423324^R
1/02/2005 22:12:14.653^F^l0031d6^F^5591^F^793423324^R
The ^F^ is the file seperator. The file could have upto 140,000 lines
and I need to find if there are duplicates in Col4.
Iam tryin to do this the convetional way, and this is how far I got..
- by reading the file using BufferedReader
- tokenizing the line and going to col 4
- take the value in col4 .
Would appreciate if there are pointers...
- TIA Ben
Is there a smart way of finding duplicates in a large file.
This is how the file will look:
Col1 Col2 Col3 Col4
1/02/2005 20:06:10.870^F^l0091nd^F^5591^F^793423423^R
1/02/2005 21:06:15.533^F^l0091f3^F^5591^F^793423324^R
1/02/2005 22:12:14.653^F^l0031d6^F^5591^F^793423324^R
The ^F^ is the file seperator. The file could have upto 140,000 lines
and I need to find if there are duplicates in Col4.
Iam tryin to do this the convetional way, and this is how far I got..
- by reading the file using BufferedReader
- tokenizing the line and going to col 4
- take the value in col4 .
Would appreciate if there are pointers...
- TIA Ben