T
thellper
Hello,
I've a problem to solve, and I need some help, please.
I've as input a large text file (up to 5GB) which I need to filter
according some REG_EXP and then I need to write the filtered
(hopefully smaller) output to another file.
The filtering applies row-by-row: a row is splitted according to some
rules in various pieces, then some of the pieces are checked according
to some REG_EXP, and if a match is found, the whole line is written to
the output.
The problem is that this solution is slow.
I'm now reading line by line the whole file, and then I'm applying the
reg_exp... but it is very slow.
I've noticed that the time to read and write the file without doing
anything is very small, so I'm loosing a lot of time for my
reg_exps... .
Ok, the whole program is more complicated: the files may have
different syntax, and I have syntax files which tell me how to split
each line in its fields. Then I load separately files with the rules
(the reg_exps) used to filter them.... .
Anyway, my idea was to try to use the FORKS.pm module (s. CPAN) to
split the file in chunks and let each thread work on a chunk of the
file: can somebody tell me how to do this ? Or a better way?
Any help is really appreciated.
Best regards,
Davide
I've a problem to solve, and I need some help, please.
I've as input a large text file (up to 5GB) which I need to filter
according some REG_EXP and then I need to write the filtered
(hopefully smaller) output to another file.
The filtering applies row-by-row: a row is splitted according to some
rules in various pieces, then some of the pieces are checked according
to some REG_EXP, and if a match is found, the whole line is written to
the output.
The problem is that this solution is slow.
I'm now reading line by line the whole file, and then I'm applying the
reg_exp... but it is very slow.
I've noticed that the time to read and write the file without doing
anything is very small, so I'm loosing a lot of time for my
reg_exps... .
Ok, the whole program is more complicated: the files may have
different syntax, and I have syntax files which tell me how to split
each line in its fields. Then I load separately files with the rules
(the reg_exps) used to filter them.... .
Anyway, my idea was to try to use the FORKS.pm module (s. CPAN) to
split the file in chunks and let each thread work on a chunk of the
file: can somebody tell me how to do this ? Or a better way?
Any help is really appreciated.
Best regards,
Davide