C++ based fast parser for delimited records

G

garth_rockett

We need to process a very large amount of delimited variable length
ASCII data in files as large as 3-4 gigs. We need a high performance
parser for this and as always, we have no money to buy one. We are ok
with building one as long as that can be done quick enough and I was
wondering if Boost has a panacea for us. Can anyone help with their
ideas / experience.

I am also very open to any suggestions outside Boost. Any outline on
how to build such a parser would be very welcome. If some comparative
performance figures can be mentioned, it would be of tremendous help.
Any fast C++ library would be of help.

We develop a market analytics tool on HP-UX and Linux on 32/64 bits.

Cheers,
Andy
 
I

Ivan Vecerina

: We need to process a very large amount of delimited variable length
: ASCII data in files as large as 3-4 gigs. We need a high performance
: parser for this and as always, we have no money to buy one. We are ok
: with building one as long as that can be done quick enough and I was
: wondering if Boost has a panacea for us. Can anyone help with their
: ideas / experience.
:
: I am also very open to any suggestions outside Boost. Any outline on
: how to build such a parser would be very welcome. If some comparative
: performance figures can be mentioned, it would be of tremendous help.
: Any fast C++ library would be of help.

The parser itself may not be the performance-limiting factor as much
as the technique you use for i/o.

In similar circumstances, I usually use memory-mapping (map, or
MapViewOfFile on Windows) to bring the file (or large segments of it)
into memory. The OS page caching is typically much more efficient
than any file i/o API.

For parsing, I tend to rely on the tried and true flex tool
(http://www.gnu.org/software/flex/). Flex-generated code is very
likely to be faster than boost::spirit (but I have no data).
A hand-coded parser might be fastest if the structure of the
records is simple enough.
Maybe you can just delimit lines and use sscanf?

: We develop a market analytics tool on HP-UX and Linux on 32/64 bits.

Wishing you success - Ivan
 
M

Marc Mutz

We need to process a very large amount of delimited
variable length ASCII data in files as large as 3-4
gigs. We need a high performance parser for this and as
always, we have no money to buy one. We are ok with
building one as long as that can be done quick enough
and I was wondering if Boost has a panacea for us. Can
anyone help with their ideas / experience.

Use Boost.Spirit:
http://www.boost.org/libs/spirit/index.html

Marc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,020
Latest member
GenesisGai

Latest Threads

Top