Michael said:
I'm writing an application that decodes a file containing binary
records. Each record is a particular event type. Each record is
translated into ASCII and then written to a file. Each file contains
the same events. At the moment each record is processed one after the
other. It taks about 1m40s to process a large file containing 70,000
records. Would my application benifit from multiple threads and mmap?
The answer is a definite maybe. The threads question is highly hardware
dependent. Multiple threads are most effective on machines with
multiple processors. Otherwise, simply increasing the number of threads
does not increase a machine's processing power to like degree. In fact
because switching between threads entails some overhead, it is just as
possible to wind up with too many threads instead of too few when
guessing blindly for the optimal number.
Since you have not provided a detailed description about the
application's current memory use and I/O characteristics, it is
impossible to say whether mmap would help or not. And the first order
of business in any case has to be to profile the current app and find
out how it is spending those 100 seconds. If 90% of that time is in
parsing code, than no, mmap will be unlikely to help. If, on the other
hand, a large portion of that time is spent in disk I/O operations (as
is often the case), then yes, a few large read and write operations
(instead of many little ones) will do more to improve performance than
almost any other type of optimization. But without knowing the extent
to which the current application has optimized its behavior, it's
futile to estimate how much further its performance could be optimized.
If so what is the best way to manage the multiple output files? For
example there are 20 event types. When parsing the file I identify the
event type and build 20 lists. Then have 20 threads each working with
each event file.
Unless the hardware has a lot of multiprocessing capability, 20 threads
sound like far too many. But only profiling and testing various
implementations will be able to find the optimal number of threads for
this app running on a particular hardware configuation.
As for the 20 event types, I would not do anything fancy. If the 20
possible types are fixed, then declaring an array of 20 file handles
with using an enum as an index into that array to find the
corresponding file handle should suffice. Just avoid "magic numbers"
like 20, and define const integral values in their place.
How do I extract this into classes?
I'm not sure that a program that performs a linear processing task
benefits a great deal from classes. Classes (and a class hierarchy)
work best as a dynamic model - often one driven by an ever-changing
series of events (often generated by the user's interaction with the
application). A program that opens a file, parses its contents, closes
the file and declares itself done is really conducting a series of
predictable sequential operations. And the only reason for wanting to
use classes here would be for maintainability (because I can't see that
performance issues would ever mandate implementing classes).
So the question to ask is whether classes would necessarily make the
code more maintainable? A well-designed and implemented class model
should, but otherwise a class model designed for its own sake would
probably be harder to maintain. Because a class hierarchy of any kind,
almost always increases the total complexity of a program (in other
words there is more code). But because code in a well-designed
hierarchy better encapsulates its complexity, a programmer is able to
work on the program's logic in smaller "pieces" (thereby reducing the
complexity that the programmer has to deal with at any one time).
Lastly, maintainability is a separate issue from performance. And one
that should be addressed first. It wouldn't make sense to fine tune the
app's performance if its code is going to be thrown out and replaced
with an object-oriented implemnentation in the final, shipping version.
So to recap: first, decide whether (and then implement, if the decision
is affirmative) a class hierarchy would improve the maintainability of
the source code to such an extent that would justify the additional
work. Second, profile the app to obtain a precise accounting of the 100
seconds it spends processing records. Next, use that profile
information to target bottlenecks: remedy them using standard
optimization techniques (such as using fewer I/O requests by increasing
the size of each request, or, if parsing is the bottleneck, use a table
driven for maximal speed). And lastly the most important point: it's
simply never effective to try to speed up a program, without first
learning why it is so slow.
Greg