A
alexandre_paterson
Hi everyone,
I spent some time optimizing an application that first parses a lot
of text files, which can be in various encodings, then processes
these files (the fact that we're forced to parse lots of text files
isn't
under my control nor my decision, that's the way it is and we have
to do it).
The optimization went great, maybe a little too great, hence the
semi-weird question that is coming after the explanation of what I
did.
I noticed wrapping a BufferedReader around a FileInputStream was
unbearably slow for our needs. So the first optimization step was
to read the text files in byte arrays, then dispatch the actual text
decoding to several threads (wrapping BufferedReader(s) around
ByteArrayInputStream(s)). This is all done using Executors, a
producer/consumer pattern to read the files/etc. We're never I/O
bound (except for the very first file read, as soon as one file is
read,
a worker thread starts decoding it), etc.
Note that I fully understand why it was slow and why it's now one
order of magnitude faster, my question isn't about how to parse
text files faster: I managed to obtain basically 100% cpu usage
(for each core) and to never be I/O bound (and it's the best I can
do, seen the heavy crunching we're doing on these files once
they've been decoded, we'll be CPU bound).
The only "problem" is that my workstation, running Linux
(Debian etch), starts to become kinda unresponsive when
running the optimized version of our parser, while everything
was fine using the old, unoptimized parser (except that the
app was one order of magnitude slower to parse the text
files, of course).
The same, optimized, app on MacOS X acts much nicer :
I can see that both cores (on a Core 2 Duo / Mac Mini) are
being fully used, but I can, say, play videos fine (it was just
a test to see how the app was responding).
I realize it's probably the OS's fault here: if MacOS X keeps
behaving correctly then my Linux system probably should too.
So here comes my question :
Seen that some OSes *shall* start to become laggy when one
application consumes a lot of resources (we've *all* seen that
behavior at least once) and seen that I've got no control over
how the OS runs the app (e.g. I can't ask the user to "nice" the
Linux app to lower its priority), should I artifically limit the
resources used by my app?
For example I noticed that by forcing every worker thread to
sleep for a few milliseconds every 'x' files parsed I could both
obtain a very high troughput and have my workstation stay
responsive.
Stated in another way : once I've optimized my application so
that it's not stupidly unnecessarily bound to some resources, should
I start "freeing some of that resource" to make the OS happy?
In my case it's maxxing CPU usage (and it's the behavior I want), so
should I start "giving the CPU a breath" ?
I could see other cases where you would be hard disk bound once you've
optimized the way your app work, would you then start "giving the hard
disk a breath" ?
Or network usage ?
Has anyone here ever ran into that problem and how do you
deal with that?
Now if I accept the fact that artificially limiting the CPU usage
of our app is a kludgy hack, but still I decide that it's the only
acceptable way to solve our problem (our problem being that
several people's OS becomes too unresponsive when running
our "too optimized" app), what would be the correct way to lower
the CPU usage of our Java app, from our Java program (because
we can't ask the user to re-prioritize our app, nor to force it to run
on only 'x' cores, etc.) ?
Is inserting Thread.sleep(...) the way to go ?
Alex
I spent some time optimizing an application that first parses a lot
of text files, which can be in various encodings, then processes
these files (the fact that we're forced to parse lots of text files
isn't
under my control nor my decision, that's the way it is and we have
to do it).
The optimization went great, maybe a little too great, hence the
semi-weird question that is coming after the explanation of what I
did.
I noticed wrapping a BufferedReader around a FileInputStream was
unbearably slow for our needs. So the first optimization step was
to read the text files in byte arrays, then dispatch the actual text
decoding to several threads (wrapping BufferedReader(s) around
ByteArrayInputStream(s)). This is all done using Executors, a
producer/consumer pattern to read the files/etc. We're never I/O
bound (except for the very first file read, as soon as one file is
read,
a worker thread starts decoding it), etc.
Note that I fully understand why it was slow and why it's now one
order of magnitude faster, my question isn't about how to parse
text files faster: I managed to obtain basically 100% cpu usage
(for each core) and to never be I/O bound (and it's the best I can
do, seen the heavy crunching we're doing on these files once
they've been decoded, we'll be CPU bound).
The only "problem" is that my workstation, running Linux
(Debian etch), starts to become kinda unresponsive when
running the optimized version of our parser, while everything
was fine using the old, unoptimized parser (except that the
app was one order of magnitude slower to parse the text
files, of course).
The same, optimized, app on MacOS X acts much nicer :
I can see that both cores (on a Core 2 Duo / Mac Mini) are
being fully used, but I can, say, play videos fine (it was just
a test to see how the app was responding).
I realize it's probably the OS's fault here: if MacOS X keeps
behaving correctly then my Linux system probably should too.
So here comes my question :
Seen that some OSes *shall* start to become laggy when one
application consumes a lot of resources (we've *all* seen that
behavior at least once) and seen that I've got no control over
how the OS runs the app (e.g. I can't ask the user to "nice" the
Linux app to lower its priority), should I artifically limit the
resources used by my app?
For example I noticed that by forcing every worker thread to
sleep for a few milliseconds every 'x' files parsed I could both
obtain a very high troughput and have my workstation stay
responsive.
Stated in another way : once I've optimized my application so
that it's not stupidly unnecessarily bound to some resources, should
I start "freeing some of that resource" to make the OS happy?
In my case it's maxxing CPU usage (and it's the behavior I want), so
should I start "giving the CPU a breath" ?
I could see other cases where you would be hard disk bound once you've
optimized the way your app work, would you then start "giving the hard
disk a breath" ?
Or network usage ?
Has anyone here ever ran into that problem and how do you
deal with that?
Now if I accept the fact that artificially limiting the CPU usage
of our app is a kludgy hack, but still I decide that it's the only
acceptable way to solve our problem (our problem being that
several people's OS becomes too unresponsive when running
our "too optimized" app), what would be the correct way to lower
the CPU usage of our Java app, from our Java program (because
we can't ask the user to re-prioritize our app, nor to force it to run
on only 'x' cores, etc.) ?
Is inserting Thread.sleep(...) the way to go ?
Alex