Semaphores or not?

D

Deniz Dogan

Hello.

I am currently working on a small project (as a hobby) and part of it is
having a JTable with five columns (no need to know more about the actual
columns than the fact that there are five of them). What it's supposed
to show is information from two different text files, let's call them
original.txt and translation.txt on separate columns. Both text files
are formatted as such:

//----------------------------

The text files are separated into paragraphs,
such as this one.

There may or may not be multiple rows right next to each other,
and there is no way of knowing how many rows there are before
a new empty line (the end of a paragraph).

Both of the files have this format.
Both the files have the same amount of paragraphs.

//----------------------------

What I need to do is capture each so called paragraph of each file into
separate Strings, where I replace '\n' with '|' and that I add into the
table data, on two separate columns.

Now to my question:
Is it better to create two threads which read information from each file
and use Semaphore based barrier synchronization to change values of a
row of the table only once (with both the information from original.txt
and translation.txt at once) ...OR... is it better to first sequentially
read original.txt and then use table.setValueAt(Object, int, int) to set
the correct values from translated.txt?

Phew, I hope you understand what I'm trying to say.

/Deniz Dogan
 
E

Eric Sosman

Deniz said:
Hello.

I am currently working on a small project (as a hobby) and part of it is
having a JTable with five columns (no need to know more about the actual
columns than the fact that there are five of them). [... and that they
> get filled with data from different input files ...]

Now to my question:
Is it better to create two threads which read information from each file
and use Semaphore based barrier synchronization to change values of a
row of the table only once (with both the information from original.txt
and translation.txt at once) ...OR... is it better to first sequentially
read original.txt and then use table.setValueAt(Object, int, int) to set
the correct values from translated.txt?

As far as I can see, the only advantage of deferring a bunch
of updates and doing them in one batch would be that you might
avoid firing some events. I doubt that will be a big savings,
but you should measure it if you need to be sure.
 
P

Patricia Shanahan

Deniz Dogan wrote:
....
Is it better to create two threads which read information from each file
and use Semaphore based barrier synchronization to change values of a
row of the table only once (with both the information from original.txt
and translation.txt at once) ...OR... is it better to first sequentially
read original.txt and then use table.setValueAt(Object, int, int) to set
the correct values from translated.txt?

Phew, I hope you understand what I'm trying to say.

I'm assuming the processing is reasonably simple.

If the files are not cached in memory at the time the job runs, it
should run at disk read speed. If both files are on the same disk,
reading one file at a time will tend to be faster because it reduces
disk head movement. If they are on different disks reading them in
parallel may be faster than one at a time, because you get to make
effective use of both disk heads.

If the files are cached in memory, the job's CPU time becomes the
critical factor. On a dual processor, or higher, you may get some gain
from running two threads. However, there is a risk that the chunks of
parallel work may be too small, and the cost of synchronization too
high, for a net gain. Also, you may find data contention between the two
processors messes up caching, reducing performance.

Even if you have separate disk drives or a dual processor, I would start
with the simple single thread implementation, and only consider going to
two threads if this turns out to be performance critical relative to the
whole program.

Patricia
 
D

Deniz Dogan

Patricia said:
Deniz Dogan wrote:
...

I'm assuming the processing is reasonably simple.

If the files are not cached in memory at the time the job runs, it
should run at disk read speed. If both files are on the same disk,
reading one file at a time will tend to be faster because it reduces
disk head movement. If they are on different disks reading them in
parallel may be faster than one at a time, because you get to make
effective use of both disk heads.

If the files are cached in memory, the job's CPU time becomes the
critical factor. On a dual processor, or higher, you may get some gain
from running two threads. However, there is a risk that the chunks of
parallel work may be too small, and the cost of synchronization too
high, for a net gain. Also, you may find data contention between the two
processors messes up caching, reducing performance.

Even if you have separate disk drives or a dual processor, I would start
with the simple single thread implementation, and only consider going to
two threads if this turns out to be performance critical relative to the
whole program.

Patricia

Thank you for your response, Patricia! That was pretty much what I was
thinking as well, and the code actually gets prettier when I read them
sequentially. I'll stick to that way as for now!

/Deniz Dogan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,527
Members
44,999
Latest member
MakersCBDGummiesReview

Latest Threads

Top