dual core and c

K

kerravon

Assuming I am running a C program that is doing some cpu-intensive
work
such as zip -9, I can understand:

If I have 8 CPUs, then it will make no difference at all to the zip
program,
it will only run on one of the CPUs, although this does allow me to
run 8
separate zips simultaneously, which would be cool on a large site.

But what I don't understand is the concept of a "core", as in "dual
core".
What implications does that have for a C program like zip? Does it
have the ability to look at the instructions ahead of time and
pipeline
them or something? Pipelining is something that has been around for
a long time. Did someone just get the bright idea to call it dual
core
instead or what?

Assume the zip in question is written in C89, no fancy parallelism -
at
least not inherent in the language itself.

Thanks. Paul.
 
W

Walter Roberson

Assuming I am running a C program that is doing some cpu-intensive
work
such as zip -9, I can understand:
If I have 8 CPUs, then it will make no difference at all to the zip
program,
it will only run on one of the CPUs,

Quite possibly true that it would only run on one of the CPUs
(or cores), but you might be interested to know that several
standard compression algorithms turn out to be ameniable to
"divide and conquer" strategies using multiple CPUs or multiple
cores. For some of the algorithms, there would be no loss of
compression ability at all (e.g., there are parallel sorting
algorithms for the block sort phase); for others of the algorithms,
(including standard LZW type of algorithms), there would be a minor
loss of compression efficiency but a fair reduction in CPU time.
 
A

Antoninus Twink

Assuming I am running a C program that is doing some cpu-intensive
work
such as zip -9, I can understand:

If I have 8 CPUs, then it will make no difference at all to the zip
program,
it will only run on one of the CPUs, although this does allow me to
run 8
separate zips simultaneously, which would be cool on a large site.

But what I don't understand is the concept of a "core", as in "dual
core".
What implications does that have for a C program like zip? Does it
have the ability to look at the instructions ahead of time and
pipeline
them or something? Pipelining is something that has been around for
a long time. Did someone just get the bright idea to call it dual
core
instead or what?

And HeathField's attack dogs are unleashed in 3... 2... 1...
 
E

Eric Sosman

kerravon said:
Assuming I am running a C program that is doing some cpu-intensive
work
such as zip -9, I can understand:

If I have 8 CPUs, then it will make no difference at all to the zip
program,
it will only run on one of the CPUs, although this does allow me to
run 8
separate zips simultaneously, which would be cool on a large site.

But what I don't understand is the concept of a "core", as in "dual
core".

<off-topic>

What's usually meant is that a "core" is most of a CPU,
so a "dual-core" processor has two CPUs on one chip, an
eight-core processor has eight, and so on. It's not quite
as hard-and-fast as that, because the chip usually has some
components that are shared by all the cores but that would
be per-CPU resources in a traditional design: Caches, memory
controllers, I/O ports, and so on. But to some fuzzy level
of approximation, a core is a CPU.

What implications does that have for a C program like zip? Does it
have the ability to look at the instructions ahead of time and
pipeline
them or something? Pipelining is something that has been around for
a long time. Did someone just get the bright idea to call it dual
core
instead or what?

<off-topic>

This is a question for a computer architecture forum,
not a question about a programming language.

Assume the zip in question is written in C89, no fancy parallelism -
at
least not inherent in the language itself.

<ON-topic>

The C language is defined in terms of an abstract machine
whose operations are almost entirely sequential. ("Almost"
because there are a few grudging nods toward asynchronous
activities -- signals, some uses of `volatile' -- and because
there's some leeway between "sequence points.") Thus, the C
language itself is almost entirely sequential; there's no
built-in way to express parallelism.

However, implementations of C are not required to model
the abstract machine in every detail. A principle called
the "as-if rule" allows an actual C implementation to play
all manner of optimization games, provided the ultimate outcome
is "as if" the abstract machine had performed the computation.
So if a compiler can detect opportunities for parallelism in a
C program, it's free to exploit those opportunities if it can.
But the C language itself gives you no way to control this --
indeed, by the way "as if" works, it gives you no way to
detect that something has been parallelized.

</ON-topic>
 
K

Kenny McCormack

But what I don't understand is the concept of a "core", as in "dual
core". What implications does that have for a C program like zip?
Does it have the ability to look at the instructions ahead of time
and pipeline them or something? Pipelining is something that has
been around for a long time. Did someone just get the bright idea to
call it dual core instead or what?

And HeathField's attack dogs are unleashed in 3... 2... 1...
[/QUOTE]

Yes. I was quite amazed to see that Roberson managed to post a response
in this thread - that wasn't the usual:

Off topic. Not portable. Cant discuss it here. Blah, blah, blah.

nonsense. He must be slipping. Clique membership possibly in doubt.
 
N

Nelu

<[email protected]> wrote: ...
Yes. I was quite amazed to see that Roberson managed to post a response
in this thread - that wasn't the usual:

Off topic. Not portable. Cant discuss it here. Blah, blah, blah.

nonsense. He must be slipping. Clique membership possibly in doubt.

I was expecting to see you all grown up, maybe posting something useful
and on topic every once in a while... Oh well, I see that, at least, you
have a friend to talk to.
 
P

Paul Hsieh

Assuming I am running a C program that is doing some cpu-intensive
work such as zip -9, I can understand:

If I have 8 CPUs, then it will make no difference at all to the zip
program, it will only run on one of the CPUs,

C, according to the standard, is not a language that is up to the task
of doing anything about parallel or multithreaded programming. There
has been some interesting research into compilers that perform
automatic conversion of some loops to multithreaded object code, but
these are silly research things that only apply to the most obvious
cases, which LZ compression is *not* an example of.

With pretty much any compiler environment you will get exactly one
core/CPU/thread usage here no matter what.
[...] although this does allow me to
run 8 separate zips simultaneously, which would be cool on a large
site.

If you want to run several instances of zip on something like a
webserver (one per connection), then yes that's fine.
But what I don't understand is the concept of a "core", as in "dual
core".

It is essentially the same as dual processor. The difference is more
pronounced on AMD CPUs because they use a shared core-centric memory
controller (this means that memory shared between two different cores
is much faster than between two different processors.) But this is
*way* off topic with respect to the C language specification.
What implications does that have for a C program like zip?
None.

[...] Does it have the ability to look at the instructions ahead of
time and pipeline them or something?  

That's not what dual core means. The "out of order execution" of
these CPUs is what it allows it to do that (and all the modern stuff
does that on a per CPU basis.)
[...] Pipelining is something that has been around for
a long time.

It has. It has nothing to do with dual process or dual core
(excepting in that the concept is to offer more computational
resources.)
[...] Did someone just get the bright idea to call it dual
core instead or what?

Specifically, dual core refers to the specific activity of glueing two
CPU cores into a single microprocessor. Its a chip design thing.
From software's point of view, it should be viewed as if there are
multiple processors in the system.
Assume the zip in question is written in C89, no fancy parallelism -
at least not inherent in the language itself.

With such a program, just running one instance of it straight, on
pretty much any contemporary compiler, it means your 8-core machine
will run at one eighth capacity.

To get more performance, you will need 1) To paralellize the algorithm
(its not obvious that can even be done for zip, which is LZ based
compression/decompression) and 2) Use a programming language/compiler
that supports parallelism. Java supports parallelism out of the box,
but many C compilers also support extensions for parallelism support.
 
W

Walter Roberson

Paul Hsieh said:
To get more performance, you will need 1) To paralellize the algorithm
(its not obvious that can even be done for zip, which is LZ based
compression/decompression)

[OT]

It turns out that parallelizing LZ compression or decompression is
not difficult. The LZ algorithms reset the code dictionary from
time to time, in order to adapt to changes in the data. This is
done by emitting a "flush" token and flushing out all pending
encodings, starting the encoded data again on a byte boundary.
This is the same token that is used at end of file. (A consequence
of this is that you can simply concatenate together several LZ
encoded files and the decompression of the result would be the same
as if the original files had been appended together before compression.)
Different LZ implementations use different strategies for deciding
-when- to flush, but this flushing behaviour is a fairly consistant
property of the family of algorithms.

Thus, in order to parallelize LZ compression, all that has to be
done is to split the file up into pieces, do an LZ compression
independantly on the pieces, and concatenate the encoded pieces
together to form the output. There may be a small loss of encoding
efficiency in doing so (corresponding to the extra output from
flushing at each boundary and rebuilding the data dictionary); on
the other hand, it could also happen that the result was smaller than
it would otherwise be, if it happened that the boundaries chosen
lay near places where the statistical properties changed noticably
and the LZ algorithm would have otherwise delayed resetting the
dictionary until it was more sure that a dictionary reset was
warranted.

For parallel decompression, you have to find the reset tokens
in the encoded data, split the data streams there, and decode
each chunk, concatenating the results.
 
U

user923005

Assuming I am running a C program that is doing some cpu-intensive
work
such as zip -9, I can understand:

If I have 8 CPUs, then it will make no difference at all to the zip
program,
it will only run on one of the CPUs, although this does allow me to
run 8
separate zips simultaneously, which would be cool on a large site.

But what I don't understand is the concept of a "core", as in "dual
core".
What implications does that have for a C program like zip?  

In order for a C program to take advantage of multiple threads of
execution, it will require a non-portable call to create threads or
processes or to use an interface like MPI.
See, for instance:
http://www.cs.cf.ac.uk/Dave/C/node29.html
http://www.mhpcc.edu/training/workshop/mpi/MAIN.html
Does it
have the ability to look at the instructions ahead of time and
pipeline
them or something?  

Usually, the programmer has to take some kind of action, but some
compilers can spit out parallel tasks to some degree.
Pipelining is something that has been around for
a long time.  Did someone just get the bright idea to call it dual
core
instead or what?

Pipelining does not rely upon more than one CPU core. You can have
multiple pipes executing simultaneously on a machine with a single CPU
with only one processor on the CPU.
Assume the zip in question is written in C89, no fancy parallelism -
at
least not inherent in the language itself.

Yes. The Cilk langugage is a modification of C that tries to make
threading more transparent:
http://supertech.csail.mit.edu/cilk/

If you want to actually implement something that works in parallel,
probably pthreads is the closest thing to a ubiquitous interface in
C. It works on Windows and many Unix variants.
http://en.wikipedia.org/wiki/POSIX_Threads

If you are programming in a Unix or POSIX environment, you might try
http://groups.google.com/group/comp.programming.threads/topics?lnk=sg
 
C

CBFalconer

Walter said:
Paul Hsieh said:
To get more performance, you will need 1) To paralellize the
algorithm (its not obvious that can even be done for zip, which
is LZ based compression/decompression)

[OT]

It turns out that parallelizing LZ compression or decompression is
not difficult. The LZ algorithms reset the code dictionary from
time to time, in order to adapt to changes in the data. This is
done by emitting a "flush" token and flushing out all pending
encodings, starting the encoded data again on a byte boundary.
This is the same token that is used at end of file. (A consequence
of this is that you can simply concatenate together several LZ
encoded files and the decompression of the result would be the
same as if the original files had been appended together before
compression.) Different LZ implementations use different strategies
for deciding -when- to flush, but this flushing behaviour is a
fairly consistant property of the family of algorithms.

At least one of us is confused. IIRC LZ is the method used in zip,
and LZW is the earlier technique. LZW required periodic reset and
reformation of the compression table. LZ depends only on the
history over the past (limited size) block. Thus LZ continuously
adapts to the input data, and doesn't leave convenient
opportunities for breaks. I don't think LZW does either, because
the next break location will depend on performance during this
block.

I may well have the identifiers fouled, but not the basic
processes.
 
R

Robbie Hatley

kerravon said:
Assuming I am running a C program that is doing some cpu-
intensive work such as zip -9, I can understand:

If I have 8 CPUs, then it will make no difference at all
to the zip program, it will only run on one of the CPUs,
although this does allow me to run 8 separate zips
simultaneously, which would be cool on a large site.

That's only true if your compiler has no multi-threading
capability, or if it does but you don't use it. To get
multi-threading going, you have to have a compiler that
supports it, and you have to actually use its multi-threaded
capabilities by starting tasks in their own threads.
A single-threaded program is unlikely to run much faster
just because you run it on a multi-core CPU.
But what I don't understand is the concept of a "core",
as in "dual core".

As I understand it, a "core" is almost an entire CPU, with
it's own ALU, registers, etc; as for exactly what is shared
and what is separate, you'd have to RTFM* for your CPU.
What implications does that have for a C program like
zip? Does it have the ability to look at the instructions
ahead of time and pipeline them or something? Pipelining
is something that has been around for a long time.
Did someone just get the bright idea to call it dual core
instead or what?

No, generally a multi-threading compiler will run one thread
per core. Each thread is defined by an entry function, which
usually consists of a for loop which loops until either the
thread is no-longer needed or the application exits. Any
functions called by the thread function will also be a part
of that thread and run on that thread's core.
Assume the zip in question is written in C89, no fancy
parallelism - at least not inherent in the language itself.

The C language itself neither supports nor thwarts parallel
processing. That's a function of your compiler. RTFM* for
your compiler.

My compiler at work (National Instruments LabWindows/CVI) *does*
support parallel processing. It comes with a multi-threading
library, which I make heavy use of, because the program I'm
working on needs to be doing multiple things in real-time, and
it is not acceptable for one task to have to stop and wait for
another to complete. So I put the different tasks in different
threads.

A multi-threaded compiler such as LabWindows/CVI will first look
for multiple cores to run threads in, and if it finds them, it
will attempt to distribute one thread per core.

If it runs out of cores (or if there is only one), it then shares
multiple threads per core by time-sharing (jumping execution
rapidly back and forth between two threads on the same core).

For example, here's how one program I'm working on launches
a couple of new threads:

// Start communications thread:
CmtScheduleThreadPoolFunction
(
DEFAULT_THREAD_POOL_HANDLE, // thread pool handle
CommThread, // thread function
(void*)0,
&thrdCommunications // thread id
);

// Start background thread:
CmtScheduleThreadPoolFunction
(
DEFAULT_THREAD_POOL_HANDLE, // thread pool handle
BackgroundThread, // thread function
(void*)0,
&thrdBackground // thread id
);

If separate CPU cores are available, those will run in
separate cores; otherwise, they'll be time-shared on
the same core.

Multi-threaded programming does take some getting used to, though.
You need to watch out for booboos such as thread A getting to a
certain point then waiting for thread B to complete, with
thread B also having a point where it stops and waits for
thread A to complete. Ooops! Deadlock.

Also, you need to be aware that multiple thread can access
the same global variables. Say thread A stores "57" in
global variable "int argle" and leaves it there for a while
without changing it. Later, thread A comes back and reads
argle and finds that its value is now "87375629". What
happened??? I'll tell you what happened! Thread B wrote
to it, overwriting what thread A had written to it. Gotta
watch out for that. Only use global variables for things
which absolutely MUST be global, especially when doing
multi-threaded programming.


Footnotes:
*RTFM: "Read the Fine Manual". (Or substitute another word for
"fine" if that suits your fancy better.)
 
H

Herbert Rosenau

C, according to the standard, is not a language that is up to the task
of doing anything about parallel or multithreaded programming.

This is not true. The standard does neither allow or forbid an
compiler to generate code that uses multiple CPUs, multiple pipelines
in parallel. So it lefts the job to paralise code for multiple CPUs
only to the compiler.

There
has been some interesting research into compilers that perform
automatic conversion of some loops to multithreaded object code, but
these are silly research things that only apply to the most obvious
cases, which LZ compression is *not* an example of.

The standard lets any freedom to an implementation if it supports
multithreading and/or any other kind of parallelism of handling
multiple CPUs. Only the implementation knows if there are multiple
CPUs avaiilable, if the multiple CPUs can work parallel in a single
thread, if it can assign only one or more CPUs to one or more
processes or threads in parallel. There is nothing in the standard
that allows or forbids that.
With pretty much any compiler environment you will get exactly one
core/CPU/thread usage here no matter what.

Not true. Read the standard carefully and you'll see that it does
explicity allows to parallise instructions. The only requirement on
statements is that they have to be complete done on each sequence
point. It allows even to split off multiple sequences of statements in
parallel.
[...] although this does allow me to
run 8 separate zips simultaneously, which would be cool on a large
site.

If you want to run several instances of zip on something like a
webserver (one per connection), then yes that's fine.

No, even a single instance of zip or something like a webserver (even
a single connection) can use multiple CPUs in parallel. Its left only
to the implementation to use these abilities when they exist.
It is essentially the same as dual processor. The difference is more
pronounced on AMD CPUs because they use a shared core-centric memory
controller (this means that memory shared between two different cores
is much faster than between two different processors.) But this is
*way* off topic with respect to the C language specification.


None.

No, many. It is up to the implementation to coordinate the CPUs in a
manner that at a sequence point a given statement is complete done or
at least to guarantee that it behaves exactly as if it were even as it
is not.
[...] Does it have the ability to look at the instructions ahead of
time and pipeline them or something?  

That's not what dual core means. The "out of order execution" of
these CPUs is what it allows it to do that (and all the modern stuff
does that on a per CPU basis.)

Not true.
Specifically, dual core refers to the specific activity of glueing two
CPU cores into a single microprocessor. Its a chip design thing.
From software's point of view, it should be viewed as if there are
multiple processors in the system.

Yes, exactly.
With such a program, just running one instance of it straight, on
pretty much any contemporary compiler, it means your 8-core machine
will run at one eighth capacity.

No, it depends only on the implementation. When the implementation can
split a a single instruction sequence off to multiple cores in
oarallel it can do that solong the visible result is not different
from a strict sequential one.
To get more performance, you will need 1) To paralellize the algorithm
(its not obvious that can even be done for zip, which is LZ based
compression/decompression) and 2) Use a programming language/compiler
that supports parallelism. Java supports parallelism out of the box,
but many C compilers also support extensions for parallelism support.


--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation
eComStation 1.2R Deutsch ist da!
 
S

santosh

Herbert said:
This is not true. The standard does neither allow or forbid an
compiler to generate code that uses multiple CPUs, multiple pipelines
in parallel. So it lefts the job to paralise code for multiple CPUs
only to the compiler.

I think that Paul means that C does not have any built in constructs for
parallel programming. Neither does it have any standardised library
facilities for this. That's why he said "C, according to the
standard, ...".

Not true. Read the standard carefully and you'll see that it does
explicity allows to parallise instructions. The only requirement on
statements is that they have to be complete done on each sequence
point. It allows even to split off multiple sequences of statements in
parallel.

He is talking about parallelism above the instruction level.

[ ... ]
No, even a single instance of zip or something like a webserver (even
a single connection) can use multiple CPUs in parallel. Its left only
to the implementation to use these abilities when they exist.

Yes, but this is rarely done without explicitly using implementation
specific constructs for threading.

<snip>
 
C

Chris Thomasson

kerravon said:
Assuming I am running a C program that is doing some cpu-intensive
work
such as zip -9, I can understand:

If I have 8 CPUs, then it will make no difference at all to the zip
program,
it will only run on one of the CPUs, although this does allow me to
run 8
separate zips simultaneously, which would be cool on a large site.

But what I don't understand is the concept of a "core", as in "dual
core".
What implications does that have for a C program like zip? Does it
have the ability to look at the instructions ahead of time and
pipeline
them or something? Pipelining is something that has been around for
a long time. Did someone just get the bright idea to call it dual
core
instead or what?

Assume the zip in question is written in C89, no fancy parallelism -
at
least not inherent in the language itself.

Your not going to experience any speedup wrt concurrency and scalability
unless the thread synchronization scheme that you are using is well
engineered. For instance, you have to pad data-structures up to a L2
cacheline size and then align then on l2cache boundary's. You should
minimize the use of membars and interlocked RMW instructions (e.g., try and
avoid the LOCK prefix on x86m, and the membar instruction on SPARC). Try and
use cache friendly algorithms in creative ways (e.g., distributed
ref-counting, RCU, ect...)...

As for single-threaded applications, well they should perform just as good
as they did before. Maybe even a little bit worse. Think of a chip with many
very many cores which run under a low clock speed such that a single-thread
program could end up running on a core that is most likely slower than a
single-chip/core system...
 
P

Paul Hsieh

To get more performance, you will need 1) To paralellize the algorithm
(its not obvious that can even be done for zip, which is LZ based
compression/decompression)

[OT]

It turns out that parallelizing LZ compression or decompression is
not difficult. The LZ algorithms reset the code dictionary from
time to time, in order to adapt to changes in the data. This is
done by emitting a "flush" token and flushing out all pending
encodings, starting the encoded data again on a byte boundary.

Oh. I had no idea. I only know it from the base theoretical ideas
behind it and just assumed that the whole thing was always self-
dependent on the entire state before any emit point. Learn a new
thing every day. Cheers.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,009
Latest member
GidgetGamb

Latest Threads

Top