An unusual use of case/switch

R

Rosario1903

let's comapare an contrast:

char foo[10]={/*whatever*/};
int index = 7;
index *= index; // KT approved, it's an int!
bar(foo[index]); // no regrets

this is wrong because foo has ten char spaces
and index is 49 in foo[index]
against:

const u32 FIFO1_LWM_INT = 0xC0084001; // address of hw register
const u32 FIFO1_HWM_INT = 0xC0084002; // address of hw register
u32 whocareswhat= FIFO1_LWM_INT*FIFO1_HWM_INT;

what is wrong about above is: the mult operations
pointer mult pointer
integer or pointer mult pointer
are rare

the only operations with pointer are not rare are + and -
pointer + or - unsigned
pointer - pointer
pointer % unsigned

so would be
against:
u32 foo = 0xC0084001;
u32 v=7;

v=v*v;
bar(foo[v]);

that means
bar( *(u32*) ((u8*)foo+v*sizeof(u32)) )

that has one chances to be right
if i not wrote something wrong and understand the post above...
 
R

Rosario1903

On 27 Sep 2013 00:25:12 +0300, Phil Carmody wrote:
so would be
against:
u32 foo = 0xC0084001;
u32 v=7;

v=v*v;
bar(foo[v]);

that means
bar( *(u32*) ((u8*)foo+v*sizeof(u32)) )

wrong:
u8 *foo = 0xC0084001;
u32 v=7;

v=v*v;
bar(foo[v]);

that means
bar( * (foo+v*sizeof(u32)) )
 
R

Rosario1903

so would be
against:
u32 foo = 0xC0084001;
u32 v=7;

v=v*v;
bar(foo[v]);

that means
bar( *(u32*) ((u8*)foo+v*sizeof(u32)) )

wrong:
u8 *foo = 0xC0084001;
u32 v=7;

v=v*v;
bar(foo[v]);

that means
bar( * (foo+v*sizeof(u32)) )

wrong

wrong:
u8 *foo = 0xC0084001;
u32 v=7;

v=v*v;
bar(foo[v]);

that means
bar( * ( foo + v ) )
 
M

Maxim Fomin

27.09.2013 12:07, Rosario1903 пишет:
let's comapare an contrast:

char foo[10]={/*whatever*/};
int index = 7;
index *= index; // KT approved, it's an int!
bar(foo[index]); // no regrets

this is wrong because foo has ten char spaces
and index is 49 in foo[index]
against:

const u32 FIFO1_LWM_INT = 0xC0084001; // address of hw register
const u32 FIFO1_HWM_INT = 0xC0084002; // address of hw register
u32 whocareswhat= FIFO1_LWM_INT*FIFO1_HWM_INT;

what is wrong about above is: the mult operations
pointer mult pointer
integer or pointer mult pointer
are rare

Thanks. Without your help nobody would have spotted the bug.
 
K

Keith Thompson

Phil Carmody said:
we're not.

did you miss the "in ..." clauses nearby the word "address" above?

I'll try to make my point again without being quite so dismissive.
See above. Addresses are addresses. They are defined by the hardware,
your silly C language pretends to model them closeley, and often
suceeds. As bit-strings (not necessarily atomic), they are integers.

I'm a bit surprised that you have an embedded background given your
above "everything revolves around C" attitude.

The fact that something is represented as a bit-string does not imply
that it's an integer. In computer systems, bit-strings are used to
represent everything that can be represented. Yes, addresses are
represented as bit-strings; so are floating-point numbers.

There is a mapping between addresses and integers, exposed in C by the
semantics of pointer-to-integer and integer-to-pointer conversions.
On many systems, that mapping is trivial. On others it may not be.

If you're talking about a particular low-level system, outside the
context of C, it may well be reasonable to treat addresses as integers.
That doesn't mean that addresses *are* integers, and it certainly
doesn't mean that addresses are ints (where "int" is a specific type
that exists in C).

A side point: it's quite common for addresses and "int" to have
different sizes.
You can add, multiply, divide two integers and get a meaningful
result. You can't do that with two addresses.

let's comapare an contrast:

char foo[10]={/*whatever*/};
int index = 7;
index *= index; // KT approved, it's an int!

The result of the multiplication is well defined (it's 49, obviously).
If "index" were a pointer, that operation would be about as meaningful
as the square root of beige.
bar(foo[index]); // no regrets

I'm sure misusing an integer value as an out-of-bounds array index makes
some point, but I don't know what it is.

I can imagine cases where the result of an integer multiplication could
sensibly be used as an array index.
against:

const u32 FIFO1_LWM_INT = 0xC0084001; // address of hw register
const u32 FIFO1_HWM_INT = 0xC0084002; // address of hw register
u32 whocareswhat= FIFO1_LWM_INT*FIFO1_HWM_INT;

which is the more bogus code?

I presume u32 is an unsigned 32-bit integer type. If so, the result of
the multiplication is well defined by C, but if the operands are integer
representations of memory addresses it's probably not meaningful.

In C terms, 0xC0084001 is not an address. It may well be the result of
converting an address to u32, and converting 0xC0084001 to some pointer
type would yield an address. And one can certainly informally talk
about, say, "the address 0xC0084001" -- assuming that makes sense for
the particular system being discussed.
I know you'ved approved the former,, and think that the latter is inherently
evil, but I would ask you to reconsider.

I would ask you to reconsider putting words in my mouth. In particular,
I've said that multiplying two addresses is meaningless, and that
addresses are not integers (in C, of course); I don't know how an
example of multiplying two integers is relevant to that.

To summarize:

In C, addresses (i.e., values of pointer type) are distinct from
integers. They can be converted back and forth; the result of such a
conversion is "intended to be consistent with the addressing structure
of the execution environment", whatever that may be. Very commonly this
makes the conversion trivial, just reintrepreting the bits.

In contexts other than C, it may well make sense to treat addresses and
integers as the same thing, or as very similar things. An address might
logically be an integer index into an "array" that corresponds to the
machine's entire (virtual or physical) memory. Or it might be something
else altogether; it depends on the system.

Since this is a C newsgroup, it is my opinion that it's a good idea to
make the distinction between integers and addresses clear. Among other
things, it might prevent inexperienced C programmers from wondering why

void *ptr = 0xC0084001;

is rejected.

I'm aware that a lot of this is stuff that you know as well as I do.
I'm restating some of it in the hope of figuring out just where we
disagree.

Do you disagree with any of that?
 
S

Seebs

See above. Addresses are addresses. They are defined by the hardware,
your silly C language pretends to model them closeley, and often
suceeds. As bit-strings (not necessarily atomic), they are integers.

I am not convinced this is actually the case. Or rather, I am not convinced
it is absolutely, always, without exception, and universally the case.

Let's take, just as an example, a fairly obscure architecture used for
some chips a while back made by some little company based out of Portland
or thereabouts, which used to have "addressing modes", which had names like
"near" and "far" and such. These denoted states in which an address might
be, say, a pair of 16-bit values which were combined together in such a way
that multiple distinct sets of 32 bits might represent the same address.
Which is to say... I do not think it fair to say that they were "integers"
in the usual sense. Because you couldn't just do integer arithmetic on the
32-bit value, or on either 16-bit value, and expect sensible results in all
cases.
char foo[10]={/*whatever*/};
int index = 7;
index *= index; // KT approved, it's an int!
bar(foo[index]); // no regrets

const u32 FIFO1_LWM_INT = 0xC0084001; // address of hw register
const u32 FIFO1_HWM_INT = 0xC0084002; // address of hw register
u32 whocareswhat= FIFO1_LWM_INT*FIFO1_HWM_INT;
which is the more bogus code?

I know you'ved approved the former,, and think that the latter is inherently
evil, but I would ask you to reconsider.

I think that's a bit unfair. Keith's claim wasn't "it is always and without
exception appropriate and sane to multiply an integer by another integer", but
merely "multiplication is a defined operation on integers".

A rebuttal would look something like:

void *v1 = malloc(1);
void *v2 = malloc(1);

intptr_t a1 = (intptr_t) v1;
intptr_t a2 = (intptr_t) v2;

intptr_t result = a1 * a2;

Is it remotely coherent to talk about multiplying or dividing addresses?
If not, they're not conceptually integers, even if they happen to map onto
memory in a way that makes "integer" a useful way to think about them in
some contexts.

-s
 
P

Phil Carmody

Seebs said:
I am not convinced this is actually the case. Or rather, I am not convinced
it is absolutely, always, without exception, and universally the case.

I apologise, to Keith too, for my perhaps over-the-top flippancy.
Let's take, just as an example, a fairly obscure architecture used for
some chips a while back made by some little company based out of Portland
or thereabouts, which used to have "addressing modes", which had names like
"near" and "far" and such.

That falls under the "not necessarily atomic" clause. The paragraphs
were integral, the offsets were integral. The idea that went through
my mind as I was writing the above was even less flat than the 8086
paragraph/offset model, it was that of paged memory, where there
wasn't even an address wide enough for the whole address, first you
had to page in the appropriate page (with an integral page number),
and then you could address just that page (with an integral offset).

I don't know anyone who works in chip design who doesn't think of
addresses as anything other than (not-necessarily-contiguous ranges
of) integers. C's job is to model that conveniently. Sometimes, in the
case of memory mapped I/O where accesses might not be performed
identically to how real memory is accessed (e.g. real RAM may have
different byte-endianness than the memory-mapped I/O regions), the C
programmer will wish to absolutely prevent the dereferencing of such
"pointers", and may chose to not use C pointers at all. Others, for
example in linux, may use a pointer, but have it annotated for the
compiler as non-dereferenceable. Others just call it a void* and
hope. There's no right way, there's just a collection of not-
dreadfully-wrong ways, and to me the existence of these
contradictory/incompatible approaches shows that C is not perfect in
modelling real world hardware. I therefore don't like to just say
"C has defined what a pointer is, that's the end of the story"

Phil
 
S

Seebs

That falls under the "not necessarily atomic" clause. The paragraphs
were integral, the offsets were integral. The idea that went through
my mind as I was writing the above was even less flat than the 8086
paragraph/offset model, it was that of paged memory, where there
wasn't even an address wide enough for the whole address, first you
had to page in the appropriate page (with an integral page number),
and then you could address just that page (with an integral offset).

Yeah. But... Even though it may be structurally like an integer, I find
it pretty useless to try to think of it as an integer.

Integers are numeric values. It makes sense to talk about multiplying or
dividing them. It makes sense to talk about the average of a set. None of
these make sense when talking about pointers in C.
I don't know anyone who works in chip design who doesn't think of
addresses as anything other than (not-necessarily-contiguous ranges
of) integers.

I don't know many people who work in chip design to begin with. However,
I'm not sure it's necessarily trhe case that a way of thinking about things
that's useful for chip design is a good approach when thinking about C.
There's no right way, there's just a collection of not-
dreadfully-wrong ways, and to me the existence of these
contradictory/incompatible approaches shows that C is not perfect in
modelling real world hardware.

I'd agree, and I'd furthermore state that part of the reason for this is
that real world hardware doesn't all have compatible conceptual designs.
I therefore don't like to just say
"C has defined what a pointer is, that's the end of the story"

What I've found is that I can write a whole heck of a lot of code thinking
about it that way and have no problems. And that I see a lot of buggy code
coming from people who are thinking too hard about what they "know" about
the underlying machine, and not enough about the slightly different rules C
imposes in order to let compilers make good optimization choices.

-s
 
K

Keith Thompson

Seebs said:
What I've found is that I can write a whole heck of a lot of code thinking
about it that way and have no problems. And that I see a lot of buggy code
coming from people who are thinking too hard about what they "know" about
the underlying machine, and not enough about the slightly different rules C
imposes in order to let compilers make good optimization choices.

Indeed. A concrete example: some years ago I saw some code that
was intended to compute the distance in bytes between the memory
locations pointed to by two pointers. The computation was something
like:

char *ptr0 = ...;
char *ptr1 = ...;
int difference = (uintptr_t)ptr1 - (uintptr_t)ptr0;

The person who wrote the code presumably was thinking of pointers
in a way that goes beyond what C says about them. And it would
not work correctly on at least one of the platforms on which the
code needed to run.
 
T

Tim Rentsch

Ian Collins said:
Tim said:
If you have a test case, I'll run the numbers.

Any simple test case will do, as for example (please excuse
typing mistakes):

void
simple_loop( char *p, char *q, int n ){
while( n-- > 0 ) *p = *q++;
}

void
duff_loop( char *p, char *q, int n ){
int k = n/8;
switch( n%8 ) do {
*p = *q++; case 7:
*p = *q++; case 6:
*p = *q++; case 5:
*p = *q++; case 4:
*p = *q++; case 3:
*p = *q++; case 2:
*p = *q++; case 1:
*p = *q++; case 0: ;
} while( k-- > 0 );
}

/* and separately compiled */

extern void simple_loop(), duff_loop();
char a[1234567], b[1234567];

int
main( int argc, char *argv[] ){
void (*f)() = argc < 2 ? simple_loop : duff_loop;
int loop_count = 10000; /* or suitable other */
while( loop_count-- > 0 ) f( a, b, (int) sizeof b );
return 0;
}

and time two invocations (with appropriate number of arguments to
get the two different behaviors).

I added an extra zero to the loop count.

The difference was dramatic:
c99 -fast a.c b.c
time ./a.out
real 0m4.237s

time ./a.out 2
real 0m53.272s

gcc -O3 a.c b.c
time ./a.out
real 1m8.108s

time ./a.out 2

real 0m34.600s

Don't these numbers strike you as suspicious, in particular the
discrepancy between the two functions under the sun compiler?
The difference is well over a factor of 10, and given the counts
involved suggests an operation rate (in the faster measurement)
of more than 25 billion instructions per second. Clearly there
is something unusual going on here - perhaps the result of
targeting a particular benchmark, or the compiler playing a
little bit fast and loose with the semantics, but in any case
surely more than just a better job of unrolling the loop.
Moreover the sun compiler was worse on the duff's device function
than gcc, which shows the sun compiler to be significantly uneven
in the results it achieves. I don't think these results show
very much except that there is something funny going on and other
variations need to be measured before any stronger conclusions
should be drawn.
 
P

Phil Carmody

Naive loop:

Duff's Device:
Don't these numbers strike you as suspicious, in particular the
discrepancy between the two functions under the sun compiler?
The difference is well over a factor of 10, and given the counts
involved suggests an operation rate (in the faster measurement)
of more than 25 billion instructions per second.

A simple 16-chars-at-a-time vectorisation of the naive loop
(which gcc *should* have been able to do too, they specifically
look for loops like these) could easily provide nearly 16x
speed-up on some architectures (with next-to-zero loop overhead).
Add in an uncooperative cache architecture, and the byte writes
could be even more expensive. I wish my Alpha was still running,
as that was notorious for (lack of) byte accesses.

Personally, I'm happy to see compilers remove the need for
uglifying micro-optimisations whereever possible.

I would be interested to see Ian's disassembly, though. (all 4)

Phil
 
J

jgh

Keith said:
With the simple do/while loop and an insufficiently clever compiler, it
has to perform the check (--count > 0) on each iteration. With the
Duff's device version, that check is done only once every 8 iterations;
it spends most of its time running straight-line code:

Ah ha! I remember now. It's the difference between:

loop: movb (r0)+,(r1)+
sob r2,loop

and:
mov r2,r3
bic -8,r3
asl r3
add pc,r3 ; branch according to r2 AND 7 to go0-go7
loop: movb (r0)+,(r1)+
go7: movb (r0)+,(r1)+
go6: movb (r0)+,(r1)+
go5: movb (r0)+,(r1)+
go4: movb (r0)+,(r1)+
go3: movb (r0)+,(r1)+
go2: movb (r0)+,(r1)+
go1: movb (r0)+,(r1)+
sub r2,#8
bne loop

(untested)

JGH
 
T

Tim Rentsch

Phil Carmody said:
Naive loop:


Duff's Device:


A simple 16-chars-at-a-time vectorisation of the naive loop
(which gcc *should* have been able to do too, they specifically
look for loops like these) could easily provide nearly 16x
speed-up on some architectures (with next-to-zero loop overhead).

I don't think this conclusion can be right, unless you're
talking about something different than what I think you mean.
A non-optimized version of the original loop would look like

load indirect
increment pointer
store indirect
decrement count
test and branch

which times 16 would be 80 "instructions". Now suppose the
optimizer magically eliminates all but the fetching of
source bytes. We still are left with 16 "instructions" for
one pass through the unrolled loop, which seems unlikely to
be 16 times faster than the original. [snip]

I admit the performance predictor of "instructions" that I'm
using here is simplified and crude, but even so I wouldn't
expect it to be off by the factor of three that would be
needed to reconcile the discrepancy. Did I misunderstand
your suggestion?
Personally, I'm happy to see compilers remove the need for
uglifying micro-optimisations whereever possible.

Me too, definitely. Unfortunately how well they do is
pretty uneven, so it still can't be taken for granted.
I'd like to see compilers do a better job of smoothing
out the worst cases, and put less emphasis on doing
fantastically well on contrived benchmark cases.
I would be interested to see Ian's disassembly, though. (all 4)

Ditto. I also would like to see some results of some
similar test cases (for example, *--p = *q++, and other
simple variations), to get a better sense of what's really
going on here.

Incidentally, in my own tests, there was a huge difference
between a loop doing *p++ = *q++ for its body, and another
loop doing *--p = *--q for its body.
 
P

Phil Carmody

Tim Rentsch said:
Phil Carmody said:
Naive loop:


Duff's Device:


A simple 16-chars-at-a-time vectorisation of the naive loop
(which gcc *should* have been able to do too, they specifically
look for loops like these) could easily provide nearly 16x
speed-up on some architectures (with next-to-zero loop overhead).

I don't think this conclusion can be right, unless you're
talking about something different than what I think you mean.
A non-optimized version of the original loop would look like

load indirect
increment pointer
store indirect
decrement count
test and branch

which times 16 would be 80 "instructions". Now suppose the
optimizer magically eliminates all but the fetching of
source bytes. We still are left with 16 "instructions" for
one pass through the unrolled loop, which seems unlikely to
be 16 times faster than the original. [snip]

Nope, vectorisation a la SIMD, not mere unrolling.

Preamble
Load 16 bytes
Increment pointer by 16
Store 16 bytes
Decrement count by 16
Test and branch
Remnants

GCC specifically attempts to do this on some architectures/platforms,
but clearly fails on Ian's, as this *p++=*q++ case is presumably one
of the most basic examples.

Still looking forward to an objdump... ;-)

Phil
 
T

Tim Rentsch

Phil Carmody said:
Tim Rentsch said:
Phil Carmody said:
Tim Rentsch wrote:
c99 -fast a.c b.c

Naive loop:

time ./a.out

real 0m4.237s

Duff's Device:

time ./a.out 2

real 0m53.272s

Don't these numbers strike you as suspicious, in particular the
discrepancy between the two functions under the sun compiler?
The difference is well over a factor of 10, and given the counts
involved suggests an operation rate (in the faster measurement)
of more than 25 billion instructions per second.

A simple 16-chars-at-a-time vectorisation of the naive loop
(which gcc *should* have been able to do too, they specifically
look for loops like these) could easily provide nearly 16x
speed-up on some architectures (with next-to-zero loop overhead).

I don't think this conclusion can be right, unless you're
talking about something different than what I think you mean.
A non-optimized version of the original loop would look like

load indirect
increment pointer
store indirect
decrement count
test and branch

which times 16 would be 80 "instructions". Now suppose the
optimizer magically eliminates all but the fetching of
source bytes. We still are left with 16 "instructions" for
one pass through the unrolled loop, which seems unlikely to
be 16 times faster than the original. [snip]

Nope, vectorisation a la SIMD, not mere unrolling.

Preamble
Load 16 bytes
Increment pointer by 16
Store 16 bytes
Decrement count by 16
Test and branch
Remnants
[snip aside on gcc]

Oh I see, you assume 16 bytes can be loaded in parallel
(and the number 16 was not just picked at random but
relates to memory architecture). And this may be how
the performance gain was achieved. At the same time,
it illustrates the need for trying other code sequences,
eg, those which are not (easily) parallelizble in this
way.
Still looking forward to an objdump... ;-)

Re-ditto.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top