What's more important to know: "Big-O" scientology, or Assembly Language?

J

joe

(I thought about the capitalization of the subject a little, "little"
being the key (No!!! Not you DBA, go back to sleep.) word).

What is more important? Oh? I presented it "wrong"? Well, you know what I
meant. So, which is more important? (Obviously I'm not soliciting a
definitive "answer".)
 
P

Phlip

(I thought about the capitalization of the subject a little, "little"
being the key (No!!! Not you DBA, go back to sleep.) word).

What is more important? Oh? I presented it "wrong"? Well, you know what I
meant. So, which is more important? (Obviously I'm not soliciting a
definitive "answer".)

Please tell us u r just joking about complexity theory (big-O)
relating to $cientology.

However, you just asked "what tool do I use?" without explaining what
project you target.

If you want to shred (which is most of the reason to use C++), you
need both big-O and assembler. You need to be able to estimate (and
profile) your code's performance over data sets of various sizes. And
you need to understand which C++ statements map onto what general
kinds of Assembler opcodes (push, pop, branch, fetch, etc.).

So the answer is yes.
 
J

Juha Nieminen

Phlip said:
And you need to understand which C++ statements map onto what general
kinds of Assembler opcodes (push, pop, branch, fetch, etc.).

I don't think I would fully agree with that. Exactly what does it matter
what kind of assembler opcodes the compiler is generating from your code?

In fact, I would say the exact opposite: You should not worry which
opcodes the compiler is generating, mainly because the compiler will
often generate wildly different opcodes for the same C++ code depending
on the compiler itself, the target architecture and the amount of
optimizations. The exact opcodes being generated is rather irrelevant
from the point of view of programming in C++.

That's not to say that *everything* that is extremely low-level is
completely irrelevant. For example, there may be situations where memory
locality and cache optimization may affect the efficiency of the program
very significantly (eg. your program could become an order of magnitude
faster if you do something more cache-optimally). However, this has nothing
to do with exactly what assembler opcodes the compiler is generating.
 
P

Phlip

  I don't think I would fully agree with that. Exactly what does it matter
what kind of assembler opcodes the compiler is generating from your code?

C++ is a C language, which is a kind of "portable assembler".

The point of using C++ is writing high-level code that compiles very
tight and runs very fast. If you don't need small footprint and high
speed, then use a language such as Ruby, which is gentler to the
programmer, and harsher to the CPU.
  In fact, I would say the exact opposite: You should not worry which
opcodes the compiler is generating, mainly because the compiler will
often generate wildly different opcodes for the same C++ code depending
on the compiler itself, the target architecture and the amount of
optimizations. The exact opcodes being generated is rather irrelevant
from the point of view of programming in C++.

Most of the time you do not worry about the specific opcodes.

C++ is defined in terms of a virtual architecture. That's not poetry,
it's how the Standards work. C++ calls a function by pushing arguments
onto a stack or into registers, then pushing its own address onto the
stack, then jumping into another address. That address is expected to
pop the arguments off the stack and use them, then pop the return
address off and jump back to it.

You don't think of any of that when you code. And someone could port C+
+ to an architecture, such as DNA or photonics, which does not have a
stack, or pushes, or pops, etc.

A syntax error at compile time represents an inability to create valid
opcodes, such as codes that align data types and copy them around.
Disregard that and fix the syntax error.

And the language allows many dodges, from unions to reinterpret_cast,
to allow you to turn off that syntax, and enforce the behavior that
you want. For example, indexing off the end of a minor dimension of a
multidimensional array is well-defined, if you are still within the
major dimensions.

And when you make a mistake and have to diagnose a crash,
understanding all the ways you can stomp the CPU sure helps! C++ gets
as close as possible to the Von Neumann Architecture, of pointers,
registers, accumulators, opcodes, and stacks, while remaining
portable. This also explains the 1001 "undefined behaviors" it can
generate.
  That's not to say that *everything* that is extremely low-level is
completely irrelevant. For example, there may be situations where memory
locality and cache optimization may affect the efficiency of the program
very significantly (eg. your program could become an order of magnitude
faster if you do something more cache-optimally). However, this has nothing
to do with exactly what assembler opcodes the compiler is generating.

The point of using C++ is writing fast, tight code via some awareness
what the operations cost. Learning assembler (even for another
architecture) helps that, even when you only keep it in the back of
your mind.
 
J

Juha Nieminen

Phlip said:
C++ is a C language, which is a kind of "portable assembler".

The point of using C++ is writing high-level code that compiles very
tight and runs very fast. If you don't need small footprint and high
speed, then use a language such as Ruby, which is gentler to the
programmer, and harsher to the CPU.

I still don't see why you should worry which precise opcodes are being
generated from your C++ code by the compiler. Still feels completely
irrelevant.
Most of the time you do not worry about the specific opcodes.

Most of the time? Can you give an example of a situation where knowing
the exact opcodes is relevant in any way?
C++ is defined in terms of a virtual architecture. That's not poetry,
it's how the Standards work. C++ calls a function by pushing arguments
onto a stack or into registers, then pushing its own address onto the
stack, then jumping into another address. That address is expected to
pop the arguments off the stack and use them, then pop the return
address off and jump back to it.

It may be interesting to know that a C/C++ program basically divides
memory into two sections: The so-called "heap" and the so-called "stack",
and that when you call a function, the return address and the function
parameters are put in the stack, from where the function in question can
access them. (This may be interesting information because then you will
know why in some cases your program is termitating due to running out of
stack space.)

However, it's still rather irrelevant which opcodes exactly the compiler
is using to do this.
And the language allows many dodges, from unions to reinterpret_cast,
to allow you to turn off that syntax, and enforce the behavior that
you want. For example, indexing off the end of a minor dimension of a
multidimensional array is well-defined, if you are still within the
major dimensions.

I still don't see the relevance of knowing which opcodes are being
created from your C++ code.
The point of using C++ is writing fast, tight code via some awareness
what the operations cost. Learning assembler (even for another
architecture) helps that, even when you only keep it in the back of
your mind.

Knowing how the CPU architecture works doesn't require you knowing the
exact opcodes which the CPU is executing. In the example above, you can
perfectly well understand how data caching works in modern CPUs without
knowing even one single assembler opcode.
 
Ö

Öö Tiib

  I still don't see why you should worry which precise opcodes are being
generated from your C++ code by the compiler. Still feels completely
irrelevant.



  Most of the time? Can you give an example of a situation where knowing
the exact opcodes is relevant in any way?

Probably not exact opcodes. Just the facts like that adding int to
pointer involves multiplication. It is clear if just to think about
it. Some realize it (and other as "surprising" things) first time when
eyeballing the opcodes. It is easier to see than to think. So opcodes
are one way of understanding things to the very bottom of actual
truth.
 
P

Phlip

  I still don't see why you should worry which precise opcodes are being
generated from your C++ code by the compiler. Still feels completely
irrelevant.

general... precise. Violent agreement achieved.

(BTW, just to show off the tiny bit of assembler I know, a "precise
opcode" is impossible. Each one is really the name of a whole category
of similar machine language instructions that each take different
argument types.)
  Most of the time? Can you give an example of a situation where knowing
the exact opcodes is relevant in any way?

I declared something volatile, and it has a bug, so I debug thru the
opcodes, looking to see if it accidentally got aliased.

Other than that, I said "some awareness", not exact opcodes, so you
got some straw going there.
 
P

Phlip

Probably not exact opcodes. Just the facts like that adding int to
pointer involves multiplication. It is clear if just to think about
it. Some realize it (and other as "surprising" things) first time when
eyeballing the opcodes. It is easier to see than to think. So opcodes
are one way of understanding things to the very bottom of actual
truth.

And multiplying too high numbers causes (well defined?) roll-over.
Simply to allow the CPU to use its most primitive & fastest multiplier
as often as possible.
 
J

James Kanze

I don't think I would fully agree with that. Exactly what does
it matter what kind of assembler opcodes the compiler is
generating from your code?

More to the point, how can you know, since different hardware
has different sets of machine instructions. (I've used C on
a machine with no push or pop instructions.) And even on the
same machine, different compilers may map things differently,
and use different instructions.

[...]
That's not to say that *everything* that is extremely
low-level is completely irrelevant. For example, there may be
situations where memory locality and cache optimization may
affect the efficiency of the program very significantly (eg.
your program could become an order of magnitude faster if you
do something more cache-optimally). However, this has nothing
to do with exactly what assembler opcodes the compiler is
generating.

Yes, but even then, an O(n ln n) algorithm with poor locality
will outperform an O(n^2) algorithm with good locality. (And
such locality issues are often very machine dependent.)
 
J

James Kanze

[...]
Most of the time? Can you give an example of a situation where
knowing the exact opcodes is relevant in any way?

When you're implementing a compiler. Or when the error is due
to a bug in the compiler (especially when it only occurs when
optimization is turned on).

[...]
It may be interesting to know that a C/C++ program basically
divides memory into two sections: The so-called "heap" and the
so-called "stack",

Actually, there are more parts than that: most good compilers
will use 5 or 6 distinct segments, if not more. (At a minimum:
one for code, one for the stack walkback information used when
handling an exception, one for variables with static lifetime,
a heap, a stack per thread, a place to store the exception
during stack walkback, etc., etc.)
and that when you call a function, the return address and the
function parameters are put in the stack,

Or in registers, or any other place where the called function
can find it.

[...]
This is, of course, completely false. To begin with, there are
no multidimensional arrays in C++, the closest one can come is
an array of arrays. And out of bounds indexing is undefined
behavior, regardless of where it occurs.

The point of using C++ is that it is the best alternative you
have available for some particular project. Generally speaking,
C++ gives you the possibility of writing very well engineered
software, which is very important in large projects. The fact
that it usually produces fairly efficient code isn't always that
important; the fact that it supports large scale software
engineering, and has a wide choice of compilers readily
available is.
 
J

James Kanze

On 22 aug, 22:07, Juha Nieminen <[email protected]> wrote:

[...]
Probably not exact opcodes. Just the facts like that adding
int to pointer involves multiplication.

Except that at the machine code level, it normally doesn't.
It is clear if just to think about it. Some realize it (and
other as "surprising" things) first time when eyeballing the
opcodes. It is easier to see than to think. So opcodes are one
way of understanding things to the very bottom of actual
truth.

I started as a hardware engineer and an assembler programmer.
I'm familiar with several machine instruction sets. About the
only time I ever think of them today is with regards to
threading issues (where current C++ doesn't offer ways of
generating some of the machine instructions you need, like
membar).
 
J

Juha Nieminen

James Kanze said:
there are
no multidimensional arrays in C++, the closest one can come is
an array of arrays.

If "int array[10][20]" is not a multidimensional array, then what is,
in your opinion, a multidimensional array?
 
J

Juha Nieminen

James Kanze said:
Yes, but even then, an O(n ln n) algorithm with poor locality
will outperform an O(n^2) algorithm with good locality. (And
such locality issues are often very machine dependent.)

It depends on the amount of data being handled. With large amounts of
data the former will, of course, always beat the latter. However, with
small amounts of data it may become less evident which is better.

(This may be relevant even if the program must be prepared to handle
any amount of data. It can make a choice of algorithm depending on the
amount of data given to it, and sometimes depending on the type of data.)
 
Ö

Öö Tiib

On 22 aug, 22:07, Juha Nieminen <[email protected]> wrote:

    [...]
Probably not exact opcodes. Just the facts like that adding
int to pointer involves multiplication.

Except that at the machine code level, it normally doesn't.

Depends what is 'normally'. It does not when that int is compile time
constant, so it also multiplies compile time. However:

struct Blob { char data_[77]; };
Blob* test( Blob* pBlob, int i ) { return pBlob + i };

That test() compiles to (if to put entry pushes and exit pops aside):

mov eax, dword ptr
imul eax,eax,4Dh
add eax, dword ptr [pBlob]

MSVC 2008 for Win32, so it is also ... quite 'normally'?
I started as a hardware engineer and an assembler programmer.
I'm familiar with several machine instruction sets.  About the
only time I ever think of them today is with regards to
threading issues (where current C++ doesn't offer ways of
generating some of the machine instructions you need, like
membar).

Then you had unusual path. For most others ... seeing what compiler
does might be quite educating, even if thinking about it when not
needed may lead to bad habits.
 
J

James Kanze

On 22 aug, 22:07, Juha Nieminen <[email protected]> wrote:
[...]
Most of the time? Can you give an example of a situation
where knowing the exact opcodes is relevant in any way?
Probably not exact opcodes. Just the facts like that
adding int to pointer involves multiplication.
Except that at the machine code level, it normally doesn't.
Depends what is 'normally'. It does not when that int is
compile time constant, so it also multiplies compile time.

struct Blob { char data_[77]; };
Blob* test( Blob* pBlob, int i ) { return pBlob + i };
That test() compiles to (if to put entry pushes and exit pops aside):
mov eax, dword ptr
imul eax,eax,4Dh
add eax, dword ptr [pBlob]

MSVC 2008 for Win32, so it is also ... quite 'normally'?

If the int is a constant, the compiler will often convert it
into appropriate shifts, etc. For things like an array of
double, the Intel has special addressing instructions which make
the multiplication superfluous. (It would be interesting to see
what happens if you change the 77 to e.g. 128.)

Of course, if you turn up optimization, and actually use the
function, who knows what will happen. Don't expect to
systematically find a multiplication, however.
Then you had unusual path. For most others ... seeing what compiler
does might be quite educating, even if thinking about it when not
needed may lead to bad habits.

As part of your general culture, I think a software engineer
should have some knowledge as to how a computer works, and
probably a bit of assembler. (But for what machine?) But the
issue concerned knowing what sort of machine instructions are
emitted for certain operations.
 
J

James Kanze

If "int array[10][20]" is not a multidimensional array, then
what is, in your opinion, a multidimensional array?

"int array[10][20]" is an array of 20 arrays of 10 int. It
behaves somewhat like a multidimensional array, but not
completely; something like:
int (*p)[10] = array;
wouldn't be legal if it were a multidimensional array.
 
A

Alf P. Steinbach /Usenet

* James Kanze, on 24.08.2010 17:10:
If "int array[10][20]" is not a multidimensional array, then
what is, in your opinion, a multidimensional array?

"int array[10][20]" is an array of 20 arrays of 10 int. It
behaves somewhat like a multidimensional array, but not
completely; something like:
int (*p)[10] = array;
wouldn't be legal if it were a multidimensional array.

No problem!

x.cpp:8: error: cannot convert 'int (*)[20]' to 'int (*)[10]' in initialization


He he. :)


Cheers,

- Alf

PS: Like you I have to figure out each time what that declaration really means.
But my memorization rule works: last index varies fastest. And the reason why I
can't remember it without that, is that it *is* a multidimensional raw array
(not a jagged array) and such beasts are very rare in actual code. A C++
multidimensional raw array *is* an array of arrays. E.g., K&R included a whole
section called "Pointers vs. Multi-dimensional Arrays".
 
J

Juha Nieminen

James Kanze said:
If "int array[10][20]" is not a multidimensional array, then
what is, in your opinion, a multidimensional array?

"int array[10][20]" is an array of 20 arrays of 10 int. It
behaves somewhat like a multidimensional array, but not
completely; something like:
int (*p)[10] = array;
wouldn't be legal if it were a multidimensional array.

Well, I didn't ask why "int array[10][20]" isn't a multidimensional
array (and, frankly, your argument wasn't so convincing, IMO). I asked
what *is* a multidimensional array in your opinion.
 
J

James Kanze

James Kanze said:
there are no multidimensional arrays in C++, the closest
one can come is an array of arrays.
If "int array[10][20]" is not a multidimensional array, then
what is, in your opinion, a multidimensional array?
"int array[10][20]" is an array of 20 arrays of 10 int. It
behaves somewhat like a multidimensional array, but not
completely; something like:
int (*p)[10] = array;
wouldn't be legal if it were a multidimensional array.
Well, I didn't ask why "int array[10][20]" isn't
a multidimensional array (and, frankly, your argument wasn't
so convincing, IMO).

It wasn't "my argument". It's what the standard and the
language specification says.
I asked what *is* a multidimensional array in your opinion.

An array which requires more than one index to access?

In the end, it is largely a question of terminology. Barton and
Nackman refer to the [] operator as a projection operation, and
not indexation. Which is not the terminology of the standard,
but does allow considering things like array[10][20] a two
dimensional array. But this leads to other unexpected results:
C++ doesn't support indexing of C style arrays, only taking
projection on them. (Of course, the projection of a one
dimensional array will be a scalar.) The C++ standard uses
a particular terminology, which happens to work fairly well, so
I tend to stick to it.

Of course, at a higher level, you can effectively use arrays of
arrays as if they were two dimensional arrays, and in a lot of
contexts, the distinction isn't important.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top