Writing Scalabe Software in C++

Stephen Sprunk · Aug 31, 2007

Skybuck Flying said:
Lot's of code will have to be 64 bit.

My guess is the performance impact will be noticeable !

Do some actual _measurements_ and find out, rather than guessing. Emulating
64-bit operations even when not required is almost always cheaper in both
programmer and CPU time than trying to detect and handle cases in which not
to use emulation.

"Rules of Optimization:
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet."
- M.A. Jackson

"More computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason - including blind
stupidity."
- W.A. Wulf

"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil."
- Donald Knuth

S

Stephen Sprunk · Aug 31, 2007

Skybuck Flying said:
The world is not completely 64 bit, The world is not statis it fluctuates.

Sometimes the program only needs 32 bits, sometimes 64 bits.

Always choosing 64 bits would hurt performance LOL.

Not if you have a 64-bit machine; even if you're using a 32-bit machine,
emulating 64-bit operations s will hurts performance less than trying to
detect the appropriate choice and then act on that information.

S

Stephen Sprunk · Aug 31, 2007

Skybuck Flying said:
Absolutely nonsense.

If I want I can write a computer program that runs 32 bit when possible
and 64 bit emulated when needed.

Yes, it's entirely possible to do that.

My computer program will outperform your "always 64 emulated" program WITH
EASE.

No, it won't. Post an actual test program using your method, and I'll
produce a program that does the same thing with my method, and we can
compare runtimes.

The only problem is that I have to write each code twice.

A 32 bit version and a 64 bit version.

No, you can write the code once and compile it twice.

I simply instantiate the necessary object and run it.

First of all, you must pay the cost of determining which type to use. Even
ignoring that, tracking down which code path to execute for that object at
runtime will be slower than simply using 64-bit operations (which may or may
not need to be emulated) all the time.

Absolutely no big deal.

The only undesirable property of this solution is two code bases.

Wrong. You only need one code base, but the poor performance of such a
solution will be a "big deal".

Your lack fo programming language knownledge and experience is definetly
showing.

Are you talking to yourself? _Every single person_ commenting on this
thread is telling you you're wrong.

S

dave · Aug 31, 2007

In comp.arch Stephen Sprunk said:
Yes, it's entirely possible to do that.

No, it won't. Post an actual test program using your method, and I'll
produce a program that does the same thing with my method, and we can
compare runtimes.

No, you can write the code once and compile it twice.

First of all, you must pay the cost of determining which type to use. Even
ignoring that, tracking down which code path to execute for that object at
runtime will be slower than simply using 64-bit operations (which may or may
not need to be emulated) all the time.

Wrong. You only need one code base, but the poor performance of such a
solution will be a "big deal".

Are you talking to yourself? _Every single person_ commenting on this
thread is telling you you're wrong.

And at least one person (me) put him in a kill file after reading the first
3 of his posts.

S

--

Skybuck Flying · Aug 31, 2007

Stephen Sprunk said:
Not if you have a 64-bit machine; even if you're using a 32-bit machine,
emulating 64-bit operations s will hurts performance less than trying to
detect the appropriate choice and then act on that information.

For addition and subtraction probably.

For multiple and division some performance could be reduced for 32 bits but
would still be faster than simulating it.

Whatever the case maybe.

The point is the detection is the overhead if cpu can do the detection that
overhead might disappear !

Bye,
Skybuck.

Skybuck Flying · Aug 31, 2007

Actually what I wrote only applies to doing the check each time.

If the check only has to be done once for large parts of code, there will
definetly be performance gains achievable !

(I already wrote that elsewhere but ok

)

Bye,
Skybuck.

Skybuck Flying · Aug 31, 2007

I don't agree with that.

Write large parts of code, do a check once.

Voila, only problem you will have two codes.

Bye,
Skybuck.

Skybuck Flying · Aug 31, 2007

Now I see what those persons where bitching about.

It's called the instruction prefix which is part of the instruction
encoding.

Which if I interpret the manual correctly means this instruction prefix is
added before each instruction.

That means it's hard coded into the instruction and this means only one mode
can be selected.

So it's not an efficient way to switch modes during runtime, since the
instruction prefixes would need to be changed, which requires changes in
many memory locations.

Finally it might be interesting for compilers that won't to generate
multiple code paths, they might just need a few bit switches while
generating the instructions... but why bother... why not use some other
method to specify the operand size which might be more reliable.

From the manual:
"
The operand-size override prefix allows a program to switch between 16- and
32-bit operand
sizes. Either size can be the default; use of the prefix selects the
non-default size. Use of 66H
followed by 0FH is treated as a mandatory prefix by some SSE/SSE2/SSE3
instructions. Other
use of the 66H prefix with MMX/SSE/SSE2/SSE3 instructions is reserved; such
use may cause
unpredictable behavior.
The address-size override prefix (67H) allows programs to switch between 16-
and 32-bit
addressing. Either size can be the default; the prefix selects the
non-default size. Using this
prefix and/or other undefined opcodes when operands for the instruction do
not reside in
memory is reserved; such use may cause unpredictable behavior.
"

So far this was the 16/32 bit mode some people were bitching about

Now I go look for a 64 bit mode switch.

Bye,
Skybuck.

Skybuck Flying · Aug 31, 2007

However somebody else is still bitching about something else I think:

"
Intel introduced a bit
in the segment descriptor of the executable code, equivalent to your
BitMode variable, to specify whether the default code size is 16 bits
or 32 bits.
"

Where can I find more information about this ? (Me go search in manuals some
more !

)

Bye,
Skybuck.

Frank Birbacher · Aug 31, 2007

Hi!

Skybuck said:
I wouldn't call that "Scalable Software"

It doesn't even scale properly at runtime.

Only one can be chosen at compile time.

So if you use the virtual function approach you can decide at runtime
which of the following you want to support:

- 16bit integer
- 32bit integer
- 32bit integer, emulated
- 64bit integer
- 64bit integer, emulated

combined with any of:

- i286 optimized
- i386 optimized
- i486 optimized
- i586 optimized
- i686 optimized
- 64bit integer instruction set

combined with any of:

- various flavours with different cache sizes

When you got all code paths in one single "über"-library I'll blame you:
it's too big for the 64kB RAM of my i286. On the other hand I'll blame
you if it won't run on:

- 128bit integer instruction set (yet to be invented)

And then we do the same for floating point numbers which come in 32bit,
64bit, 80bit, 128bit already on current CPU. Which can be mixed with or
without the use of ix87, MMX, MMXext, SSE 1 to 3, special addon cards
for math calculation (e.g. physics board for hardcore gamers).

And then I want to use your library on SPARC, PPC, ARM, Alpha, ...

My statements:

- yes, you can do a single check at the start of your program to choose
whether to use 32bit native, 64bit native, or 64bit emulated
- no, you do not have to code three times. you can use the compiler to
generate the code for you by the use of templates
- yes, the code would work without virtual functions
- no, I won't use your library because I only want to pay for what I
need. And I don't need this sort of code bloat.
- no, this approach can not scale to infinity without recompilation of
your library (e.g. using a compiler which can generate 128bit instructions)
- yes, you could use virtual functions and have "plugins" for the
various architecutres. the main library would detect the right "plugin",
load it, and use it
- yes, this would be essentially the same as providing distinct
libraries the first place
- yes, for theory it is nice to think about the polymorphic behaviour

Frank

Frank Birbacher · Aug 31, 2007

Hi!

Skybuck said:
Lol such big statements LOL.

I visited that newsgroup two times.

And I never plan to revisit it again unless I have a really really really
really really strange question.

..oO(OMG, we're stuck with him/her here.)

If you managed to get yourself a bad reputation by visiting the
comp.lang.c group just two times you should be thinking about your
behaviour. You seem to be too enthusiastic.

Most people here stick to a wisdom:
1st collect your thoughts
2nd order your thoughts
3rd communicate
These people don't respond to their own posts ten times.

Frank

Skybuck Flying · Aug 31, 2007

No,

That newsgroup full with retards, that's all LOL.

Bye,
Skybuck.

Miguel Guedes · Aug 31, 2007

Skybuck said:
No,

That newsgroup full with retards, that's all LOL.

Well, this particular one (clc++) isn't.

Bye,
Skybuck.

I wish you stuck to that and flew off in the wind.

MooseFET · Aug 31, 2007

For addition and subtraction probably.

For multiple and division some performance could be reduced for 32 bits but
would still be faster than simulating it.

Multiply doesn't take very long to do. The instructions for it can be
easily inlined. For a divide, finding 2^N/X and then multiplying is
sometimes quicker. There are tricks for doing 2^N/X quickly.

A 256 bit divided by a 64 bit yelding a 64 bit can be done this way in
about 1/3rd the time of the actual divide on an 8 bit machine.

Whatever the case maybe.

The point is the detection is the overhead if cpu can do the detection that
overhead might disappear !

I wouldn't go away. You are adding parts and logic and choices to be
made to every instruction in the CPU, This uses up transistors and
time. Doing stuff takes time.

MooseFET · Aug 31, 2007

Do some actual _measurements_ and find out, rather than guessing. Emulating
64-bit operations even when not required is almost always cheaper in both
programmer and CPU time than trying to detect and handle cases in which not
to use emulation.

"Rules of Optimization:
Rule 1: Don't do it.
Rule 2 (for experts only): Don't do it yet."
- M.A. Jackson

"More computing sins are committed in the name of efficiency (without
necessarily achieving it) than for any other single reason - including blind
stupidity."
- W.A. Wulf

"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil."
- Donald Knuth

I forgot who said these:

No amount of optimizing the implementation of the slow algorithm will
turn it into the fast one.

Optimize after there are zero bugs. There never are zero bugs.

MooseFET · Aug 31, 2007

On Aug 30 said:
No, you can write the code once and compile it twice.

Make that "compile it more than once". The truth may be that it only
gets compiled 1.9 times. If both flavors of the code are seen by the
compiler, at the same time, the compiler may not make twice as much
code as for the single version. It will shift the order of some
instructions to move common ones out of the sections that differ.

[...]

Are you talking to yourself? _Every single person_ commenting on this
thread is telling you you're wrong.

I think he knows he is lost. The insults are to cover the
embarrassment he feels.

hagman · Aug 31, 2007

Yes you missed the other threads, I shall explain again lol:

I want:

1. One code base which adepts at runtime:

2. Uses 32 bit instructions when possible.

3. Switches to 64 bit instructions when necessary (true or emulated).

4. No extra overhead.

As far as I can tell the cpu's for pc's are inflexible:

32 bit data types require 32 bit instructions.

64 bit data types require 64 bit instructions or alternatively:

64 bit data types require multiple 32 bit instructions.

This means it's necessary to code 3 code paths !

I do not want to write code 3 times !

I want to express my formula's and algorithms just one time !

I want the program/code base to adept to the optimal instruction sequences
without actually having to code those three times !

I suggested a "feature extension" to processors: "Flexible Instruction Set".

The idea is to use a BitMode variable to specify to the cpu how it is
supposed to interpret the coded instructions sequences.

So that I can write simple one instruction sequence and only need to change
a single variable.

Many people started bitching that the current cpu's can already do this for
16/32/64.

I have seen no prove what so ever.

Can you provide prove ?

Bye,
Skybuck.

OK, I guess what you want is in a way similar to the D flag in intel
cpu's:
Depending on whether it is set or not, the string assembly instruction
behave differently:
LODSB = move byte from (esi) to al, then increment/decrement edi
depending on D
STOSB = move byte from al to (edi), then increment/decrement esi
depending on D
By this, the same code
1: LODSB
STOSB
LOOP 1b
will eithr copy a string from the location pointed to by esi to edi
either forward (pointers
point to beginning of string) or backward (pointers to end).
Similarly, the fpu has a rounding mode setting that allows to choose
for all
subsequent fpu instructione between
- round to 0
- round down
- round up

Both these flags tend to cause more problem than advantage in many
situations:
Some code in some library may either
a) hope that the flags are set in a way to make its code work
correctly
or
b) save the flag somewhere on entry, set it to the desired value,
restiore the flag on exit
In case a) you better keep your hands off the flags (this is the usual
best practice
for the direction flag).
In case b) you lose performance (this is why most compilers don't use
the otherwise
efficient instruction to convert floats to ints - to work properly
they would have to implement
a lot of overhead just to not interfere with the user's current
rounding configuration)

I'm afraid that your suggestion might suffer from simlar problems.
Of course you may say that you want to add new instructions,
e.g. ADDG (add generic) to ADDB (add byte), ADDW (add word), ADDL (add
long).
where ADDG is ine effect equivalent to one of the others depending on
some copntrol flags.
Firstly, such code would have to produce different microcode depending
on
flag setting and whis might turn out somewhat problematic - well, not
too much, you jsut have to dump the complete instruction cache when
the flags are changed.
The next problem is storage. When running through an array of generic
(but
consistently so) intergers, your step size must vary depending on te
flag setting.
If the code is really generic, you cannot knw the sizer of your data
in the higher level language (e.g. C++ - like).
Note that I'm not talking about classes - generic integer would be a
primitive type
since it is implemented in the cpu!
But still sizeof(generic int) could not be a compile time constant.

IMHO, the resulting code and programming technique would become
too awful for me to like it.

Skybuck Flying · Aug 31, 2007

I have invested the segment selectors and descriptors.

Currently there seems to be no way to specifiy a default operand size of 64
bits ?!?!?

However some bits are reserved for future use and I think these might be
used to implement a default mode for operand size 64 bits.

CS.L = 1 and CS.D = 1 are reserved for future use.

I think this combination of bits could be used to implement default operand
size 64 bits !

However I think this would be related to 64 bit compatibility mode.

I am not sure how usefull that would be.

It might be insanely usefull or not at all ?!? I dont know

Here are some texts I found from the Intel Manual Volume 3A:

"
4.2.1 Code Segment Descriptor in 64-bit Mode

Code segments continue to exist in 64-bit mode even though, for address
calculations, the
segment base is treated as zero. Some code-segment (CS) descriptor content
(the base address
and limit fields) is ignored; the remaining fields function normally (except
for the readable bit
in the type field).

Code segment descriptors and selectors are needed in IA-32e mode to
establish the processor's
operating mode and execution privilege-level. The usage is as follows:

.. IA-32e mode uses a previously unused bit in the CS descriptor. Bit 53 is
defined as the
64-bit (L) flag and is used to select between 64-bit mode and compatibility
mode when
IA-32e mode is active (IA32_EFER.LMA = 1). See Figure 4-2.

- If CS.L = 0 and IA-32e mode is active, the processor is running in
compatibility mode.
In this case, CS.D selects the default size for data and addresses. If CS.D
= 0, the
default data and address size is 16 bits. If CS.D = 1, the default data and
address size is
32 bits.

- If CS.L = 1 and IA-32e mode is active, the only valid setting is CS.D = 0.
This setting
indicates a default operand size of 32 bits and a default address size of 64
bits. The
CS.L = 1 and CS.D = 1 bit combination is reserved for future use and a #GP
fault will
be generated on an attempt to use a code segment with these bits set in
IA-32e mode.

.. In IA-32e mode, the CS descriptor's DPL is used for execution privilege
checks (as in
legacy 32-bit mode).
"

Specifically:

"
If CS.L = 1 and IA-32e mode is active, the only valid setting is CS.D = 0.
This setting
indicates a default operand size of 32 bits and a default address size of 64
bits. The
"

As you can see from the text above, the default operand size remains 32
bits.

There is no way to specify a default operand size of 64 bits.

This makes it impossible to use a segment descriptor to quickly change
operand size from 32 bits to 64 bits or vice versa AT RUNTIME !

Otherwise it might have been possible to change the operand size at runtime
by simply changing the segment descriptor ?

^^^ Only one place for a change to occur ^^^ <- Could be a real nice feature
to convert existing binary code from 32 bit to 64 bit or vica versa with a
single change !

preferrably all at runtime ! <-- Nice idea.

Bye,
Skybuck.

Skybuck Flying · Aug 31, 2007

It's the other people that started the insults not me !

They think they know everything, well that's definetly not the case !

Bye,
Skybuck.

Skybuck Flying · Aug 31, 2007

Give me wings bitch ! =D

Bye,
Skybuck.

Miguel Guedes said:
Well, this particular one (clc++) isn't.

I wish you stuck to that and flew off in the wind.

Binary Arithmetic Add Operator Overloading not compiling, what's wrong ?	8	Aug 29, 2007
Old template class works in VC++ Not in C++ Builder 5	3	Nov 30, 2006
Dynamic indexing (multi-dimensional-indexing) (probably my most important/valuable posting up to thi	30	Jul 1, 2011
comp.lang.c Answers to Frequently Asked Questions (FAQ List)	1	Feb 1, 2004
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 15, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Mar 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Feb 1, 2008
comp.lang.c Answers (Abridged) to Frequently Asked Questions (FAQ)	0	Jan 15, 2008

Writing Scalabe Software in C++

Stephen Sprunk

Stephen Sprunk

Stephen Sprunk

dave

Skybuck Flying

Skybuck Flying

Skybuck Flying

Skybuck Flying

Skybuck Flying

Frank Birbacher

Frank Birbacher

Skybuck Flying

Miguel Guedes

MooseFET

MooseFET

MooseFET

hagman

Skybuck Flying

Skybuck Flying

Skybuck Flying

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads