assembly in future C standard

K

Keith Thompson

Rod Pemberton said:
Keith Thompson said:
Rod Pemberton said:
Theoretically it seems possible to develop a subset of assembly
languages which are used by motern CPUs and include in a C standard a C
library which would allow user to use the assembly subset. Since it is
a limited subset and its sintax is goverened by C it should not present
portability challenges with possible exception of older systems. any
thoughts about this?

You should ignore any response Healthfield gives to your question.
[snip]

That's really bad advice.

How so? It appears that your response is incorrectly biased...

Here's what Richard wrote:

| Would that it were true - but it isn't. There is no such thing as "assembly
| language". There are, rather, a great many assembly languages. One
| particular assembly language may well be portable between two or even more
| OSs, and yet not be portable between two different assemblers on the same
| OS. One assembly language may be portable between two different assemblers
| on the same OS, and yet not be portable to some other OS.
|
| [...]
|
| It depends what you need. But the best solution to your immediate problem -
| that of performance - lies in choosing better, faster algorithms and
| implementing them well. You have a great many gains to realise from doing
| this; if you do it well, you may well decide that you have no need for any
| assembly language after all. Implementing your current algorithms in some
| assembly language or other is unlikely to result in significant performance
| improvements.

Is that what you're referring to?
Heathfield's response was totally incorrect and borders upon incompetence.

No, he was quite correct. If you disagree, by all means say so;
there's no need to make it personal.
My response was the best introduction to the minimal necessary interface to
C and various C libraries and other languages that he'll ever get. He won't
find anything even remotely close in hundreds, if not thousands, of
programming books.

That may or may not be true; I was commenting specifically on your
advice about what Richard wrote, not on the rest of your article.
 
S

Stephen Sprunk

fermineutron said:
Well, it seems to me that it is not so much the speed gain as a
functionality gain that can be realized from assembly language.

Ah, but _which_ assembly language? There are hundreds of them, and they
contain orthogonal feature sets. Any common subset, if there even is
one, would be constrained to what you can express in C itself, rendering
the concept useless.

That also ignores the fact that the C standard's use of the word
"translator" allows for interpreters, which may not have any underlying
assembly language at all, and not just compilers.
For example, a while back i wrote a simple C profiler, which parces
C file and inserts RDTSC statements before and after each C statement,
hence determining the number of clock cycles it took to execute that
line of code. Now the only compiler that my profiler will work with is
lcc
because the RDTSC is a part of intrinsics library of LCC, but it is
not
a part of BC++ 5.02 for example. Had there been a full support for
assembly code within C I could have used inline assembly to do this
and not rely on intrinsics library of LCC.

How exactly is your program supposed to work on platforms that don't
have _any_ assembly instruction similar to RDTSC?

If you want to limit yourself to x86 platforms, you could have easily
created a macro that expands to compiler-specific assembly or
intrinsics. Your program would then add that macro definition to each
file it modified, and then insert calls to that macro instead of to
LCC's nonstandard intrinsic. That's about as "portable" as you're going
to get when you're looking for such implementation-dependent
functionality, and having a "standard" assembly language would merely do
the same thing (perhaps encapsulated in a <stdasm.h> header file to be
neat).

Another point is that once you instrument code, it necessarily changes
what the optimizer can do with it, and you're no longer measuring how
the original code performs but rather how the instrumented code
performs. It's quite easy to design a scenario where algorithm A runs
faster than algorithm B when instrumented, but B runs faster than A when
not instrumented. Physics has something similar called the Heisenberg
Uncertainty Principle -- the very act of measuring something necessarily
changes the value you were trying to measure.
To the best of my knowlege most of modern C compilers produce assembly
code whih is as good as one could optimize it by hand, so clearly
there
is no speed gain from asm for a general purpose code.

That is fundamentally false. A compiler will never produce code faster
than a competent human can, because the human can start with the
compiler's output and tune from there. Good asm programmers can improve
on that by a factor of two in many cases, and five in exceptional ones.
If you want the big gains, though, you need to change the algorithm
completely as RH noted, and you can typically do that while staying in
C.
Theoretically it seems possible to develop a subset of assembly
languages which are used by motern CPUs and include in a C standard a
C
library which would allow user to use the assembly subset. Since it is
a limited subset and its sintax is goverened by C it should not
present
portability challenges with possible exception of older systems. any
thoughts about this?

"with possible exception of older systems" is a large part of the point.
C runs on thousands of different implementations, and any changes to
Standard C must be available on all of them. There's a reason the
Standard is so vague on so many fundamental details, e.g. sizeof(int),
and that's so C can be available on the widest possible range of
platforms. When you start ignoring certain systems because they're
inconvenient, you're limiting the applicability of the standard.

That's fine in your own code, because you can make a judgement call
about how portable you want your code to be, but the ISO folks don't get
to make calls like that. They have to produce something that works on
everything from wristwatches to supercomputers to whatever we'll be
using 20 years from now. You don't get that by ignoring what's
inconvenient.

S
 
R

Richard Heathfield

Rod Pemberton said:

I'm sorry that I unintentionally mispelled of your name out of a thousand
or
so. You know that I haven't mispelled it elsewhere.

You know that, and I know that, but the Google archives disagree with us
both.
I guess it was a
convenient excuse for you to ignore the fact that your response was
totally incorrect and borders upon incompetence.

I stand by the accuracy and correctness of my response.
 
J

John Bode

fermineutron wrote:

[snip]
Theoretically it seems possible to develop a subset of assembly
languages which are used by motern CPUs and include in a C standard a C
library which would allow user to use the assembly subset. Since it is
a limited subset and its sintax is goverened by C it should not present
portability challenges with possible exception of older systems. any
thoughts about this?

Here's a proposed set:

ARG : Agree to Run Garbage
BDM : Branch and Destroy Memory
CMN : Convert to Mayan Numerals
DDS : Damage Disk and Stop
EMR : Emit Microwave Radiation
ETO : Emulate Toaster Oven
FSE : Fake Serious Error
GSI : Garble Subsequent Instructions
GQS : Go Quarter Speed
HEM : Hide Evidence of Malfunction
IDD : Inhale Dust and Die
IKI : Ignore Keyboard Input
IMU : Irradiate and Mutate User
JPF : Jam Paper Feed
JUM : Jeer at Users Mistake
KFP : Kindle Fire in Printer
LNM : Launch Nuclear Missiles
MAW : Make Aggravating Whine
NNI : Neglect Next Instruction
OBU : Overheat and Burn if Unattended
PNG : Pass Noxious Gas
QWF : Quit Working Forever
QVC : Question Valid Command
RWD : Read Wrong Device
SCE : Simulate Correct Execution
SDJ : Send Data to Japan
TTC : Tangle Tape and Crash
UBC : Use Bad Chip
VDP : Violate Design Parameters
VMB : Verify and Make Bad
WAF : Warn After Fact
XID : eXchange Instruction with Data
YII : Yield to Irresistible Impulse
ZAM : Zero All Memory
PI : Punch Invalid
POPI : Punch Operator Immediately
RASC : Read And Shred Card
RPM : Read Programmers Mind
RSSC : Reduce Speed, Step Carefully (for improved accuracy)
RTAB : Rewind Tape and Break
RWDSK : ReWind DiSK
SPSW : Scramble Program Status Word
SRSD : Seek Record and Scar Disk
WBT : Water Binary Tree

Basically what you're asking for is an assembly language version of
Java. Is that *really* what you want?
 
J

Jordan Abel

2006-10-30 said:
fermineutron wrote:

[snip]
Theoretically it seems possible to develop a subset of assembly
languages which are used by motern CPUs and include in a C standard a C
library which would allow user to use the assembly subset. Since it is
a limited subset and its sintax is goverened by C it should not present
portability challenges with possible exception of older systems. any
thoughts about this?

Here's a proposed set:

(snip)

No HCF?

Where'd you get these anyway?
 
C

Chris Torek

In general _asm() will interact very badly with the optimizer,
specially the peephole optimizer of lcc-win32. Since
the optimizer is geared to code produced by the compiler, it will
get confused by your asm statements.

To avoid this problems, gcc has developed a language that
has been always a closed book for me, where you describe your
assembly statements to the compiler.

That language is a pure horror, I have never been able to understand
it even after some time spent (wasted?) in it.

Actually, it is a pretty straightforward system[%], using a constraint
model and gcc's internal "register transfer language" as the basic
building blocks. What you stick in a gcc "asm" statement is
essentially the same as what you stick in the machine description
file that gcc uses to produce machine code in the first place.
(There are some limitations and problems with this, that have
Special Hacks for them, more of which are accessible from the .md
files than from asm() constructs.)
Obviously that is a better solution that asm, but it
would be very difficult to standardize.

In particular, it ties you to RTL, and to those things that are
in the original .md file.

As an example, consider asm() constructs for x86 targets compared
to those for m68k targets. The "d" constraint on x86 refers to
the %edx register, but the "d" constraint on m68k refers to any
available data register (as opposed to any available address
register, which uses the letter "a" -- which on the x86 means the
%eax register).

Since the 680x0 has no %edx register, it makes no sense to tell
the asm() line that, e.g., the output of the instruction is in the
edx/eax register pair. Of course, the 680x0 has no "rdtsc"
instruction either.

[% Some of the corner cases get sticky: when do you use the "&"
constraint vs the "+" constraint, for instance? Use the wrong one
and the reload pass can get confused.]
 
A

Arthur J. O'Dwyer

(snip)

No HCF?

Where'd you get these anyway?

At first I thought that was a stupid question... but googling "tangle
tape and crash" turned up 64 hits. So, your question wasn't stupid;
there are at least 64 possible answers, and we have no way of knowing
which is correct until John responds!

BTW, googling "Punch Invalid" turns up a much longer list:
http://www.physics.ohio-state.edu/~bcd/humor/instruction.set.html
I do wonder if "Punch Invalid" had originally been a real error
message on a mainframe somewhere (relating to a broken or missing
cardpunch), and made it into these lists due to the double entendre. :)

-Arthur
 
R

Richard Harter

At first I thought that was a stupid question... but googling "tangle
tape and crash" turned up 64 hits. So, your question wasn't stupid;
there are at least 64 possible answers, and we have no way of knowing
which is correct until John responds!

BTW, googling "Punch Invalid" turns up a much longer list:
http://www.physics.ohio-state.edu/~bcd/humor/instruction.set.html
I do wonder if "Punch Invalid" had originally been a real error
message on a mainframe somewhere (relating to a broken or missing
cardpunch), and made it into these lists due to the double entendre. :)

My favorite real command was the rewind card reader command, which was
legal on the CDC 3400.
 
J

John Bode

Arthur said:
At first I thought that was a stupid question... but googling "tangle
tape and crash" turned up 64 hits. So, your question wasn't stupid;
there are at least 64 possible answers, and we have no way of knowing
which is correct until John responds!

I googled on "Inhale Dust and Die" which led me to

http://www.cs.inf.ethz.ch/37-023/fun/UAB.pdf

I first saw that list back in '87 or '88, and I think a couple of
entries are missing (eXecute OPerator and eXecute OPerator Immediate
(20,000 volts to the console). I remembered IDD and figured I'd get
the list on the first hit. However, the first page came back with
stories of kids who inhaled dust and died as a result.

Google's a scary place sometimes.
BTW, googling "Punch Invalid" turns up a much longer list:
http://www.physics.ohio-state.edu/~bcd/humor/instruction.set.html

I think that's the list I remember. Of course, I'm old now, so what I
think I remember and what I really remember are often two different
things.
 
G

Gordon Burditt

No HCF?
At first I thought that was a stupid question... but googling "tangle
tape and crash" turned up 64 hits. So, your question wasn't stupid;
there are at least 64 possible answers, and we have no way of knowing
which is correct until John responds!

BTW, googling "Punch Invalid" turns up a much longer list:
http://www.physics.ohio-state.edu/~bcd/humor/instruction.set.html
I do wonder if "Punch Invalid" had originally been a real error
message on a mainframe somewhere (relating to a broken or missing
cardpunch), and made it into these lists due to the double entendre. :)

Real OS/360 card readers had this error called a "validity check".
They had to read 1 column (with positions for 12 punches) and
translate this into a byte (8 bits), unless the cards were being
read in "column binary" mode (12 bits per column), which was rarely
used. Describing this as "Punch Invalid" was reasonable, since it
read an invalid combination of punches.

Naturally, with 4096 possible codes and only 256 valid ones (the
maximum number of punches for a valid column was 6, as in 12-11-0-7-8-9),
a lot of combinations were invalid, particularly the infamous
"ventilator column" (all 12 positions punched out) often appearing
on the "ventilator card" (all 12 positions X 80 columns punched
out, which weakened the card so much it was likely to jam, in both
card readers and keypunches).

If the card reader stopped with a validity check, you retrieved the card
with the error, put it first in the hopper, and started reading again.
If it kept doing it, you took the offending deck out, handed it back
to the person submitting it, and told them to fix it.
 
C

Chris Dollin

John said:
I first saw that list back in '87 or '88, and I think a couple of
entries are missing (eXecute OPerator and eXecute OPerator Immediate
(20,000 volts to the console).

Similar lists existed before 1979 (I encountered one before I started
Paid Work). Possibly my favourite pair was Switch Off (boring) and
Switch On (less boring).
 
R

Richard Bos

Rod Pemberton said:
You should ignore any response Healthfield gives to your question.

This problem was solved in the very first assembly language:
http://en.wikipedia.org/wiki/Autocode

"Very first" is debatable; there were assemblers for the ENIAC. You may
or may not count these, depending on how you count.

In any case, that does not solve the OP's problem. It doesn't allow you
to use all the dirty tricks that you can use in native assembly
language, such as controlling the pipelining or addressing specific
hardware directly. OTOH, if you extend it so that you can use these
tricks, the result will ipso facto not be portable to all systems,
simply because the processor and hardware on other systems may not allow
you to pull the tricks you wanted to pull on the first system, and where
they do, may have different constraints.
It has also been solved by many other languages, the most effectively by
FORTH and C.

Ah. Yes. And there's the rub, isn't it? To the advocates of __asm, the
way it's been solved by C isn't effective enough. If it were, they'd use
C as it is, rather than asking for assembly language to be shoehorned
in. (Whether such advocates are correct in their desires is another
matter. IMO they're not.)
The above functionality of C can be represented by 16 "actions" and 20
arithmetic operations. That means that C can be written on an interpreter.
The highly portable QEMU emulator reduces host specific CPU instructions to
"micro-ops" for a virtual machine. Those "micro-ops" could be considered to
be a portable assembly. Research into the FORTH language, shows that the
_entire_ functionality of FORTH language (which is just as powerful as C)
reduces to 13 "primitives."

Amateurs.

<http://catseye.mine.nu:8080/projects/smetana/doc/smetana.html>. Two.
<http://catseye.mine.nu:8080/projects/thue/doc/thue.txt>. Three, if you
count Production, Input and Output, but unlike Smetana, known to be
Turing-complete. And if you only care about calculation and are willing
to write and read the data straight from the chip, you can reduce this
to a single primitive.

Richard
 
R

Richard Bos


No Sign Extend, either. I want my Sign Extend! Not to mention a Halt and
Terminate Operator instruction, although there are several which could
be used for roughly the same purpose.
Where'd you get these anyway?

The back of a C-Flat manual, I'd guess.

Richard
 
R

Richard Bos

jacob navia said:
(e-mail address removed) a écrit :
Since Mac OS, Windows, Linux and Solaris all run the gcc compiler
and since that compiler has an assembler, using that assembler
makes your code portable to any of those architectures.

Oh, right? So calling interrupt 20h will do the same thing under MS-DOS
that it does under MS-Windows? Int 13 can be used to read the disk
allocation tables under Linux? Or are you perhaps limiting yourself to a
severely restricted sub-set of that assembly code?
(Note also that you contradict yourself, since MacOS, MS Windows, and
Solaris may all sometimes use gcc, they typically run on different
processors; so according to your previous post, quoted above, assembly
would not be portable across those three, while here you claim that it
should be.)

Richard
 
F

fermineutron

And for aplications that run on machines connected to internet we can
have additional instructionl like

OP - Order Pizza
PSG - Port Scan Google

I also want an SFP instruction, so the the computer could Show Free
Porn. With todays grafix that should be possible.
 
R

Rod Pemberton

Richard Heathfield said:
Rod Pemberton said:



You know that, and I know that, but the Google archives disagree with us
both.


I stand by the accuracy and correctness of my response.

You weren't qualified to respond, but you did so anyway, provided a
ludricrously false response, and expected everyone here to accept it as
truth? As an example of falseness of your response, I provided facts,
statistics, and proof, which contradicted your claims. Where's yours (just
like Bill Reid, do you have none)? I even forgot to mention a couple of the
best cases against your argument. I guess you can stand anywhere you want.
But, your attempt to claim that there is any accuracy or correctness of your
response is a blatant lie.


Rod Pemberton
 
R

Rod Pemberton

Keith Thompson said:
Rod Pemberton said:
Keith Thompson said:
Theoretically it seems possible to develop a subset of assembly
languages which are used by motern CPUs and include in a C standard a C
library which would allow user to use the assembly subset. Since it is
a limited subset and its sintax is goverened by C it should not present
portability challenges with possible exception of older systems. any
thoughts about this?

You should ignore any response Healthfield gives to your question.
[snip]

That's really bad advice.

How so? It appears that your response is incorrectly biased...

Here's what Richard wrote:

| Would that it were true - but it isn't. There is no such thing as
"assembly
Is that what you're referring to?

No. You're off by one level. Please, try to follow. I told "fermineutron"
to ignore any response to the question above that he posted. Heathfield's
response was posted prior to my statement. This is Heathfield's response to
the said question:

RH> Feel free to try. Don't forget to include CPUs manufactured by Cray,
Unisys,
RH> the mainframe division of IBM, Analog, Motorola... and many many more
RH> besides. Once you see how long the list is, and how many different
assembly
RH> languages with different syntaxes are out there, you'll realise why
nobody
RH> is doing this
incompetence.

No, he was quite correct. If you disagree, by all means say so;
there's no need to make it personal.

No, he wasn't and still isn't. Reread the facts I posted to the contrary.


Rod Pemberton
..
 
R

Rod Pemberton

Richard Bos said:
"Very first" is debatable; there were assemblers for the ENIAC. You may
or may not count these, depending on how you count.

In any case, that does not solve the OP's problem. It doesn't allow you
to use all the dirty tricks that you can use in native assembly
language, such as controlling the pipelining or addressing specific
hardware directly. OTOH, if you extend it so that you can use these
tricks, the result will ipso facto not be portable to all systems,
simply because the processor and hardware on other systems may not allow
you to pull the tricks you wanted to pull on the first system, and where
they do, may have different constraints.

The only issue you mentioned that isn't supported by Autocode is pipelining.
Autocode even supports segmentation or multiple address spaces.

But, you're taking the issue of a "portable assembly" language and
attempting to turn it into a "comprehensive assembly" language. The two are
different. The full feature set of a CISC, and alomost VLIW, CPU such as
the x86 isn't needed to make a portable assembly language. And, the OP was
wanting a portable assembly language based on the functionality of C. I
listed that functionality. The functionality of C can be represented by 16
"actions" and 20 arithmetic operations. Even if one assumes that I'm
completely incorrect, say by 600% thereby giving C about minimum of 200
required operations, C could still be implemented as an interpreter. Those
16/20 (or 200) operations are the portable assembly he seeks.
Ah. Yes. And there's the rub, isn't it? To the advocates of __asm, the
way it's been solved by C isn't effective enough.

For the OP, I believe it has been solved, because he wanted a C based
portable assembly. But, for the "advocates of __asm", it hasn't. C
captures the abilities of RISC, and most other early CPU's, but hasn't added
any additional complexity for CISC or VLIW CPU's, DSP's etc. Without __asm,
you need a kludge or extension to C to support segmentation, CPU registers,
etc. (see TR 18037: Embedded C, http://www.open-std.org/jtc1/sc22/wg14/).
If it were, they'd use
C as it is, rather than asking for assembly language to be shoehorned
in. (Whether such advocates are correct in their desires is another
matter. IMO they're not.)

They ask for assembly to be "shoehorned" in because extensions like TR18037
are about two decades _late_... Almost everything it adds has been in
needed and in use since the late '80's. I doubt the "original authors"
(knowing full well who they are) intended C to be a "dead" or static
language. So, now that everyone knows how to use __asm, instead of
extensions like TR18037, what is the point of extending C? (i.e., force
people to waste more time learning more C when they've already learned
assembly and it does the job?)
Amateurs.

<http://catseye.mine.nu:8080/projects/smetana/doc/smetana.html>. Two.
<http://catseye.mine.nu:8080/projects/thue/doc/thue.txt>. Three, if you
count Production, Input and Output, but unlike Smetana, known to be
Turing-complete. And if you only care about calculation and are willing
to write and read the data straight from the chip, you can reduce this
to a single primitive.

Both of those appear to be theoretical. I listed Frank Sargent's 3
instruction FORTH which is an actual working FORTH using only 3 instructions
for embedded environments. In this case, the assembly instructions and
FORTH primitives were the same.


Rod Pemberton
 
R

Rod Pemberton

Richard Bos said:
Oh, right? So calling interrupt 20h will do the same thing under MS-DOS
that it does under MS-Windows? Int 13 can be used to read the disk
allocation tables under Linux? Or are you perhaps limiting yourself to a
severely restricted sub-set of that assembly code?
(Note also that you contradict yourself, since MacOS, MS Windows, and
Solaris may all sometimes use gcc, they typically run on different
processors; so according to your previous post, quoted above, assembly
would not be portable across those three, while here you claim that it
should be.)

You're responding to the wrong stuff. You snipped the important part of his
statement. So, let's back it up:
Since Mac OS, Windows, Linux and Solaris all run the gcc compiler
and since that compiler has an assembler, using that assembler
makes your code portable to any of those architectures. That is
why lcc-win32 uses ATT syntax and NOT Intel's syntax.

He stated, albeit less than elegantly, that he uses the gcc assembler as a
back-end to lcc-win32 because the gcc assembler supports multiple targets.


Rod Pemberton
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,050
Latest member
AngelS122

Latest Threads

Top