Typography of programs

jacob navia · Jun 29, 2011

I am learning the Macintosh environment with the idea of porting the IDE
of lcc-win to the mac.

What is nice in the mac is how easily you can set up a text display
and integrate images and other data into the text. It supports RTF text
(as windows too, by the way) and I was again wondering at implementing
an old idea that I wanted to implement already into wedit some years ago.

Actually, when C was conceived there weren't any bit mapped graphics
or any such hardware that we now consider common place.

The typography of C remained the same, as all other computer languages,
obsessed with the character sets from 32 to 127 ASCII codes. Then, we
write != instead of the inequality sign, = instead of the assignment
arrow, "&&" instead of /\, "||" instead of \/.

Programs would be clearer if we would use today's hardware to show
the usual signs, instead of constructs adapted to the teletype
typewriter of the seventies.

Unicode now offers all possible signs for displaying in our programs,
and it would be a progress if C would standardize some codes to be
used isnstead of the usual != and &&, etc.

We have in iso646.h
#define and &&
#define and_eq &=
#define bitand &
#define bitor |
#define compl ~

We could have in some isoXXX.h
#define â‰ !=
#define â‹€ and
#define â‹ or
#define â‰¤ <=
#define â‰¥ >=

etc.

Using â† for assignment would avoid the common beginner's error of using
= instead of == and programs would look less horrible.

All this would be done first in output only to avoid requiring a new C
keyboard even though that can be done later. You would still type !=
but you would obtain â‰ in output in the same way that you type first the
accent, then the later under the accent and you obtain one character.

The standardization committee would be crucial in making this change
smooth but... I fear the won't be so enthusiastic...

Just some thoughts

jacob

Ian Collins · Jun 30, 2011

I am learning the Macintosh environment with the idea of porting the IDE
of lcc-win to the mac.

What is nice in the mac is how easily you can set up a text display
and integrate images and other data into the text. It supports RTF text
(as windows too, by the way) and I was again wondering at implementing
an old idea that I wanted to implement already into wedit some years ago.

Actually, when C was conceived there weren't any bit mapped graphics
or any such hardware that we now consider common place.

The typography of C remained the same, as all other computer languages,
obsessed with the character sets from 32 to 127 ASCII codes. Then, we
write != instead of the inequality sign, = instead of the assignment
arrow, "&&" instead of /\, "||" instead of \/.

Programs would be clearer if we would use today's hardware to show
the usual signs, instead of constructs adapted to the teletype
typewriter of the seventies.

Unicode now offers all possible signs for displaying in our programs,
and it would be a progress if C would standardize some codes to be
used isnstead of the usual != and&&, etc.

We have in iso646.h
#define and&&
#define and_eq&=
#define bitand&
#define bitor |
#define compl ~

We could have in some isoXXX.h
#define â‰ !=
#define â‹€ and
#define â‹ or
#define â‰¤<=
#define â‰¥>=

Another way to generate non-portable code! Not much fun for those of us
who use command line text processing tools in the C locale!

jacob navia · Jun 30, 2011

Le 30/06/11 02:57, Robert Wessel a Ã©crit :

You'd have a hard time exchanging programs with
non-extended-character-set implementations (not limiting things to
Unicode implementations, on the assumption that some extended, but
non-Unicode implementations would be plausible).

Hard time?

It would mean just #include isoXXX.h. That would transform those codes
into Ascii again. A pass through the preprocessor and that would be
done!

I'm afraid you'd end up with more di/trigraphs. The horror of that
will probably send anyone running.

We have NOW digraphs!!! What else is != than a digraph for â‰ ?????
Those digraphs would stay of course.

FWIW, Java allows the use of Unicode (in fact it's specified), and you
can use Unicode characters in names (so you can assign the value 3.14
to a variable named U+03C0). But they didn't use any of the extended
characters for operators.

Does Java have Big problems because of Unicode? I would bet not!

Nor is this a particularly new concept - APL (circa 1964 for the first
implementations, although the initial papers, with most of the
symbology, date to about 1961) required/could use a extended character
set. IBM (and others) manufactured terminals and printers with the
APL character set available for output and on the keyboard. IBM even
had Selectric typewriter balls with the APL set.

It was my first programming language. It is great still today.

jacob navia · Jun 30, 2011

Le 30/06/11 06:43, China Blue Dolls a écrit :

I've been doing that for a few years now. I have a macro processor that can
parse RTF and uses Tcl for its metalanguage. The macro names are C style
identifiers, but it also can have string mappings like
<= ï½²<=> <= ï¾‚ !>

It's set up to ignore Palatino font and only include Helvetica. That means I can
interspersed comments, images, and movies in the text.

I think that would be great. If iso would standardize the characters
used and the operators needed then we could write comments in another
font, etc etc.

jacob navia · Jun 30, 2011

Le 30/06/11 06:59, Ian Collins a Ã©crit :

Another way to generate non-portable code!

If isoXXX.h exists then it is 100% portable. You would just need to
#include that file or pass the code through a small preprocessor
and that's all.

Not much fun for those of us
who use command line text processing tools in the C locale!

It is not very difficult to adapt those tools to accept unicode
anyway... We could use a small subset of Unicode

Ian Collins · Jun 30, 2011

Le 30/06/11 06:59, Ian Collins a Ã©crit :

If isoXXX.h exists then it is 100% portable. You would just need to
#include that file or pass the code through a small preprocessor
and that's all.

It is not very difficult to adapt those tools to accept unicode
anyway... We could use a small subset of Unicode

Well it is difficult if the tool ere built into the OS!

Ian Collins · Jun 30, 2011

Your C compiler is built into the OS?

Read my first post!

jacob navia · Jun 30, 2011

Le 30/06/11 08:08, Robert Wessel a écrit :

Nor is a simple character-by-character translation really plausible
(something you'd run before transferring the code). It would need to
tokenize at least as well as the C program.

????
Can you elaborate?

Why a simple preprocessor that transforms a small set of
Unicode characters into Ascii equivalents would not work?

Again: What I propose is that a small set of Unicode characters,
fixed by ISO is accepted instead of the digraphs in use today.

This means that !=, &&, ||, and other digraphs are translated into ONE
character, that's all.

It is the way *all* implementations work - so by definition it
wouldn't. My point was that they still avoided the extended
characters in predefined operators, while allowing programmers to use
them.

Well, if ISO would do this, all C implementations would use the same
characters.

jacob navia · Jun 30, 2011

Le 30/06/11 08:56, Robert Wessel a écrit :

What happens when you translate:

(a) if (a \U+2260 b)
(b) strcpy(s, "a\U+2260b")

from Unicode to ASCII? With or with or without isoXXX.h defined? And
then back to Unicode?

You have completely misunderstood. You would never see
\U+2260
That is 7 chars.

You would see ONE Unicode character in the input!

Only THOSE characters would be translated into digraphs.

Conceptually, it is exactly like digraphs and trigraphs
revisited but in the OTHER direction: starting with today's digraphs
!= etc you would get SINGLE characters.

And what about<<= in ASCII? You had better not translate that to
U+003C, U+2264 in Unicode.

Yes but that would never happen, see above.

Except for the C implementations not using Unicode.

Just translating C source from ASCII to EBCDIC is more pain than most
of us want to go through already. This would be worse.

Why?

In environments where this is not desired, we could keep the digraphs.
A very long transition period of 25 years could be used to slowly
move to Unicode. You can still program in digraphs and trigraphs TODAY
if you wish so. This would be the same.

Ike Naar · Jun 30, 2011

We could have in some isoXXX.h
#define ? !=
#define ? and
#define ? or
#define ? <=
#define ? >=

All the characters that you propose show up as question marks
in my newsreader ;-<

Joachim Schmitz · Jun 30, 2011

Ike said:
All the characters that you propose show up as question marks
in my newsreader ;-<

Of course they do, using
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Jacob used
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit

And all but the ones for "and" and "or" show up fine in mine...

Angel · Jun 30, 2011

Le 30/06/11 06:59, Ian Collins a ??crit :

It is not very difficult to adapt those tools to accept unicode
anyway... We could use a small subset of Unicode

We already do. It's called ASCII.

jacob navia · Jun 30, 2011

Le 30/06/11 09:25, Robert Wessel a écrit :

Since this is Usenet, and I can't type Unicode here, I used a common
representation of the Unicode character assuming that it would be
totally obvious to anyone what I had done to represent the single
character. *Sheesh*

But then why you would insert a SINGLE Unicode character into
a character string? You could insert \Uwhatever and avoid
any problems.

This is no different than the situation now where
strcpy(a,"??/");

will get translated into something completely different but in the
OTHER direction.

BTW, I meant digraphs in the C sense, not the meaning you made up.

The meaning I "made up" is THE SAME as the "C" sense: a di-graph is a
combination of two characters that represent a single character!

Because the character sets are not consistent, and depending on which
EBCDIC code pages you're using you end up having to use trigraphs or
digraphs. And the collating sequence issues do bite on an awful lot
of real code.

Then, do not use this feature until IBM gets its act together. If you
use a crippled environment this is NOT a reason that everyone else
is affected. You would go on using your programs as before. When
you import foreign programs you pass them through the translator and
that's all.

The transition period *is* the problem. During that time if you have
code that uses the Unicode characters, you need to be able to move
that to/from ASCII implementations. There would appear to be
translation issues.

Maybe, but is nothing that can't be done with a general purpose CPU
excuse me. IBM produces computers powerful enough to be able to do that

P.S. Note that the above trigraph is translated correctly into a single
char

`

jacob navia · Jun 30, 2011

Le 30/06/11 09:48, Ike Naar a écrit :

All the characters that you propose show up as question marks
in my newsreader ;-<

Hi Ike

Your news reader slrn supports utf8, and apparently there are
commands to set it to use utf8. You were using:

Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

This can't work, as joachim schmitz pointed out.

slrn is a text based news reader used mainly in bsd.
I do not know if it is still being developed (the last version is
Version 0.9.9p1, October 2008, i.e. almost 3 years ago) but any
windowed news reader should do the trick

jacob

Ian Collins · Jun 30, 2011

That argued that if a few people have to work on a crippled system, everyone
must as well.

Read it again, the local is the choice of the user. I have no reason to
use a local other than "C". Why should a programming language dictate
mu locale?

jacob navia · Jun 30, 2011

Le 30/06/11 11:11, Ian Collins a écrit :

Read it again, the local is the choice of the user. I have no reason to
use a local other than "C". Why should a programming language dictate mu
locale?

Of course not. You can go on using digraphs (!=, ==, etc) and nothing
will change. It is only if you want to import a foreign program that
uses those characters, THEN you would need a simple translator to
convert them into digraphs.

jacob

Chris H · Jun 30, 2011

In message <[email protected]

september.org> said:
Does this mean that if one person's newsreader cannot cope with UTF-8 that no
one else should be allowed to use it?

No not for that reason.

Simply for the reason that Usenet is an ASCII Text system.
HTML, XML, Unicode, attachments etc should not be used.

Angel · Jun 30, 2011

Le 30/06/11 11:11, Ian Collins a ?crit :

Of course not. You can go on using digraphs (!=, ==, etc) and nothing
will change. It is only if you want to import a foreign program that
uses those characters, THEN you would need a simple translator to
convert them into digraphs.

Portability is about making a program work on different platforms with a
minimum of fuss, i.e. without making other programmers jump through
hoops just to read your source.

Seebs · Jun 30, 2011

Does this mean that if one person's newsreader cannot cope with UTF-8 that
no one else should be allowed to use it?

No, but it does mean that people who use it should be aware that many
things won't handle it, and that this limits the scope of their publications.

This isn't rocket science, and isn't anything new to C programmers. Some
decisions restrict portability. How much, and whether you care, varies.

-s

jacob navia · Jun 30, 2011

Le 30/06/11 14:19, Angel a écrit :

Portability is about making a program work on different platforms with a
minimum of fuss, i.e. without making other programmers jump through
hoops just to read your source.

Then do you use ??/ instead of { ??????

Because "??/" is more portable than "{" since some 3270 terminals
don't support "{" in some models.

WHY must be the huge majority of users that develop on workstations be
constrained to develop as if we all use an old teletype ?

And please do not come with embedded systems development since
development in all those systems is done with a cross compiler
using a PC or a mac as development environment.

And "jump through hoops" doesn't cut it either since
it would be VERY difficult to find a development environment
where there isn't any workstation (PC/Mac/Linux) and all
terminals are old 3270s.

Procedural Programs of the Past	3	May 9, 2023
The Horror of pointers...	5	Jan 11, 2025
The end of maintenance	1	Feb 5, 2025
C Programs	16	Sep 7, 2007
Two programs with same logic	11	Dec 20, 2009
Measuring a string of text	1	Sep 15, 2022
Physics coding -> getting the trajectory of an object with a force applied to it.	2	Oct 21, 2024
Best Way to Create a PDF Archive of Google Workspace Data?	1	Jun 23, 2026

Typography of programs

jacob navia

Ian Collins

jacob navia

jacob navia

jacob navia

Ian Collins

Ian Collins

jacob navia

jacob navia

Ike Naar

Joachim Schmitz

Angel

jacob navia

jacob navia

Ian Collins

jacob navia

Chris H

Angel

Seebs

jacob navia

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads