Typography of programs

Stephen Sprunk · Jul 4, 2011

We have not yet reached the point where it is reasonably safe to assume that
a web page which attempts to encode "smart quotes" will display correctly
on the top five or so popular browsers with their default settings.

Can you point out a web page which isn't displayed correctly in modern
browsers and has characters encoded consistently with its charset
declaration?

The problems I've noticed to date have _all_ been due to either
inconsistent encoding (eg. UTF-8 in a page declared as windows-1252 or
vice versa) or the complete lack of a charset declaration. In such
cases, GIGO applies; that is not the fault of the browser.

S

Dr Nick · Jul 4, 2011

Stephen Sprunk said:
Can you point out a web page which isn't displayed correctly in modern
browsers and has characters encoded consistently with its charset
declaration?

Although it might work for smart-quotes, there are plenty of pages that
try to display the far reaches of Unicode and fail, because not all -
sometimes barely any - fonts include the characters.

Here's an example. I haven't actually checked that the page doesn't
make any mistakes but I'd be surprised.

http://www.alanwood.net/unicode/egyptian-hieroglyphs.html

None of them display in my default set up.

Angel · Jul 4, 2011

On 03-Jul-11 23:42, Seebs wrote:

The problems I've noticed to date have _all_ been due to either
inconsistent encoding (eg. UTF-8 in a page declared as windows-1252 or
vice versa) or the complete lack of a charset declaration. In such
cases, GIGO applies; that is not the fault of the browser.

While I don't have an example at hand, I've seen many a site display
characters differently depending on whether it is viewed in IE or
Firefox, even on the same machine.

Same thing with M$ Word and OpenOffice, character sets differ even when
viewing the same document. Even in simple text editors that should be
utf8-capable things sometimes go horribly wrong.

I'm all for using an appropriate character set for documents, but I feel
that a program's source should not depend on such fickle things as
character sets and locales.

However, those that want to see program source displayed in utf8 can
write themselves an editor that keeps the usual && and == in the source
but translates them on screen as whatever characters they like. That should
keep both sides happy, and doesn't require a fundamental change in the
language that will likely cause far more problems than it solves.

Kleuskes & Moos · Jul 4, 2011

ASCII defines CR and LF; \r and \n are C-specific notations (which have
also been picked up by a number of other languages).

You're right, i used c-notation in a newsgroup dedicated to c and
assumed everybody would get the point. I should not have made that
assumption.

James Kuyper · Jul 4, 2011

On 07/04/2011 02:03 AM, China Blue Dolls wrote:
....

English needs to mark stress and few extra letters like edh and thorn.

If it's actually "needed", then why is the overwhelming majority of
English text completely free of both stress marks and those extra
letters? Whatever purpose that it's "needed" to achieve is apparently
not being achieved - is it an important one?

Stephen Sprunk · Jul 4, 2011

English needs to mark stress

No, it does not. Words spelled the same but distinguished by stress are
differentiated by context, though sometimes that fails humorously.

and few extra letters like edh and thorn.

Those are distinct letters, not diacritical marks.

S

Stephen Sprunk · Jul 4, 2011

While I don't have an example at hand, I've seen many a site display
characters differently depending on whether it is viewed in IE or
Firefox, even on the same machine.

That's why I asked for examples. I've seen that, and similar problems,
numerous times but in _every_ case I've examined to date, the issue was
that the page lacked charset declaration; in such cases, browsers use
heuristics to determine the charset, but each browser is different, so
results vary.

Same thing with M$ Word and OpenOffice, character sets differ even when
viewing the same document.

I've never had that problem. Again, can you provide an example?

Even in simple text editors that should be utf8-capable things
sometimes go horribly wrong.

I haven't had a problem with Unicode in text editors since MS Notepad
started inserting a BOM at the start of files. Prior to that, I had
numerous instances where it would incorrectly load or save files as
UTF-16 (or maybe it was UCS-2 at the time) despite being explicitly told
to use UTF-8.

I'm all for using an appropriate character set for documents, but I feel
that a program's source should not depend on such fickle things as
character sets and locales.

Even C has the concept of a character set for sources, presumably to
allow writing programs in ASCII, EBCDIC, etc. I've never tried, but I'm
pretty sure I wouldn't be able to read/edit/compile programs written in
EBCDIC without significant extra work--probably more than what would be
required to handle M. Navia's proposed change.

S

Stephen Sprunk · Jul 4, 2011

Although it might work for smart-quotes, there are plenty of pages that
try to display the far reaches of Unicode and fail, because not all -
sometimes barely any - fonts include the characters.

Still, if the browser is properly decoding a properly-formatted document
and the OS is simply unable to display it because it doesn't have
adequate fonts, that's an entirely different matter than what was being
discussed. I'm not aware of any modern system that doesn't have "smart
quotes" or the various operator glyphs M. Navia proposed. Certainly,
they're not a problem for Windows, Mac or Linux.

Here's an example. I haven't actually checked that the page doesn't
make any mistakes but I'd be surprised.

I don't see any glaring mistakes, but they used character entities (eg.
"") rather than directly encoding the characters. That
sidesteps the most common errors, at the expense of a few extra bytes.

http://www.alanwood.net/unicode/egyptian-hieroglyphs.html

None of them display in my default set up.

Neither in mine, but it took less than a minute to download and install
a font with the appropriate glyphs, and now they display fine.

S

Keith Thompson · Jul 4, 2011

Kleuskes & Moos said:
You're right, i used c-notation in a newsgroup dedicated to c and
assumed everybody would get the point. I should not have made that
assumption.

The C notation \n does not *necessarily* refer to the ASCII LF
character. It's not merely a different notation for the same thing.

Seebs · Jul 5, 2011

Can you point out a web page which isn't displayed correctly in modern
browsers and has characters encoded consistently with its charset
declaration?

Probably not.

The problems I've noticed to date have _all_ been due to either
inconsistent encoding (eg. UTF-8 in a page declared as windows-1252 or
vice versa) or the complete lack of a charset declaration. In such
cases, GIGO applies; that is not the fault of the browser.

Doubtless.

But the fact remains, if you use the normal tools people have lying around
to create pages, there's a pretty real risk of ending up with a page in
which apostrophes display only on Macs or only on Windows or something
similar.

-s

Seebs · Jul 5, 2011

All you're doing is complaining about the inadequacies of your software.

Well, yeah.

The point is, there's a whole lot of software which doesn't have completely
reliable support for unicode, so relying on it seems a bad idea.

-s

Seebs · Jul 5, 2011

Still, if the browser is properly decoding a properly-formatted document
and the OS is simply unable to display it because it doesn't have
adequate fonts, that's an entirely different matter than what was being
discussed. I'm not aware of any modern system that doesn't have "smart
quotes" or the various operator glyphs M. Navia proposed. Certainly,
they're not a problem for Windows, Mac or Linux.

Perhaps. Doesn't change the fact that, in general, there's a whole lot of
software that people may be using for whatever reasons which doesn't display
these glyphs, and I don't see a huge upside benefit here.

-s

jacob navia · Jul 5, 2011

Le 05/07/11 22:21, Seebs a Ã©crit :

Probably not.

Doubtless.

But the fact remains, if you use the normal tools people have lying around
to create pages, there's a pretty real risk of ending up with a page in
which apostrophes display only on Macs or only on Windows or something
similar.

-s

That will be ALWAYS the case. There are still some terminals that do
not have the "{" character like the IBM 3270.

Happily for us some people in the eighties decided that we couldn't
afford to remain compatible with those terminals and introduced ASCII
terminals.

My proposal is the same. If there is some older hardware/software
combination that can't display â‰ (Alt"+" in my Mac keyboard)
they can always translate â‰ to "!="

Or you can run a simple filter and be done with it.

But there will be always some naysayers that will be aginst ANY
improvements however minimal. This is the problem with this
group.

Willem · Jul 5, 2011

jacob navia wrote:
) That will be ALWAYS the case. There are still some terminals that do
) not have the "{" character like the IBM 3270.
)
) Happily for us some people in the eighties decided that we couldn't
) afford to remain compatible with those terminals and introduced ASCII
) terminals.
)
) My proposal is the same. If there is some older hardware/software
) combination that can't display ??? (Alt"+" in my Mac keyboard)
) they can always translate ??? to "!="
)
) Or you can run a simple filter and be done with it.

Or you do it the other way around: Have the IDE display the appropriate
glyphs for the different operators. That way you have the same benefits.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Squeamizh · Jul 6, 2011

Le 05/07/11 22:21, Seebs a Ã©crit :

That will be ALWAYS the case. There are still some terminals that do
not have the "{" character like the IBM 3270.

Happily for us some people in the eighties decided that we couldn't
afford to remain compatible with those terminals and introduced ASCII
terminals.

My proposal is the same. If there is some older hardware/software
combination that can't display â‰ (Alt"+" in my Mac keyboard)
they can always translate â‰ to "!="

Or you can run a simple filter and be done with it.

You have yet to explain why this is better than simply using an editor
which displays the extended characters without modifying the source.

Also, I would expect this to be a common feature among IDEs if there
were demand for it. Are there any IDEs which do this?

But there will be always some naysayers that will be aginst ANY
improvements however minimal. This is the problem with this
group.

I like some of your ideas, just not this one.

Dr Nick · Jul 6, 2011

jacob navia said:
That will be ALWAYS the case. There are still some terminals that do
not have the "{" character like the IBM 3270.

Happily for us some people in the eighties decided that we couldn't
afford to remain compatible with those terminals and introduced ASCII
terminals.

My proposal is the same. If there is some older hardware/software
combination that can't display â‰ (Alt"+" in my Mac keyboard)
they can always translate â‰ to "!="

Or you can run a simple filter and be done with it.

But there will be always some naysayers that will be aginst ANY
improvements however minimal. This is the problem with this
group.

Can you tell us why Java didn't do this? It was designed with Unicode
in mind, and no nasty history and legacy like C, yet it chose not to do
it. Until we know, I'm going to be sceptical in principle, as I can't
believe that the designers didn't think of it.

jacob navia · Jul 6, 2011

You have yet to explain why this is better than simply using an editor
which displays the extended characters without modifying the source.

But that is exactly what I am proposing. But if we are going to keep
those programs independent of ONE EDITOR, a standard should be set up
so that all editors know what to expect

In a first post I thought that rtf could be that standard, but it
has some drawbacks, and I think now that HTML could be a better
solution

Also, I would expect this to be a common feature among IDEs if there
were demand for it. Are there any IDEs which do this?

No, because if they would do that, the program would be tied
to a single IDE forever. If we had a standard decision THEN
IDEs could do it.

BartC · Jul 6, 2011

The typography of C remained the same, as all other computer languages,
obsessed with the character sets from 32 to 127 ASCII codes. Then, we
write != instead of the inequality sign, = instead of the assignment
arrow, "&&" instead of /\, "||" instead of \/.

That latter I agree with, those V's and up-side-down V's always confused
me...

Programs would be clearer if we would use today's hardware to show
the usual signs, instead of constructs adapted to the teletype
typewriter of the seventies.

Unicode now offers all possible signs for displaying in our programs,
and it would be a progress if C would standardize some codes to be
used isnstead of the usual != and &&, etc.

We could have in some isoXXX.h
#define â‰ !=
#define â‹€ and
#define â‹ or
#define â‰¤ <=
#define â‰¥ >=

Using â† for assignment would avoid the common beginner's error of using =
instead of == and programs would look less horrible.

This sounds great but I don't think it's the way to go at present. And where
would you stop with typography? Would you also use different fonts, styles
and colours to distinguish keywords, user identifiers, and library
functions, for example?

Or have comments in italics so that you don't need that ugly /* ... */
bracketing?

A proper square root symbol? Superscripts for raising to the power of?
Subscripts for array indexing? The proper signs for multiplication and
division (and even arranging a/b vertically)? Etc.

Besides my keyboard doesn't have any of those symbols.

All this would be done first in output only to avoid requiring a new C
keyboard even though that can be done later. You would still type !=
but you would obtain â‰ in output in the same way that you type first the
accent, then the later under the accent and you obtain one character.

I think this is the only place for this sort of enhancement: restricted to
the displayed output of an editor or DTP program (and you have the problem
of modern word processors where you'd much rather work on the underlying
'markup' info, in an easy-to-change textual form, than having to directly
edit the final layout. Who wants to use Word 2011 to edit C?

(Having said all that, I admit I worked on a lexical analyzer recently that
does recognise some of those symbols! At least those with codes in the range
128 to 255 of either ANSI or Unicode (I forget which). Such as superscript 2
for the 'square' operator (aÂ²). These look cool but I suspect they'll never
be used and later will be dropped...)

BartC · Jul 6, 2011

China Blue Dolls said:
I use roman fonts for comments and gothic for code. I bold case reserved
words
and italicise variables. I can't use background colours, but I sometimes
use
foreground colours for different types of code.

I don't use superscripts because they aren't important in my code. I do
use
subscripts. It helps reduce the visual clutter. I don't use horizontal
division
lines because RTFs don't really support that. I would have to use a table
and
that's more trouble than it's worth.

My keyboard has a bunch of symbols with the Option key. I can also bring
up a
window with all unicode characters to insert them into the text.

I use RTFs so that I get the benefit of easier to read code.

Click to expand...

Within the environment of an IDE, or a private development setup, then all
this and more is possible.

But it doesn't need changes to the language spec as is being proposed; the
end-result of the editing (or a conversion process as you mention) is
standard C-compatible ASCII.

Willem · Jul 6, 2011

jacob navia wrote:
) Le 06/07/11 05:03, Squeamizh a ??crit :
)>
)> You have yet to explain why this is better than simply using an editor
)> which displays the extended characters without modifying the source.
)>
)
) But that is exactly what I am proposing. But if we are going to keep
) those programs independent of ONE EDITOR, a standard should be set up
) so that all editors know what to expect

You misunderstood. What he proposed was to have the source code contain the
old notation, != -> || et cetera, but have the editor display them in
modern glyphs for unequality, arrow, et cetera.

)> Also, I would expect this to be a common feature among IDEs if there
)> were demand for it. Are there any IDEs which do this?
)
) No, because if they would do that, the program would be tied
) to a single IDE forever. If we had a standard decision THEN
) IDEs could do it.

No.
The program would be in ascii. The ide would translate for display only.
See above.

SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT

Procedural Programs of the Past	3	May 9, 2023
C Programs	16	Sep 7, 2007
Two programs with same logic	11	Dec 20, 2009
Measuring a string of text	1	Sep 15, 2022
GUI programs	4	May 30, 2010
I'm tempted to quit out of frustration	1	Aug 13, 2023
Contents of python magazine	0	Jun 7, 2013
C Foundations at Omniversity of Manchester	0	Mar 7, 2012

Typography of programs

Stephen Sprunk

Dr Nick

Angel

Kleuskes & Moos

James Kuyper

Stephen Sprunk

Stephen Sprunk

Stephen Sprunk

Keith Thompson

Seebs

Seebs

Seebs

jacob navia

Willem

Squeamizh

Dr Nick

jacob navia

BartC

BartC

Willem

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads