What exactly is a text character in the computer?

T

thenightfly

Ok, I know all about how binary numbers translate into text characters.
My question is what exactly IS a text character? Is it a bitmap?
 
M

Martijn

My question is what exactly IS a text character? Is it a
bitmap?

This is so off-topic, you wouldn't believe it. Although I shouldn't do this
(because it may encourage others to post OT here as well):

<OT>
In the old days (and still today if you switch to CLI/DOS mode or run a
UNIX-like OS without a GUI like X) the characters were stored in memory in a
special location (setting some pointers in the BIOS memory, you could
override these) as bitmaps. The characters (or glyphs) used in GUI's like
Windows and X use more often than not a combination of curves (and sometimes
hints) to describe what they should look like.
</OT>
 
E

Eric Sosman

Martijn said:
This is so off-topic, you wouldn't believe it. Although I shouldn't do this
(because it may encourage others to post OT here as well):

<OT>
In the old days (and still today if you switch to CLI/DOS mode or run a
UNIX-like OS without a GUI like X) the characters were stored in memory in a
special location (setting some pointers in the BIOS memory, you could
override these) as bitmaps. The characters (or glyphs) used in GUI's like
Windows and X use more often than not a combination of curves (and sometimes
hints) to describe what they should look like.
</OT>

Nonsense. In the old days, characters were little metal
dies that pressed an inked ribbon against paper (various
mechanical arrangements were used). With the right sequences of
characters you could get some printers to emit sounds that had
pitch and could produce recognizable tunes, but that tended to
eat holes in the ribbon and earn you a stern talking-to from the
computer operators.
 
R

Rouben Rostamian

Nonsense. In the old days, characters were little metal
dies that pressed an inked ribbon against paper (various
mechanical arrangements were used). With the right sequences of
characters you could get some printers to emit sounds that had
pitch and could produce recognizable tunes, but that tended to
eat holes in the ribbon and earn you a stern talking-to from the
computer operators.

Nonsense. Everyone knows that in the old days, characters
were shapes chiseled in stone.
 
E

Emmanuel Delahaye

Ok, I know all about how binary numbers translate into text characters.
My question is what exactly IS a text character? Is it a bitmap?

A text file is just another binary file. What counts is the way it is
opened and read. Use fopen() with "r", so that the end of line markers
and end of file marker (if exist) will be correctly interpreted.

I recommend fgets() for a line-oriented reading

Similary, when you create a text file, be sure to open it in text mode
("w" or "a") so that the '\n' character is properly encoded on the
file.
(The actual value is compiler dependent). Not to mention that some
system (say MS-DOS) append a special character (say Ctrl-Z) to mark the
end of text files.

--
Emmanuel
The C-FAQ: http://www.eskimo.com/~scs/C-faq/faq.html
The C-library: http://www.dinkumware.com/refxc.html

..sig under repair
 
M

Malcolm

Martijn said:
This is so off-topic, you wouldn't believe it. Although I shouldn't do
this
(because it may encourage others to post OT here as well):
The question is not actually off topic. It's a bit like "what does the add
sign mean?", or "when I multiply two negative numbers the program outputs a
positive, why?".

In pre-computer days a "text character" was defined by the topology of
lines. So a reasonably geometric circles with a hole in it is an "O", a
vertical triangle with a raised lower bar is a "A", and so on.

It is extremely difficult to get computers to use this system. So instead of
using writing pads, we usually use keyboards. Each key generates a character
code - for English you only need about a hundred codes to represent each
character. Internally, the computer uses these codes, almost always ASCII in
an English-speaking environment. That's why a char is usually an 8 bit
integer.

However when it comes to output, humans don't want codes. They want to see
the glyphs. These could easily be stored as bitmaps (rasters giving the dot
pattern of the character) somewhere in the computer. Alternatively you could
hook the computer up to a teletext, in which case the metal key is carved
into the shape of the character. If the computer is being used by a blind
person, you could have a device that converts the ASCII code into a pattern
of raised bumps.

These days fonts tend to be rather sophisticated, with variable pitch, anti
aliasing, kerning, sometimes other features. So the usual answer is that a
fairly complicated program writes the characters to a raster display.

However if you want to implement printf() yourself, an easy way of doing it
is to define each character by an 8 by 8 block. This gives readable output.
 
M

Mark McIntyre

Nonsense. In the old days, characters were little metal
dies that pressed an inked ribbon against paper

Goodness me, whats with this ribbon thing? Characters were set into
great plates by highly paid typesetters, put onto a press, inked and
the paper rolled over them. This happened up until the 1980s in the
UK.
 
W

Walter Roberson

<OT>
In the old days (and still today if you switch to CLI/DOS mode or run a
UNIX-like OS without a GUI like X) the characters were stored in memory in a
special location (setting some pointers in the BIOS memory, you could
override these) as bitmaps.

Character bitmaps stored in BIOS-accessible memory is a new-fangled
innovation. Real video character bitmaps are pulled from ROM in a
character-generator circuit. Provided, that is, that you can
get a high enough yield on the ROMs -- otherwise you Do The Right Thing
and use AND and NOT chips and shift registers.
 
S

Skarmander

Malcolm said:
The question is not actually off topic. It's a bit like "what does the add
sign mean?", or "when I multiply two negative numbers the program outputs a
positive, why?".
<snip>
Riiiight. That's not far from claiming that just about any topic
involving programming is alright for c.l.c, since C is a programming
language.

It definitely has nothing to do with C specifically, since obviously,
the rest of what you posted applies to computers in general, whether
you're using C, Pascal, Ada, or a Turing machine. The question of how C
handles characters is another matter.

But if we're on that track, let's pull in the whole
character/glyph/encoding story, rather than stick to the ASCII world.
http://www.cs.tut.fi/~jkorpela/chars.html seems like a nice start.

Dealing with characters in terms of what they represent on an abstract
level tends to be more common than having to know how they're displayed,
but of course that's an important area of study as well. See
http://redsun.com/type/abriefhistoryoftype/ for a nice overview of the
history of typography, including early digital typography.

S.
 
J

Joe Wright

Emmanuel said:
A text file is just another binary file. What counts is the way it is
opened and read. Use fopen() with "r", so that the end of line markers
and end of file marker (if exist) will be correctly interpreted.

I recommend fgets() for a line-oriented reading

Similary, when you create a text file, be sure to open it in text mode
("w" or "a") so that the '\n' character is properly encoded on the file.
(The actual value is compiler dependent). Not to mention that some
system (say MS-DOS) append a special character (say Ctrl-Z) to mark the
end of text files.

Don't frighten the children. If you fopen() a text file with mode "r"
you will never see '\r' nor 0x1a (^Z). Trust me on this. :)
 
S

Simon Biber

Joe said:
Don't frighten the children. If you fopen() a text file with mode "r"
you will never see '\r' nor 0x1a (^Z). Trust me on this. :)

Unfortunately, if you try to open an MS-DOS, Windows or Mac OS text file
on a Unix-style system with fopen mode "r", you will see '\r'
characters. If people used tools like ftp to transfer files between
computers, which can be told to handle files as text or binary, then if
told to use text it will handle the format conversion properly. However,
most recent methods of file transfer just copy the files exactly, and
don't convert text formats. It is left up to the destination application
to be robust to different formats.

My point is that portable C applications can no longer afford to just
use fopen in text mode and hope that the file is in the same format that
the local C library was expecting. It makes more sense to read files as
binary, and be robust to all the competing text file formats. Not just
different line endings (CR, CRLF and LF), but different character
encodings as well. I have text files in UTF-8, UTF-16, ISO-8859-1,
GB2312 and Big5 here. Wherever possible, I expect programs to not barf
on any of those formats.
 
J

John Bode

Ok, I know all about how binary numbers translate into text characters.
My question is what exactly IS a text character? Is it a bitmap?

That depends *entirely* on the output device. A bitmap works for a
raster display, but not so well for a vector display or a teletype.

This is way beyond the scope of the C programming language, btw.
 
M

Martijn

Eric said:
Nonsense. In the old days, characters were little metal
dies that pressed an inked ribbon against paper (various
mechanical arrangements were used). With the right sequences of
characters you could get some printers to emit sounds that had
pitch and could produce recognizable tunes, but that tended to
eat holes in the ribbon and earn you a stern talking-to from the
computer operators.

Characters? In the old days?

:p
 
M

Martijn

Walter said:
Character bitmaps stored in BIOS-accessible memory is a new-fangled
innovation. Real video character bitmaps are pulled from ROM in a
character-generator circuit.

You are right - I remember specifically liking the glyphs of one video card,
while disliking those of another (I think the latter was the Diamond Fire).
Either way I was saying that setting some pointers in the mapped BIOS memory
space - I am not sure which ones anymore - allowed you to specify one or two
alternate sets, not that they were actually stored in the BIOS. Hopefully I
am using proper nomenclature here, because I am not sure that I am.
 
W

Walter Roberson

You are right - I remember specifically liking the glyphs of one video card,
while disliking those of another (I think the latter was the Diamond Fire).
Either way I was saying that setting some pointers in the mapped BIOS memory
space - I am not sure which ones anymore - allowed you to specify one or two
alternate sets, not that they were actually stored in the BIOS. Hopefully I
am using proper nomenclature here, because I am not sure that I am.

BIOS?? You must be one of them Pro-gress-ives chaps. You probably
didn't even build your own terminal !! Back in my day, we had to worry
about re-using scan lines in the ROM because 256 byte ROMs were so
expensive.
 
L

Lew Pitcher

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Walter said:
BIOS?? You must be one of them Pro-gress-ives chaps. You probably
didn't even build your own terminal !! Back in my day, we had to worry
about re-using scan lines in the ROM because 256 byte ROMs were so
expensive.

Well, ain't you fancy!!

I still have a printer that you can change the print head on to get different
characters for the same codepoint. The one I have downstairs is a
"daisy-wheel" printer that used a print head shaped like a 100-petal daisy.
Each petal has a different glyph embossed in it's surface, and the printer
rotates the print head to position each glyph behind a hammer.

My old Teletype ASR33 used a cylinder which had 60 or so glyphs embossed on
it. The hardware rotated the cylinder so that the proper glyph was positioned
behind the print hammer.

In both cases, the hammer whacked the print head, which drove the embossed
glyph into the print ribbon, and smacked the ribbon against the paper, leaving
an imprint of the glyph in ribbon ink on the paper. No fancy ROMs or BIOS for
those babies. Just good old typewriter and printing press technology.

:)


- --
Lew Pitcher

Master Codewright & JOAT-in-training | GPG public key available on request
Registered Linux User #112576 (http://counter.li.org/)
Slackware - Because I know what I'm doing.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.7 (GNU/Linux)

iD8DBQFDRHcAagVFX4UWr64RAprpAKCfIqxj1jRFt2CV99IV4W6Df0PatACglSEj
vWHiYCccKxvErfyEdqJnq0Y=
=3dqM
-----END PGP SIGNATURE-----
 
M

Mabden

Rouben Rostamian said:
Nonsense. Everyone knows that in the old days, characters
were shapes chiseled in stone.

Chisels? You were lucky.
In the old days, we had to lick shapes into the stones with our tongues.
Whole villages were forced into slavery for 5 years just to make one
"Keep Left" sign.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,048
Latest member
verona

Latest Threads

Top