I am finding this 'em' thing to be really confusing.
Try and find a good book on CSS. Usenet isn't really a surrogate for
reading tutorials - it mostly just confuses you more unless you have
good background information. People will just make guesses, most of the
times.
It is
supposed to be the 'font size' of the relevant font.
As a CSS concept, which is surely what you mean here (there is no such
concept in HTML), it is _defined_ to be exactly that and nothing else.
So 1em is
supposed to be the same size as the relevant font.
Well, yes, it _is_, by definition.
The size of the relevant font.
The font's height or width?
You would not go very wrong in thinking that it is the height. It
definitely isn't the width of a font, if there _is_ such a thing (most
fonts are not monospace, i.e. the widths of characters vary), and it
isn't the width of any particular character except by coincidence.
And different letters will
have different heights and widths, so which height or width does
the browser choose?
The size of the font.
Whether we say that the size of a font is the height of the font is a
matter of words. In any case, ultimately the size of a font is a design
concept: it defines the height of the space inside which glyphs are
designed, so that it contains all the descenders and ascenders, but in
fact the glyphs may slightly extend beyond the limits of that space,
especially if there are multiple diacritic marks.
Consequently, the font size is virtually always larger than the height
of any particular character. For example, the letter "A" has no
descender, and its top normally does not touch the invisible line that
delineates the font size, since there must be some room for eventual
diacritics like an accent above it. (It's possible to design a font
where characters with a diacritic mark have the base letter smaller
than otherwise, but that's not common and not very esthetic.)