ligatures in Java 2D

D

Des Small

I've been trying to work out how to get Java to handle ligatures such
as "fi" correctly, but without much success. Various online documents
have suggested that I need to be intervening in the rendering process
and laying out my own GlyphVectors based on information associated
with Fonts, but I can't find out the details I need to actually do
that systematically.

I have discovered (<http://mindprod.com/jgloss/ligature.html>) that
'fi' has its own code point (\ufb01) in Unicode - should I be manually
checking whether my Font has specific glyphs for such ligatured
characters and manually substituting them into my char[] to feed to
layoutGlyphVector? Do Fonts come with lists of ligatures they
support, and if so how do I get at them?

And most of all, what is TFM that I am currently foolishly neglecting
to R?

Des
 
R

Roedy Green

I have discovered (<http://mindprod.com/jgloss/ligature.html>) that
'fi' has its own code point (\ufb01) in Unicode - should I be manually
checking whether my Font has specific glyphs for such ligatured
characters and manually substituting them into my char[] to feed to
layoutGlyphVector? Do Fonts come with lists of ligatures they
support, and if so how do I get at them?

It is up to you to use them if they exist. You have to check manually
because the automatic method to check if a char renders LIES. see
http://mindprod.com/jgloss/font.html
 
T

Thomas Weidenfeller

Des said:
I've been trying to work out how to get Java to handle ligatures such
as "fi" correctly, but without much success.

Correctly? Well, ligatures are evil :) They are different in different
languages (the sets used are different), and they are used for a number
of purposes (e.g. to get a more pleasing typographic view, but also as
semantic elements of languages).

Let's just assume you want to do it for aesthetic reasons. But that's
still difficult to automate, because it depends on language rules.
Various online documents
have suggested that I need to be intervening in the rendering process

I guess this assumes that you intend to emulate ligatures by using the
corresponding normal glyphs and move them closer together. This only
works in some fonts, but looks extremely ugly in others.

You can simply move glyphs closer together by placing one glype at a
time, e.g. via Graphics.drawString() with single-character strings. This
is inefficient.
and laying out my own GlyphVectors based on information associated
with Fonts, but I can't find out the details I need to actually do
that systematically.

A loop, a font-specific table to look up how close you want to move two
glyphs, some variables to keep track how much you must move the
remaining parts of a GlypheVector once you move one glyphe, and
GlyphVector.setGlyphPosition().
I have discovered (<http://mindprod.com/jgloss/ligature.html>) that
'fi' has its own code point (\ufb01) in Unicode - should I be manually
checking whether my Font has specific glyphs for such ligatured
characters and manually substituting them into my char[] to feed to
layoutGlyphVector?

If your font provides glyphs for ligatures it is of course better to use
these, then emulating ligatures with single glyphs. And that indeed
means you have to get the corresponding unicode into your char[].

However, the problem is how to automate this. E.g. in my mother tongue
the rules for using ligatures are partly related to hyphenation rules.
Any automatic replacement of character sequences with corresponding
ligatures would have to take these into account to get things right.
Do Fonts come with lists of ligatures they
support, and if so how do I get at them?

I am not aware that typical font file formats contain separate lists for
ligatures. However, you don't need this. Characters in Java are
Unicode-encode. All you need to know are the unicode code points for the
ligatures you want to support. Check the tables at www.unicode.org,
there aren't many. Then you need to check if a particular font contains
matching glyphs. Font.canDisplay() and Font.canDiaplayUpTo() are your
friends.
And most of all, what is TFM that I am currently foolishly neglecting
to R?

/Thomas
 
D

Des Small

Thomas Weidenfeller said:
Correctly? Well, ligatures are evil :) They are different in
different languages (the sets used are different), and they are used
for a number of purposes (e.g. to get a more pleasing typographic
view, but also as semantic elements of languages).

Let's just assume you want to do it for aesthetic reasons. But that's
still difficult to automate, because it depends on language rules.

That's OK. I mean, it's fiddly, but it's OK.
I guess this assumes that you intend to emulate ligatures by using the
corresponding normal glyphs and move them closer together. This only
works in some fonts, but looks extremely ugly in others.

No; I'm specifically thinking of
<http://java.sun.com/j2se/1.3/docs/guide/2d/spec/j2d-fonts.fm5.html>
which confidently announces that "In Figure 4-14, the custom layout
algorithm replaces the fi substring with the ligature _fi_". Without
giving any hints as to how such a custom layout algorithm could be
implemented.

[...]
I have discovered (<http://mindprod.com/jgloss/ligature.html>) that
'fi' has its own code point (\ufb01) in Unicode - should I be manually
checking whether my Font has specific glyphs for such ligatured
characters and manually substituting them into my char[] to feed to
layoutGlyphVector?

If your font provides glyphs for ligatures it is of course better to
use these, then emulating ligatures with single glyphs. And that
indeed means you have to get the corresponding unicode into your
char[].

OK. It seems slightly odd, though, that cyrrent Java handles the
complexities of Arabic and Hindi writing but neglects to provide any
hook into this for Latin. (The documentation intermittently boasts
that the fancy i18n stuff also allows conventient high end typography
of Latin scripts, but I find, as you are suggesting, this not to be
the case.)
However, the problem is how to automate this. E.g. in my mother tongue
the rules for using ligatures are partly related to hyphenation
rules. Any automatic replacement of character sequences with
corresponding ligatures would have to take these into account to get
things right.

If you mean German and especially "ß", then that is certainly more
ligaturisation than I plan to automate. (In particular, I'm not
implementing hyphenation, so I would treat it as a separate character.)
I am not aware that typical font file formats contain separate lists
for ligatures. However, you don't need this. Characters in Java are
Unicode-encode. All you need to know are the unicode code points for
the ligatures you want to support. Check the tables at
www.unicode.org, there aren't many. Then you need to check if a
particular font contains matching glyphs. Font.canDisplay() and
Font.canDiaplayUpTo() are your friends.

Fair enough. If that's how it is, that's what I'll do.

Des
 
C

Chris Uppal

Des said:
No; I'm specifically thinking of
<http://java.sun.com/j2se/1.3/docs/guide/2d/spec/j2d-fonts.fm5.html>
which confidently announces that "In Figure 4-14, the custom layout
algorithm replaces the fi substring with the ligature _fi_". Without
giving any hints as to how such a custom layout algorithm could be
implemented.

I get the impression that this stuff is still all "work in progress", and the
level of pluggability that the architecture promises has not (as yet) been
exposed in the public API. As of now, the only route /I/ can see for using a
custom layout algorithm is to create your own subclass of GlyphVector which
allows you to fill in the details as desired. That would (as far as I can
see) be a /lot/ of work. Check the source for sun.font.StandardGlyphVector
(it's part of the platform source, but not present in src.zip) for an idea of
how much work. And then too, you'd have to side-step the convenience methods
for text handling, and do everything at the lowest level of the API. It seems
to me that it would be /much/ easier just to substitute the 7 defined ligature
Unicode characters into your text and let the normal processing handle it.

-- chris
 
T

Thomas Weidenfeller

Des said:
If you mean German and especially "ß", then that is certainly more
ligaturisation than I plan to automate.

No, "ß" is yet another case. It once started as a ligature but is pretty
much treated as a single character these days. As opposite to ligatures
there is no voluntary replacement of two single "s" with one "ß".

What I mean in I simplified form (the real rule are even trickier) is
that one doesn't use a ligature if there could possibly be a hyphenation
between the original letters. There must not be an actual hyphenation,
just that there could be on.

/Thomas
 
T

Thomas Weidenfeller

Forgot to answer that part:

Des said:

But that's pretty much what they suggest :) This magic custom layout
algorithm they are talking about would exactly have to be an algorithm
that moves otherwise independent glyphs (Font data) together, so they
look like a ligature (or draws glyphs from some other source which
represent ligatures).
which confidently announces that "In Figure 4-14, the custom layout
algorithm replaces the fi substring with the ligature _fi_". Without
giving any hints as to how such a custom layout algorithm could be
implemented.

Of course not - because it is difficult. I consider that paragraph in
the guide as markting bla bla to show-off. You can summarise the
paragraph as it follows:

/If you want some special handling you can't use the usual drawString()
methods (which you can notice in the first figure). Instead, you are
completely on your own. You somehow have to create a GlypehVector with
some magic algorithm./

The algorithms they are talking about are algorithms to do proper
typesetting. You have a stream of characters, font attribures, etc. as
input, and the algorithm has to properly size, color, position (and
whatever) the corresponding glyphs, of which the result is recorded in a
GlypheVector.

I know of one guy who has published such an algorithm: Knuth for his TeX
typesetting system, and I seem to remember that the Unix troff authors,
as well as the GNU groff authors published some information about their
typesetting algorithms, too.

I have my doubts if it is worth the effort to implemente these so you
can emulate ligatures.

If I need to emulate ligatures, I would consider manipulating an already
generated GlypheVector, and try to reposition glyphs in the vector to
a very limited extend.

/Thomas
 
D

Des Small

Thomas Weidenfeller said:
Forgot to answer that part:



But that's pretty much what they suggest :) This magic custom layout
algorithm they are talking about would exactly have to be an algorithm
that moves otherwise independent glyphs (Font data) together, so they
look like a ligature (or draws glyphs from some other source which
represent ligatures).

I'm after the latter. High quality fonts typically come with a stock
of such glyphs. I want to use them on screen.
Of course not - because it is difficult. I consider that paragraph in
the guide as markting bla bla to show-off. You can summarise the
paragraph as it follows:

/If you want some special handling you can't use the usual
drawString() methods (which you can notice in the first
figure). Instead, you are completely on your own. You somehow have to
create a GlypehVector with some magic algorithm./

Yes; however the marketroids left the impression there would be hooks
into the code for this. In fact, the Font class has no protocol for
asking about ligature glyphs. There must be an internal API for this,
since it is (as the marketroids observe) formally the same problem as
rendering Arabic text, and Java can indeed render Arabic text. It is
just that the APIs aren't public. (Another poster suggested that this
stuff is work in progress, but the APIs don't seem to have changed
since 1999.)
The algorithms they are talking about are algorithms to do proper
typesetting. You have a stream of characters, font attribures, etc. as
input, and the algorithm has to properly size, color, position (and
whatever) the corresponding glyphs, of which the result is recorded in
a GlypheVector.

I know of one guy who has published such an algorithm: Knuth for his
TeX typesetting system, and I seem to remember that the Unix troff
authors, as well as the GNU groff authors published some information
about their typesetting algorithms, too.

I have my doubts if it is worth the effort to implemente these so you
can emulate ligatures.

I don't want to "emulate" ligatures; I want to just plain have and use
ligatures. I want a text-editor with decent rendering of text - when
writing I spend more time looking at text on the screen than anywhere
else, and it seems to me worth optimising for that. (I don't want
WYSIWYG; I don't care about WYSIWIG and I usually typeset with TeX.)

Under the circumstances the non-support from Java does indeed mean
that it isn't worth doing in Java. Accordingly, I am currently
investigating more hospitable platforms for this project.

Thanks very much for all your help, though.

Des
 
J

John C. Bollinger

Des said:
I'm after the latter. High quality fonts typically come with a stock
of such glyphs. I want to use them on screen.

And the problem with that is what?

[...]
Yes; however the marketroids left the impression there would be hooks
into the code for this. In fact, the Font class has no protocol for
asking about ligature glyphs. There must be an internal API for this,
since it is (as the marketroids observe) formally the same problem as
rendering Arabic text, and Java can indeed render Arabic text. It is
just that the APIs aren't public. (Another poster suggested that this
stuff is work in progress, but the APIs don't seem to have changed
since 1999.)
[...]

I don't want to "emulate" ligatures; I want to just plain have and use
ligatures. I want a text-editor with decent rendering of text - when
writing I spend more time looking at text on the screen than anywhere
else, and it seems to me worth optimising for that. (I don't want
WYSIWYG; I don't care about WYSIWIG and I usually typeset with TeX.)

Under the circumstances the non-support from Java does indeed mean
that it isn't worth doing in Java. Accordingly, I am currently
investigating more hospitable platforms for this project.

But you still haven't explained (that I have been able to follow) what
specific features you find lacking. Perhaps I'm missing it because my
native writing system doesn't use any ligatures, but you seem to have
started this thread with a view toward a very complicated way of
handling what may be a very simple problem. To wit: what's wrong with
using the appropriate Unicode characters for the ligatures you want, and
relying on the Font to render them correctly? (You do assert that
high-quality fonts will have the glyphs.) Why do you need to worry
about glyph vectors and layout details? It may be that the lack of docs
you complained about arises from there being nothing different about
rendering the Unicode character for a ligature than there is for
rendering any other arbitrary Unicode character.
 
R

Roedy Green

In fact, the Font class has no protocol for
asking about ligature glyphs.

Let's say you were to invent such as api. You would have to give the
unicode slot where the ligature was, often in the private area. You
would also have to specify what pair of characters it was intended to
replace.

Then you have things like the German ß which does its own thing.

Then consider Arabic. Arrgh as you run screaming from the room.

You then came back, and solved it, then you realised that somehow your
table would have to be embedded in every font, and you would have to
convince all the Font software makers to support your addition and all
Font designers to fill it in.

What you will have to do is get a copy of any new font, examine it
manually, and add it to your ligaturiser code.

Consider too that you might not WANT to use some ligatures. They may
look too archaic.

What perhaps you want instead is a tool for you user to examine a font
for ligatures and the end user can decide if they want to use them,
and for what pair. It might report back to central on their findings.
You can build a consensus table to use as the defaults.

Surely you are not the first person to want to handle ligatures.
Perhaps if you study the OpenType format there is something in there.
You might examine the font file directly or use some platform specific
tool to get you more font trivia.

I am not optimistic. Font canDisplay LIES. It returns TRUE even if
you just get a blob back. If they can't even get that right, what hope
is there for ligatures?
 
T

Thomas Weidenfeller

John said:
And the problem with that is what?

For me the OPs problem sounds as it follows:

A font as part of its data provides information about mappings of
charsets to the glyphs contained in the font. Not each such mapping can
point to all glyphs in a font. E.g. an ISO Lantin 1 mapping provided by
a font would of course only point to the glyphs in the font which are
part of ISO Latin 1. A Unicode mapping can only point to glyphs in the
font which have a place in Unicode.

Now, Unicode defines very few ligatures, much less than one would need
in serious typesetting. Fonts (apparently his fonts) can contain glyphs
for many more ligatures than Unicode knows about.

But he can't get to these additional (ligature) glyphs in the font,
because they have no Unicode code point, are therefor not in a font's
Unicode-to-glyph mapping, and are therefore not addressable by Java.
Because Java only deals with Unicode.

I just learned that (contrary to what I originally wrote) high quality
fonts even contain "glyph substitution" data. Lists which provide a
mapping of character combinations to special glyphs for these
combinations, e.g. ligatures. Java provides no API to access such lists.
And even it it would, the resulting glyph has no Unicode code point, so
could not be represented in a Java char.

So from a typesetting point of view already Java's Font API is very poor.

Hi also can't simply circumvent the Font API and somehow create his own
GlypheVector. At least not with some huge effort. From what I see, doing
this would (a) require to write an own font file reader, capable of
getting all the necessary information out of a font file, (b) a layout
algorithm to place glyph data in a GlypheVector, which (c) requires to
write an own font renderer to interpret the glyph data in a font and
convert it to Java glyph information.

Perhaps I'm missing it because my
native writing system doesn't use any ligatures,

I would be very surprised if you haven't seen your native language
(American English?) printed with ligatures. It is a matter of aesthetic
typesetting. High-quality newspaper, magazine or book printers for sure
use ligatures.

Do you happen to have the GoF book at hand? For example the first
paragraph of the first page (xi) of the prefix contains a ligature.
"This book assumes you are reasonably proficient ...". See the "fi" in
"proficient"? IMHO it is printed as a ligature.
started this thread with a view toward a very complicated way of
handling what may be a very simple problem. To wit: what's wrong with
using the appropriate Unicode characters for the ligatures you want,

There aren't any Unicode characters.

/Thomas
 
D

Des Small

John C. Bollinger said:
And the problem with that is what?

The problem with that is there is no protocol to ask fonts whether
they do or not have such ligatures.

[...]
But you still haven't explained (that I have been able to follow) what
specific features you find lacking.

The correct model is to produce a GlyphVector, which includes all the
ligatured glyph, from a sequence of logical characters, which is
entirely oblivious to ligatures. What I lack is hooks into that
process; what you are suggesting is to munge the characters instead
which is a presumably workable hack for some cases, but see below.

I don't think it's unreasonable to want what I want, and it certainly
isn't unanticipated: the Java official documentation says: "In Figure
4-14, the custom layout algorithm replaces the fi substring with the
ligature fi."
<http://java.sun.com/j2se/1.3/docs/guide/2d/spec/j2d-fonts.fm5.html#67469>

That's what I want to do. Write such a "custom layout algorithm".
Perhaps I'm missing it because my native writing system doesn't use
any ligatures, but you seem to have started this thread with a view
toward a very complicated way of handling what may be a very simple
problem. To wit: what's wrong with using the appropriate Unicode
characters for the ligatures you want, and relying on the Font to
render them correctly?

They are essentially deprecated:

"""
The existing ligatures exist basically for compatibility and
round-tripping with non-Unicode character sets. Their use is
discouraged. No more will be encoded in any circumstances.
"""
<http://www.unicode.org/faq/ligature_digraph.html>

It's the rendering engine's job to do this stuff.
The current Java rendering engine does not do this stuff.
The current Java rendering engine does not have hooks to allow me to
do this stuff. (Yes, I can feed it deprecated characters. No, I'm
not going to.)

Note that some fonts have a fancy 'st' ligature which has no Unicode
codepoint and which isn't going to get one.
(You do assert that high-quality fonts will have the glyphs.) Why
do you need to worry about glyph vectors and layout details? It may
be that the lack of docs you complained about arises from there
being nothing different about rendering the Unicode character for a
ligature than there is for rendering any other arbitrary Unicode
character.

There isn't just a lack of docs, there's a lack of hooks into the
character rendering pipeline.

Just to put the matter beyond reasonable doubt, though, I have decided
that the showpiece for my implementation will be the correct handling
of Fraktur writing, which has many ligatures not encoded in Unicode
code-points.

Des
 
D

Des Small

Roedy Green said:
Let's say you were to invent such as api. You would have to give the
unicode slot where the ligature was, often in the private area. You
would also have to specify what pair of characters it was intended to
replace.

Yes, of course.
Then you have things like the German ß which does its own thing.

German ß is usually considered a character in its own right;
iso-latins give it its own codepoint.
Then consider Arabic. Arrgh as you run screaming from the room.

Arabic isn't really that bad, if you abandon a naive belief in a 1-1
mapping of characters and glyphs. Since I have abandoned such a
belief, I am not very intimidated. What I am asking does indeed
amount to a recognition that proper typesetting of even English
requires comparable resources to typesetting Arabic. But that's
because it does.
You then came back, and solved it, then you realised that somehow your
table would have to be embedded in every font, and you would have to
convince all the Font software makers to support your addition and all
Font designers to fill it in.

They do. That's the point. They already do. Consider the following
fragment of the Abode Font Metrics (afm) file for a passing Palatino:

"""
C 102 ; WX 333 ; N f ; B 23 -3 341 728 ; L i fi ; L l fl ;
"""

This is the letter 'f', and the Ls say how it combines with 'i' and
'l' to form ligatures. This is typical of Abode's Type 1 fonts.
What you will have to do is get a copy of any new font, examine it
manually, and add it to your ligaturiser code.

No, it can be done programmatically from afm files (or equivalent data
included with, say, TrueType fonts).
Consider too that you might not WANT to use some ligatures. They may
look too archaic.

That's a crossable bridge, if I come to it.
What perhaps you want instead is a tool for you user to examine a font
for ligatures and the end user can decide if they want to use them,
and for what pair. It might report back to central on their findings.
You can build a consensus table to use as the defaults.

Maybe. But at the moment the user is me, and I want them all.
Surely you are not the first person to want to handle ligatures.

I can Google up other forlorn unsuccesses with Java. At the moment
I'm working on parsing Type 1 font afm files, and the most hospitable
framework seems to be the Gnome project's Pango, which specialises in
internationalised text and already has an Arabic shaper module.

Des
 
T

Thomas Weidenfeller

Des said:
I can Google up other forlorn unsuccesses with Java. At the moment
I'm working on parsing Type 1 font afm files, and the most hospitable
framework seems to be the Gnome project's Pango, which specialises in
internationalised text and already has an Arabic shaper module.

Deep inside somewhere in Apache FOP should also be some classes to read
TTF and Type 1 fonts.

/Thomas
 
R

Roedy Green

I just learned that (contrary to what I originally wrote) high quality
fonts even contain "glyph substitution" data. Lists which provide a
mapping of character combinations to special glyphs for these
combinations, e.g. ligatures. Java provides no API to access such lists.
And even it it would, the resulting glyph has no Unicode code point, so
could not be represented in a Java char.

Those could be done by mapping the ligatures into the unicode private
area. When I was reading up on ligatures a while back that is how they
were supposed to be handled. Unicode itself wanted to wash its hands
of them.

A curiosity itch that is getting more and more insistent is to find
out just what information is encoded in a font, and how the mapping
from unicode to glyph gets handled. Are there 8 and 16 bit fonts, or
are 16 bit fonts faked with multiple 8-bit fonts and duct tape?
I looking at fonts on the net. the glyphs seem to be often placed in
quite arbitrary slots. How do you get them moved where you want them?

How can you tell what glyphs you really have available?
 
R

Roedy Green

The problem with that is there is no protocol to ask fonts whether
they do or not have such ligatures.

It sounds like it MIGHT be encoded in some fonts. The problem is Java
is not handing the information to you on a plate. Presumably you can
go analyse the font yourself.
 
R

Roedy Green

I can Google up other forlorn unsuccesses with Java. At the moment
I'm working on parsing Type 1 font afm files, and the most hospitable
framework seems to be the Gnome project's Pango, which specialises in
internationalised text and already has an Arabic shaper module.

You have three basic font formats to deal with, truetype, Adobe type 1
and OpenType. I'm going to see if I can read up on the format.

I have the White Book for Adobe Type 1, but it is very old.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,774
Messages
2,569,598
Members
45,149
Latest member
Vinay Kumar Nevatia0
Top