universal unicode font for reportlab

L

Laszlo Nagy

I need to create multi lingual invoices from reportlab. I think it is
possible to use UTF 8 strings but there is a problem with the font. I
could not find any free TTF font that can do latin1, latin2, arabic,
chinese and other languages at the same time. Is there a single font
that is able to handle these languages? (Most of our invoices will be
for EN, FR, DE, HU, SK, CZ, RO but some of them needs to be in Chinese.)

Thanks,

Laszlo
 
L

Laszlo Nagy

The GNU Unifont <URL:http://en.wikipedia.org/wiki/GNU_Unifont>
<URL:http://unifoundry.com/unifont.html> covers an impressive range of
the Unicode Basic Multilingual Plane.

Unifont is originally a bitmap font, but was recently made available
in TrueType format
<URL:http://www.lgm.cl/trabajos/unifont/index.en.html>.

Both are available in Debian 'lenny'; the 'unifont' and 'ttf-unifont'
packages, respectively.
I found out that dejavu is what I need. It covers the languages I need
and more:

http://dejavu.svn.sourceforge.net/viewvc/dejavu/tags/version_2_26/dejavu-fonts/langcover.txt


Thanks four your help!

L
 
L

Laszlo Nagy

The GNU Unifont <URL:http://en.wikipedia.org/wiki/GNU_Unifont>
<URL:http://unifoundry.com/unifont.html> covers an impressive range of
the Unicode Basic Multilingual Plane.

Unifont is originally a bitmap font, but was recently made available
in TrueType format
<URL:http://www.lgm.cl/trabajos/unifont/index.en.html>.

Both are available in Debian 'lenny'; the 'unifont' and 'ttf-unifont'
packages, respectively.
I found out that dejavu is what I need. It covers the languages I need
and more:

http://dejavu.svn.sourceforge.net/viewvc/dejavu/tags/version_2_26/dejavu-fonts/langcover.txt


Thanks four your help!

L
 
L

Laszlo Nagy

I found out that dejavu is what I need. It covers the languages I need
and more:

http://dejavu.svn.sourceforge.net/viewvc/dejavu/tags/version_2_26/dejavu-fonts/langcover.txt
Sorry, this did not work either. Dejavu does support cyrillic and greek
characters but I have to load a different ttf for that. They are no
unified. :-( The only one that worked so far was "unifont.tff" but it is
very ugly above point size=10.

Can you tell me what kind of font Geany is using on my Ubuntu system?
The preferences tells that it is "monospace" but when I load
VeraMono.ttf in reportlab, it will not even display latin2 characters.
In contrast, please look at this example that show my test program in Geany:

http://www.shopzeus.com/geany.jpg

It is a real scalable truetype font, displaying latin 1, latin2,
chinese, russian and japanese characters. Is it the same font? Does this
mean that reportlab is buggy? If I could load the same font that geany
uses, it would probably solve my problem forever.

Thanks,

Laszlo
 
L

Laszlo Nagy

I found out that dejavu is what I need. It covers the languages I need
and more:

http://dejavu.svn.sourceforge.net/viewvc/dejavu/tags/version_2_26/dejavu-fonts/langcover.txt
Sorry, this did not work either. Dejavu does support cyrillic and greek
characters but I have to load a different ttf for that. They are no
unified. :-( The only one that worked so far was "unifont.tff" but it is
very ugly above point size=10.

Can you tell me what kind of font Geany is using on my Ubuntu system?
The preferences tells that it is "monospace" but when I load
VeraMono.ttf in reportlab, it will not even display latin2 characters.
In contrast, please look at this example that show my test program in Geany:

http://www.shopzeus.com/geany.jpg

It is a real scalable truetype font, displaying latin 1, latin2,
chinese, russian and japanese characters. Is it the same font? Does this
mean that reportlab is buggy? If I could load the same font that geany
uses, it would probably solve my problem forever.

Thanks,

Laszlo
 
L

Laszlo Nagy

Iain said:
Why don't you want to use multiple typefaces? Many programs that deal
with multilingual strings use multiple fonts (cf. any Web browser and
Emacs).
You are right, but these PDF documents will show mixed strings. The end
user can enter arbitrary strings into the database, and they must be
presented. For example, the name of a product can be arabic or german.
It might be possible to guess the language used from the unicode string,
and then select a different font. But I don't want to go into that trouble.

It would be a great idea to use pango. Apparently pango is able to
change fonts on the fly and render the requested glyph. However, if I
use pango then I loose the much higher level of abstraction that comes
with reportlab and platypus: I need automatic page headers and footers,
I need to be able to repeat table headers on each page automatically
(when the table doesn't fit one page) etc. Developing my own "platypus"
like engine for pango and PDF rendering is a nightmare.

Better than that, I can develop my own flowable object for platypus: a
special paragraph that changes the used true type font on the fly.
(Split input string into parts, determine language for the parts and
display each part with its own font.) But of course this is a lot of
extra programming.

The simplest solution would be to use a font that is able to handle all
encodings that I need.

Thanks,

Laszlo
 
T

Terry Reedy

The simplest solution would be to use a font that is able to handle all
encodings that I need.

My OpenOffice on WinXP uses a unicode font, I believe Lucida Sans
Unicode, that seems to cover the entire BMP. I don't know whether it
was already installed or installed by OO or how one would get to it to
extract it.
 
R

Ross Ridge

Terry Reedy said:
My OpenOffice on WinXP uses a unicode font, I believe Lucida Sans
Unicode, that seems to cover the entire BMP.

Lucida Sans Unicode only covers a small subset of Unicode. It may seem
to cover a wider range because Windows (and possibly OpenOffice) will
automatically substitute characters from other fonts, if necessary.
I don't know whether it was already installed or installed by OO or
how one would get to it to extract it.

It's a standard Windows font.

Ross Ridge
 
T

Terry Reedy

Ross said:
Lucida Sans Unicode only covers a small subset of Unicode. It may seem
to cover a wider range because Windows (and possibly OpenOffice) will
automatically substitute characters from other fonts, if necessary.

Sorry, I posted the wrong name.
Ariel Unicode MS is the one that seems pretty complete.

It's a standard Windows font.

From the MS, I would guess that is a Windows font too ;-).
 
J

Jeroen Ruigrok van der Werven

-On [20080909 05:23] said:
Ariel Unicode MS is the one that seems pretty complete.

Not really. It misses a lot of characters.

Might I recommend using BabelMap
(http://www.babelstone.co.uk/Software/BabelMap.html) so you can investigate
your fonts?

The only font I am aware of that supports a lot of Unicode fonts is James
Kass' Code 200x fonts (http://www.code2000.net/).

In almost all cases you will need to gather a collection of fonts in order
to typeset your documents as it is hard to find font designers who know
enough about all languages to properly design the fonts. Not to mention such
fonts would quickly grow unwieldy.

--
Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
A kiss is a lovely trick designed by nature to stop speech when words
become superfluous...
 
R

Ross Ridge

Terry Reedy said:
Sorry, I posted the wrong name.
Ariel Unicode MS is the one that seems pretty complete. ....
From the MS, I would guess that is a Windows font too ;-).

It's made by Microsoft, but it's not a standard Windows font. I think
it comes with Microsoft Office.

Ross Ridge
 
T

Terry Reedy

Jeroen said:
-On [20080909 05:23] said:
Ariel Unicode MS is the one that seems pretty complete.

Not really. It misses a lot of characters.

Well, it has Latin, Greek, Cyrillic, Hebrew, Arabic, several south
Asian, Tibetan, CJK, Japanese, Korean, and numerous symbols and special
forms. I don't know what it misses, but I think that covers what the OP
asked for.
 
L

Laszlo Nagy

Ross said:
It's made by Microsoft, but it's not a standard Windows font. I think
it comes with Microsoft Office.
I need to use HTML anyway. I realized that universal unicode fonts are
above 5MB in size. The report would be a 10KB PDF, but I need to embed
the font before I can send it to anyone. Since some reports needs to be
sent in emails, I need to use something else. I cannot be sending 10MB
emails for "one page" reports.

I ended up implementing the reports in HTML. I'm assuming that the
user's browser is capable of displaying any characters needed. Now there
is another problem: how to print an HTML without page header/footer
information, from a browser? But that is another problem and probably
has nothing to do with Python.

Thanks for your help anyway.

Best,

Laszlo
 
R

Ross Ridge

Duncan Booth said:
I thought that usually when you embed a font in a PDF only the glyphs which
are actually used in the document get embedded. Unfortunately a quick test
with reportlab seems to show that it doesn't do that optimisation: it looks
as though it just embeds the entire font.

Yah, PDF files normally only contain an embedded subset of the fonts used.
It might possible to use Ghostscript's ps2pdf command (which can take a
PDF file as input) to strip out the unused glyphs from the embedded fonts.

Ross Ridge
 
T

Tim Roberts

Duncan Booth said:
I thought that usually when you embed a font in a PDF only the glyphs which
are actually used in the document get embedded. Unfortunately a quick test
with reportlab seems to show that it doesn't do that optimisation: it looks
as though it just embeds the entire font.

No, it does subsetting. There was a debate a year or two ago on the
reportlab list about how the font subset should be named in the resulting
PDF file.

Is it possible you have an older release?
 
T

Tim Roberts

Duncan Booth said:
The not too scientific test I did was to copy the font embedding example
from the Reportlab documentation, modify it enough to make it actually
run, and then change the output to have only one glyph. The resulting
PDF is virtually identical. I'm not a reportlab expert though so I may
have made some blindingly obvious beginners mistake (or maybe it only
subsets fonts over a certain size or glyphs outside the ascii range?).

---------- rlab.py ------------
import os, sys
import reportlab
folder = os.path.dirname(reportlab.__file__) + os.sep + 'fonts'
afmFile = os.path.join(folder, 'LeERC___.AFM')
pfbFile = os.path.join(folder, 'LeERC___.PFB')
from reportlab.pdfbase import pdfmetrics
justFace = pdfmetrics.EmbeddedType1Face(afmFile, pfbFile)
faceName = 'LettErrorRobot-Chrome' # pulled from AFM file
pdfmetrics.registerTypeFace(justFace)
justFont = pdfmetrics.Font('LettErrorRobot-Chrome',faceName,'WinAnsiEncoding')

OK, look the other way while I backpedal furiously. The conversation on
the mailing last year was focused on TrueType fonts. Those are subsetted.

EmbeddedType1Face, used for Type 1 fonts, does appear to embed the entire
font.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,611
Members
45,280
Latest member
BGBBrock56

Latest Threads

Top