Unicode Greek in English HTML

O

OccasionalFlyer

I have a docuemnt (it's actually an .aspx page but it's mostly
HTML. I am seeking to embed a little Greek in Palatino Linotype. I
have doen this successfully once, just by pasting the words in the
correct font into the document and making sure its encoding is UTF-8.
However, I tried to do this again elsewhere in the document and it is
not working.Is ther something simply I can do to resolve this?
Alhtough I have control over all the .aspx pages, I did not code them
and I am averse to doing major changes that I might not be able to
resolve if something goes wrong. (I took over maintenance of my
organizations' web site as a volunteer from the previous volunteer and
whil ehe knows .aspx somewhat, I"m a Java developer and while i have
done some reading in aspx coding, don't know a lot yet). Thanks.

Ken
 
J

Jukka K. Korpela

OccasionalFlyer said:
I have a docuemnt (it's actually an .aspx page but it's mostly
HTML. I am seeking to embed a little Greek in Palatino Linotype.

Greek looks a bit odd in Palatino Linotype (some letters look slanted etc.),
but that's perhaps just me.
I have doen this successfully once, just by pasting the words in the
correct font into the document and making sure its encoding is UTF-8.

That's a possible approach, but there are many risks. In particular, cut and
paste may carry formatting information that should be lost or, conversely,
it may lose information that you would like to preserve. I would copy and
paste as plain text, then perhaps add a style sheet rule suggesting a font -
though normally one should use the same font for copy text in Latin letters
and any quotations using some other script. This overall font should of
course be one that covers all the characters you'll use.
However, I tried to do this again elsewhere in the document and it is
not working.

We need the URL. And I mean URL, not a snippet of code. It is quite possible
that the _server_ sends information about encoding, and this information
isn't in the HTML document itself and will override any meta tags you might
use in the documet.
Alhtough I have control over all the .aspx pages, I did not code them
and I am averse to doing major changes that I might not be able to
resolve if something goes wrong.

If the server actually announces the encoding as, say, iso-8859-1, then you
have two options: change the server settings, or represent the Greek
characters using character references or entity references, which work
irrespectively of encoding. Surely this will make the file a little bigger,
as you would have e.g. instead of letter alpha the string α or the
string α, but this isn't a serious efficiency issue if you have just
some short strings. It makes the source less readable to people who know
Greek, of course.

There are many utilities that can convert e.g. Greek text to character
references or entity references, such as the free Unicode-capable text
editor BabelPad.
 
O

OccasionalFlyer

Greek looks a bit odd in Palatino Linotype (some letters look slanted etc..),
but that's perhaps just me.


That's a possible approach, but there are many risks. In particular, cut and
paste may carry formatting information that should be lost or, conversely,
it may lose information that you would like to preserve. I would copy and
paste as plain text, then perhaps add a style sheet rule suggesting a font -
though normally one should use the same font for copy text in Latin letters
and any quotations using some other script. This overall font should of
course be one that covers all the characters you'll use.


We need the URL. And I mean URL, not a snippet of code. It is quite possible
that the _server_ sends information about encoding, and this information
isn't in the HTML document itself and will override any meta tags you might
use in the documet.


If the server actually announces the encoding as, say, iso-8859-1, then you
have two options: change the server settings, or represent the Greek
characters using character references or entity references, which work
irrespectively of encoding. Surely this will make the file a little bigger,
as you would have e.g. instead of letter alpha the string α or the
string α, but this isn't a serious efficiency issue if you have just
some short strings. It makes the source less readable to people who know
Greek, of course.

There are many utilities that can convert e.g. Greek text to character
references or entity references, such as the free Unicode-capable text
editor BabelPad.

Thanks. Here's the URL:
http://www.ibr-bbr.org/IBRBulletin/IBR_BBR_ByYearList.aspx
The piece that worked for me is near the bottom:

Key Words: MT, LXX, Final Doxology, collocation, horn, translation,
judgment, deliverance,
Diaspora, קֶרֶן, κέÏας, רוּ×, ὑψόω

The piece that did not work for me almost at the very bottom:
Key Words: hebdomadal system, stages of life, Paul, Timothy,
paidi,on , pai/j , meiravkion, neani,skoj , avnh,r , presbu,thj ,
ge,rwn


I will say right here that most of what is on this page I did not do.
I am responsible for the last few journal issues describes on the page
(Vol 19), and even as I look at them now, I see a few errors I need to
correct. I don't know why everything is in italics because that's not
what I thought I did. I'm making no great claims to skill here but I
am trying, and not just trying to be stupid like, "What's Unicode?"
Thanks.

Ken
 
A

Andy Dingley

Thanks. Here's the URL:http://www.ibr-bbr.org/IBRBulletin/IBR_BBR_ByYearList.aspx
The piece that worked for me is near the bottom:

Looks like the page encoding is OK, but those few characters just
aren't Unicode. Smells more like an ASP problem than HTML - I think
your generation is breaking it, not the target of what you're trying
to generate.

Is the database content OK? Don't forget you'll need NVARCHAR under
SQL Server, not just VARCHAR



On a side issue, that's ugly HTML. No useful markup in there (it needs
headers, let alone any other semantics) and this had led to a very
"flat" presentation that's difficult to read. For a page of that sheer
bulk, your readers need all the help they can get!

To be honest, you just shouldn't serve 1/2MB pages - they're no use to
anyone. As the only thing you can do with a page that big is to try
and split it or search it mechanically, you should be supporting ways
that they can do this on yoru server, without needing to first
download that whole behemoth.
 
O

OccasionalFlyer

Looks like the page encoding is OK, but those few characters just
aren't Unicode. Smells more like an ASP problem than HTML - I think
your generation is breaking it, not the target of what you're trying
to generate.

So I guess I should ask in the ASP.NET group, yes?
Is the database content OK?  Don't forget you'll need NVARCHAR under
SQL Server, not just VARCHAR

So far as I can tell, it's fine but the content is not coming from SQL
Server, so far as I know.
On a side issue, that's ugly HTML. No useful markup in there (it needs
headers, let alone any other semantics) and this had led to a very
"flat" presentation that's difficult to read. For a page of that sheer
bulk, your readers need all the help they can get!

Most of the header/CSS stuff is in the "master" page that wraps around
all the other pages in an ASP environment. (Honestly, if I was better
at JavaScript, I'd convert the whole site to HTML but I've no idea how
I'd implement Login security, especially for blocking some, but not
all, resources. I don't even normally do that in Java, once my
servlet is sure the user is valid. I'd love to move it to another ISP
because I have nothing but grief with the ISP. I'm also open to
suggestions. I was only trying to more or less continue what had been
started. All those blocks of years will take a user to a specific
journal volume. Yes, more pages would be nice but I'm afraid that my
understanding of how to add more levels of navigation to ASP.NET is
not good. From what I've read, it would take a menu control, but the
site was not built that way. All of its links were simply hard-
coded.

I'm not page designer but a software developer, so I'm not sure
what to do that would be best. Ideas?
To be honest, you just shouldn't serve 1/2MB pages - they're no use to
anyone. As the only thing you can do with a page that big is to try
and split it or search it mechanically, you should be supporting ways
that they can do this on yoru server, without needing to first
download that whole behemoth.

I'll put this on my to-do list. Thanks.

Ken
 
J

Jukka K. Korpela

OccasionalFlyer said:
http://www.ibr-bbr.org/IBRBulletin/IBR_BBR_ByYearList.aspx […]
The piece that did not work for me almost at the very bottom:
Key Words: hebdomadal system, stages of life, Paul, Timothy,
paidi,on , pai/j , meiravkion, neani,skoj , avnh,r , presbu,thj ,
ge,rwn

The page encoding is delared as UTF-8, and like Andy wrote, there are words
that obviously aren’t in that encoding. This looks like a problem in copy
and paste. Where were the words copied from? Perhaps from a document (web
page or other) where â€fontistic fantasies†are used to extend character
repertoire, i.e. text is written in Ascii but some font setting is used to
make the characters look something completely different. Needless to say,
such tricks only work on defective software and generally break apart when
data is transferred to another program.

The page apparently contains parts that have come from Microsoft Office
software said:
I don't know why everything is in italics because that's not
what I thought I did.

There seems to be a lot of <em> and <strong> markup on the page. To be
honest, it might be best to extract the content as plain text and then add
some simple markup, instead of trying to fix the mess. But maybe the quick
and dirty fix of adding

em { font-style: normal; }
strong { font-weight: normal; }

would remove some of the most striking problems in rendering.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,538
Members
45,024
Latest member
ARDU_PROgrammER

Latest Threads

Top