Unicode Greek in English HTML

Discussion in 'HTML' started by OccasionalFlyer, Aug 10, 2010.

  1. I have a docuemnt (it's actually an .aspx page but it's mostly
    HTML. I am seeking to embed a little Greek in Palatino Linotype. I
    have doen this successfully once, just by pasting the words in the
    correct font into the document and making sure its encoding is UTF-8.
    However, I tried to do this again elsewhere in the document and it is
    not working.Is ther something simply I can do to resolve this?
    Alhtough I have control over all the .aspx pages, I did not code them
    and I am averse to doing major changes that I might not be able to
    resolve if something goes wrong. (I took over maintenance of my
    organizations' web site as a volunteer from the previous volunteer and
    whil ehe knows .aspx somewhat, I"m a Java developer and while i have
    done some reading in aspx coding, don't know a lot yet). Thanks.

    Ken
     
    OccasionalFlyer, Aug 10, 2010
    #1
    1. Advertising

  2. OccasionalFlyer wrote:

    > I have a docuemnt (it's actually an .aspx page but it's mostly
    > HTML. I am seeking to embed a little Greek in Palatino Linotype.


    Greek looks a bit odd in Palatino Linotype (some letters look slanted etc.),
    but that's perhaps just me.

    > I have doen this successfully once, just by pasting the words in the
    > correct font into the document and making sure its encoding is UTF-8.


    That's a possible approach, but there are many risks. In particular, cut and
    paste may carry formatting information that should be lost or, conversely,
    it may lose information that you would like to preserve. I would copy and
    paste as plain text, then perhaps add a style sheet rule suggesting a font -
    though normally one should use the same font for copy text in Latin letters
    and any quotations using some other script. This overall font should of
    course be one that covers all the characters you'll use.

    > However, I tried to do this again elsewhere in the document and it is
    > not working.


    We need the URL. And I mean URL, not a snippet of code. It is quite possible
    that the _server_ sends information about encoding, and this information
    isn't in the HTML document itself and will override any meta tags you might
    use in the documet.

    > Alhtough I have control over all the .aspx pages, I did not code them
    > and I am averse to doing major changes that I might not be able to
    > resolve if something goes wrong.


    If the server actually announces the encoding as, say, iso-8859-1, then you
    have two options: change the server settings, or represent the Greek
    characters using character references or entity references, which work
    irrespectively of encoding. Surely this will make the file a little bigger,
    as you would have e.g. instead of letter alpha the string α or the
    string α, but this isn't a serious efficiency issue if you have just
    some short strings. It makes the source less readable to people who know
    Greek, of course.

    There are many utilities that can convert e.g. Greek text to character
    references or entity references, such as the free Unicode-capable text
    editor BabelPad.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Aug 10, 2010
    #2
    1. Advertising

  3. On Aug 10, 12:24 pm, "Jukka K. Korpela" <> wrote:
    > OccasionalFlyer wrote:
    > > I have a docuemnt (it's actually an .aspx page but it's mostly
    > > HTML.  I am seeking to embed a little Greek in Palatino Linotype.

    >
    > Greek looks a bit odd in Palatino Linotype (some letters look slanted etc..),
    > but that's perhaps just me.
    >
    > > I have doen this successfully once, just by pasting the words in the
    > > correct font into the document and making sure its encoding is UTF-8.

    >
    > That's a possible approach, but there are many risks. In particular, cut and
    > paste may carry formatting information that should be lost or, conversely,
    > it may lose information that you would like to preserve. I would copy and
    > paste as plain text, then perhaps add a style sheet rule suggesting a font -
    > though normally one should use the same font for copy text in Latin letters
    > and any quotations using some other script. This overall font should of
    > course be one that covers all the characters you'll use.
    >
    > > However, I tried to do this again elsewhere in the document and it is
    > > not working.

    >
    > We need the URL. And I mean URL, not a snippet of code. It is quite possible
    > that the _server_ sends information about encoding, and this information
    > isn't in the HTML document itself and will override any meta tags you might
    > use in the documet.
    >
    > > Alhtough I have control over all the .aspx pages, I did not code them
    > > and I am averse to doing major changes that I might not be able to
    > > resolve if something goes wrong.

    >
    > If the server actually announces the encoding as, say, iso-8859-1, then you
    > have two options: change the server settings, or represent the Greek
    > characters using character references or entity references, which work
    > irrespectively of encoding. Surely this will make the file a little bigger,
    > as you would have e.g. instead of letter alpha the string α or the
    > string &alpha;, but this isn't a serious efficiency issue if you have just
    > some short strings. It makes the source less readable to people who know
    > Greek, of course.
    >
    > There are many utilities that can convert e.g. Greek text to character
    > references or entity references, such as the free Unicode-capable text
    > editor BabelPad.
    >
    > --
    > Yucca,http://www.cs.tut.fi/~jkorpela/


    Thanks. Here's the URL:
    http://www.ibr-bbr.org/IBRBulletin/IBR_BBR_ByYearList.aspx
    The piece that worked for me is near the bottom:

    Key Words: MT, LXX, Final Doxology, collocation, horn, translation,
    judgment, deliverance,
    Diaspora, קֶרֶן, κέÏας, רוּ×, ὑψόω

    The piece that did not work for me almost at the very bottom:
    Key Words: hebdomadal system, stages of life, Paul, Timothy,
    paidi,on , pai/j , meiravkion, neani,skoj , avnh,r , presbu,thj ,
    ge,rwn


    I will say right here that most of what is on this page I did not do.
    I am responsible for the last few journal issues describes on the page
    (Vol 19), and even as I look at them now, I see a few errors I need to
    correct. I don't know why everything is in italics because that's not
    what I thought I did. I'm making no great claims to skill here but I
    am trying, and not just trying to be stupid like, "What's Unicode?"
    Thanks.

    Ken
     
    OccasionalFlyer, Aug 11, 2010
    #3
  4. OccasionalFlyer

    Andy Dingley Guest

    On 11 Aug, 01:54, OccasionalFlyer <> wrote:

    > Thanks. Here's the URL:http://www.ibr-bbr.org/IBRBulletin/IBR_BBR_ByYearList.aspx
    > The piece that worked for me is near the bottom:


    Looks like the page encoding is OK, but those few characters just
    aren't Unicode. Smells more like an ASP problem than HTML - I think
    your generation is breaking it, not the target of what you're trying
    to generate.

    Is the database content OK? Don't forget you'll need NVARCHAR under
    SQL Server, not just VARCHAR



    On a side issue, that's ugly HTML. No useful markup in there (it needs
    headers, let alone any other semantics) and this had led to a very
    "flat" presentation that's difficult to read. For a page of that sheer
    bulk, your readers need all the help they can get!

    To be honest, you just shouldn't serve 1/2MB pages - they're no use to
    anyone. As the only thing you can do with a page that big is to try
    and split it or search it mechanically, you should be supporting ways
    that they can do this on yoru server, without needing to first
    download that whole behemoth.
     
    Andy Dingley, Aug 11, 2010
    #4
  5. On Aug 10, 7:43 pm, Andy Dingley <> wrote:

    > Looks like the page encoding is OK, but those few characters just
    > aren't Unicode. Smells more like an ASP problem than HTML - I think
    > your generation is breaking it, not the target of what you're trying
    > to generate.


    So I guess I should ask in the ASP.NET group, yes?

    > Is the database content OK?  Don't forget you'll need NVARCHAR under
    > SQL Server, not just VARCHAR


    So far as I can tell, it's fine but the content is not coming from SQL
    Server, so far as I know.

    > On a side issue, that's ugly HTML. No useful markup in there (it needs
    > headers, let alone any other semantics) and this had led to a very
    > "flat" presentation that's difficult to read. For a page of that sheer
    > bulk, your readers need all the help they can get!


    Most of the header/CSS stuff is in the "master" page that wraps around
    all the other pages in an ASP environment. (Honestly, if I was better
    at JavaScript, I'd convert the whole site to HTML but I've no idea how
    I'd implement Login security, especially for blocking some, but not
    all, resources. I don't even normally do that in Java, once my
    servlet is sure the user is valid. I'd love to move it to another ISP
    because I have nothing but grief with the ISP. I'm also open to
    suggestions. I was only trying to more or less continue what had been
    started. All those blocks of years will take a user to a specific
    journal volume. Yes, more pages would be nice but I'm afraid that my
    understanding of how to add more levels of navigation to ASP.NET is
    not good. From what I've read, it would take a menu control, but the
    site was not built that way. All of its links were simply hard-
    coded.

    I'm not page designer but a software developer, so I'm not sure
    what to do that would be best. Ideas?

    > To be honest, you just shouldn't serve 1/2MB pages - they're no use to
    > anyone. As the only thing you can do with a page that big is to try
    > and split it or search it mechanically, you should be supporting ways
    > that they can do this on yoru server, without needing to first
    > download that whole behemoth.


    I'll put this on my to-do list. Thanks.

    Ken
     
    OccasionalFlyer, Aug 11, 2010
    #5
  6. OccasionalFlyer wrote:

    > http://www.ibr-bbr.org/IBRBulletin/IBR_BBR_ByYearList.aspx

    […]
    > The piece that did not work for me almost at the very bottom:
    > Key Words: hebdomadal system, stages of life, Paul, Timothy,
    > paidi,on , pai/j , meiravkion, neani,skoj , avnh,r , presbu,thj ,
    > ge,rwn


    The page encoding is delared as UTF-8, and like Andy wrote, there are words
    that obviously aren’t in that encoding. This looks like a problem in copy
    and paste. Where were the words copied from? Perhaps from a document (web
    page or other) where â€fontistic fantasies†are used to extend character
    repertoire, i.e. text is written in Ascii but some font setting is used to
    make the characters look something completely different. Needless to say,
    such tricks only work on defective software and generally break apart when
    data is transferred to another program.

    The page apparently contains parts that have come from Microsoft Office
    software, as the markup <p class="MsoNormal"> reveals.

    > I don't know why everything is in italics because that's not
    > what I thought I did.


    There seems to be a lot of <em> and <strong> markup on the page. To be
    honest, it might be best to extract the content as plain text and then add
    some simple markup, instead of trying to fix the mess. But maybe the quick
    and dirty fix of adding

    em { font-style: normal; }
    strong { font-weight: normal; }

    would remove some of the most striking problems in rendering.

    --
    Yucca, http://www.cs.tut.fi/~jkorpela/
     
    Jukka K. Korpela, Aug 11, 2010
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    361
    Steve C. Orr [MVP, MCSD]
    Apr 20, 2004
  2. =?Utf-8?B?UmFlZCBTYXdhbGhh?=

    English/English DLL

    =?Utf-8?B?UmFlZCBTYXdhbGhh?=, Oct 15, 2005, in forum: ASP .Net
    Replies:
    2
    Views:
    1,680
    =?Utf-8?B?UmFlZCBTYXdhbGhh?=
    Oct 16, 2005
  3. Thang Nguyen
    Replies:
    0
    Views:
    622
    Thang Nguyen
    Aug 7, 2003
  4. IchBin
    Replies:
    1
    Views:
    781
  5. James Hutton

    English / Greek language site

    James Hutton, Jul 6, 2006, in forum: HTML
    Replies:
    6
    Views:
    518
    Andy Dingley
    Jul 8, 2006
Loading...

Share This Page