Determining the font characteristics of any web page?

J

Jonathan N. Little

Peter said:
If it can't be done then browsers couldn't do it, browsers do it therefore it
can be done.

AH! But the browser is running on the client's machine and your I am
assuming your code will be running on a remote server. You cannot get
from the browser what its current settings are except the very limited
info allowed with JavaScript, (except maybe with ol' IE via some nasty
ActiveX). But it really is not clear what you are trying to do here.
 
P

Peter Olcott

Jonathan N. Little said:
AH! But the browser is running on the client's machine and your I am assuming
your code will be running on a remote server.

Wrong assumption. This will be a software product running on the client machine.
 
P

Philip

If it would be easier to do, I could write some code that runs in the context
of
the browser and then pass the results to the code that needs it that does not
run within the context of the browser.

I see two possibilities. First, you might be able to get this
information via Javascript. This page might point you in the right
direction:
http://www.quirksmode.org/dom/getstyles.html

You might also want to consider creating a document and using Javascript
to measure the size of the text in it.

The second possibility is to write a browser plugin which means it is
time for you to start hunting through the Plugin Writer's Guides for the
browsers you intend to support. It is up to the individual browser
whether or not they expose the kind of information you're looking for.

Neither one of these methods is very portable from one browser to
another. The first one might be somewhat portable, but the second is not
at all.

HTH
 
P

Peter Olcott

Philip said:
I see two possibilities. First, you might be able to get this
information via Javascript. This page might point you in the right
direction:
http://www.quirksmode.org/dom/getstyles.html

You might also want to consider creating a document and using Javascript
to measure the size of the text in it.

The second possibility is to write a browser plugin which means it is
time for you to start hunting through the Plugin Writer's Guides for the
browsers you intend to support. It is up to the individual browser
whether or not they expose the kind of information you're looking for.

Neither one of these methods is very portable from one browser to
another. The first one might be somewhat portable, but the second is not
at all.

I need to know this on a web-page by web-page basis. Since each web-page is free
to determine how it will display itself, I need a very generic method. I guess
that I would need to start with an exhaustively comprehensive list of every
possible way that any possible web-page could specify any of this information.
 
J

Jonathan N. Little

Peter said:
Wrong assumption. This will be a software product running on the client machine.

Okay, well it is not clear what you are trying to do.

1) Discover all the settings of the users browser?
2) Discover all fonts a user has on his system?
3) Discover all specified fonts on a specific webpage?
4) Discover all specified fonts on all webpages the user is viewing?

#3 is easy, view source HTML and stylesheet.

#1,2,4 are kind of creepy, #4 especially. Wouldn't sell me on your product.
 
P

Philip

I need to know this on a web-page by web-page basis. Since each web-page is
free
to determine how it will display itself, I need a very generic method. I
guess
that I would need to start with an exhaustively comprehensive list of every
possible way that any possible web-page could specify any of this
information.

Both Javascript and a plugin could work on a page-by-page basis.

You last sentence makes it sound like you intend to parse each page's
HTML and CSS to determine what font(s) it is using. Is that the case?
Because you'll end up reimplementing (quirks and all) a hefty chunk of
the browser's parsing engine if that's the case.

If you are willing to be less mysterious about the larger problem you're
trying to solve, we might be able to provide better advice. As it is,
we're left with a lot to guess at.
 
P

Peter Olcott

Jonathan N. Little said:
Okay, well it is not clear what you are trying to do.

1) Discover all the settings of the users browser?
2) Discover all fonts a user has on his system?
3) Discover all specified fonts on a specific webpage?
4) Discover all specified fonts on all webpages the user is viewing?

#3 is easy, view source HTML and stylesheet.

#1,2,4 are kind of creepy, #4 especially. Wouldn't sell me on your product.

I will only need this for the specific web pages that the user invokes my
program on, yet these can be any arbitrary web-page. I must have the exact
details of the FontInstance. If the HTML says small, I need to know the precise
point size. This must work for any web presentation language, not just HTML.
 
P

Peter Olcott

Philip said:
Both Javascript and a plugin could work on a page-by-page basis.

You last sentence makes it sound like you intend to parse each page's
HTML and CSS to determine what font(s) it is using. Is that the case?
Because you'll end up reimplementing (quirks and all) a hefty chunk of
the browser's parsing engine if that's the case.

What I am hoping to be able to do is to hook into the parsing engine and
directly retrieve the results of the parse.
If you are willing to be less mysterious about the larger problem you're
trying to solve, we might be able to provide better advice. As it is,
we're left with a lot to guess at.

New product development requires secrecy because first to market could provide
the difference between success and failure.
 
N

Neredbojias

Then browsers couldn't do it, and since they do it, therefore its not
impossible.

Actually, they don't; they _set_ the font-size for their application,
determining, perhaps, the OS default font size as necessary. If doing this
in points is really accurate (which I doubt), said browser must also find
out the current monitor size and resolution.
 
P

Peter Olcott

Neredbojias said:
Actually, they don't; they _set_ the font-size for their application,
determining, perhaps, the OS default font size as necessary. If doing this
in points is really accurate (which I doubt), said browser must also find
out the current monitor size and resolution.

I need to know whatever the final result is that these browsers use to display
the text on the browser window. It does not matter if it is in point size, or
tmHeight, as long as it is 100% precise.
 
J

Jim Higson

What I am hoping to be able to do is to hook into the parsing engine and
directly retrieve the results of the parse.

You can do this to the degree that the renderers provide hooks for you to
use. I can't imagine why they would provide these hooks, and probably most
renderers don't.

To what degree do you control the environment this is run on? If you can
deploy an open source browser you could add your own hooks to the engine to
get whatever you want.

There might be a js hack where you use the DOM to get the current style.
However, the sizes in the style might be specified in percentages, in which
case things become more difficult.
New product development requires secrecy because first to market could
provide the difference between success and failure.

Perhaps true. The product being impossible is probably a bigger difference
though :)
 
J

Jim Higson

Peter said:
I need to know whatever the final result is that these browsers use to
display the text on the browser window. It does not matter if it is in
point size, or tmHeight, as long as it is 100% precise.

Perhaps you could do it using an open source renderer like Gecko by
stripping out all the code you don't want (perhaps millions of lines) so
you basically just have a parsing and layout engine that doesn't display
anything. A very daunting task.

No idea what you could do with Trident (the IE engine) though. If you want
to copy the (weird) layout behaviour exactly the only thing I can think of
is clean room reverse engineering. Won't ever be exact though and really,
unless you have a huge team of experts (In which case, why would you post
here?) this seems not worth bothering with.

Another idea: how about you display the text, capture it and OCR it with
some kind of font recognition? Would that be OK? Seems hackish but might
work.
 
P

Peter Olcott

Jim Higson said:
Perhaps you could do it using an open source renderer like Gecko by
stripping out all the code you don't want (perhaps millions of lines) so
you basically just have a parsing and layout engine that doesn't display
anything. A very daunting task.

No idea what you could do with Trident (the IE engine) though. If you want
to copy the (weird) layout behaviour exactly the only thing I can think of
is clean room reverse engineering. Won't ever be exact though and really,
unless you have a huge team of experts (In which case, why would you post
here?) this seems not worth bothering with.

Another idea: how about you display the text, capture it and OCR it with
some kind of font recognition? Would that be OK? Seems hackish but might
work.

Maybe I could just hook all of the lower level windows font functions.
 
J

Jonathan N. Little

Peter said:
I will only need this for the specific web pages that the user invokes my
program on, yet these can be any arbitrary web-page. I must have the exact
details of the FontInstance. If the HTML says small, I need to know the precise
point size. This must work for any web presentation language, not just HTML.

Well that is where your problem lies, font size it a complex relation of
what is defined on the website and what is set in the user settings of
the browser and what is the setting of the browser default AND what is
available on the user's system. Another thing is are your looking for
the size as displayed on the screen or printed on the paper?
 
P

Peter Olcott

Jonathan N. Little said:
Well that is where your problem lies, font size it a complex relation of what
is defined on the website and what is set in the user settings of the browser
and what is the setting of the browser default AND what is available on the
user's system. Another thing is are your looking for the size as displayed on
the screen or printed on the paper?

I don't care about the paper sized font, only the screen sized font.
 
J

Jim Higson

Peter said:
Maybe I could just hook all of the lower level windows font functions.

Perhaps, if you can replace or wrap system libraries (DLLs on Windows, ELF
binaries on Linux etc). I'm not sure you you'd tell if the font functions
are being drawn by the HTML renderer or to display the browser menus etc.

Really, though I'd call a non-OS level program that replaces important
system libraries a gross mistake. It'd make the system less stable,
possibly causing hard lock-ups, especially on certain versions of Windows
(XP included) where the graphics subsystem is in kernel.
 
J

Jonathan N. Little

Peter said:
I don't care about the paper sized font, only the screen sized font.

Well good luck, you'll need it. First of all points are for printing and
don't mean doodle for screen display. Second how large a font displays
varies for all the reasons and more listed in depth in this thread. Open
the same page on different browsers on different systems, different
OS's, different hardware, different user settings and the text is going
vary.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,770
Messages
2,569,584
Members
45,078
Latest member
MakersCBDBlood

Latest Threads

Top