Is there a known algorithm for this?

G

Gerald Rosenberg

Have not been able to Google very well for an answer, since I haven't a
usable name for the algorithm/type of problem.

In sum, I need to determine the least common denominator for the spacing
of a one dimensional array of integers where the integers have a noise
component.

In practical terms, I have the Y-axis pixel locations of lines of text
on a page (which are approximations) and need to determine whether any
two adjacent text lines are single spaced, 1.5 spaced, or multiple
spaced.

Seems like there should be an analytic solution, but auto-correlation
doesn't seem right. Some kind of quantized best-fit?

Rather than continuing to guess, does anyone know the name of the
algorithm for solving this type of problem. Is there a Java package
that can solve this kind of problem? I have looked at Colt, but it does
not provide a direct solution.

Thanks,
Gerald
 
N

Niels Ull Harremoës

Gerald Rosenberg said:
Have not been able to Google very well for an answer, since I haven't a
usable name for the algorithm/type of problem.

In sum, I need to determine the least common denominator for the spacing
of a one dimensional array of integers where the integers have a noise
component.

In practical terms, I have the Y-axis pixel locations of lines of text
on a page (which are approximations) and need to determine whether any
two adjacent text lines are single spaced, 1.5 spaced, or multiple
spaced.

Seems like there should be an analytic solution, but auto-correlation
doesn't seem right. Some kind of quantized best-fit?

Try doing a one-dimensional fourier transformation look for the low
frequency components?
 
T

Thomas G. Marshall

Gerald Rosenberg coughed up:
Have not been able to Google very well for an answer, since I haven't
a usable name for the algorithm/type of problem.

In sum, I need to determine the least common denominator for the
spacing of a one dimensional array of integers where the integers
have a noise component.

In practical terms, I have the Y-axis pixel locations of lines of text
on a page (which are approximations) and need to determine whether any
two adjacent text lines are single spaced, 1.5 spaced, or multiple
spaced.

Seems like there should be an analytic solution, but auto-correlation
doesn't seem right. Some kind of quantized best-fit?

Rather than continuing to guess, does anyone know the name of the
algorithm for solving this type of problem. Is there a Java package
that can solve this kind of problem? I have looked at Colt, but it
does not provide a direct solution.

Thanks,
Gerald

You should try this post in comp.programming, if you want the algorithmic
help, sans java-specific experience. Many there are java guys, but many are
not, but they're there to help with algorithms.
 
P

Paul Lutus

Gerald said:
Have not been able to Google very well for an answer, since I haven't a
usable name for the algorithm/type of problem.

In sum, I need to determine the least common denominator for the spacing
of a one dimensional array of integers where the integers have a noise
component.

Could you state the problem more clearly? Do you need a single LCD for a set
of integers? Do you need to find the most frequently occurring values in a
set?
In practical terms, I have the Y-axis pixel locations of lines of text
on a page (which are approximations) and need to determine whether any
two adjacent text lines are single spaced, 1.5 spaced, or multiple
spaced.

Create a histogram of all the values and examine them yourself for patterns,
then decide on an appropriate strategy to achieve what you are trying to
accomplish, which you don't bother to say.

Another poster has recommended a fourier transform, but I think this is
overkill. A histogram approach will work for any case except many integers
with little in common with each other. I don't think this is what you face.
Seems like there should be an analytic solution, but auto-correlation
doesn't seem right. Some kind of quantized best-fit?

Whyt not state the problem to be solved before hypothesizing about a
solution?
Rather than continuing to guess, does anyone know the name of the
algorithm for solving this type of problem.

What type of problem is that? You have only discussed one aspect of the data
set, and you haven't stated a problem to be solved at all.
 
G

Gerald Rosenberg

Gerald Rosenberg coughed up:

You should try this post in comp.programming, if you want the algorithmic
help, sans java-specific experience. Many there are java guys, but many are
not, but they're there to help with algorithms.

Thanks, will repost there.
 
G

Gerald Rosenberg

Could you state the problem more clearly? Do you need a single LCD for a set
of integers? Do you need to find the most frequently occurring values in a
set?


Create a histogram of all the values and examine them yourself for patterns,

Interesting. Will look into that. Thanks.
then decide on an appropriate strategy to achieve what you are trying to
accomplish, which you don't bother to say.

Did "need to determine whether any two adjacent text lines are single
spaced, 1.5 spaced, or multiple spaced" not relate what I am trying to
accomplish?
Another poster has recommended a fourier transform, but I think this is
overkill. A histogram approach will work for any case except many integers
with little in common with each other. I don't think this is what you face.


Why not state the problem to be solved before hypothesizing about a
solution?

Sure: In practical terms, I have the Y-axis pixel locations of lines of
text on a page (which are approximations) and need to determine whether
any two adjacent text lines are single spaced, 1.5 spaced, or multiple
spaced.
What type of problem is that? You have only discussed one aspect of the data
set, and you haven't stated a problem to be solved at all.

OK. World peace through analysis of existing imaged document
collections. ;-) Documents are imaged, OCR'd, and PDF'd. The PDF is a
given. Now I need to figure out the document structure from an analysis
of the PDF command and data stream.

A big problem, much of it solved. Now I am just tackling a very
specific aspect where I "have the Y-axis pixel [baseline] locations of
lines of text on a page (which are approximations [I.e., contain a noise
component]) and need to determine whether any two adjacent text lines
are single spaced, 1.5 spaced, or multiple spaced."

No doubt in the relm of mathematics (at least I expect) people have
investigated this class of problem and have proposed generalized
algorithms to solve it. Could not guess the name or a functional
description well enough to find it by Google. Thought that the good
folk here at cljp, in their acknowledged wide ranging knowledge of all
things algorithmic, might know a name for this class of problem, or
provide a pointer to suitable algorithms.
 
P

Paul Lutus

Gerald Rosenberg wrote:

/ ...
Did "need to determine whether any two adjacent text lines are single
spaced, 1.5 spaced, or multiple spaced" not relate what I am trying to
accomplish?

No, that is a statement of a bit of data you need to solve the problem you
don't state.
Sure: In practical terms, I have the Y-axis pixel locations of lines of
text on a page (which are approximations) and need to determine whether
any two adjacent text lines are single spaced, 1.5 spaced, or multiple
spaced.

What problem is this a part of? What good thing are you slowly working
toward by categorizing these line spacings?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,581
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top