OCR of ancient periodicals. What about XML outputs?

Daniel · Jan 26, 2009

Hi All,

I'm working on OCR of ancient periodicals. The issue is this: I can't access
to layout data encoded in the OCR pdf files and use them regardless to their
original format. There is an appropriate XML standard, ALTO, which matches
each text character and its corresponding graphic zone. But I don't know how
to generate an ALTO output. Do you know a soft whith such output? Any clue
about this?

Thanks a lot.

Daniel
Paris

Andy Dingley · Jan 27, 2009

I'm working on OCR of ancient periodicals.

That's OK, you're making me dig through some pretty ancient memories!
AFAIR, ALTO was a layout-specific extension to the Metadata Encoding
and Transmission Standard (METS) work that came from the Library of
Congress (LoC) back in the last century. I only worked with METS, but
AFAIR there was a published XML Schema for ALTO and it was pretty
simple to generate - nothing weird about it.

Searching around METS & LoC ought to be useful.

Daniel · Jan 28, 2009

You're wright, nothing weird about this. But do you know an OCR soft with an
ALTO output currently available ?
Thanks

Daniel

Peter Flynn · Jan 29, 2009

Daniel said:
You're wright, nothing weird about this. But do you know an OCR soft with an
ALTO output currently available ?

I think Optopus from Makrolog GmbH (Wiesbaden?) might have done this,
but it was a long time ago.

///Peter

XSL Transformation - Dynamic Generation of XML Content	0	Jul 14, 2004
The devolution of English language and slothful c.l.p behaviors exposed!	50	Jan 24, 2012
In the Matter of Herb Schildt: a Detailed Analysis of "C: TheComplete Nonsense"	109	Apr 3, 2010
comp.lang.c FAQ list Table of Contents	0	Jan 12, 2008
comp.lang.c FAQ list Table of Contents	0	Jan 1, 2006
comp.lang.vhdl FAQ part 1 of 4: general	0	Jul 8, 2003
anybody help me	1	Feb 10, 2006
comp.lang.vhdl FAQ part 4 of 4: glossary	0	Jul 8, 2003

OCR of ancient periodicals. What about XML outputs?

Daniel

Andy Dingley

Daniel

Peter Flynn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads