OCR of ancient periodicals. What about XML outputs?

Discussion in 'XML' started by Daniel, Jan 26, 2009.

  1. Daniel

    Daniel Guest

    Hi All,

    I'm working on OCR of ancient periodicals. The issue is this: I can't access
    to layout data encoded in the OCR pdf files and use them regardless to their
    original format. There is an appropriate XML standard, ALTO, which matches
    each text character and its corresponding graphic zone. But I don't know how
    to generate an ALTO output. Do you know a soft whith such output? Any clue
    about this?

    Thanks a lot.

    Daniel
    Paris
     
    Daniel, Jan 26, 2009
    #1
    1. Advertising

  2. Daniel

    Andy Dingley Guest

    On 26 Jan, 18:49, "Daniel" <>
    wrote:

    > I'm working on OCR of ancient periodicals.


    That's OK, you're making me dig through some pretty ancient memories!
    AFAIR, ALTO was a layout-specific extension to the Metadata Encoding
    and Transmission Standard (METS) work that came from the Library of
    Congress (LoC) back in the last century. I only worked with METS, but
    AFAIR there was a published XML Schema for ALTO and it was pretty
    simple to generate - nothing weird about it.

    Searching around METS & LoC ought to be useful.
     
    Andy Dingley, Jan 27, 2009
    #2
    1. Advertising

  3. Daniel

    Daniel Guest

    You're wright, nothing weird about this. But do you know an OCR soft with an
    ALTO output currently available ?
    Thanks

    Daniel


    "Andy Dingley" <> a écrit dans le message de news:
    ...
    > On 26 Jan, 18:49, "Daniel" <>
    > wrote:
    >
    >> I'm working on OCR of ancient periodicals.

    >
    > That's OK, you're making me dig through some pretty ancient memories!
    > AFAIR, ALTO was a layout-specific extension to the Metadata Encoding
    > and Transmission Standard (METS) work that came from the Library of
    > Congress (LoC) back in the last century. I only worked with METS, but
    > AFAIR there was a published XML Schema for ALTO and it was pretty
    > simple to generate - nothing weird about it.
    >
    > Searching around METS & LoC ought to be useful.
    >
     
    Daniel, Jan 28, 2009
    #3
  4. Daniel

    Peter Flynn Guest

    Daniel wrote:
    > You're wright, nothing weird about this. But do you know an OCR soft with an
    > ALTO output currently available ?


    I think Optopus from Makrolog GmbH (Wiesbaden?) might have done this,
    but it was a long time ago.

    ///Peter

    > "Andy Dingley" <> a écrit dans le message de news:
    > ...
    >> On 26 Jan, 18:49, "Daniel" <>
    >> wrote:
    >>
    >>> I'm working on OCR of ancient periodicals.

    >> That's OK, you're making me dig through some pretty ancient memories!
    >> AFAIR, ALTO was a layout-specific extension to the Metadata Encoding
    >> and Transmission Standard (METS) work that came from the Library of
    >> Congress (LoC) back in the last century. I only worked with METS, but
    >> AFAIR there was a published XML Schema for ALTO and it was pretty
    >> simple to generate - nothing weird about it.
    >>
    >> Searching around METS & LoC ought to be useful.
    >>

    >
    >
     
    Peter Flynn, Jan 29, 2009
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Anjali Lourda
    Replies:
    0
    Views:
    477
    Anjali Lourda
    Feb 4, 2004
  2. Digital Puer

    Dr. Dobbs and other periodicals?

    Digital Puer, Jul 30, 2003, in forum: Java
    Replies:
    5
    Views:
    373
    osmium
    Sep 12, 2003
  3. Digital Puer

    Dr. Dobbs and other periodicals?

    Digital Puer, Jul 30, 2003, in forum: C++
    Replies:
    5
    Views:
    320
    osmium
    Sep 12, 2003
  4. Andrew
    Replies:
    25
    Views:
    971
    BlueC
    Jul 17, 2006
  5. happytoday

    compile BCC 4.52 ancient program

    happytoday, Jun 25, 2008, in forum: C Programming
    Replies:
    1
    Views:
    293
    Raymond Martineau
    Jun 29, 2008
Loading...

Share This Page