searching a structured text data base

Discussion in 'Perl Misc' started by Michael Friendly, Apr 9, 2004.

  1. I have a LaTeX document composed of historical items with structured
    fields (on the history of data visualization,
    http://www.math.yorku.ca/SCS/Gallery/milestone/)

    I'd like to create a web-based facility to provide searching of these
    items. As a first step, I've written a perl script to translate the
    LaTeX stuff into various formats: tagged, CSV, HTML, XML. But I don't
    know how to choose a data format and appropriate software tools to
    accomplish this most easily.

    There's a bewildering array of perl modules for databases, XML, etc.
    but I'm not sure what would be most useful in this context. Can anyone
    help point me in useful directions? I'm doing this on a debian linux
    system, and a solution involving software other than perl is possible.

    For example, the tagged format looks like this:

    KEY: Ptolemy150
    YEAR: c. 150
    WHAT: Map projections of a spherical earth and use of latitude and
    longitude to characterize position (first display of longitude)
    WHO: Claudius Ptolemy
    WHERE: Alexandria, Egypt
    TXT: http://portico.bl.uk/exhibitions/maps/ptolemy.html::Ptolemy's
    world map, description and high-res image::
    TXT:
    http://www-groups.dcs.st-and.ac.uk/~history/Mathematicians/Ptolemy.html::Ptolemy
    history::
    PIC: /SCS/Gallery/images/portraits/ptolemy.gif::ptolemy, portrait
    from ca. 1400 (90 x 109; 9K)::
    FIG: /SCS/Gallery/images/ptolemy-map.jpg::ptolemy's world map,
    republished in 1482 (640 x 496; 40K)::
    ADD: 11/22/00

    and the XML format like this (I have a basic DTD):

    <hdbitem key="Ptolemy150" added="11/22/00">
    <keywords>latitude,longitude,projection,map!projection</keywords>
    <description>Map projections of a spherical earth and use of latitude
    and longitude to characterize position (first display of longitude)
    </description>
    <authors>
    <who first="Claudius" last="Ptolemy" lived="c. 85--c. 165">Claudius
    Ptolemy</who>
    </authors>
    <date from="c. 150" to="c. 150">c. 150</date>
    <where>Alexandria, Egypt</where>
    <commentary url="http://portico.bl.uk/exhibitions/maps/ptolemy.html"
    text="Ptolemy's world map, description and high-res image" />
    <commentary
    url="http://www-groups.dcs.st-and.ac.uk/~history/Mathematicians/Ptolemy.html"
    text="Ptolemy history" />
    <figure type="portrait"
    url="/SCS/Gallery/images/portraits/ptolemy.gif" height="109" width="90"
    size="9K">
    <caption>Ptolemy, portrait from ca. 1400</caption>
    </figure>
    <figure type="figure" url="/SCS/Gallery/images/ptolemy-map.jpg"
    height="496" width="640" size="40K">
    <caption>Ptolemy's world map, republished in 1482</caption>
    </figure>
    </hdbitem>



    --
    Michael Friendly Email:
    Professor, Psychology Dept.
    York University Voice: 416 736-5115 x66249 Fax: 416 736-5814
    4700 Keele Street http://www.math.yorku.ca/SCS/friendly.html
    Toronto, ONT M3J 1P3 CANADA
    Michael Friendly, Apr 9, 2004
    #1
    1. Advertising

  2. On Fri, 09 Apr 2004 10:50:02 -0400, Michael Friendly wrote:

    > I have a LaTeX document composed of historical items with structured
    > fields (on the history of data visualization,
    > http://www.math.yorku.ca/SCS/Gallery/milestone/)
    >
    > I'd like to create a web-based facility to provide searching of these
    > items. As a first step, I've written a perl script to translate the
    > LaTeX stuff into various formats: tagged, CSV, HTML, XML. But I don't
    > know how to choose a data format and appropriate software tools to
    > accomplish this most easily.
    >
    > There's a bewildering array of perl modules for databases, XML, etc. but
    > I'm not sure what would be most useful in this context. Can anyone help
    > point me in useful directions? I'm doing this on a debian linux system,
    > and a solution involving software other than perl is possible.


    [ ... ]

    You might want to think about parsing the documents and putting them into
    a database instead of using the XML files *as* a database.

    XML::Simple might work for you. However, I don't use XML that much (right
    now, at least :) ).

    You could try looking over the various XML modules and see which might fit
    the bill for you.

    http://search.cpan.org/

    HTH

    --
    Jim

    Copyright notice: all code written by the author in this post is
    released under the GPL. http://www.gnu.org/licenses/gpl.txt
    for more information.

    a fortune quote ...
    "It's a summons." "What's a summons?" "It means summon's in
    trouble." -- Rocky and Bullwinkle
    James Willmore, Apr 14, 2004
    #2
    1. Advertising

  3. Michael Friendly

    Guest

    Michael Friendly <> wrote:
    > I have a LaTeX document composed of historical items with structured
    > fields (on the history of data visualization,
    > http://www.math.yorku.ca/SCS/Gallery/milestone/)
    >
    > I'd like to create a web-based facility to provide searching of these
    > items. As a first step, I've written a perl script to translate the
    > LaTeX stuff into various formats: tagged, CSV, HTML, XML. But I don't
    > know how to choose a data format and appropriate software tools to
    > accomplish this most easily.


    I think that you are starting at the wrong end. What do you want the
    interface to look like and do, and what tools if any do you have in mind
    for making the interface? How many hits per minute do you want to support?
    I would make those decisions first, and then look into the data storage
    format secondarily.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Apr 15, 2004
    #3
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. R. P.
    Replies:
    3
    Views:
    8,242
    Joe Kesselman
    Jun 22, 2006
  2. Hoang
    Replies:
    5
    Views:
    660
    Dean Goodmanson
    Nov 17, 2003
  3. Dinu Gherman
    Replies:
    1
    Views:
    420
    David Boddie
    Nov 13, 2003
  4. Petr Jakes

    how to parse structured text file?

    Petr Jakes, Feb 1, 2006, in forum: Python
    Replies:
    1
    Views:
    317
    Paul McGuire
    Feb 1, 2006
  5. Replies:
    0
    Views:
    391
Loading...

Share This Page