[ANN] OOoExtract v0.1

Discussion in 'Ruby' started by Daniel Carrera, Oct 28, 2003.

  1. Greetings,

    I'd like to announce the immediate availability of "OOoExtract" :

    http://www.math.umd.edu/~dcarrera/openoffice/tools/ooo_extract.html

    This is a command-line program, inspired by 'grep', to extract data from
    OpenOffice.org files according to certain regular expressions.

    This program is really cool and I'm very happy with it. It can make use of OOo's XML
    structure to make more intelligent and complex matches than a simple 'grep' could.

    OpenOffice.org has a concept of "styles". It has some pre-defined styles, and you
    can define your own. For example, if you have a list of poems, you can define a
    "Poem" style and a "PoemAuthor" style. You an then assign to them a particular
    appearance. This allows you to give your document a logical structure.

    OOoExtract can make use of this information to match not only text content, but also
    styles. For example:

    $ ruby ooo_extract.rb --style="PoemAuthor" poems.sxw
    Robert Frost
    Ernest Hemingway
    Robert Frost


    OOoExtract can also apply boolean operators to the search.

    $ ruby ooo_extract.rb --style="PoemAuthor" --text="R" file.sxw
    Robert Frost
    Robert Frost
    $
    $ ruby ooo_extract.rb --style="PoemAuthor" --or --text="R" file.sxw
    Robert Frost
    Ernest Hemingway
    Robert Frost
    Richard M. Stallman
    $
    $ ruby ooo_extract.rb --style="PoemAuthor" --xor --text="R" file.sxw
    Ernest Hemingway
    Richard M. Stallman
    $
    $ ruby ooo_extract.rb --style="PoemAuthor" --xor --text="R" \
    --ignore-case file.sxw
    Richard M. Stallman
    $


    This program should be considered beta. OpenOffice.org files are very complex and I
    have only tested it in very simple scenarios. I have not tested it on files with
    tables, or lists. I have not tested it on anything but word processor documents
    (Writer).

    Let me know what you think.

    Cheers,
    --
    Daniel Carrera | OpenPGP KeyID: 9AF77A88
    PhD grad student. |
    Mathematics Dept. | "To understand recursion, you must first
    UMD, College Park | understand recursion".
     
    Daniel Carrera, Oct 28, 2003
    #1
    1. Advertising

  2. Daniel Carrera

    Harry Ohlsen Guest

    Harry Ohlsen, Oct 29, 2003
    #2
    1. Advertising

  3. On Wed, Oct 29, 2003 at 09:01:36AM +0900, Harry Ohlsen wrote:

    > >I'd like to announce the immediate availability of "OOoExtract" :
    > >
    > >http://www.math.umd.edu/~dcarrera/openoffice/tools/ooo_extract.html

    >
    > What's all that embedded "binary" at the end of the script?


    It's a tar archive. The script is a self-extracting archive. I made it with
    Erik's Tar2RubyScript:

    http://www.erikveen.dds.nl/tar2rubyscript/

    This program takes a directory with a ruby program and any number of files and packs
    them all together into one single script. The idea being, that this makes it easier
    to distribute, because it is only one single, self-contained file.

    If you download the tar.gz file under the "Download Source" link, you extract it, and
    run tar2rubyscript.rb on it, you will get the "binary" file under the "Download
    Program" link.

    Cheers,
    --
    Daniel Carrera | OpenPGP KeyID: 9AF77A88
    PhD grad student. |
    Mathematics Dept. | "To understand recursion, you must first
    UMD, College Park | understand recursion".
     
    Daniel Carrera, Oct 29, 2003
    #3
  4. Daniel Carrera

    Harry Ohlsen Guest

    Daniel Carrera wrote:

    >>What's all that embedded "binary" at the end of the script?

    >
    >
    > It's a tar archive. The script is a self-extracting archive. I made it with
    > Erik's Tar2RubyScript:
    >
    > http://www.erikveen.dds.nl/tar2rubyscript/
    >
    > This program takes a directory with a ruby program and any number of files and packs
    > them all together into one single script. The idea being, that this makes it easier
    > to distribute, because it is only one single, self-contained file.


    Brilliant!

    Cheers,

    H.
     
    Harry Ohlsen, Oct 29, 2003
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mike Sampson [MSFT]

    [ANN]: NNTP Server slow downs.

    Mike Sampson [MSFT], Oct 7, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    425
    Mike Sampson [MSFT]
    Oct 7, 2003
  2. Mike Sampson [MSFT]

    [ANN]: NNTP Server slow downs.

    Mike Sampson [MSFT], Dec 6, 2003, in forum: ASP .Net
    Replies:
    0
    Views:
    511
    Mike Sampson [MSFT]
    Dec 6, 2003
  3. Richard Grimes [MVP]

    ANN: Free .NET Workshops

    Richard Grimes [MVP], Jul 4, 2005, in forum: ASP .Net
    Replies:
    0
    Views:
    514
    Richard Grimes [MVP]
    Jul 4, 2005
  4. Tom Hawkins

    [ANN] Confluence 0.7.1 Released

    Tom Hawkins, Oct 23, 2003, in forum: VHDL
    Replies:
    0
    Views:
    507
    Tom Hawkins
    Oct 23, 2003
  5. Michael Livsey
    Replies:
    3
    Views:
    447
    Michael Livsey
    May 27, 2004
Loading...

Share This Page