D
Daniel Carrera
Greetings,
I'd like to announce the immediate availability of "OOoExtract" :
http://www.math.umd.edu/~dcarrera/openoffice/tools/ooo_extract.html
This is a command-line program, inspired by 'grep', to extract data from
OpenOffice.org files according to certain regular expressions.
This program is really cool and I'm very happy with it. It can make use of OOo's XML
structure to make more intelligent and complex matches than a simple 'grep' could.
OpenOffice.org has a concept of "styles". It has some pre-defined styles, and you
can define your own. For example, if you have a list of poems, you can define a
"Poem" style and a "PoemAuthor" style. You an then assign to them a particular
appearance. This allows you to give your document a logical structure.
OOoExtract can make use of this information to match not only text content, but also
styles. For example:
$ ruby ooo_extract.rb --style="PoemAuthor" poems.sxw
Robert Frost
Ernest Hemingway
Robert Frost
OOoExtract can also apply boolean operators to the search.
$ ruby ooo_extract.rb --style="PoemAuthor" --text="R" file.sxw
Robert Frost
Robert Frost
$
$ ruby ooo_extract.rb --style="PoemAuthor" --or --text="R" file.sxw
Robert Frost
Ernest Hemingway
Robert Frost
Richard M. Stallman
$
$ ruby ooo_extract.rb --style="PoemAuthor" --xor --text="R" file.sxw
Ernest Hemingway
Richard M. Stallman
$
$ ruby ooo_extract.rb --style="PoemAuthor" --xor --text="R" \
--ignore-case file.sxw
Richard M. Stallman
$
This program should be considered beta. OpenOffice.org files are very complex and I
have only tested it in very simple scenarios. I have not tested it on files with
tables, or lists. I have not tested it on anything but word processor documents
(Writer).
Let me know what you think.
Cheers,
I'd like to announce the immediate availability of "OOoExtract" :
http://www.math.umd.edu/~dcarrera/openoffice/tools/ooo_extract.html
This is a command-line program, inspired by 'grep', to extract data from
OpenOffice.org files according to certain regular expressions.
This program is really cool and I'm very happy with it. It can make use of OOo's XML
structure to make more intelligent and complex matches than a simple 'grep' could.
OpenOffice.org has a concept of "styles". It has some pre-defined styles, and you
can define your own. For example, if you have a list of poems, you can define a
"Poem" style and a "PoemAuthor" style. You an then assign to them a particular
appearance. This allows you to give your document a logical structure.
OOoExtract can make use of this information to match not only text content, but also
styles. For example:
$ ruby ooo_extract.rb --style="PoemAuthor" poems.sxw
Robert Frost
Ernest Hemingway
Robert Frost
OOoExtract can also apply boolean operators to the search.
$ ruby ooo_extract.rb --style="PoemAuthor" --text="R" file.sxw
Robert Frost
Robert Frost
$
$ ruby ooo_extract.rb --style="PoemAuthor" --or --text="R" file.sxw
Robert Frost
Ernest Hemingway
Robert Frost
Richard M. Stallman
$
$ ruby ooo_extract.rb --style="PoemAuthor" --xor --text="R" file.sxw
Ernest Hemingway
Richard M. Stallman
$
$ ruby ooo_extract.rb --style="PoemAuthor" --xor --text="R" \
--ignore-case file.sxw
Richard M. Stallman
$
This program should be considered beta. OpenOffice.org files are very complex and I
have only tested it in very simple scenarios. I have not tested it on files with
tables, or lists. I have not tested it on anything but word processor documents
(Writer).
Let me know what you think.
Cheers,