I
ishamid
[novice]
Hi,
Paul Lutus suggested that I give more detail about my problem. Ok,
here it is:
BACKGROUND:
Save a small document in OOo format, like a bibliographic entry with,
say, the article title in bold and the journal in italics
- under options - save -> disable xml size optimization
- save the file
- copy the file to a subdirectory
- run "unzip filename"
content.xml has the data we want to convert to TeX. A sample
content.xml is given at the end of this message, after the script.
RUBY: I have a script provided by a colleague that does a lot of the
work needed to convert this to a sane ConTeXt file. I am trying to
teach myself enough ruby to edit this script as needed for academic
articles (I edit an academic journal in TeX). The script is reproduced
at the end of this message.
PROBLEMS: Yesterday I did learn about regexp and made progress, though
the script is still buggy:
i) In the script (l. 110--112) I have
===========
str.gsub!(/"(.*?)"/) do
'\quotation {' + $1 + '}'
end
===========
but line 114 of content.xml the " pair is not converted, though
it is converted elsewhere.
ii) (really weird) In the script (l. 45--47) I have
============
@data.gsub!(/\[<(text:sequence
text:ref-name="refAutoNr0").*?>.*?<\/text:sequence>/mois) do
'\startitemize' + '\head'
end
============
This apparently works fine. Now I want some linespace between
'\startitemize' & '\head', so I put a "\n\n" in between them. This
causes the xml tags to appear in the output file like this
============
<text
text:style-name="ID">\startitemize
\head</text
>
============
iii) any tips for improving this script are appreciated. I'm sure I'll
have more questions over the next couple of days as I work on this.
Thank you all in advance for any help or pointers for this novice
Best
Idris
================idris.rb==============
class OpenOffice
# using an xml parser if overkill and we need to regexp anyway
attr_reader :display, :inline, :translate
attr_writer :display, :inline, :translate
def initialize
@data = nil
@file = ''
@display = Hash.new
@inline = Hash.new
@translate = Hash.new
end
def load(filename)
if not filename.empty? and FileTest.file?(filename) then
begin
@data, @file = IO.read(filename), filename
rescue
@data, @file = nil, ''
end
else
@data, @file = nil, ''
end
end
def save(filename='')
if filename.empty? then
filename = "clean-#{@file}.tex"
end
if f = open(filename,'w') then
f.puts(@data)
f.close
end
end
def convert
@translations = Hash.new
@translate.each do |k,v|
@translations[/#{k}/] = v
end
if @data then
@data.gsub!(/\[<(text:sequence
text:ref-name="refAutoNr0").*?>.*?<\/text:sequence>/mois) do
'\startitemize' + "\n\n" + '\head' # + "\n\n"
end
@data.gsub!(/\[<\/(text:span)><(text:sequence
text:ref-name="refAutoNr[^0].*?").*?>.*?<\/text:sequence>/mois) do
'\head'
end
@data.gsub!(/\[<(text:sequence
text:ref-name="refAutoNr[^0].*?").*?>.*?<\/text:sequence>/mois) do
'\head'
end
@data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
'\enableregime[utf]' + "\n" + '\useencoding[cyr]' + "\n\n" +
'\definetypeface [russian]'\
+ "\n" + ' ' + '[rm] [serif] [computer-modern] [default]
[encoding=t2a]' + "\n\n" +\
'\starttext'+ "\n\n" + '\switchtobodyfont[russian]' + "\n" + $2 +
"\n" + \
'\stopitemize' + "\n\n" + '\stoptext'
end
@data.gsub!(/<(office:font-face-decls|office:automatic-styles|text:sequence-decls).*?>.*?<\/\1>/mois)
do
# remove
end
# @data.gsub!(/<(text:span
text:style-name="T10")>(.*?)<\/text:span>/mois) do
# '{' + '\bf ' + $2 + '}'
# end
# @data.gsub!(/<(text:span
text:style-name="Style2")>(.*?)<\/text:span>/mois) do
# '{' + '\it ' + $2 + '}'
# end
@data.gsub!(/<text:span.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text:span>/)
do
tag, text = $2, $3
if inline[tag] then
(inline[tag][0]||'') + clean_display(text) +
(inline[tag][1]||'')
else
clean_display(text)
end
end
@data.gsub!(/<text:span.*?text:style-name=(".*?")>/) do
# remove
end
@data.gsub!(/<\?.*?\?>/) do
# remove
end
@data.gsub!(/<!--.*?-->/) do
# remove
end
@data.gsub!(/<text
[^>]*?\/>/) do
# remove
end
@data.gsub!(/<text
.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text
>/)
do
tag, text = $2, $3
if display[tag] then
"\n" + (display[tag][0]||'') + clean_inline(text)
+ (display[tag][1]||'') + "\n"
else
"\n" + clean_inline(text) + "\n"
end
end
@data.gsub!(/<text:s[^>]*?\/>/) do
# remove
end
@data.gsub!(/<text:bookmark[^>]*?\/>/) do
# remove
end
@data.gsub!(/\t/,' ')
@data.gsub!(/^ +$/,'')
@data.gsub!(/\n\n+/moi,"\n\n")
end
end
def clean_display(str)
str.gsub!(/"(.*?)"/) do
'\quotation {' + $1 + '}'
end
str.gsub!(/&/) do
'\&'
end
str
end
def clean_inline(str)
@translations.each do |k,v|
str.gsub!(k,v)
end
str
end
end
def convert(filename)
doc = OpenOffice.new
doc.display['P1'] = ['\chapter{','}']
doc.display['P2'] = ['\start'+"\n","\n"+'\stop']
doc.display['P3'] = doc.display['P2']
# doc.display['ID'] = ['\relax']
doc.inline['T1'] = ['','']
doc.inline['T2'] = ['','']
doc.inline['T3'] = ['{\bf ','}']
doc.inline['T6'] = ['{\bf ','}']
doc.inline['T8'] = ['{\bf ','}']
doc.inline['T10'] = ['{\bf ','}']
doc.inline['T11'] = ['{\bf ','}']
doc.inline['Style2'] = ['{\it ','}']
# doc.translate['¬'] = 'XX'
doc.translate['''] = '`'
doc.translate['&'] = '\&'
doc.load(filename)
doc.convert
doc.save
end
filename = ARGV[0]
filename = 'content.xml' if not filename or filename.empty?
convert('content.xml')
===========content.xml============
<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns
ffice="urn
asis:names:tc
pendocument:xmlns
ffice:1.0"
xmlns:style="urn
asis:names:tc
pendocument:xmlns:style:1.0"
xmlns:text="urn
asis:names:tc
pendocument:xmlns:text:1.0"
xmlns:table="urn
asis:names:tc
pendocument:xmlns:table:1.0"
xmlns:draw="urn
asis:names:tc
pendocument:xmlns:drawing:1.0"
xmlns:fo="urn
asis:names:tc
pendocument:xmlns:xsl-fo-compatible:1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:meta="urn
asis:names:tc
pendocument:xmlns:meta:1.0"
xmlns:number="urn
asis:names:tc
pendocument:xmlns:datastyle:1.0"
xmlns:svg="urn
asis:names:tc
pendocument:xmlns:svg-compatible:1.0"
xmlns:chart="urn
asis:names:tc
pendocument:xmlns:chart:1.0"
xmlns:dr3d="urn
asis:names:tc
pendocument:xmlns:dr3d:1.0"
xmlns:math="http://www.w3.org/1998/Math/MathML"
xmlns:form="urn
asis:names:tc
pendocument:xmlns:form:1.0"
xmlns:script="urn
asis:names:tc
pendocument:xmlns:script:1.0"
xmlns
oo="http://openoffice.org/2004/office"
xmlns
oow="http://openoffice.org/2004/writer"
xmlns
ooc="http://openoffice.org/2004/calc"
xmlns:dom="http://www.w3.org/2001/xml-events"
xmlns:xforms="http://www.w3.org/2002/xforms"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
office:version="1.0">
<office:scripts/>
<office:font-face-decls>
<style:font-face style:name="Wingdings" svg:font-family="Wingdings"
style:font-pitch="variable" style:font-charset="x-symbol"/>
<style:font-face style:name="Symbol" svg:font-family="Symbol"
style:font-family-generic="roman" style:font-pitch="variable"
style:font-charset="x-symbol"/>
<style:font-face style:name="Tahoma2" svg:font-family="Tahoma"/>
<style:font-face style:name="Arial Unicode MS"
svg:font-family="'Arial Unicode MS'"
style:font-pitch="variable"/>
<style:font-face style:name="MS Mincho" svg:font-family="'MS
Mincho'" style:font-pitch="variable"/>
<style:font-face style:name="Tahoma1" svg:font-family="Tahoma"
style:font-pitch="variable"/>
<style:font-face style:name="Garamond" svg:font-family="Garamond"
style:font-family-generic="roman" style:font-pitch="variable"/>
<style:font-face style:name="Times New Roman"
svg:font-family="'Times New Roman'"
style:font-family-generic="roman" style:font-pitch="variable"/>
<style:font-face style:name="Arial" svg:font-family="Arial"
style:font-family-generic="swiss" style:font-pitch="variable"/>
<style:font-face style:name="Tahoma" svg:font-family="Tahoma"
style:font-family-generic="swiss" style:font-pitch="variable"/>
</office:font-face-decls>
<office:automatic-styles>
<style:style style:name="P1" style:family="paragraph"
style
arent-style-name="Standard"
style:master-page-name="First_20_Page">
<style
aragraph-properties fo:text-align="center"
style:justify-single-word="false"/>
</style:style>
<style:style style:name="P2" style:family="paragraph"
style
arent-style-name="Standard">
<style
aragraph-properties fo:text-align="center"
style:justify-single-word="false"/>
<style:text-properties fo:font-size="14pt" fo:font-weight="bold"
style:font-size-asian="14pt" style:font-weight-asian="bold"
style:font-size-complex="14pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="P3" style:family="paragraph"
style
arent-style-name="Standard">
<style
aragraph-properties fo:text-align="center"
style:justify-single-word="false"/>
<style:text-properties fo:font-size="18pt" fo:font-weight="bold"
style:font-size-asian="18pt" style:font-weight-asian="bold"
style:font-size-complex="18pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="P4" style:family="paragraph"
style
arent-style-name="Standard"
style:master-page-name="Convert_20_1"/>
<style:style style:name="P5" style:family="paragraph"
style
arent-style-name="Standard"
style:master-page-name="Convert_20_2"/>
<style:style style:name="P6" style:family="paragraph"
style
arent-style-name="Standard">
<style:text-properties style:font-name-asian="Wingdings"
style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="P7" style:family="paragraph"
style
arent-style-name="reference">
<style:text-properties style:font-name-asian="Wingdings"/>
</style:style>
<style:style style:name="P8" style:family="paragraph"
style
arent-style-name="Standard">
<style:text-properties fo:language="fr" fo:country="FR"
style:font-name-asian="Wingdings" style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="P9" style:family="paragraph"
style
arent-style-name="Standard">
<style:text-properties style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="P10" style:family="paragraph"
style
arent-style-name="reference">
<style:text-properties fo:font-size="11pt"
style:font-size-asian="11pt" style:font-size-complex="9pt"/>
</style:style>
<style:style style:name="P11" style:family="paragraph"
style
arent-style-name="reference2">
<style:text-properties fo:font-size="11pt"
style:font-size-asian="11pt" style:font-size-complex="9pt"/>
</style:style>
<style:style style:name="T1" style:family="text">
<style:text-properties fo:font-size="21pt" fo:font-weight="bold"
style:font-size-asian="21pt" style:font-weight-asian="bold"
style:font-size-complex="21pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T2" style:family="text">
<style:text-properties fo:font-size="21pt" fo:font-weight="bold"
style:font-size-asian="21pt" style:font-weight-asian="bold"
style:font-size-complex="22pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T3" style:family="text">
<style:text-properties fo:font-weight="bold"
style:font-name-asian="Wingdings" style:font-weight-asian="bold"
style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T4" style:family="text">
<style:text-properties style:font-name-asian="Wingdings"/>
</style:style>
<style:style style:name="T5" style:family="text">
<style:text-properties fo:language="fr" fo:country="FR"/>
</style:style>
<style:style style:name="T6" style:family="text">
<style:text-properties fo:language="fr" fo:country="FR"
fo:font-weight="bold" style:font-name-asian="Wingdings"
style:font-weight-asian="bold" style:font-size-complex="10pt"
style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T7" style:family="text">
<style:text-properties fo:language="fr" fo:country="FR"
style:font-name-asian="Wingdings" style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="T8" style:family="text">
<style:text-properties fo:font-weight="bold"
style:font-name-asian="Wingdings" style:font-weight-asian="bold"
style:font-size-complex="10pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T9" style:family="text">
<style:text-properties style:font-name-asian="Wingdings"
style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="T10" style:family="text">
<style:text-properties fo:font-weight="bold"
style:font-weight-asian="bold" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T11" style:family="text">
<style:text-properties fo:font-weight="bold"
style:font-weight-asian="bold" style:font-size-complex="10pt"
style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T12" style:family="text">
<style:text-properties style:font-size-complex="10pt"/>
</style:style>
</office:automatic-styles>
<office:body>
<office:text>
<text:sequence-decls>
<text:sequence-decl text:display-outline-level="0"
text:name="Illustration"/>
<text:sequence-decl text:display-outline-level="0"
text:name="Table"/>
<text:sequence-decl text:display-outline-level="0"
text:name="Text"/>
<text:sequence-decl text:display-outline-level="0"
text:name="Drawing"/>
<text:sequence-decl text:display-outline-level="0"
text:name="AutoNr"/>
</text:sequence-decls>
<text
text:style-name="P1"><text:span
text:style-name="T1">Isma</text:span><text:span
text:style-name="T2">'</text:span><text:span
text:style-name="T1">ilis: A Bibliography</text:span></text
>
<text
text:style-name="P2"/>
<text
text:style-name="P2"/>
<text
text:style-name="P2"/>
<text
text:style-name="P3">Compiled by:</text
>
<text
text:style-name="P3">Ramin Khanbagi</text
>
<text
text:style-name="P4"/>
<text
text:style-name="Standard"/>
<text
text:style-name="Standard"/>
<text
text:style-name="P5"/>
<text
text:style-name="Standard"/>
<text
text:style-name="Standard"/>
<text
text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr0" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">1</text:sequence></text
>
<text
text:style-name="P6">'Abd al-Râziq, Ahmad</text
>
<text
text:style-name="reference"><text:span
text:style-name="T3">Die al-Azhar-Moschee</text:span><text:span
text:style-name="T4">., in, </text:span><text:span
text:style-name="T3">"Schätze der Kalifen: Islamische Kunst zur
Fatimidenzeit."</text:span><text:span text:style-name="T4">,
Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien;
Milan: Skira, 1998, pp. 144-147</text:span></text
>
<text
text:style-name="P7"/>
<text
text:style-name="P7"/>
<text
text:style-name="ID"><text:span
text:style-name="T5">[</text:span><text:sequence
text:ref-name="refAutoNr1" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">2</text:sequence></text
>
<text
text:style-name="P8">'Abd al-Râziq, Ahmad</text
>
<text
text:style-name="reference"><text:span
text:style-name="T6">La mosquée al-Azhar</text:span><text:span
text:style-name="T7">., in, </text:span><text:span
text:style-name="T6">"Trésors fatimides du Caire. Exposition
présentée àl'Institut du Monde Arabe...
</text:span><text:span
text:style-name="T8">1998."</text:span><text:span
text:style-name="T9">, Paris: Institut du Monde Arabe, 1998, pp.
147-149</text:span></text
>
<text
text:style-name="P7"/>
<text
text:style-name="P7"/>
<text
text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr2" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">3</text:sequence></text
>
<text
text:style-name="Standard"><text:s/>'Amri, Husay
'Abdallah</text
>
<text
text:style-name="reference"><text:span
text:style-name="T10">The Text of an Unpublished Fatwa of the Scholar
al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
Batiniyyah (Isma'iliyyah) of the People of Hamdan</text:span>.,
Translated by A.B.D.R. Eagle, <text:span text:style-name="Style2">New
Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text
>
<text
text:style-name="reference"/>
<text
text:style-name="reference"/>
<text
text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr3" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">4</text:sequence></text
>
<text
text:style-name="Standard">Abarahamov, Binyamin</text
>
<text
text:style-name="reference"><text:span
text:style-name="T10">An Isma'ili Epistemology: The Case of
Al-Da'i al-Mutlaq 'Ali b. Muhammad b. al-Walid</text:span>.,
<text:span text:style-name="Style2">Journal of Semitic
Studies</text:span>, 41ii (1996), pp. 263-273.</text
>
<text
text:style-name="reference"/>
<text
text:style-name="reference"/>
<text
text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr4" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">5</text:sequence></text
>
<text
text:style-name="Standard">Abel, A.</text
>
<text
text:style-name="reference"><text:span
text:style-name="T10">De historische betekenis van de Loutere Broeders
van Basra (Bassorah), een wijsgerig gezelschap in de Islam van de Xe
eeuw</text:span>., <text:span text:style-name="Style2">Orientalia
Gandensia</text:span>, 1 (1964), pp. 157-170.</text
>
<text
text:style-name="reference"/>
<text
text:style-name="reference"/>
<text
text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr5" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">6</text:sequence></text
>
<text
text:style-name="P9">Abou Said, A.C.</text
>
<text
text:style-name="reference"><text:span
text:style-name="T11">Abbasid and Fatimid Political Relations during
the Buhawid Period</text:span><text:span text:style-name="T12">.,
University of Cambridge, 1967.</text:span></text
>
<text
text:style-name="reference2">[<text:span
text:style-name="Style2">Dissertation</text:span>]</text
>
<text
text:style-name="reference2"/>
<text
text:style-name="reference2"/>
<text
text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr6" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">7</text:sequence></text
>
<text
text:style-name="P9">Abu Firas, Shihab al-Din
al-Maynaqi</text
>
<text
text:style-name="reference"><text:span
text:style-name="T11">Ash-Shafiya': An Isma'ili
Treatise</text:span><text:span text:style-name="T12">., Edited and
Translated with an Introduction and Commentary by Sami Nasib Makarim,
Beirut: American University of Beirut, 1966.</text:span></text
>
<text
text:style-name="reference"/>
<text
text:style-name="reference"/>
<text
text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr7" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">8</text:sequence></text
>
<text
text:style-name="P9">Abu'l-Fida, al-Malik
al-Mu'ayyad 'Imad al-Din Ismai'l b. 'Ali</text
>
<text
text:style-name="reference"><text:span
text:style-name="T11">The Memoirs of a Syrian
Prince</text:span><text:span text:style-name="T12">., Translated by
Peter Malcom Holt, Wiesbaden: Franz Steiner Verlag, [Freiburger
Islamstudien], 1983.</text:span></text
>
<text
text:style-name="reference"/>
<text
text:style-name="reference"/>
<text
text:style-name="ID"><text:bookmark-start
text:name="a01"/>[<text:sequence text:ref-name="refAutoNr8"
text:name="AutoNr" text:formula="ooow:AutoNr+1"
style:num-format="1">9</text:sequence></text
>
<text
text:style-name="P9">Abu-Lughod, J.<text:bookmark-end
text:name="a01"/></text
>
<text
text:style-name="reference"><text:span
text:style-name="T11">Cairo: 1001 Years of the City
Victorious</text:span><text:span text:style-name="T12">., Princeton:
Princeton University Press, 1971. </text:span></text
>
<text
text:style-name="reference"/>
<text
text:style-name="reference"/>
<text
text:style-name="ID"><text:bookmark-start
text:name="a02"/>[<text:sequence text:ref-name="refAutoNr9"
text:name="AutoNr" text:formula="ooow:AutoNr+1"
style:num-format="1">10</text:sequence></text
>
<text
text:style-name="P6">Adamji, Ebrahimji N. and Sorabji M.
Darookhanawala</text
>
<text
text:style-name="reference"><text:span
text:style-name="T8">Two Indian Travellers: East Africa, 1902-1905:
Being Accounts of Journeys Made by Ebrahimji N. Adamji, a Very Young
Bohra Merchant from Mombasa & Sorabji M. Darookhanawala, a
Middle-Aged Parsi Engineer from Zanzibar</text:span><text:span
text:style-name="T9">., Edited by C. Salvadori and J. Aldrick, Mombasa:
Friends of Fort Jesus, 1997.</text:span></text
>
<text
text:style-name="P10"/>
<text
text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr1113" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">11</text:sequence></text
>
<text
text:style-name="P9">Úðûðýôðрþò, Ã¢Ã¾Ñ…øр
áðфðрñõúþòøч</text
>
<text
text:style-name="reference"><text:span
text:style-name="T11">àõûøóøþ÷ýðÑÂÑÂøтуðцøÑÂýðßðüøрõ(úÿрþñûõüõрõûøóøþ÷ýþóþÑÂøýúрõтø÷üð).
(Summary: The Religious Situation on the Pamirs (to the problem of
religious syncretism).)</text:span><text:span text:style-name="T12">.,
</text:span><text:span
text:style-name="Style2">ÒþÑÂтþú</text:span><text:span
text:style-name="T12">, 2000 vi, pp. 36-49;219</text:span></text
>
<text
text:style-name="reference2">[Ismailis in
Tajikistan.]</text
>
<text
text:style-name="reference2"/>
<text
text:style-name="reference2"/>
<text
text:style-name="ID">[<text:sequence
text:ref-name="refAutoNr1114" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">12</text:sequence></text
>
<text
text:style-name="P9">èþхуüþрþò, Ã¡Ã°Ã¸Ã´Ã°Ã½Ã²Ã°Ñ€</text
>
<text
text:style-name="reference"><text:span
text:style-name="T11">ØÑÂüðøûø÷ü:
трðôøцøøøÑÂþòрõüõýýþÑÂть</text:span><text:span
text:style-name="T12">., </text:span><text:span
text:style-name="Style2">æõýтрðûьýðÑÂÃÂ÷øÑÂøÃšðòúð÷</text:span><text:span
text:style-name="T12">, 2000ii/8, pp. 128-138</text:span></text
>
<text
text:style-name="P11">[Also online at <text:span
text:style-name="T10">www.ca-c.org/journal-table-rus.shtml</text:span>]</text
>
</office:text>
</office:body>
</office:document-content>
Hi,
Paul Lutus suggested that I give more detail about my problem. Ok,
here it is:
BACKGROUND:
Save a small document in OOo format, like a bibliographic entry with,
say, the article title in bold and the journal in italics
- under options - save -> disable xml size optimization
- save the file
- copy the file to a subdirectory
- run "unzip filename"
content.xml has the data we want to convert to TeX. A sample
content.xml is given at the end of this message, after the script.
RUBY: I have a script provided by a colleague that does a lot of the
work needed to convert this to a sane ConTeXt file. I am trying to
teach myself enough ruby to edit this script as needed for academic
articles (I edit an academic journal in TeX). The script is reproduced
at the end of this message.
PROBLEMS: Yesterday I did learn about regexp and made progress, though
the script is still buggy:
i) In the script (l. 110--112) I have
===========
str.gsub!(/"(.*?)"/) do
'\quotation {' + $1 + '}'
end
===========
but line 114 of content.xml the " pair is not converted, though
it is converted elsewhere.
ii) (really weird) In the script (l. 45--47) I have
============
@data.gsub!(/\[<(text:sequence
text:ref-name="refAutoNr0").*?>.*?<\/text:sequence>/mois) do
'\startitemize' + '\head'
end
============
This apparently works fine. Now I want some linespace between
'\startitemize' & '\head', so I put a "\n\n" in between them. This
causes the xml tags to appear in the output file like this
============
<text
\head</text
============
iii) any tips for improving this script are appreciated. I'm sure I'll
have more questions over the next couple of days as I work on this.
Thank you all in advance for any help or pointers for this novice
Best
Idris
================idris.rb==============
class OpenOffice
# using an xml parser if overkill and we need to regexp anyway
attr_reader :display, :inline, :translate
attr_writer :display, :inline, :translate
def initialize
@data = nil
@file = ''
@display = Hash.new
@inline = Hash.new
@translate = Hash.new
end
def load(filename)
if not filename.empty? and FileTest.file?(filename) then
begin
@data, @file = IO.read(filename), filename
rescue
@data, @file = nil, ''
end
else
@data, @file = nil, ''
end
end
def save(filename='')
if filename.empty? then
filename = "clean-#{@file}.tex"
end
if f = open(filename,'w') then
f.puts(@data)
f.close
end
end
def convert
@translations = Hash.new
@translate.each do |k,v|
@translations[/#{k}/] = v
end
if @data then
@data.gsub!(/\[<(text:sequence
text:ref-name="refAutoNr0").*?>.*?<\/text:sequence>/mois) do
'\startitemize' + "\n\n" + '\head' # + "\n\n"
end
@data.gsub!(/\[<\/(text:span)><(text:sequence
text:ref-name="refAutoNr[^0].*?").*?>.*?<\/text:sequence>/mois) do
'\head'
end
@data.gsub!(/\[<(text:sequence
text:ref-name="refAutoNr[^0].*?").*?>.*?<\/text:sequence>/mois) do
'\head'
end
@data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
'\enableregime[utf]' + "\n" + '\useencoding[cyr]' + "\n\n" +
'\definetypeface [russian]'\
+ "\n" + ' ' + '[rm] [serif] [computer-modern] [default]
[encoding=t2a]' + "\n\n" +\
'\starttext'+ "\n\n" + '\switchtobodyfont[russian]' + "\n" + $2 +
"\n" + \
'\stopitemize' + "\n\n" + '\stoptext'
end
@data.gsub!(/<(office:font-face-decls|office:automatic-styles|text:sequence-decls).*?>.*?<\/\1>/mois)
do
# remove
end
# @data.gsub!(/<(text:span
text:style-name="T10")>(.*?)<\/text:span>/mois) do
# '{' + '\bf ' + $2 + '}'
# end
# @data.gsub!(/<(text:span
text:style-name="Style2")>(.*?)<\/text:span>/mois) do
# '{' + '\it ' + $2 + '}'
# end
@data.gsub!(/<text:span.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text:span>/)
do
tag, text = $2, $3
if inline[tag] then
(inline[tag][0]||'') + clean_display(text) +
(inline[tag][1]||'')
else
clean_display(text)
end
end
@data.gsub!(/<text:span.*?text:style-name=(".*?")>/) do
# remove
end
@data.gsub!(/<\?.*?\?>/) do
# remove
end
@data.gsub!(/<!--.*?-->/) do
# remove
end
@data.gsub!(/<text
# remove
end
@data.gsub!(/<text
do
tag, text = $2, $3
if display[tag] then
"\n" + (display[tag][0]||'') + clean_inline(text)
+ (display[tag][1]||'') + "\n"
else
"\n" + clean_inline(text) + "\n"
end
end
@data.gsub!(/<text:s[^>]*?\/>/) do
# remove
end
@data.gsub!(/<text:bookmark[^>]*?\/>/) do
# remove
end
@data.gsub!(/\t/,' ')
@data.gsub!(/^ +$/,'')
@data.gsub!(/\n\n+/moi,"\n\n")
end
end
def clean_display(str)
str.gsub!(/"(.*?)"/) do
'\quotation {' + $1 + '}'
end
str.gsub!(/&/) do
'\&'
end
str
end
def clean_inline(str)
@translations.each do |k,v|
str.gsub!(k,v)
end
str
end
end
def convert(filename)
doc = OpenOffice.new
doc.display['P1'] = ['\chapter{','}']
doc.display['P2'] = ['\start'+"\n","\n"+'\stop']
doc.display['P3'] = doc.display['P2']
# doc.display['ID'] = ['\relax']
doc.inline['T1'] = ['','']
doc.inline['T2'] = ['','']
doc.inline['T3'] = ['{\bf ','}']
doc.inline['T6'] = ['{\bf ','}']
doc.inline['T8'] = ['{\bf ','}']
doc.inline['T10'] = ['{\bf ','}']
doc.inline['T11'] = ['{\bf ','}']
doc.inline['Style2'] = ['{\it ','}']
# doc.translate['¬'] = 'XX'
doc.translate['''] = '`'
doc.translate['&'] = '\&'
doc.load(filename)
doc.convert
doc.save
end
filename = ARGV[0]
filename = 'content.xml' if not filename or filename.empty?
convert('content.xml')
===========content.xml============
<?xml version="1.0" encoding="UTF-8"?>
<office:document-content
xmlns
xmlns:style="urn
xmlns:text="urn
xmlns:table="urn
xmlns:draw="urn
xmlns:fo="urn
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:meta="urn
xmlns:number="urn
xmlns:svg="urn
xmlns:chart="urn
xmlns:dr3d="urn
xmlns:math="http://www.w3.org/1998/Math/MathML"
xmlns:form="urn
xmlns:script="urn
xmlns
xmlns
xmlns
xmlns:dom="http://www.w3.org/2001/xml-events"
xmlns:xforms="http://www.w3.org/2002/xforms"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
office:version="1.0">
<office:scripts/>
<office:font-face-decls>
<style:font-face style:name="Wingdings" svg:font-family="Wingdings"
style:font-pitch="variable" style:font-charset="x-symbol"/>
<style:font-face style:name="Symbol" svg:font-family="Symbol"
style:font-family-generic="roman" style:font-pitch="variable"
style:font-charset="x-symbol"/>
<style:font-face style:name="Tahoma2" svg:font-family="Tahoma"/>
<style:font-face style:name="Arial Unicode MS"
svg:font-family="'Arial Unicode MS'"
style:font-pitch="variable"/>
<style:font-face style:name="MS Mincho" svg:font-family="'MS
Mincho'" style:font-pitch="variable"/>
<style:font-face style:name="Tahoma1" svg:font-family="Tahoma"
style:font-pitch="variable"/>
<style:font-face style:name="Garamond" svg:font-family="Garamond"
style:font-family-generic="roman" style:font-pitch="variable"/>
<style:font-face style:name="Times New Roman"
svg:font-family="'Times New Roman'"
style:font-family-generic="roman" style:font-pitch="variable"/>
<style:font-face style:name="Arial" svg:font-family="Arial"
style:font-family-generic="swiss" style:font-pitch="variable"/>
<style:font-face style:name="Tahoma" svg:font-family="Tahoma"
style:font-family-generic="swiss" style:font-pitch="variable"/>
</office:font-face-decls>
<office:automatic-styles>
<style:style style:name="P1" style:family="paragraph"
style
style:master-page-name="First_20_Page">
<style
style:justify-single-word="false"/>
</style:style>
<style:style style:name="P2" style:family="paragraph"
style
<style
style:justify-single-word="false"/>
<style:text-properties fo:font-size="14pt" fo:font-weight="bold"
style:font-size-asian="14pt" style:font-weight-asian="bold"
style:font-size-complex="14pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="P3" style:family="paragraph"
style
<style
style:justify-single-word="false"/>
<style:text-properties fo:font-size="18pt" fo:font-weight="bold"
style:font-size-asian="18pt" style:font-weight-asian="bold"
style:font-size-complex="18pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="P4" style:family="paragraph"
style
style:master-page-name="Convert_20_1"/>
<style:style style:name="P5" style:family="paragraph"
style
style:master-page-name="Convert_20_2"/>
<style:style style:name="P6" style:family="paragraph"
style
<style:text-properties style:font-name-asian="Wingdings"
style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="P7" style:family="paragraph"
style
<style:text-properties style:font-name-asian="Wingdings"/>
</style:style>
<style:style style:name="P8" style:family="paragraph"
style
<style:text-properties fo:language="fr" fo:country="FR"
style:font-name-asian="Wingdings" style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="P9" style:family="paragraph"
style
<style:text-properties style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="P10" style:family="paragraph"
style
<style:text-properties fo:font-size="11pt"
style:font-size-asian="11pt" style:font-size-complex="9pt"/>
</style:style>
<style:style style:name="P11" style:family="paragraph"
style
<style:text-properties fo:font-size="11pt"
style:font-size-asian="11pt" style:font-size-complex="9pt"/>
</style:style>
<style:style style:name="T1" style:family="text">
<style:text-properties fo:font-size="21pt" fo:font-weight="bold"
style:font-size-asian="21pt" style:font-weight-asian="bold"
style:font-size-complex="21pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T2" style:family="text">
<style:text-properties fo:font-size="21pt" fo:font-weight="bold"
style:font-size-asian="21pt" style:font-weight-asian="bold"
style:font-size-complex="22pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T3" style:family="text">
<style:text-properties fo:font-weight="bold"
style:font-name-asian="Wingdings" style:font-weight-asian="bold"
style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T4" style:family="text">
<style:text-properties style:font-name-asian="Wingdings"/>
</style:style>
<style:style style:name="T5" style:family="text">
<style:text-properties fo:language="fr" fo:country="FR"/>
</style:style>
<style:style style:name="T6" style:family="text">
<style:text-properties fo:language="fr" fo:country="FR"
fo:font-weight="bold" style:font-name-asian="Wingdings"
style:font-weight-asian="bold" style:font-size-complex="10pt"
style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T7" style:family="text">
<style:text-properties fo:language="fr" fo:country="FR"
style:font-name-asian="Wingdings" style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="T8" style:family="text">
<style:text-properties fo:font-weight="bold"
style:font-name-asian="Wingdings" style:font-weight-asian="bold"
style:font-size-complex="10pt" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T9" style:family="text">
<style:text-properties style:font-name-asian="Wingdings"
style:font-size-complex="10pt"/>
</style:style>
<style:style style:name="T10" style:family="text">
<style:text-properties fo:font-weight="bold"
style:font-weight-asian="bold" style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T11" style:family="text">
<style:text-properties fo:font-weight="bold"
style:font-weight-asian="bold" style:font-size-complex="10pt"
style:font-weight-complex="bold"/>
</style:style>
<style:style style:name="T12" style:family="text">
<style:text-properties style:font-size-complex="10pt"/>
</style:style>
</office:automatic-styles>
<office:body>
<office:text>
<text:sequence-decls>
<text:sequence-decl text:display-outline-level="0"
text:name="Illustration"/>
<text:sequence-decl text:display-outline-level="0"
text:name="Table"/>
<text:sequence-decl text:display-outline-level="0"
text:name="Text"/>
<text:sequence-decl text:display-outline-level="0"
text:name="Drawing"/>
<text:sequence-decl text:display-outline-level="0"
text:name="AutoNr"/>
</text:sequence-decls>
<text
text:style-name="T1">Isma</text:span><text:span
text:style-name="T2">'</text:span><text:span
text:style-name="T1">ilis: A Bibliography</text:span></text
<text
<text
<text
<text
<text
<text
<text
<text
<text
<text
<text
<text
text:ref-name="refAutoNr0" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">1</text:sequence></text
<text
<text
text:style-name="T3">Die al-Azhar-Moschee</text:span><text:span
text:style-name="T4">., in, </text:span><text:span
text:style-name="T3">"Schätze der Kalifen: Islamische Kunst zur
Fatimidenzeit."</text:span><text:span text:style-name="T4">,
Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien;
Milan: Skira, 1998, pp. 144-147</text:span></text
<text
<text
<text
text:style-name="T5">[</text:span><text:sequence
text:ref-name="refAutoNr1" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">2</text:sequence></text
<text
<text
text:style-name="T6">La mosquée al-Azhar</text:span><text:span
text:style-name="T7">., in, </text:span><text:span
text:style-name="T6">"Trésors fatimides du Caire. Exposition
présentée àl'Institut du Monde Arabe...
</text:span><text:span
text:style-name="T8">1998."</text:span><text:span
text:style-name="T9">, Paris: Institut du Monde Arabe, 1998, pp.
147-149</text:span></text
<text
<text
<text
text:ref-name="refAutoNr2" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">3</text:sequence></text
<text
'Abdallah</text
<text
text:style-name="T10">The Text of an Unpublished Fatwa of the Scholar
al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
Batiniyyah (Isma'iliyyah) of the People of Hamdan</text:span>.,
Translated by A.B.D.R. Eagle, <text:span text:style-name="Style2">New
Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text
<text
<text
<text
text:ref-name="refAutoNr3" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">4</text:sequence></text
<text
<text
text:style-name="T10">An Isma'ili Epistemology: The Case of
Al-Da'i al-Mutlaq 'Ali b. Muhammad b. al-Walid</text:span>.,
<text:span text:style-name="Style2">Journal of Semitic
Studies</text:span>, 41ii (1996), pp. 263-273.</text
<text
<text
<text
text:ref-name="refAutoNr4" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">5</text:sequence></text
<text
<text
text:style-name="T10">De historische betekenis van de Loutere Broeders
van Basra (Bassorah), een wijsgerig gezelschap in de Islam van de Xe
eeuw</text:span>., <text:span text:style-name="Style2">Orientalia
Gandensia</text:span>, 1 (1964), pp. 157-170.</text
<text
<text
<text
text:ref-name="refAutoNr5" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">6</text:sequence></text
<text
<text
text:style-name="T11">Abbasid and Fatimid Political Relations during
the Buhawid Period</text:span><text:span text:style-name="T12">.,
University of Cambridge, 1967.</text:span></text
<text
text:style-name="Style2">Dissertation</text:span>]</text
<text
<text
<text
text:ref-name="refAutoNr6" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">7</text:sequence></text
<text
al-Maynaqi</text
<text
text:style-name="T11">Ash-Shafiya': An Isma'ili
Treatise</text:span><text:span text:style-name="T12">., Edited and
Translated with an Introduction and Commentary by Sami Nasib Makarim,
Beirut: American University of Beirut, 1966.</text:span></text
<text
<text
<text
text:ref-name="refAutoNr7" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">8</text:sequence></text
<text
al-Mu'ayyad 'Imad al-Din Ismai'l b. 'Ali</text
<text
text:style-name="T11">The Memoirs of a Syrian
Prince</text:span><text:span text:style-name="T12">., Translated by
Peter Malcom Holt, Wiesbaden: Franz Steiner Verlag, [Freiburger
Islamstudien], 1983.</text:span></text
<text
<text
<text
text:name="a01"/>[<text:sequence text:ref-name="refAutoNr8"
text:name="AutoNr" text:formula="ooow:AutoNr+1"
style:num-format="1">9</text:sequence></text
<text
text:name="a01"/></text
<text
text:style-name="T11">Cairo: 1001 Years of the City
Victorious</text:span><text:span text:style-name="T12">., Princeton:
Princeton University Press, 1971. </text:span></text
<text
<text
<text
text:name="a02"/>[<text:sequence text:ref-name="refAutoNr9"
text:name="AutoNr" text:formula="ooow:AutoNr+1"
style:num-format="1">10</text:sequence></text
<text
Darookhanawala</text
<text
text:style-name="T8">Two Indian Travellers: East Africa, 1902-1905:
Being Accounts of Journeys Made by Ebrahimji N. Adamji, a Very Young
Bohra Merchant from Mombasa & Sorabji M. Darookhanawala, a
Middle-Aged Parsi Engineer from Zanzibar</text:span><text:span
text:style-name="T9">., Edited by C. Salvadori and J. Aldrick, Mombasa:
Friends of Fort Jesus, 1997.</text:span></text
<text
<text
text:ref-name="refAutoNr1113" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">11</text:sequence></text
<text
text:style-name="P9">Úðûðýôðрþò, Ã¢Ã¾Ñ…øр
áðфðрñõúþòøч</text
<text
text:style-name="T11">àõûøóøþ÷ýðÑÂÑÂøтуðцøÑÂýðßðüøрõ(úÿрþñûõüõрõûøóøþ÷ýþóþÑÂøýúрõтø÷üð).
(Summary: The Religious Situation on the Pamirs (to the problem of
religious syncretism).)</text:span><text:span text:style-name="T12">.,
</text:span><text:span
text:style-name="Style2">ÒþÑÂтþú</text:span><text:span
text:style-name="T12">, 2000 vi, pp. 36-49;219</text:span></text
<text
Tajikistan.]</text
<text
<text
<text
text:ref-name="refAutoNr1114" text:name="AutoNr"
text:formula="ooow:AutoNr+1"
style:num-format="1">12</text:sequence></text
<text
text:style-name="P9">èþхуüþрþò, Ã¡Ã°Ã¸Ã´Ã°Ã½Ã²Ã°Ñ€</text
<text
text:style-name="T11">ØÑÂüðøûø÷ü:
трðôøцøøøÑÂþòрõüõýýþÑÂть</text:span><text:span
text:style-name="T12">., </text:span><text:span
text:style-name="Style2">æõýтрðûьýðÑÂÃÂ÷øÑÂøÃšðòúð÷</text:span><text:span
text:style-name="T12">, 2000ii/8, pp. 128-138</text:span></text
<text
text:style-name="T10">www.ca-c.org/journal-table-rus.shtml</text:span>]</text
</office:text>
</office:body>
</office:document-content>