I
ishamid
[Total novice]
A follow-up on my last email ("search and replace")". I am trying to
convert an OOo xml source (content.xml) to TeX. It's a bibliography and
thus very predictable/regular/simple etc. Each entry looks roughly like
this (simplified):
====================================
<text text:style-name="ID">[<text:sequence text:ref-name="refAutoNr3"
text:name="AutoNr" text:formula="ooow:AutoNr+1"
style:num-format="1">4</text:sequence></text>
<text text:style-name="Standard">Ben</text>
<text text:style-name="reference">
<text:span text:style-name="T10">Article</text:span>.,
<text:span text:style-name="Style2">Journal</text:span>,
volume, issue, year.
</text>
<text text:style-name="reference"/>
<text text:style-name="reference"/>
====================================
I. line one is discussed in my last email. Basically, each line of this
type (numbers are variable) needs to be converted to
====
\head
====
II.
====================================
<text text:style-name="P6">Jim</text>
<text text:style-name="P8">Michael</text>
<text text:style-name="Standard">Ben</text>
====================================
replace each with the name plus a linespace
====================================
Jim
Michael
Ben
====================================
III. <text:span text:style-name="T10">Article</text:span>
If the style-name="T10", then the argument should be, e.g. {\bf
Article}
if the style-name="Style2", then argument should be, e.g. {\it
Journal}
IV. So the final output should be something like
====================================
\head Ben
{\bf Article}, {\it Journal}, volume, issue, year.
====================================
I hope to get enough info here to be able to finish this myself. I
assume finishing my script would only take one of you guys 15 or 20
minutes ;-) If I'm not able to get things working quickly (trying to
learn Ruby and do my work at the same time) I will be happy to pay one
of you for an hour or so of work (I'm up against a deadline).
THANK YOU
Idris
PS For reference, here is the script I'm trying to modify for this OOo
bibliography:
=====================================
class OpenOffice
# using an xml parser if overkill and we need to regexp anyway
attr_reader :display, :inline, :translate
attr_writer :display, :inline, :translate
def initialize
@data = nil
@file = ''
@display = Hash.new
@inline = Hash.new
@translate = Hash.new
end
def load(filename)
if not filename.empty? and FileTest.file?(filename) then
begin
@data, @file = IO.read(filename), filename
rescue
@data, @file = nil, ''
end
else
@data, @file = nil, ''
end
end
def save(filename='')
if filename.empty? then
filename = "clean-#{@file}"
end
if f = open(filename,'w') then
f.puts(@data)
f.close
end
end
def convert
@translations = Hash.new
@translate.each do |k,v|
@translations[/#{k}/] = v
end
if @data then
@data.gsub!(/<\?.*?\?>/) do
# remove
end
@data.gsub!(/<!--.*?-->/) do
# remove
@data.gsub!(/<!--.*?-->/) do
# remove
end
@data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
'\starttext' + "\n" + $2 + "\n" + '\stoptext'
end
@data.gsub!(/<(office:font-face-decls|office:automatic-styles|text:sequence-decls).*?>.*?<\/\1>/mois)
do
# remove
end
@data.gsub!(/<text:span.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text:span>/)
do
tag, text = $2, $3
if inline[tag] then
(inline[tag][0]||'') + clean_display(text) +
(inline[tag][1]||'')
else
clean_display(text)
end
end
@data.gsub!(/<text[^>]*?\/>/) do
# remove
end
@data.gsub!(/<text.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text>/)
do
tag, text = $2, $3
if display[tag] then
"\n" + (display[tag][0]||'') + clean_inline(text) +
(display[tag][1]||'') + "\n"
else
"\n" + clean_inline(text) + "\n"
end
end
@data.gsub!(/\t/,' ')
@data.gsub!(/^ +$/,'')
@data.gsub!(/\n\n+/moi,"\n\n")
end
end
def clean_display(str)
str.gsub!(/"(.*?)"/) do
'\quotation {' + $1 + '}'
end
str
end
def clean_inline(str)
@translations.each do |k,v|
str.gsub!(k,v)
end
str
end
end
def convert(filename)
doc = OpenOffice.new
doc.display['P1'] = ['\chapter{','}']
doc.display['P2'] = ['\startparagraph'+"\n","\n"+'\stopparagraph']
doc.display['P3'] = doc.display['P2']
doc.inline['T1'] = ['','']
doc.inline['T2'] = ['{\sl ','}']
doc.translate['¬'] = 'XX'
doc.translate['''] = '`'
doc.load(filename)
doc.convert
doc.save
end
filename = ARGV[0]
filename = 'content.xml' if not filename or filename.empty?
convert('content.xml')
=====================================
A follow-up on my last email ("search and replace")". I am trying to
convert an OOo xml source (content.xml) to TeX. It's a bibliography and
thus very predictable/regular/simple etc. Each entry looks roughly like
this (simplified):
====================================
<text text:style-name="ID">[<text:sequence text:ref-name="refAutoNr3"
text:name="AutoNr" text:formula="ooow:AutoNr+1"
style:num-format="1">4</text:sequence></text>
<text text:style-name="Standard">Ben</text>
<text text:style-name="reference">
<text:span text:style-name="T10">Article</text:span>.,
<text:span text:style-name="Style2">Journal</text:span>,
volume, issue, year.
</text>
<text text:style-name="reference"/>
<text text:style-name="reference"/>
====================================
I. line one is discussed in my last email. Basically, each line of this
type (numbers are variable) needs to be converted to
====
\head
====
II.
====================================
<text text:style-name="P6">Jim</text>
<text text:style-name="P8">Michael</text>
<text text:style-name="Standard">Ben</text>
====================================
replace each with the name plus a linespace
====================================
Jim
Michael
Ben
====================================
III. <text:span text:style-name="T10">Article</text:span>
If the style-name="T10", then the argument should be, e.g. {\bf
Article}
if the style-name="Style2", then argument should be, e.g. {\it
Journal}
IV. So the final output should be something like
====================================
\head Ben
{\bf Article}, {\it Journal}, volume, issue, year.
====================================
I hope to get enough info here to be able to finish this myself. I
assume finishing my script would only take one of you guys 15 or 20
minutes ;-) If I'm not able to get things working quickly (trying to
learn Ruby and do my work at the same time) I will be happy to pay one
of you for an hour or so of work (I'm up against a deadline).
THANK YOU
Idris
PS For reference, here is the script I'm trying to modify for this OOo
bibliography:
=====================================
class OpenOffice
# using an xml parser if overkill and we need to regexp anyway
attr_reader :display, :inline, :translate
attr_writer :display, :inline, :translate
def initialize
@data = nil
@file = ''
@display = Hash.new
@inline = Hash.new
@translate = Hash.new
end
def load(filename)
if not filename.empty? and FileTest.file?(filename) then
begin
@data, @file = IO.read(filename), filename
rescue
@data, @file = nil, ''
end
else
@data, @file = nil, ''
end
end
def save(filename='')
if filename.empty? then
filename = "clean-#{@file}"
end
if f = open(filename,'w') then
f.puts(@data)
f.close
end
end
def convert
@translations = Hash.new
@translate.each do |k,v|
@translations[/#{k}/] = v
end
if @data then
@data.gsub!(/<\?.*?\?>/) do
# remove
end
@data.gsub!(/<!--.*?-->/) do
# remove
@data.gsub!(/<!--.*?-->/) do
# remove
end
@data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
'\starttext' + "\n" + $2 + "\n" + '\stoptext'
end
@data.gsub!(/<(office:font-face-decls|office:automatic-styles|text:sequence-decls).*?>.*?<\/\1>/mois)
do
# remove
end
@data.gsub!(/<text:span.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text:span>/)
do
tag, text = $2, $3
if inline[tag] then
(inline[tag][0]||'') + clean_display(text) +
(inline[tag][1]||'')
else
clean_display(text)
end
end
@data.gsub!(/<text[^>]*?\/>/) do
# remove
end
@data.gsub!(/<text.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text>/)
do
tag, text = $2, $3
if display[tag] then
"\n" + (display[tag][0]||'') + clean_inline(text) +
(display[tag][1]||'') + "\n"
else
"\n" + clean_inline(text) + "\n"
end
end
@data.gsub!(/\t/,' ')
@data.gsub!(/^ +$/,'')
@data.gsub!(/\n\n+/moi,"\n\n")
end
end
def clean_display(str)
str.gsub!(/"(.*?)"/) do
'\quotation {' + $1 + '}'
end
str
end
def clean_inline(str)
@translations.each do |k,v|
str.gsub!(k,v)
end
str
end
end
def convert(filename)
doc = OpenOffice.new
doc.display['P1'] = ['\chapter{','}']
doc.display['P2'] = ['\startparagraph'+"\n","\n"+'\stopparagraph']
doc.display['P3'] = doc.display['P2']
doc.inline['T1'] = ['','']
doc.inline['T2'] = ['{\sl ','}']
doc.translate['¬'] = 'XX'
doc.translate['''] = '`'
doc.load(filename)
doc.convert
doc.save
end
filename = ARGV[0]
filename = 'content.xml' if not filename or filename.empty?
convert('content.xml')
=====================================