more search and replace

Discussion in 'Ruby' started by ishamid, Dec 2, 2006.

  1. ishamid

    ishamid Guest

    [Total novice]

    A follow-up on my last email ("search and replace")". I am trying to
    convert an OOo xml source (content.xml) to TeX. It's a bibliography and
    thus very predictable/regular/simple etc. Each entry looks roughly like
    this (simplified):

    ====================================
    <text:p text:style-name="ID">[<text:sequence text:ref-name="refAutoNr3"

    text:name="AutoNr" text:formula="ooow:AutoNr+1"
    style:num-format="1">4</text:sequence></text:p>
    <text:p text:style-name="Standard">Ben</text:p>
    <text:p text:style-name="reference">
    <text:span text:style-name="T10">Article</text:span>.,
    <text:span text:style-name="Style2">Journal</text:span>,
    volume, issue, year.
    </text:p>
    <text:p text:style-name="reference"/>
    <text:p text:style-name="reference"/>
    ====================================

    I. line one is discussed in my last email. Basically, each line of this
    type (numbers are variable) needs to be converted to

    ====
    \head
    ====

    II.
    ====================================
    <text:p text:style-name="P6">Jim</text:p>
    <text:p text:style-name="P8">Michael</text:p>
    <text:p text:style-name="Standard">Ben</text:p>
    ====================================

    replace each with the name plus a linespace

    ====================================
    Jim

    Michael

    Ben
    ====================================

    III. <text:span text:style-name="T10">Article</text:span>

    If the style-name="T10", then the argument should be, e.g. {\bf
    Article}
    if the style-name="Style2", then argument should be, e.g. {\it
    Journal}

    IV. So the final output should be something like

    ====================================
    \head Ben

    {\bf Article}, {\it Journal}, volume, issue, year.

    ====================================

    I hope to get enough info here to be able to finish this myself. I
    assume finishing my script would only take one of you guys 15 or 20
    minutes ;-) If I'm not able to get things working quickly (trying to
    learn Ruby and do my work at the same time) I will be happy to pay one
    of you for an hour or so of work (I'm up against a deadline).

    THANK YOU
    Idris

    PS For reference, here is the script I'm trying to modify for this OOo
    bibliography:

    =====================================
    class OpenOffice

    # using an xml parser if overkill and we need to regexp anyway

    attr_reader :display, :inline, :translate
    attr_writer :display, :inline, :translate

    def initialize
    @data = nil
    @file = ''
    @display = Hash.new
    @inline = Hash.new
    @translate = Hash.new
    end

    def load(filename)
    if not filename.empty? and FileTest.file?(filename) then
    begin
    @data, @file = IO.read(filename), filename
    rescue
    @data, @file = nil, ''
    end
    else
    @data, @file = nil, ''
    end
    end

    def save(filename='')
    if filename.empty? then
    filename = "clean-#{@file}"
    end
    if f = open(filename,'w') then
    f.puts(@data)
    f.close
    end
    end

    def convert
    @translations = Hash.new
    @translate.each do |k,v|
    @translations[/#{k}/] = v
    end
    if @data then
    @data.gsub!(/<\?.*?\?>/) do
    # remove
    end
    @data.gsub!(/<!--.*?-->/) do
    # remove
    @data.gsub!(/<!--.*?-->/) do
    # remove
    end
    @data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
    '\starttext' + "\n" + $2 + "\n" + '\stoptext'
    end

    @data.gsub!(/<(office:font-face-decls|office:automatic-styles|text:sequence-decls).*?>.*?<\/\1>/mois)
    do
    # remove
    end

    @data.gsub!(/<text:span.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text:span>/)
    do
    tag, text = $2, $3
    if inline[tag] then
    (inline[tag][0]||'') + clean_display(text) +
    (inline[tag][1]||'')
    else
    clean_display(text)
    end
    end
    @data.gsub!(/<text:p[^>]*?\/>/) do
    # remove
    end

    @data.gsub!(/<text:p.*?text:style-name=([\'\"])(.*?)\1>(.*?)<\/text:p>/)
    do
    tag, text = $2, $3
    if display[tag] then
    "\n" + (display[tag][0]||'') + clean_inline(text) +
    (display[tag][1]||'') + "\n"
    else
    "\n" + clean_inline(text) + "\n"
    end
    end
    @data.gsub!(/\t/,' ')
    @data.gsub!(/^ +$/,'')
    @data.gsub!(/\n\n+/moi,"\n\n")
    end
    end

    def clean_display(str)
    str.gsub!(/&quot;(.*?)&quot;/) do
    '\quotation {' + $1 + '}'
    end
    str
    end

    def clean_inline(str)
    @translations.each do |k,v|
    str.gsub!(k,v)
    end
    str
    end

    end

    def convert(filename)

    doc = OpenOffice.new

    doc.display['P1'] = ['\chapter{','}']
    doc.display['P2'] = ['\startparagraph'+"\n","\n"+'\stopparagraph']
    doc.display['P3'] = doc.display['P2']

    doc.inline['T1'] = ['','']
    doc.inline['T2'] = ['{\sl ','}']

    doc.translate['¬'] = 'XX'
    doc.translate['&apos;'] = '`'

    doc.load(filename)

    doc.convert

    doc.save
    end

    filename = ARGV[0]

    filename = 'content.xml' if not filename or filename.empty?

    convert('content.xml')
    =====================================
    ishamid, Dec 2, 2006
    #1
    1. Advertising

  2. Are you using OOo 2.0.4? I know it has a TeX/BibTeX export feature now...

    It's not Ruby, but it should work (unless you're using this with some
    sort of automated system). :)

    --Jeremy

    On 12/2/06, ishamid <> wrote:
    > [Total novice]
    >
    > A follow-up on my last email ("search and replace")". I am trying to
    > convert an OOo xml source (content.xml) to TeX. It's a bibliography and
    > thus very predictable/regular/simple etc. Each entry looks roughly like
    > this (simplified):
    >
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    > <text:p text:style-name=3D"ID">[<text:sequence text:ref-name=3D"refAutoNr=

    3"
    >
    > text:name=3D"AutoNr" text:formula=3D"ooow:AutoNr+1"
    > style:num-format=3D"1">4</text:sequence></text:p>
    > <text:p text:style-name=3D"Standard">Ben</text:p>
    > <text:p text:style-name=3D"reference">
    > <text:span text:style-name=3D"T10">Article</text:span>.,
    > <text:span text:style-name=3D"Style2">Journal</text:span>,
    > volume, issue, year.
    > </text:p>
    > <text:p text:style-name=3D"reference"/>
    > <text:p text:style-name=3D"reference"/>
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    >
    > I. line one is discussed in my last email. Basically, each line of this
    > type (numbers are variable) needs to be converted to
    >
    > =3D=3D=3D=3D
    > \head
    > =3D=3D=3D=3D
    >
    > II.
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    > <text:p text:style-name=3D"P6">Jim</text:p>
    > <text:p text:style-name=3D"P8">Michael</text:p>
    > <text:p text:style-name=3D"Standard">Ben</text:p>
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    >
    > replace each with the name plus a linespace
    >
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    > Jim
    >
    > Michael
    >
    > Ben
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    >
    > III. <text:span text:style-name=3D"T10">Article</text:span>
    >
    > If the style-name=3D"T10", then the argument should be, e.g. {\bf
    > Article}
    > if the style-name=3D"Style2", then argument should be, e.g. {\it
    > Journal}
    >
    > IV. So the final output should be something like
    >
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    > \head Ben
    >
    > {\bf Article}, {\it Journal}, volume, issue, year.
    >
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    >
    > I hope to get enough info here to be able to finish this myself. I
    > assume finishing my script would only take one of you guys 15 or 20
    > minutes ;-) If I'm not able to get things working quickly (trying to
    > learn Ruby and do my work at the same time) I will be happy to pay one
    > of you for an hour or so of work (I'm up against a deadline).
    >
    > THANK YOU
    > Idris
    >
    > PS For reference, here is the script I'm trying to modify for this OOo
    > bibliography:
    >
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    > class OpenOffice
    >
    > # using an xml parser if overkill and we need to regexp anyway
    >
    > attr_reader :display, :inline, :translate
    > attr_writer :display, :inline, :translate
    >
    > def initialize
    > @data =3D nil
    > @file =3D ''
    > @display =3D Hash.new
    > @inline =3D Hash.new
    > @translate =3D Hash.new
    > end
    >
    > def load(filename)
    > if not filename.empty? and FileTest.file?(filename) then
    > begin
    > @data, @file =3D IO.read(filename), filename
    > rescue
    > @data, @file =3D nil, ''
    > end
    > else
    > @data, @file =3D nil, ''
    > end
    > end
    >
    > def save(filename=3D'')
    > if filename.empty? then
    > filename =3D "clean-#{@file}"
    > end
    > if f =3D open(filename,'w') then
    > f.puts(@data)
    > f.close
    > end
    > end
    >
    > def convert
    > @translations =3D Hash.new
    > @translate.each do |k,v|
    > @translations[/#{k}/] =3D v
    > end
    > if @data then
    > @data.gsub!(/<\?.*?\?>/) do
    > # remove
    > end
    > @data.gsub!(/<!--.*?-->/) do
    > # remove
    > @data.gsub!(/<!--.*?-->/) do
    > # remove
    > end
    > @data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
    > '\starttext' + "\n" + $2 + "\n" + '\stoptext'
    > end
    >
    > @data.gsub!(/<(office:font-face-decls|office:automatic-styles|text:sequen=

    ce-decls).*?>.*?<\/\1>/mois)
    > do
    > # remove
    > end
    >
    > @data.gsub!(/<text:span.*?text:style-name=3D([\'\"])(.*?)\1>(.*?)<\/text:=

    span>/)
    > do
    > tag, text =3D $2, $3
    > if inline[tag] then
    > (inline[tag][0]||'') + clean_display(text) +
    > (inline[tag][1]||'')
    > else
    > clean_display(text)
    > end
    > end
    > @data.gsub!(/<text:p[^>]*?\/>/) do
    > # remove
    > end
    >
    > @data.gsub!(/<text:p.*?text:style-name=3D([\'\"])(.*?)\1>(.*?)<\/text:p>/=

    )
    > do
    > tag, text =3D $2, $3
    > if display[tag] then
    > "\n" + (display[tag][0]||'') + clean_inline(text) +
    > (display[tag][1]||'') + "\n"
    > else
    > "\n" + clean_inline(text) + "\n"
    > end
    > end
    > @data.gsub!(/\t/,' ')
    > @data.gsub!(/^ +$/,'')
    > @data.gsub!(/\n\n+/moi,"\n\n")
    > end
    > end
    >
    > def clean_display(str)
    > str.gsub!(/&quot;(.*?)&quot;/) do
    > '\quotation {' + $1 + '}'
    > end
    > str
    > end
    >
    > def clean_inline(str)
    > @translations.each do |k,v|
    > str.gsub!(k,v)
    > end
    > str
    > end
    >
    > end
    >
    > def convert(filename)
    >
    > doc =3D OpenOffice.new
    >
    > doc.display['P1'] =3D ['\chapter{','}']
    > doc.display['P2'] =3D ['\startparagraph'+"\n","\n"+'\stopparagraph']
    > doc.display['P3'] =3D doc.display['P2']
    >
    > doc.inline['T1'] =3D ['','']
    > doc.inline['T2'] =3D ['{\sl ','}']
    >
    > doc.translate['=AC'] =3D 'XX'
    > doc.translate['&apos;'] =3D '`'
    >
    > doc.load(filename)
    >
    > doc.convert
    >
    > doc.save
    > end
    >
    > filename =3D ARGV[0]
    >
    > filename =3D 'content.xml' if not filename or filename.empty?
    >
    > convert('content.xml')
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=

    =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    >
    >
    >
    Jeremy McAnally, Dec 2, 2006
    #2
    1. Advertising

  3. ishamid

    ishamid Guest

    Hi Jeremy,

    On Dec 2, 11:57 am, "Jeremy McAnally" <>
    wrote:
    > Are you using OOo 2.0.4? I know it has a TeX/BibTeX export feature now...


    Wow, I did not know this, but...

    > It's not Ruby, but it should work (unless you're using this with some
    > sort of automated system). :)


    I use ConTeXt, not LaTeX, and the two are really different, so...

    I am sending a note to the ConTeXt developers list about this; maybe
    some of them can port the OOo LaTeX filters to ConTeXt. In the meantime
    I think it's best to finish that script...

    Thank you very much for letting me know about OOo and LaTeX!

    Best
    Idris
    ishamid, Dec 2, 2006
    #3
  4. ishamid

    ishamid Guest

    On Dec 2, 12:21 pm, "ishamid" <> wrote:
    > Hi Jeremy,
    >
    > On Dec 2, 11:57 am, "Jeremy McAnally" <>
    > wrote:
    >
    > > Are you using OOo 2.0.4? I know it has a TeX/BibTeX export feature now...Wow, I did not know this, but...

    >
    > > It's not Ruby, but it should work (unless you're using this with some
    > > sort of automated system). :)


    I checked it out; the source is way too messy for my purposes; it will
    be much easier to convert the xml to ConTeXt than the LaTeX to ConTeXt.

    Thnx again
    Idris
    ishamid, Dec 2, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark McKay
    Replies:
    3
    Views:
    1,300
    Thomas Weidenfeller
    Jan 21, 2004
  2. Michael
    Replies:
    4
    Views:
    395
    Matt Hammond
    Jun 26, 2006
  3. Abby Lee
    Replies:
    5
    Views:
    375
    Abby Lee
    Aug 2, 2004
  4. Robert Klemme

    With a Ruby Yell: more, more more!

    Robert Klemme, Sep 28, 2005, in forum: Ruby
    Replies:
    5
    Views:
    201
    Jeff Wood
    Sep 29, 2005
  5. Replies:
    1
    Views:
    510
    Rainer Weikusat
    Jun 21, 2012
Loading...

Share This Page