search and replace

Discussion in 'Ruby' started by ishamid, Dec 2, 2006.

  1. ishamid

    ishamid Guest

    [total novice here]

    Hi,

    I have a series of expressions like this (shortened from verbose xml)
    =====================
    [<text:sequence text:ref-name="refAutoNr0">1</text:sequence>
    [<text:sequence text:ref-name="refAutoNr1">2</text:sequence>
    [<text:sequence text:ref-name="refAutoNr2">3</text:sequence>
    [<text:sequence text:ref-name="refAutoNr3">4</text:sequence>
    =====================

    I want to globally replace each such line with just

    ====================
    \head
    ====================

    followed by a line space so I get

    ====================
    \head

    \head

    \head

    \head
    ====================

    etc.

    I am modifying a script with lines like

    ====================
    data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
    '\starttext' + "\n" + $2 + "\n" + '\stoptext'
    ====================

    and don't yet know enough to completely understand. Probably a few more
    hours/days of study will get me there but I need this urgently so...

    THNX in advance

    Best
    Idris
    ishamid, Dec 2, 2006
    #1
    1. Advertising

  2. --------------enigAF59EE22A0069F24937D8CA9
    Content-Type: text/plain; charset=ISO-8859-1
    Content-Transfer-Encoding: quoted-printable

    ishamid wrote:
    > [total novice here]
    >=20
    > Hi,
    >=20
    > I have a series of expressions like this (shortened from verbose xml)
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    > [<text:sequence text:ref-name=3D"refAutoNr0">1</text:sequence>
    > [<text:sequence text:ref-name=3D"refAutoNr1">2</text:sequence>
    > [<text:sequence text:ref-name=3D"refAutoNr2">3</text:sequence>
    > [<text:sequence text:ref-name=3D"refAutoNr3">4</text:sequence>
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    >=20
    > I want to globally replace each such line with just
    >=20
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    > \head
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    >=20
    > followed by a line space so I get
    >=20
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    > \head
    >=20
    > \head
    >=20
    > \head
    >=20
    > \head
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    >=20
    > etc.
    >=20
    > I am modifying a script with lines like
    >=20
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    > data.gsub!(/.*?<(office:text).*?>(.*?)<\/\1>.*/mois) do
    > '\starttext' + "\n" + $2 + "\n" + '\stoptext'
    > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
    >=20
    > and don't yet know enough to completely understand. Probably a few more=


    > hours/days of study will get me there but I need this urgently so...
    >=20
    > THNX in advance
    >=20


    Urght. *ducks*

    > Best
    > Idris
    >=20
    >=20


    Regexps and XML always tend to blow up for me. The pattern you're
    searching for seems to be a complete element, why not use <insert XML
    parser of choice> and XPath?

    With REXML, it should be something like:

    document.elements.each('//text:sequence') {|sequence|
    sequence.replace_with(REXML::Text.new("\\head\n", true))}

    Substitute the XPath expression with one of desired precision. I'm a
    little unsure around how REXML treats namespaces in XPath and such, but
    if you know what prefix will be used in the document, that should work ou=
    t.

    The script might also require a little more massaging if you're
    outputting to plaintext, but treating XML like, well, XML might get the
    heavy lifting of searching for patterns in it done faster if you use a
    pattern language operating on the DOM structure directly.

    David Vallner


    --------------enigAF59EE22A0069F24937D8CA9
    Content-Type: application/pgp-signature; name="signature.asc"
    Content-Description: OpenPGP digital signature
    Content-Disposition: attachment; filename="signature.asc"

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.5 (MingW32)

    iD8DBQFFccLUy6MhrS8astoRAslhAJ45yqsdcH9HGm/SvN2MtjAV75M1VgCeN0Tx
    UXiQ39eCXVG32nDKEQZjODI=
    =vyHu
    -----END PGP SIGNATURE-----

    --------------enigAF59EE22A0069F24937D8CA9--
    David Vallner, Dec 2, 2006
    #2
    1. Advertising

  3. ishamid

    ishamid Guest

    Hi Paul,

    On Dec 2, 10:56 am, Paul Lutus wrote:

    If you will post a short, complete data example, even just one record
    as it
    > appears in your database, so we don't have to try to read between the
    > lines, someone here will be happy to produce a way to filter the data in
    > the way you want.


    Ok, here are 4 bibliography entries. I just did a follow-up posting
    with more detail (including the full script I'm trying to modify) so
    you may prefer to respond to that one. Thank you very much for your
    help!.

    ======================
    <text:p text:style-name="ID">[<text:sequence text:ref-name="refAutoNr0"
    text:name="AutoNr" text:formula="ooow:AutoNr+1"
    style:num-format="1">1</text:sequence></text:p>
    <text:p text:style-name="P6">&apos;Abd al-Râziq, Ahmad</text:p>
    <text:p text:style-name="reference"><text:span
    text:style-name="T3">Die al-Azhar-Moschee</text:span><text:span
    text:style-name="T4">., in, </text:span><text:span
    text:style-name="T3">&quot;Schätze der Kalifen: Islamische Kunst zur
    Fatimidenzeit.&quot;</text:span><text:span text:style-name="T4">,
    Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien;
    Milan: Skira, 1998, pp. 144-147</text:span></text:p>
    <text:p text:style-name="P7"/>
    <text:p text:style-name="P7"/>
    <text:p text:style-name="ID"><text:span
    text:style-name="T5">[</text:span><text:sequence
    text:ref-name="refAutoNr1" text:name="AutoNr"
    text:formula="ooow:AutoNr+1"
    style:num-format="1">2</text:sequence></text:p>
    <text:p text:style-name="P8">&apos;Abd al-Râziq, Ahmad</text:p>
    <text:p text:style-name="reference"><text:span
    text:style-name="T6">La mosquée al-Azhar</text:span><text:span
    text:style-name="T7">., in, </text:span><text:span
    text:style-name="T6">&quot;Trésors fatimides du Caire. Exposition
    présentée à l&apos;Institut du Monde Arabe ...
    </text:span><text:span
    text:style-name="T8">1998.&quot;</text:span><text:span
    text:style-name="T9">, Paris: Institut du Monde Arabe, 1998, pp.
    147-149</text:span></text:p>
    <text:p text:style-name="P7"/>
    <text:p text:style-name="P7"/>
    <text:p text:style-name="ID">[<text:sequence
    text:ref-name="refAutoNr2" text:name="AutoNr"
    text:formula="ooow:AutoNr+1"
    style:num-format="1">3</text:sequence></text:p>
    <text:p text:style-name="Standard"><text:s/>&apos;Amri, Husay
    &apos;Abdallah</text:p>
    <text:p text:style-name="reference"><text:span
    text:style-name="T10">The Text of an Unpublished Fatwa of the Scholar
    al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
    Batiniyyah (Isma&apos;iliyyah) of the People of Hamdan</text:span>.,
    Translated by A.B.D.R. Eagle, <text:span text:style-name="Style2">New
    Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p>
    <text:p text:style-name="reference"/>
    <text:p text:style-name="reference"/>
    <text:p text:style-name="ID">[<text:sequence
    text:ref-name="refAutoNr3" text:name="AutoNr"
    text:formula="ooow:AutoNr+1"
    style:num-format="1">4</text:sequence></text:p>
    <text:p text:style-name="Standard">Abarahamov, Binyamin</text:p>
    <text:p text:style-name="reference"><text:span
    text:style-name="T10">An Isma&apos;ili Epistemology: The Case of
    Al-Da&apos;i al-Mutlaq &apos;Ali b. Muhammad b. al-Walid</text:span>.,
    <text:span text:style-name="Style2">Journal of Semitic
    Studies</text:span>, 41ii (1996), pp. 263-273.</text:p>
    <text:p text:style-name="reference"/>
    <text:p text:style-name="reference"/>

    ======================
    ishamid, Dec 2, 2006
    #3
  4. ishamid

    ishamid Guest

    Thank you, David, for your pointers. I'm still very much a novice (at
    the level of Chris Pine's Learn to Program) so I could not follow them
    all, but I do hope to learn more fast. I just sent a follow-up with
    more detail, including the script I'm trying to modify; I hope you have
    a chance to look at it...

    Thank you again
    Idris

    On Dec 2, 11:15 am, David Vallner <> wrote:

    > Regexps and XML always tend to blow up for me. The pattern you're
    > searching for seems to be a complete element, why not use <insert XML
    > parser of choice> and XPath?
    >
    > With REXML, it should be something like:
    >
    > document.elements.each('//text:sequence') {|sequence|
    > sequence.replace_with(REXML::Text.new("\\head\n", true))}
    >
    > Substitute the XPath expression with one of desired precision. I'm a
    > little unsure around how REXML treats namespaces in XPath and such, but
    > if you know what prefix will be used in the document, that should work out.
    >
    > The script might also require a little more massaging if you're
    > outputting to plaintext, but treating XML like, well, XML might get the
    > heavy lifting of searching for patterns in it done faster if you use a
    > pattern language operating on the DOM structure directly.
    ishamid, Dec 2, 2006
    #4
  5. ishamid wrote:
    > Hi Paul,
    >
    > On Dec 2, 10:56 am, Paul Lutus wrote:
    >
    > If you will post a short, complete data example, even just one record
    > as it
    > > appears in your database, so we don't have to try to read between the
    > > lines, someone here will be happy to produce a way to filter the data in
    > > the way you want.

    >
    > Ok, here are 4 bibliography entries. I just did a follow-up posting
    > with more detail (including the full script I'm trying to modify) so
    > you may prefer to respond to that one. Thank you very much for your
    > help!.
    >
    > ======================
    > <text:p text:style-name="ID">[<text:sequence text:ref-name="refAutoNr0"
    > text:name="AutoNr" text:formula="ooow:AutoNr+1"
    > style:num-format="1">1</text:sequence></text:p>
    > <text:p text:style-name="P6">&apos;Abd al-Râziq, Ahmad</text:p>
    > <text:p text:style-name="reference"><text:span
    > text:style-name="T3">Die al-Azhar-Moschee</text:span><text:span
    > text:style-name="T4">., in, </text:span><text:span
    > text:style-name="T3">&quot;Schätze der Kalifen: Islamische Kunst zur
    > Fatimidenzeit.&quot;</text:span><text:span text:style-name="T4">,
    > Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien;
    > Milan: Skira, 1998, pp. 144-147</text:span></text:p>
    > <text:p text:style-name="P7"/>
    > <text:p text:style-name="P7"/>
    > <text:p text:style-name="ID"><text:span
    > text:style-name="T5">[</text:span><text:sequence
    > text:ref-name="refAutoNr1" text:name="AutoNr"
    > text:formula="ooow:AutoNr+1"
    > style:num-format="1">2</text:sequence></text:p>
    > <text:p text:style-name="P8">&apos;Abd al-Râziq, Ahmad</text:p>
    > <text:p text:style-name="reference"><text:span
    > text:style-name="T6">La mosquée al-Azhar</text:span><text:span
    > text:style-name="T7">., in, </text:span><text:span
    > text:style-name="T6">&quot;Trésors fatimides du Caire. Exposition
    > présentée à l&apos;Institut du Monde Arabe ...
    > </text:span><text:span
    > text:style-name="T8">1998.&quot;</text:span><text:span
    > text:style-name="T9">, Paris: Institut du Monde Arabe, 1998, pp.
    > 147-149</text:span></text:p>
    > <text:p text:style-name="P7"/>
    > <text:p text:style-name="P7"/>
    > <text:p text:style-name="ID">[<text:sequence
    > text:ref-name="refAutoNr2" text:name="AutoNr"
    > text:formula="ooow:AutoNr+1"
    > style:num-format="1">3</text:sequence></text:p>
    > <text:p text:style-name="Standard"><text:s/>&apos;Amri, Husay
    > &apos;Abdallah</text:p>
    > <text:p text:style-name="reference"><text:span
    > text:style-name="T10">The Text of an Unpublished Fatwa of the Scholar
    > al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
    > Batiniyyah (Isma&apos;iliyyah) of the People of Hamdan</text:span>.,
    > Translated by A.B.D.R. Eagle, <text:span text:style-name="Style2">New
    > Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p>
    > <text:p text:style-name="reference"/>
    > <text:p text:style-name="reference"/>
    > <text:p text:style-name="ID">[<text:sequence
    > text:ref-name="refAutoNr3" text:name="AutoNr"
    > text:formula="ooow:AutoNr+1"
    > style:num-format="1">4</text:sequence></text:p>
    > <text:p text:style-name="Standard">Abarahamov, Binyamin</text:p>
    > <text:p text:style-name="reference"><text:span
    > text:style-name="T10">An Isma&apos;ili Epistemology: The Case of
    > Al-Da&apos;i al-Mutlaq &apos;Ali b. Muhammad b. al-Walid</text:span>.,
    > <text:span text:style-name="Style2">Journal of Semitic
    > Studies</text:span>, 41ii (1996), pp. 263-273.</text:p>
    > <text:p text:style-name="reference"/>
    > <text:p text:style-name="reference"/>
    >
    > ======================


    puts DATA.read.gsub( %r{<(text:sequence)\s[^>]*>(.*?)</\1>}i,
    "\\starttext\n\\2\n\\stoptext" )

    --- output -----

    <text:p text:style-name="ID">[\starttext
    1
    \stoptext</text:p>
    <text:p text:style-name="P6">&apos;Abd al-R\xC3\xA2ziq,
    Ahmad</text:p>
    <text:p text:style-name="reference"><text:span
    text:style-name="T3">Die al-Azhar-Moschee</text:span><text:span
    text:style-name="T4">., in, </text:span><text:span
    text:style-name="T3">&quot;Sch\xC3\xA4tze der Kalifen: Islamische Kunst
    zur
    Fatimidenzeit.&quot;</text:span><text:span text:style-name="T4">,
    Herausgegeben von W. Seipel, Vienna: Kunsthistorisches Museum Wien;
    Milan: Skira, 1998, pp. 144-147</text:span></text:p>
    <text:p text:style-name="P7"/>
    <text:p text:style-name="P7"/>
    <text:p text:style-name="ID"><text:span
    text:style-name="T5">[</text:span>\starttext
    2
    \stoptext</text:p>
    <text:p text:style-name="P8">&apos;Abd al-R\xC3\xA2ziq,
    Ahmad</text:p>
    <text:p text:style-name="reference"><text:span
    text:style-name="T6">La mosqu\xC3(C)e al-Azhar</text:span><text:span
    text:style-name="T7">., in, </text:span><text:span
    text:style-name="T6">&quot;Tr\xC3(C)sors fatimides du Caire. Exposition
    pr\xC3(C)sent\xC3(C)e \xC3 l&apos;Institut du Monde Arabe ...
    </text:span><text:span
    text:style-name="T8">1998.&quot;</text:span><text:span
    text:style-name="T9">, Paris: Institut du Monde Arabe, 1998, pp.
    147-149</text:span></text:p>
    <text:p text:style-name="P7"/>
    <text:p text:style-name="P7"/>
    <text:p text:style-name="ID">[\starttext
    3
    \stoptext</text:p>
    <text:p text:style-name="Standard"><text:s/>&apos;Amri, Husay
    &apos;Abdallah</text:p>
    <text:p text:style-name="reference"><text:span
    text:style-name="T10">The Text of an Unpublished Fatwa of the Scholar
    al-Maqbali (d. 1108/1728) Concerning the Legal Position of the
    Batiniyyah (Isma&apos;iliyyah) of the People of Hamdan</text:span>.,
    Translated by A.B.D.R. Eagle, <text:span text:style-name="Style2">New
    Arabian Studies</text:span>, 2 (1994), pp. 165-174.</text:p>
    <text:p text:style-name="reference"/>
    <text:p text:style-name="reference"/>
    <text:p text:style-name="ID">[\starttext
    4
    \stoptext</text:p>
    <text:p text:style-name="Standard">Abarahamov, Binyamin</text:p>
    <text:p text:style-name="reference"><text:span
    text:style-name="T10">An Isma&apos;ili Epistemology: The Case of
    Al-Da&apos;i al-Mutlaq &apos;Ali b. Muhammad b. al-Walid</text:span>.,
    <text:span text:style-name="Style2">Journal of Semitic
    Studies</text:span>, 41ii (1996), pp. 263-273.</text:p>
    <text:p text:style-name="reference"/>
    <text:p text:style-name="reference"/>
    William James, Dec 3, 2006
    #5
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Mark McKay
    Replies:
    3
    Views:
    1,311
    Thomas Weidenfeller
    Jan 21, 2004
  2. Brian Blais
    Replies:
    1
    Views:
    378
    Bruno Desthuilliers
    Jun 27, 2006
  3. Greg Ewing
    Replies:
    2
    Views:
    344
    Dieter Maurer
    Jun 29, 2006
  4. Abby Lee
    Replies:
    5
    Views:
    400
    Abby Lee
    Aug 2, 2004
  5. Replies:
    1
    Views:
    518
    Rainer Weikusat
    Jun 21, 2012
Loading...

Share This Page