double-slashes in input causing trouble

Discussion in 'Ruby' started by Jeremy Wells, Nov 23, 2006.

  1. Jeremy Wells

    Jeremy Wells Guest

    I'm writing a ruby program that reads a file, reads sections out of that
    file, writes a header to each section and then writes the whole file
    back to disk. The problem is that if the section contains "\\" which
    mine does in places, ruby replaces these with a single "\" without my
    asking it to.

    Here is the basics of the program:
    body = ""
    File.open(input, 'r') do |file|
    body = file.read
    end

    if body =~ /^section\sheader(.*)section\sfooter/mi
    original_section = $1
    new_section = bit_at_top + original_section
    new_body = body.sub(original_section, new_section)

    File.open(input,'w') do |file|
    file.write new_body
    end
    end

    If the original_section contains "\\" then this gets replaced by "\",
    can I stop this happening?

    Jeremy
     
    Jeremy Wells, Nov 23, 2006
    #1
    1. Advertising

  2. Jeremy Wells

    Hugh Sasse Guest

    On Fri, 24 Nov 2006, Jeremy Wells wrote:

    [...]
    > The problem is that if the section contains "\\" which mine does in places,
    > ruby replaces these with a single "\" without my asking it to.
    >
    > Here is the basics of the program:
    > body = ""
    > File.open(input, 'r') do |file|
    > body = file.read
    > end
    >
    > if body =~ /^section\sheader(.*)section\sfooter/mi
    > original_section = $1
    > new_section = bit_at_top + original_section
    > new_body = body.sub(original_section, new_section)

    new_body = body.sub(Regexp.new(Regexp.quote(original_section)),
    new_section)
    >
    > File.open(input,'w') do |file|
    > file.write new_body
    > end
    > end


    # Hugh
     
    Hugh Sasse, Nov 23, 2006
    #2
    1. Advertising

  3. Some more remarks:

    On 23.11.2006 17:53, Hugh Sasse wrote:
    > On Fri, 24 Nov 2006, Jeremy Wells wrote:
    >
    > [...]
    >> The problem is that if the section contains "\\" which mine does in places,
    >> ruby replaces these with a single "\" without my asking it to.
    >>
    >> Here is the basics of the program:
    >> body = ""


    Initializing body with an empty string is superfluous - nil is more
    efficient, but:

    >> File.open(input, 'r') do |file|
    >> body = file.read
    >> end


    You could as well replace those lines with

    body = File.read input

    >> if body =~ /^section\sheader(.*)section\sfooter/mi


    Dangerous to use .* which is greedy and will break if there are more
    sections in one file!

    >> original_section = $1
    >> new_section = bit_at_top + original_section
    >> new_body = body.sub(original_section, new_section)

    > new_body = body.sub(Regexp.new(Regexp.quote(original_section)),
    > new_section)


    Now you do a replacement with sub which might replace some completely
    different piece of text (i.e. especially if the text of original_section
    appears outside a section or otherwise in multiple places.

    >> File.open(input,'w') do |file|
    >> file.write new_body
    >> end
    >> end

    >
    > # Hugh
    >


    So, combining these you get:

    body = File.read input

    if body.gsub!( %r{^(section\sheader)(.*?)(?=section\sfooter)}mi,
    '\\1your_head\\2')
    File.open(input, "w") {|io| io.write body}
    end

    Kind regards

    robert
     
    Robert Klemme, Nov 23, 2006
    #3
  4. Jeremy Wells

    Edwin Fine Guest

    Hugh Sasse wrote:
    > On Fri, 24 Nov 2006, Jeremy Wells wrote:
    >
    > [...]
    >> original_section = $1
    >> new_section = bit_at_top + original_section
    >> new_body = body.sub(original_section, new_section)

    > new_body = body.sub(Regexp.new(Regexp.quote(original_section)),
    > new_section)
    >>
    >> File.open(input,'w') do |file|
    >> file.write new_body
    >> end
    >> end

    >
    > # Hugh


    Makes no difference. String#sub states that metacharacters in the
    pattern will not be interpreted if the pattern is a String and not a
    Regexp.

    Check it out:

    irb(main):061:0> x
    => "section
    header\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection
    footer"
    irb(main):062:0> y
    => "\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\n"
    irb(main):063:0> z
    => "xyzzy
    \nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\n"
    irb(main):064:0> x.sub(y,z)
    => "section headerxyzzy
    \nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection footer"
    irb(main):065:0> x.sub(Regexp.new(Regexp.quote(y)),z)
    => "section headerxyzzy
    \nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection footer"

    Identical results.

    The problem is that the backslashes in the REPLACEMENT string are being
    interpreted.

    The way to overcome this is to use the block form of sub:

    new_body = body.sub(original_section) {|s| s = new_section}

    --
    Posted via http://www.ruby-forum.com/.
     
    Edwin Fine, Nov 23, 2006
    #4
  5. Jeremy Wells

    Hugh Sasse Guest

    On Fri, 24 Nov 2006, Edwin Fine wrote:

    > Hugh Sasse wrote:
    > > On Fri, 24 Nov 2006, Jeremy Wells wrote:
    > >> new_body = body.sub(original_section, new_section)

    > > new_body = body.sub(Regexp.new(Regexp.quote(original_section)),
    > > new_section)

    [...]
    > >> end

    > >
    > > # Hugh

    >
    > Makes no difference. String#sub states that metacharacters in the
    > pattern will not be interpreted if the pattern is a String and not a
    > Regexp.

    [...]
    > The problem is that the backslashes in the REPLACEMENT string are being
    > interpreted.


    Oops!
    Hugh
     
    Hugh Sasse, Nov 23, 2006
    #5
  6. Jeremy Wells

    Jeremy Wells Guest

    Edwin Fine wrote:
    > Makes no difference. String#sub states that metacharacters in the
    > pattern will not be interpreted if the pattern is a String and not a
    > Regexp.
    >
    > Check it out:
    >
    > irb(main):061:0> x
    > => "section
    > header\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection
    > footer"
    > irb(main):062:0> y
    > => "\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\n"
    > irb(main):063:0> z
    > => "xyzzy
    > \nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\n"
    > irb(main):064:0> x.sub(y,z)
    > => "section headerxyzzy
    > \nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection footer"
    > irb(main):065:0> x.sub(Regexp.new(Regexp.quote(y)),z)
    > => "section headerxyzzy
    > \nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection footer"
    >
    > Identical results.
    >
    > The problem is that the backslashes in the REPLACEMENT string are being
    > interpreted.
    >
    > The way to overcome this is to use the block form of sub:
    >
    > new_body = body.sub(original_section) {|s| s = new_section}
    >

    Thanks, I might try that, it's better looking than my solution, which was:
    m = Regexp.new("(" + Regexp.escape(original_section) + ")").match(body)
    body[(m.begin(1)..m.end(1)-1)] = new_section
     
    Jeremy Wells, Nov 23, 2006
    #6
  7. --------------enigAA7A61195A45AD6EEF15D569
    Content-Type: text/plain; charset=UTF-8
    Content-Transfer-Encoding: quoted-printable

    Edwin Fine wrote:
    > new_body =3D body.sub(original_section) {|s| s =3D new_section}
    >=20


    Using only {new_section} for the block should suffice, I doubt assigning
    to a block parameter actually does anything outside the block.

    David Vallner


    --------------enigAA7A61195A45AD6EEF15D569
    Content-Type: application/pgp-signature; name="signature.asc"
    Content-Description: OpenPGP digital signature
    Content-Disposition: attachment; filename="signature.asc"

    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.5 (MingW32)

    iD8DBQFFZhdvy6MhrS8astoRAqBIAJ9eGSPP06LoX6vv3PHXKcTj398YfACaA3I8
    JbmVFm+zGoh79nDkLgLgQp8=
    =+geW
    -----END PGP SIGNATURE-----

    --------------enigAA7A61195A45AD6EEF15D569--
     
    David Vallner, Nov 23, 2006
    #7
  8. On 23.11.2006 19:50, Jeremy Wells wrote:
    > Edwin Fine wrote:
    >> Makes no difference. String#sub states that metacharacters in the
    >> pattern will not be interpreted if the pattern is a String and not a
    >> Regexp.
    >>
    >> Check it out:
    >>
    >> irb(main):061:0> x
    >> => "section
    >> header\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection
    >> footer"
    >> irb(main):062:0> y
    >> => "\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\n"
    >> irb(main):063:0> z
    >> => "xyzzy \nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\n"
    >> irb(main):064:0> x.sub(y,z)
    >> => "section headerxyzzy
    >> \nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection footer"
    >> irb(main):065:0> x.sub(Regexp.new(Regexp.quote(y)),z)
    >> => "section headerxyzzy
    >> \nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection footer"
    >>
    >> Identical results.
    >>
    >> The problem is that the backslashes in the REPLACEMENT string are
    >> being interpreted.
    >>
    >> The way to overcome this is to use the block form of sub:
    >>
    >> new_body = body.sub(original_section) {|s| s = new_section}
    >>

    > Thanks, I might try that, it's better looking than my solution, which was:
    > m = Regexp.new("(" + Regexp.escape(original_section) + ")").match(body)
    > body[(m.begin(1)..m.end(1)-1)] = new_section


    Frankly, I don't understand why everybody is trying to fix backslashes
    in replacement strings when there is gsub and grouping. It's easier and
    more robust if you use grouping and use those groups in the replacement.
    No problems with slashes in there (see my other posting).

    Cheers

    robert
     
    Robert Klemme, Nov 23, 2006
    #8
  9. Jeremy Wells

    Jan Svitok Guest

    or you can use references:

    old_section = $1
    new_body = body.sub(old_section, bit_at_top + '\&')

    \& = the last match.

    if there was gsub instead of sub, this would be slower as the
    replacement takes place on every occurence. In this case, however,
    there's max 1 occurence.

    You can do as well:
    - body = ""
    - File.open(input, 'r') do |file|
    - body = file.read
    - end
    + body = File.read(input)

    and

    File.open(input,'w') do |file|
    file.write new_body
    - end
    + end unless new_body == body
     
    Jan Svitok, Nov 23, 2006
    #9
  10. Jeremy Wells

    Jeremy Wells Guest

    Jan Svitok wrote:
    > or you can use references:
    >
    > old_section = $1
    > new_body = body.sub(old_section, bit_at_top + '\&')
    >
    > \& = the last match.
    >
    > if there was gsub instead of sub, this would be slower as the
    > replacement takes place on every occurence. In this case, however,
    > there's max 1 occurence.
    >
    > You can do as well:
    > - body = ""
    > - File.open(input, 'r') do |file|
    > - body = file.read
    > - end
    > + body = File.read(input)
    >
    > and
    >
    > File.open(input,'w') do |file|
    > file.write new_body
    > - end
    > + end unless new_body == body
    >

    thanks, thats useful to know for the future. this was something of a run
    once and its done program, and i've um run it now, so its done.
     
    Jeremy Wells, Nov 24, 2006
    #10
  11. Jeremy Wells

    Edwin Fine Guest

    David Vallner wrote:
    > Edwin Fine wrote:
    >> new_body = body.sub(original_section) {|s| s = new_section}
    >>

    >
    > Using only {new_section} for the block should suffice, I doubt assigning
    > to a block parameter actually does anything outside the block.
    >
    > David Vallner


    Yes, I see. It works with the assignment as it is simply because the
    result of the assignment expression becomes the return value of the
    block. Thanks for pointing that out. I'm still learning Ruby :). And
    loving it.

    --
    Posted via http://www.ruby-forum.com/.
     
    Edwin Fine, Nov 24, 2006
    #11
  12. Hi,

    At Fri, 24 Nov 2006 01:39:05 +0900,
    Jeremy Wells wrote in [ruby-talk:226339]:
    > if body =~ /^section\sheader(.*)section\sfooter/mi
    > original_section = $1
    > new_section = bit_at_top + original_section
    > new_body = body.sub(original_section, new_section)


    You can use bang-version to replace and tell if it is done, at once.

    if body.sub!(/^(section\sheader)(.*section\sfooter)/mi) {$1+bit_at_top+$2}
    > File.open(input,'w') do |file|

    file.write body
    > end
    > end


    --
    Nobu Nakada
     
    Nobuyoshi Nakada, Nov 24, 2006
    #12
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. qazmlp
    Replies:
    5
    Views:
    691
    Michael Dunn
    Apr 7, 2004
  2. Sydex
    Replies:
    12
    Views:
    6,501
    Victor Bazarov
    Feb 17, 2005
  3. Harry George
    Replies:
    5
    Views:
    741
    Gerrit Holl
    Jan 19, 2004
  4. Carlos Ribeiro
    Replies:
    10
    Views:
    506
    Peter Hansen
    Sep 15, 2004
  5. Dan Wilkin
    Replies:
    1
    Views:
    262
    robic0
    Jul 17, 2006
Loading...

Share This Page