double-slashes in input causing trouble

J

Jeremy Wells

I'm writing a ruby program that reads a file, reads sections out of that
file, writes a header to each section and then writes the whole file
back to disk. The problem is that if the section contains "\\" which
mine does in places, ruby replaces these with a single "\" without my
asking it to.

Here is the basics of the program:
body = ""
File.open(input, 'r') do |file|
body = file.read
end

if body =~ /^section\sheader(.*)section\sfooter/mi
original_section = $1
new_section = bit_at_top + original_section
new_body = body.sub(original_section, new_section)

File.open(input,'w') do |file|
file.write new_body
end
end

If the original_section contains "\\" then this gets replaced by "\",
can I stop this happening?

Jeremy
 
H

Hugh Sasse

On Fri, 24 Nov 2006, Jeremy Wells wrote:

[...]
The problem is that if the section contains "\\" which mine does in places,
ruby replaces these with a single "\" without my asking it to.

Here is the basics of the program:
body = ""
File.open(input, 'r') do |file|
body = file.read
end

if body =~ /^section\sheader(.*)section\sfooter/mi
original_section = $1
new_section = bit_at_top + original_section
new_body = body.sub(original_section, new_section)
new_body = body.sub(Regexp.new(Regexp.quote(original_section)),
new_section)
File.open(input,'w') do |file|
file.write new_body
end
end

# Hugh
 
R

Robert Klemme

Some more remarks:

On Fri, 24 Nov 2006, Jeremy Wells wrote:

[...]
The problem is that if the section contains "\\" which mine does in places,
ruby replaces these with a single "\" without my asking it to.

Here is the basics of the program:
body = ""

Initializing body with an empty string is superfluous - nil is more
efficient, but:

You could as well replace those lines with

body = File.read input

Dangerous to use .* which is greedy and will break if there are more
sections in one file!
new_body = body.sub(Regexp.new(Regexp.quote(original_section)),
new_section)

Now you do a replacement with sub which might replace some completely
different piece of text (i.e. especially if the text of original_section
appears outside a section or otherwise in multiple places.

So, combining these you get:

body = File.read input

if body.gsub!( %r{^(section\sheader)(.*?)(?=section\sfooter)}mi,
'\\1your_head\\2')
File.open(input, "w") {|io| io.write body}
end

Kind regards

robert
 
E

Edwin Fine

Hugh said:
On Fri, 24 Nov 2006, Jeremy Wells wrote:

[...]
original_section = $1
new_section = bit_at_top + original_section
new_body = body.sub(original_section, new_section)
new_body = body.sub(Regexp.new(Regexp.quote(original_section)),
new_section)
File.open(input,'w') do |file|
file.write new_body
end
end

# Hugh

Makes no difference. String#sub states that metacharacters in the
pattern will not be interpreted if the pattern is a String and not a
Regexp.

Check it out:

irb(main):061:0> x
=> "section
header\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection
footer"
irb(main):062:0> y
=> "\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\n"
irb(main):063:0> z
=> "xyzzy
\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\n"
irb(main):064:0> x.sub(y,z)
=> "section headerxyzzy
\nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection footer"
irb(main):065:0> x.sub(Regexp.new(Regexp.quote(y)),z)
=> "section headerxyzzy
\nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection footer"

Identical results.

The problem is that the backslashes in the REPLACEMENT string are being
interpreted.

The way to overcome this is to use the block form of sub:

new_body = body.sub(original_section) {|s| s = new_section}
 
H

Hugh Sasse

Hugh said:
new_body = body.sub(original_section, new_section)
new_body = body.sub(Regexp.new(Regexp.quote(original_section)),
new_section) [...]

# Hugh

Makes no difference. String#sub states that metacharacters in the
pattern will not be interpreted if the pattern is a String and not a
Regexp. [...]
The problem is that the backslashes in the REPLACEMENT string are being
interpreted.

Oops!
Hugh
 
J

Jeremy Wells

Edwin said:
Makes no difference. String#sub states that metacharacters in the
pattern will not be interpreted if the pattern is a String and not a
Regexp.

Check it out:

irb(main):061:0> x
=> "section
header\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection
footer"
irb(main):062:0> y
=> "\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\n"
irb(main):063:0> z
=> "xyzzy
\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\n"
irb(main):064:0> x.sub(y,z)
=> "section headerxyzzy
\nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection footer"
irb(main):065:0> x.sub(Regexp.new(Regexp.quote(y)),z)
=> "section headerxyzzy
\nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection footer"

Identical results.

The problem is that the backslashes in the REPLACEMENT string are being
interpreted.

The way to overcome this is to use the block form of sub:

new_body = body.sub(original_section) {|s| s = new_section}
Thanks, I might try that, it's better looking than my solution, which was:
m = Regexp.new("(" + Regexp.escape(original_section) + ")").match(body)
body[(m.begin(1)..m.end(1)-1)] = new_section
 
D

David Vallner

--------------enigAA7A61195A45AD6EEF15D569
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Edwin said:
new_body =3D body.sub(original_section) {|s| s =3D new_section}
=20

Using only {new_section} for the block should suffice, I doubt assigning
to a block parameter actually does anything outside the block.

David Vallner


--------------enigAA7A61195A45AD6EEF15D569
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (MingW32)

iD8DBQFFZhdvy6MhrS8astoRAqBIAJ9eGSPP06LoX6vv3PHXKcTj398YfACaA3I8
JbmVFm+zGoh79nDkLgLgQp8=
=+geW
-----END PGP SIGNATURE-----

--------------enigAA7A61195A45AD6EEF15D569--
 
R

Robert Klemme

Edwin said:
Makes no difference. String#sub states that metacharacters in the
pattern will not be interpreted if the pattern is a String and not a
Regexp.

Check it out:

irb(main):061:0> x
=> "section
header\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection
footer"
irb(main):062:0> y
=> "\nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\n"
irb(main):063:0> z
=> "xyzzy \nkjhKAJSHDKjashdkjASH\\\\\\\\\\\\\\\\KJahfdkasjhdfkajshdfjh\n"
irb(main):064:0> x.sub(y,z)
=> "section headerxyzzy
\nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection footer"
irb(main):065:0> x.sub(Regexp.new(Regexp.quote(y)),z)
=> "section headerxyzzy
\nkjhKAJSHDKjashdkjASH\\\\\\\\KJahfdkasjhdfkajshdfjh\nsection footer"

Identical results.

The problem is that the backslashes in the REPLACEMENT string are
being interpreted.

The way to overcome this is to use the block form of sub:

new_body = body.sub(original_section) {|s| s = new_section}
Thanks, I might try that, it's better looking than my solution, which was:
m = Regexp.new("(" + Regexp.escape(original_section) + ")").match(body)
body[(m.begin(1)..m.end(1)-1)] = new_section

Frankly, I don't understand why everybody is trying to fix backslashes
in replacement strings when there is gsub and grouping. It's easier and
more robust if you use grouping and use those groups in the replacement.
No problems with slashes in there (see my other posting).

Cheers

robert
 
J

Jan Svitok

or you can use references:

old_section = $1
new_body = body.sub(old_section, bit_at_top + '\&')

\& = the last match.

if there was gsub instead of sub, this would be slower as the
replacement takes place on every occurence. In this case, however,
there's max 1 occurence.

You can do as well:
- body = ""
- File.open(input, 'r') do |file|
- body = file.read
- end
+ body = File.read(input)

and

File.open(input,'w') do |file|
file.write new_body
- end
+ end unless new_body == body
 
J

Jeremy Wells

Jan said:
or you can use references:

old_section = $1
new_body = body.sub(old_section, bit_at_top + '\&')

\& = the last match.

if there was gsub instead of sub, this would be slower as the
replacement takes place on every occurence. In this case, however,
there's max 1 occurence.

You can do as well:
- body = ""
- File.open(input, 'r') do |file|
- body = file.read
- end
+ body = File.read(input)

and

File.open(input,'w') do |file|
file.write new_body
- end
+ end unless new_body == body
thanks, thats useful to know for the future. this was something of a run
once and its done program, and i've um run it now, so its done.
 
E

Edwin Fine

David said:
Using only {new_section} for the block should suffice, I doubt assigning
to a block parameter actually does anything outside the block.

David Vallner

Yes, I see. It works with the assignment as it is simply because the
result of the assignment expression becomes the return value of the
block. Thanks for pointing that out. I'm still learning Ruby :). And
loving it.
 
N

Nobuyoshi Nakada

Hi,

At Fri, 24 Nov 2006 01:39:05 +0900,
Jeremy Wells wrote in [ruby-talk:226339]:
if body =~ /^section\sheader(.*)section\sfooter/mi
original_section = $1
new_section = bit_at_top + original_section
new_body = body.sub(original_section, new_section)

You can use bang-version to replace and tell if it is done, at once.

if body.sub!(/^(section\sheader)(.*section\sfooter)/mi) {$1+bit_at_top+$2}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top