Stopping String Escaping.

Phil Cooper-king · Jan 7, 2010

Hi,

I'm trying to parse code snippets on a website that are submitted by the
user. the problem is that when a user tries to shop escaping in there
code the escaping actually happens.

for instance if you submit \\ ruby teats it as a single \ is there
anyway to stop this? I still require all the other \'s such as \n etc.

Thanks
Phil.

Brian Candler · Jan 7, 2010

Phil said:
Hi,

I'm trying to parse code snippets on a website that are submitted by the
user. the problem is that when a user tries to shop escaping in there
code the escaping actually happens.

for instance if you submit \\ ruby teats it as a single \ is there
anyway to stop this? I still require all the other \'s such as \n etc.

How are you parsing them?

If you are using File.read() then no unescaping is done.

If you are parsing them using eval(), then you are inviting your machine
to be 0wned. See
http://www.ruby-doc.org/docs/ProgrammingRuby/html/taint.html

If you are parsing them some other way, then please explain it. Please
also explain what "shop escaping" is.

Regards,

Brian.

Phil Cooper-king · Jan 7, 2010

How are you parsing them?

If you are using File.read() then no unescaping is done.

I am using rails and redcloth, I have the plain-text in the database,
and the text gets parsed when the view gets called atm.

I am using Uv to for the syntax, which I pull out before sending to
redcloth

Code:

def snatch_code(text)
snippets = text.scan(/#>code\((\S+)\)(.+?)#>code/m)

snippets.each do |snip|
code = Uv.parse(snip[1], 'xhtml', snip[0], false, 'twilight')
code.insert(0, "<notextile>")
code.insert(code.length, "</notextile>")
text.sub!(/#>code\((\S+)\)(.+?)#>code/m, code)
end

text
end

then redcloth parses it.

If you are parsing them using eval(), then you are inviting your machine
to be 0wned. See
http://www.ruby-doc.org/docs/ProgrammingRuby/html/taint.html

ouch. and thanks

If you are parsing them some other way, then please explain it. Please
also explain what "shop escaping" is.

dyslexia rules! KO!

I want to stop the escaping thatâ€™s not dealing with whitespace, tab, new
line etc.

Phil.

Brian Candler · Jan 7, 2010

Phil said:

I am using Uv to for the syntax, which I pull out before sending to
redcloth

Code:

def snatch_code(text)
snippets = text.scan(/#>code\((\S+)\)(.+?)#>code/m)

snippets.each do |snip|
code = Uv.parse(snip[1], 'xhtml', snip[0], false, 'twilight')
code.insert(0, "<notextile>")
code.insert(code.length, "</notextile>")
text.sub!(/#>code\((\S+)\)(.+?)#>code/m, code)
end

text
end

OK, then what I suggest is you make a standalone test case, outside of
Rails.

source = <<'EOS'
Put your sample source code here
EOS
# Print it to be sure it hasn't already been escaped by Ruby
# Now process it with Uv
# Show the intermediate state
# Now process it with Redcloth
# Show the final state

Then you can see whether the problem is with Uv, or with Redcloth.

Then the question becomes much more focussed - for example, it might be
"how do I stop Redcloth turning \\ into \ inside a <notextile> section?"

Phil Cooper-king · Jan 8, 2010

OK, then what I suggest is you make a standalone test case, outside of

Rails.

source = <<'EOS'
Put your sample source code here
EOS

yeah I did this as well.

Code:

require 'rubygems'
require 'uv'

un_parsed =<<ENDOF
\\
ENDOF

parsed = Uv.parse(un_parsed, "xhtml", "c++", false, "twilight")
=> \

puts un_parsed
=> \

in both cases the slash gets lost. I expect the \ to be lost in puts
tho. Using the dump I see the double slash is still there.

Brian Candler · Jan 8, 2010

Phil said:
un_parsed =<<ENDOF
\\
ENDOF

Unfortunately, here the \\ is being turned into a single backslash by
ruby, the same as inside a quoted string. In other words, the same as
this:

irb(main):001:0> "\\".size
=> 1
irb(main):002:0> '\\'.size
=> 1

The simplest way of preventing this is to read unparsed from a file, or
you can have an inline dataset at the end of your source code, like
this:

unparsed = DATA.read
... rest of your code goes here

__END__
\\

I expect the \ to be lost in puts
tho.

No, puts *never* converts two backslashes into one. If your string
contains two backslashes, puts will show two backslashes.

Using the dump I see the double slash is still there.

No, this is the opposite. String#inspect turns a raw string into a
quoted string for display purposes, and as part of this quoting a single
backslash is displayed as two backslashes.

Look at this:

irb(main):001:0> s = 92.chr
=> "\\"
irb(main):002:0> s.size
=> 1
irb(main):003:0> puts s
\
=> nil
irb(main):004:0> s2 = s + s
=> "\\\\"
irb(main):005:0> s2.size
=> 2
irb(main):006:0> puts s2
\\
=> nil

Hopefully it's clear from the above that string s has one character (a
single backslash), and s2 has two backslashes. But these are displayed
in quoted form in irb as

"\\"
"\\\\"

respectively. puts displays them correctly.

Similarly, a single newline character is displayed as backslash-n when
inspect gives the quoted form; whereas puts actually prints a newline.

irb(main):009:0> nl = 10.chr
=> "\n"
irb(main):010:0> nl.size
=> 1
irb(main):011:0> puts nl

=> nil

So try your test case again:
(1) Use the DATA.read / __END__ to get the test source in
(2) Use 'puts' and not 'dump' to see clearly what you have

Phil Cooper-king · Jan 8, 2010

Hopefully it's clear from the above
yes, thanks you.

So try your test case again:
(1) Use the DATA.read / __END__ to get the test source in
(2) Use 'puts' and not 'dump' to see clearly what you have

Code:

require 'rubygems'
require 'redcloth'

data_read = DATA.read
string = "\\"

puts RedCloth.new(string).to_html
puts RedCloth.new(data_read).to_html

__END__
\\

yeilds
\
\\

although I have no idea how to treat a string as a file.
is all this to do with encoding? (sorry if that was a dense question)

erb results are similar, which I would have though was be happening in
rails anyway=> "_erbout = ''; _erbout.concat \"\\\\\"; _erbout"

Brian Candler · Jan 8, 2010

Phil said:
Hopefully it's clear from the above

Click to expand...

yes, thanks you.

So try your test case again:
(1) Use the DATA.read / __END__ to get the test source in
(2) Use 'puts' and not 'dump' to see clearly what you have

Code:

[/QUOTE] ... data_read = DATA.read string = "\\" ... __END__ \\

Click to expand...

So in this program, 'data_read' contains two backslash characters; and
'string' contains a single backslash character.

yeilds
\
\\

Click to expand...

That looks correct to me - HTML doesn't need a backslash to be escaped.
So now add Uv into your test to see if that is munging the backslashes.

although I have no idea how to treat a string as a file.

Click to expand...

A string is just a string. In ruby 1.8 it's a sequence of bytes; in ruby
1.9 it's a sequence of characters. But that doesn't matter here; a
backslash is a backslash, and is both a single character and a single
byte in either ASCII or UTF-8.

However if you enter a string *literal* in a ruby program (or in IRB),
then it is parsed with backslash escaping rules to turn it into an
actual String object. For example:

a = "abc\ndef"
b = 'abc\ndef'

string 'a' contains 7 characters (a,b,c,newline,d,e,f), whereas string b
contains 8 characters (a,b,c,backslash,n,d,e,f). This is because there
are different escaping rules for double-quoted and single-quoted
strings.

In a single-quoted string literal, \' is a single quote, and \\ is a
backslash, and everything else is treated literally, so \n is two
characters \ and n.

In a double-quoted string literal, \" is a double quote, \n is a
newline, \\ is a backslash, and there's a whole load of other expansion
including #{...} for expression interpolation and #@... for instance
variable substitution.

erb results are similar, which I would have though was be happening in
rails anyway
=> "_erbout = ''; _erbout.concat \"\\\\\"; _erbout"

Click to expand...

Now you're just scaring yourself with backslash escaping

Firstly, note that you passed a single backslash character to ERB.
That's what the string literal "\\" creates.

ERB compiled it to the following Ruby code:

_erbout = ''; _erbout.concat "\\"; _erbout

which just appends a single backslash to _erbout, which is what you
expect.

However, IRB displays the returned string from ERB.new using
String#inspect, so it is turned into a double-quoted string. This means:
1. A " is added to the start and end of the string
2. Any " within the string is displayed as \"
3. Any \ within the string is displayed as \\

In other words, String#inspect turns a string into a Ruby string literal
- something that you could paste directly into IRB. Try it:

That will show you the actual contents of str, which is the Ruby code I
pasted above.

HTH,

Brian.

Brian Candler · Jan 8, 2010

Here's the kind of standalone test I was thinking of.

----- 8< -------------------------------------------------
require 'rubygems'
require 'uv'
require 'redcloth'

snip = DATA.read
code = Uv.parse(snip, 'xhtml', 'ruby', false, 'twilight')
code.insert(0, "<notextile>")
code.insert(code.length, "</notextile>")
puts RedCloth.new(code).to_html

__END__
puts "Hello world!\n"
puts "Hello\\one backslash"
----- 8< -------------------------------------------------

And for me the output it gives is:

<pre class="twilight">puts "Hello world!\n"
puts "Hello\\one backslash"
</pre>

This looks correct to me. So can you provide an example where it fails?
Otherwise you need to look elsewhere in your application to see if
you're providing the wrong input into Uv, or you're handling the output
wrongly.

Or maybe you have an old gem with a bug which has since been fixed. I'm
using:

ultraviolet (0.10.2)
RedCloth (4.2.2)

Phil Cooper-king · Jan 8, 2010

thanks again

yep I have the same gems and the same result running your code.

I went nuts with the puts all over the place

fromdb: "##code(ruby)\r\n'\\\\'\r\n##code\r\n"

before: "##code(ruby)\n'\\\\'\n##code\n"

before parse: "\n'\\\\'\n"

after parse: "<pre class=\"twilight\">\n'\\\\'\n</pre>"

after insert: "<notextile><pre class=\"twilight\">\n'\\\\'\n</pre></notextile>"

after sub: "<notextile><pre class=\"twilight\">\n'\\'\n</pre></notextile>\n"

so after the sub section I loose two of the back slashes

Code:

text.sub!(/##code\((\S+)\)(.+?)##code/m, code)

Brian Candler · Jan 8, 2010

Phil said:
so after the sub section I loose two of the back slashes

Code:

text.sub!(/##code$(\S+)$(.+?)##code/m, code)

Ah yes, backslashes have a special interpretation in the
string-replacement part of a (g)sub too: \1 means the first capture, \2
means the second capture etc, so \\ means a single backslash.

Note that the replacement string here is two backslashes:
a\c
=> nil

The easy solution is to use the block form of sub instead.
a\\c
=> nil

You could simplify your code if you rewrote to use the block form of
gsub anyway.

text.gsub!(/#>code$(\S+)$(.+?)#>code/m) do |snip|
... make a string containing the marked-up code
end

Brian Candler · Jan 8, 2010

You could simplify your code if you rewrote to use the block form of

gsub anyway.

Try this:

----- 8< -------------------------------------------------
require 'rubygems'
require 'uv'
require 'redcloth'

text = DATA.read
text.gsub!(/#>code$(\S+)$(.+?)#>code/m) do
"<notextile>" +
Uv.parse($2, 'xhtml', $1, false, 'twilight') +
"</notextile>"
end
puts RedCloth.new(text).to_html

__END__
h1. Some code

#>code(ruby)
puts "Hello world!\n"
puts "Hello\\one backslash"
#>code

h1. The end
----- 8< -------------------------------------------------

Output:

<h1>Some code</h1>
<pre class="twilight">
puts "Hello
world!\n"
puts "Hello\\one backslash"
</pre><h1>The end</h1>

Phil Cooper-king · Jan 8, 2010

I was just reading on them, well I wont forget this mistake quickly.

The easy solution is to use the block form of sub instead.

a\\c

yep worked like a gem

You could simplify your code if you rewrote to use the block form of
gsub anyway.

I'm having to loop through the code blocks in order to parse the syntax
with Uv anyway. though that while I was in the loop I may as replace
each code block as its parsed.

thanks for your effort, you've been a great help.

Phil Cooper-king · Jan 8, 2010

Try this:

require 'rubygems'
require 'uv'
require 'redcloth'

text = DATA.read
text.gsub!(/#>code$(\S+)$(.+?)#>code/m) do
"<notextile>" +
Uv.parse($2, 'xhtml', $1, false, 'twilight') +
"</notextile>"
end
puts RedCloth.new(text).to_html

__END__
h1. Some code

#>code(ruby)
puts "Hello world!\n"
puts "Hello\\one backslash"
#>code

h1. The end

thanks again, it worked like a treat, in 1/2 the lines

escaping/stripping all user HTML input	1	Jun 28, 2007
Weird behaviour escaping special characters in a string	7	Feb 21, 2007
strings without escaping	3	Feb 3, 2009
Google sheets	0	Sep 14, 2022
Why is this WordPress comments form not submitting?	1	Jan 12, 2020
Stopping a kqueue	3	Jan 25, 2007
Filter table rows based on multiple checkboxes value	2	Jan 13, 2023
PHP cURL for large content and single HTTP request	1	Feb 23, 2023

Stopping String Escaping.

Phil Cooper-king

Brian Candler

Phil Cooper-king

Brian Candler

Phil Cooper-king

Brian Candler

Phil Cooper-king

Brian Candler

Brian Candler

Phil Cooper-king

Brian Candler

Brian Candler

Phil Cooper-king

Phil Cooper-king

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads