Stopping String Escaping.

  • Thread starter Phil Cooper-king
  • Start date
P

Phil Cooper-king

Hi,

I'm trying to parse code snippets on a website that are submitted by the
user. the problem is that when a user tries to shop escaping in there
code the escaping actually happens.

for instance if you submit \\ ruby teats it as a single \ is there
anyway to stop this? I still require all the other \'s such as \n etc.

Thanks
Phil.
 
B

Brian Candler

Phil said:
Hi,

I'm trying to parse code snippets on a website that are submitted by the
user. the problem is that when a user tries to shop escaping in there
code the escaping actually happens.

for instance if you submit \\ ruby teats it as a single \ is there
anyway to stop this? I still require all the other \'s such as \n etc.

How are you parsing them?

If you are using File.read() then no unescaping is done.

If you are parsing them using eval(), then you are inviting your machine
to be 0wned. See
http://www.ruby-doc.org/docs/ProgrammingRuby/html/taint.html

If you are parsing them some other way, then please explain it. Please
also explain what "shop escaping" is.

Regards,

Brian.
 
P

Phil Cooper-king

How are you parsing them?
If you are using File.read() then no unescaping is done.
I am using rails and redcloth, I have the plain-text in the database,
and the text gets parsed when the view gets called atm.

I am using Uv to for the syntax, which I pull out before sending to
redcloth
Code:
def snatch_code(text)
snippets = text.scan(/#>code\((\S+)\)(.+?)#>code/m)

snippets.each do |snip|
code = Uv.parse(snip[1], 'xhtml', snip[0], false, 'twilight')
code.insert(0, "<notextile>")
code.insert(code.length, "</notextile>")
text.sub!(/#>code\((\S+)\)(.+?)#>code/m, code)
end

text
end
then redcloth parses it.

If you are parsing them using eval(), then you are inviting your machine
to be 0wned. See
http://www.ruby-doc.org/docs/ProgrammingRuby/html/taint.html
ouch. and thanks
If you are parsing them some other way, then please explain it. Please
also explain what "shop escaping" is.

dyslexia rules! KO!

I want to stop the escaping that’s not dealing with whitespace, tab, new
line etc.

Phil.
 
B

Brian Candler

Phil said:
I am using Uv to for the syntax, which I pull out before sending to
redcloth
Code:
def snatch_code(text)
snippets = text.scan(/#>code\((\S+)\)(.+?)#>code/m)

snippets.each do |snip|
code = Uv.parse(snip[1], 'xhtml', snip[0], false, 'twilight')
code.insert(0, "<notextile>")
code.insert(code.length, "</notextile>")
text.sub!(/#>code\((\S+)\)(.+?)#>code/m, code)
end

text
end

OK, then what I suggest is you make a standalone test case, outside of
Rails.

source = <<'EOS'
Put your sample source code here
EOS
# Print it to be sure it hasn't already been escaped by Ruby
# Now process it with Uv
# Show the intermediate state
# Now process it with Redcloth
# Show the final state

Then you can see whether the problem is with Uv, or with Redcloth.

Then the question becomes much more focussed - for example, it might be
"how do I stop Redcloth turning \\ into \ inside a <notextile> section?"
 
P

Phil Cooper-king

OK, then what I suggest is you make a standalone test case, outside of
Rails.

source = <<'EOS'
Put your sample source code here
EOS

yeah I did this as well.

Code:
require 'rubygems'
require 'uv'

un_parsed =<<ENDOF
\\
ENDOF

parsed = Uv.parse(un_parsed, "xhtml", "c++", false, "twilight")
=> \

puts un_parsed
=> \

in both cases the slash gets lost. I expect the \ to be lost in puts
tho. Using the dump I see the double slash is still there.
 
B

Brian Candler

Phil said:
un_parsed =<<ENDOF
\\
ENDOF

Unfortunately, here the \\ is being turned into a single backslash by
ruby, the same as inside a quoted string. In other words, the same as
this:

irb(main):001:0> "\\".size
=> 1
irb(main):002:0> '\\'.size
=> 1

The simplest way of preventing this is to read unparsed from a file, or
you can have an inline dataset at the end of your source code, like
this:

unparsed = DATA.read
... rest of your code goes here

__END__
\\
I expect the \ to be lost in puts
tho.

No, puts *never* converts two backslashes into one. If your string
contains two backslashes, puts will show two backslashes.
Using the dump I see the double slash is still there.

No, this is the opposite. String#inspect turns a raw string into a
quoted string for display purposes, and as part of this quoting a single
backslash is displayed as two backslashes.

Look at this:

irb(main):001:0> s = 92.chr
=> "\\"
irb(main):002:0> s.size
=> 1
irb(main):003:0> puts s
\
=> nil
irb(main):004:0> s2 = s + s
=> "\\\\"
irb(main):005:0> s2.size
=> 2
irb(main):006:0> puts s2
\\
=> nil

Hopefully it's clear from the above that string s has one character (a
single backslash), and s2 has two backslashes. But these are displayed
in quoted form in irb as

"\\"
"\\\\"

respectively. puts displays them correctly.

Similarly, a single newline character is displayed as backslash-n when
inspect gives the quoted form; whereas puts actually prints a newline.

irb(main):009:0> nl = 10.chr
=> "\n"
irb(main):010:0> nl.size
=> 1
irb(main):011:0> puts nl

=> nil

So try your test case again:
(1) Use the DATA.read / __END__ to get the test source in
(2) Use 'puts' and not 'dump' to see clearly what you have
 
P

Phil Cooper-king

Hopefully it's clear from the above
yes, thanks you.
So try your test case again:
(1) Use the DATA.read / __END__ to get the test source in
(2) Use 'puts' and not 'dump' to see clearly what you have

Code:
require 'rubygems'
require 'redcloth'

data_read = DATA.read
string = "\\"

puts RedCloth.new(string).to_html
puts RedCloth.new(data_read).to_html

__END__
\\

yeilds
<p>\</p>
<p>\\</p>

although I have no idea how to treat a string as a file.
is all this to do with encoding? (sorry if that was a dense question)

erb results are similar, which I would have though was be happening in
rails anyway=> "_erbout = ''; _erbout.concat \"\\\\\"; _erbout"
 
B

Brian Candler

Phil said:
Hopefully it's clear from the above
yes, thanks you.
So try your test case again:
(1) Use the DATA.read / __END__ to get the test source in
(2) Use 'puts' and not 'dump' to see clearly what you have
Code:
[/QUOTE]
...
data_read = DATA.read
string = "\\" ...
__END__
\\

So in this program, 'data_read' contains two backslash characters; and
'string' contains a single backslash character.
yeilds
<p>\</p>
<p>\\</p>

That looks correct to me - HTML doesn't need a backslash to be escaped.
So now add Uv into your test to see if that is munging the backslashes.
although I have no idea how to treat a string as a file.

A string is just a string. In ruby 1.8 it's a sequence of bytes; in ruby
1.9 it's a sequence of characters. But that doesn't matter here; a
backslash is a backslash, and is both a single character and a single
byte in either ASCII or UTF-8.

However if you enter a string *literal* in a ruby program (or in IRB),
then it is parsed with backslash escaping rules to turn it into an
actual String object. For example:

a = "abc\ndef"
b = 'abc\ndef'

string 'a' contains 7 characters (a,b,c,newline,d,e,f), whereas string b
contains 8 characters (a,b,c,backslash,n,d,e,f). This is because there
are different escaping rules for double-quoted and single-quoted
strings.

In a single-quoted string literal, \' is a single quote, and \\ is a
backslash, and everything else is treated literally, so \n is two
characters \ and n.

In a double-quoted string literal, \" is a double quote, \n is a
newline, \\ is a backslash, and there's a whole load of other expansion
including #{...} for expression interpolation and #@... for instance
variable substitution.
erb results are similar, which I would have though was be happening in
rails anyway
=> "_erbout = ''; _erbout.concat \"\\\\\"; _erbout"

Now you're just scaring yourself with backslash escaping :)

Firstly, note that you passed a single backslash character to ERB.
That's what the string literal "\\" creates.

ERB compiled it to the following Ruby code:

_erbout = ''; _erbout.concat "\\"; _erbout

which just appends a single backslash to _erbout, which is what you
expect.

However, IRB displays the returned string from ERB.new using
String#inspect, so it is turned into a double-quoted string. This means:
1. A " is added to the start and end of the string
2. Any " within the string is displayed as \"
3. Any \ within the string is displayed as \\

In other words, String#inspect turns a string into a Ruby string literal
- something that you could paste directly into IRB. Try it:

That will show you the actual contents of str, which is the Ruby code I
pasted above.

HTH,

Brian.
 
B

Brian Candler

Here's the kind of standalone test I was thinking of.

----- 8< -------------------------------------------------
require 'rubygems'
require 'uv'
require 'redcloth'

snip = DATA.read
code = Uv.parse(snip, 'xhtml', 'ruby', false, 'twilight')
code.insert(0, "<notextile>")
code.insert(code.length, "</notextile>")
puts RedCloth.new(code).to_html

__END__
puts "Hello world!\n"
puts "Hello\\one backslash"
----- 8< -------------------------------------------------

And for me the output it gives is:

<pre class="twilight">puts <span class="String"><span
class="String">&quot;</span>Hello world!<span
class="StringConstant">\n</span><span
class="String">&quot;</span></span>
puts <span class="String"><span class="String">&quot;</span>Hello<span
class="StringConstant">\\</span>one backslash<span
class="String">&quot;</span></span>
</pre>

This looks correct to me. So can you provide an example where it fails?
Otherwise you need to look elsewhere in your application to see if
you're providing the wrong input into Uv, or you're handling the output
wrongly.

Or maybe you have an old gem with a bug which has since been fixed. I'm
using:

ultraviolet (0.10.2)
RedCloth (4.2.2)
 
P

Phil Cooper-king

thanks again

yep I have the same gems and the same result running your code.

I went nuts with the puts all over the place

fromdb: "##code(ruby)\r\n'\\\\'\r\n##code\r\n"

before: "##code(ruby)\n'\\\\'\n##code\n"

before parse: "\n'\\\\'\n"

after parse: "<pre class=\"twilight\">\n<span class=\"String\"><span
class=\"String\">'</span><span class=\"StringConstant\">\\\\</span><span
class=\"String\">'</span></span>\n</pre>"

after insert: "<notextile><pre class=\"twilight\">\n<span
class=\"String\"><span class=\"String\">'</span><span
class=\"StringConstant\">\\\\</span><span
class=\"String\">'</span></span>\n</pre></notextile>"

after sub: "<notextile><pre class=\"twilight\">\n<span
class=\"String\"><span class=\"String\">'</span><span
class=\"StringConstant\">\\</span><span
class=\"String\">'</span></span>\n</pre></notextile>\n"

so after the sub section I loose two of the back slashes
Code:
text.sub!(/##code\((\S+)\)(.+?)##code/m, code)
 
B

Brian Candler

Phil said:
so after the sub section I loose two of the back slashes
Code:
text.sub!(/##code\((\S+)\)(.+?)##code/m, code)

Ah yes, backslashes have a special interpretation in the
string-replacement part of a (g)sub too: \1 means the first capture, \2
means the second capture etc, so \\ means a single backslash.

Note that the replacement string here is two backslashes:
a\c
=> nil

The easy solution is to use the block form of sub instead.
a\\c
=> nil

You could simplify your code if you rewrote to use the block form of
gsub anyway.

text.gsub!(/#>code\((\S+)\)(.+?)#>code/m) do |snip|
... make a string containing the marked-up code
end
 
B

Brian Candler

You could simplify your code if you rewrote to use the block form of
gsub anyway.

Try this:

----- 8< -------------------------------------------------
require 'rubygems'
require 'uv'
require 'redcloth'

text = DATA.read
text.gsub!(/#>code\((\S+)\)(.+?)#>code/m) do
"<notextile>" +
Uv.parse($2, 'xhtml', $1, false, 'twilight') +
"</notextile>"
end
puts RedCloth.new(text).to_html

__END__
h1. Some code

#>code(ruby)
puts "Hello world!\n"
puts "Hello\\one backslash"
#>code

h1. The end
----- 8< -------------------------------------------------

Output:

<h1>Some code</h1>
<pre class="twilight">
puts <span class="String"><span class="String">&quot;</span>Hello
world!<span class="StringConstant">\n</span><span
class="String">&quot;</span></span>
puts <span class="String"><span class="String">&quot;</span>Hello<span
class="StringConstant">\\</span>one backslash<span
class="String">&quot;</span></span>
</pre><h1>The end</h1>
 
P

Phil Cooper-king

:) I was just reading on them, well I wont forget this mistake quickly.

The easy solution is to use the block form of sub instead.

yep worked like a gem :D

You could simplify your code if you rewrote to use the block form of
gsub anyway.

I'm having to loop through the code blocks in order to parse the syntax
with Uv anyway. though that while I was in the loop I may as replace
each code block as its parsed.


thanks for your effort, you've been a great help.
 
P

Phil Cooper-king

Try this:
require 'rubygems'
require 'uv'
require 'redcloth'
text = DATA.read
text.gsub!(/#>code\((\S+)\)(.+?)#>code/m) do
"<notextile>" +
Uv.parse($2, 'xhtml', $1, false, 'twilight') +
"</notextile>"
end
puts RedCloth.new(text).to_html
__END__
h1. Some code
#>code(ruby)
puts "Hello world!\n"
puts "Hello\\one backslash"
#>code
h1. The end

thanks again, it worked like a treat, in 1/2 the lines
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,774
Messages
2,569,599
Members
45,175
Latest member
Vinay Kumar_ Nevatia
Top