[SUMMARY] Whiteout (#34)

Ruby Quiz · Jun 9, 2005

Does this library have any practical value? Probably not. It's been suggested
in the Perl community that hacks like this are a good minor deterrent to those
trying to read source code you would rather keep hidden, but it must be stressed
that this is no form of serious security. Regardless, it's a fun little toy to
play with.

It was mentioned in the discussion that Perl, where ACME::Bleach comes from,
includes a framework for source filtering. It can be used to make modules that
modify source code much as we are doing in this quiz. Perl's Switch.pm is a
good example of this, but ironically ACME::Bleach is not.

That naturally leads to the question, can you build source filters in Ruby?
Clearly we can build ACME::Bleach, but not all source filters are as simple I'm
afraid. Consider this:

#!/usr/local/bin/ruby -w

require "fix_my_broken_syntax"

invalid++

Now the thought here is that fix_my_broken_syntax.rb will read my source, change
it so that it does something valid, eval() it, and exit() before the invalid
code is an issue. Here's a trivial example of fix_my_broken_syntax.rb:

#!/usr/local/bin/ruby -w

puts "Fixed!"
exit

Does that work? Unfortunately, no:

$ ruby invalid.rb
invalid.rb:5: syntax error
invalid++
^

Ruby never gets to loading the library, because it's not happy with the syntax
of the first file. That makes writing a source filter for anything that isn't
valid Ruby syntax complicated and if it is valid Ruby syntax, you can probably
just code it up in Ruby to begin with.

Except for whiteout.rb, our version of ACME::Bleach.

You can't build Ruby constructs out of whitespace alone, so some form of source
filtering is required. Luckily, we can get away with the approach described
above for this source filter, because a bunch of whitespace (with no code) is
valid Ruby syntax. It just doesn't do anything. Ruby will skip right over our
whitespace and load the library that restores and runs the code.

Most people took this approach. Let's examine one such example by Robin
Stocker:

#!/usr/bin/ruby

#
# This is my solution for Ruby Quiz #34, Whiteout.
# Author:: Robin Stocker
#

#
# The Whiteout module includes all functionality like:
# - whiten
# - run
# - encode
# - decode
#
module Whiteout

@@bit_to_code = { '0' => " ", '1' => "\t" }
@@code_to_bit = @@bit_to_code.invert
@@chars_to_ignore = [ "\n", "\r" ]

#
# Whitens the content of a file specified by _filename_.
# It leaves the shebang intact, if there is one.
# At the beginning of the file it inserts the require 'whiteout'.
# See #encode for details about how the whitening works.
#
def Whiteout.whiten( filename )
code = ''
File.open( filename, 'r' ) do |file|
file.each_line do |line|
if code.empty?
# Add shebang if there is one.
code << line if line =~ /#!\s*.+/
code << "#{$/}require 'whiteout'#{$/}"
else
code << encode( line )
end
end
end
File.open( filename, 'w' ) do |file|
file.write( code )
end
end

# ...

First, we can see that the module defines some module variables, which are
really used as constants here. Their contents hint at the encoding algorithm
we'll see later.

Then we have a method for managing the transformation of the source into
whitespace. It starts by opening the passed file and reading the code
line-by-line. If the first line is a shebang line, it's saved in the variable
code. Next, a "require 'whiteout'" line is added to code. Finally, all other
lines from the file are appended to code after being passed through an encode()
method we'll examine shortly. With the contents read and transformed, the
method then reopens the source for writing and dumps the modifications into it.

The next method is the reverse process:

# ...

#
# Reads the file _filename_, decodes and runs it through eval.
#
def Whiteout.run( filename )
text = ''
File.open( filename, 'r' ) do |file|
decode = false
file.each_line do |line|
if not decode
# We don't want to decode the "require 'whiteout'",
# so start decoding not before we passed it.
decode = true if line =~ /require 'whiteout'/
else
text << decode( line )
end
end
end
# Run the code!
eval text
end

# ...

This method again reads the passed file. It skips over the "require 'whiteout'"
line, then copies the rest of the file into the variable text, after passing it
through decode() line-by-line. The final line of the method calls eval() on
text, which should now contain the restored program.

On to encode() and decode():

#
# Encodes text to "whitecode". It works like this:
# - Chars in @@char_to_ignore are ignored
# - Each byte is converted to its bit representation,
# so that we have something like 01100001
# - Then, it is converted to whitespace according to @@bit_to_code
# - 0 results in a " " (space)
# - 1 results in a "\t" (tab)
#
def Whiteout.encode( text )
white = ''
text.scan(/./m) do |char|
if @@chars_to_ignore.include?( char )
white << char
else
char.unpack('B8').first.scan(/./) do |bit|
code = @@bit_to_code[bit]
white << code
end
end
end
return white
end

#
# Does the inverse of #encode, it takes "white"
# and returns the decoded text.
#
def Whiteout.decode( white )
text = ''
char = ''
white.scan(/./m) do |code|
if @@chars_to_ignore.include?( code )
text << code
else
char << @@code_to_bit

Code:

if char.length == 8
text << [char].pack("B8")
char = ''
end
end
end
return text
end

end

# ...

The comments in there detail the exact process we're looking at here, so I'm not
going to repeat them.

Note that @@char_to_ignore contains "\n" and "\r" so they are not translated.
The effect of that is that line-endings are untouched by this conversion.  Some
solutions used such characters in their encoding algorithm.  The gotcha there is
that any line-ending translation done to the modified source (say FTP through
ASCII mode) will break the hidden code.  Robin's solution doesn't have that
problem.

Here's the code that ties all those methods into a solution:

# ...

#
# And here's the logic part of whiteout.
# If it was run directly, whites out the files in ARGV.
# And if it was required, decodes the whitecode and runs it.
#
if __FILE__ == $0
ARGV.each do |filename|
Whiteout.whiten( filename )
end
else
Whiteout.run( $0 )
end

Again, the comment saves me some explaining.

That was Robin's first solution to a Ruby Quiz, but I never would have known
that from looking at the code.  Thanks for sharing Robin!

Obviously, a conversion of this type grossly inflates the size of the source.
Around eight times the size, to be exact.  A couple of solutions used zlib to
control the expansion, which I thought was clever.  By compressing the source
and then encoding() (and using a base three conversion) Dominik Bathom got
results around three times the inflation instead.

Ara.T.Howard took a different approach, using whiteout.rb as a database to store
the trimmed files.  That was a very interesting process, demonstrated well in
the submission email.  The advantages to this approach would be no inflation
penalty and the code stays readable (just not in the original location).  The
disadvantage I see is that it requires the exact same library to be present both
at encoding and decoding, which probably makes sharing the altered code
impractical.

As always, my thanks to all who gave this little diversion an attempt.  I'm sure
we'll see tons of whitespace only code on RubyForge in the future, thanks to our
efforts.

Tomorrow begins part one of our first two-part Ruby Quiz.  Stay tuned...

Florian Groß · Jun 9, 2005

Ruby said:
That naturally leads to the question, can you build source filters in Ruby?
Clearly we can build ACME::Bleach, but not all source filters are as simple I'm
afraid. Consider this:

#!/usr/local/bin/ruby -w

require "fix_my_broken_syntax"

invalid++
[...]
Ruby never gets to loading the library, because it's not happy with the syntax
of the first file. That makes writing a source filter for anything that isn't
valid Ruby syntax complicated and if it is valid Ruby syntax, you can probably
just code it up in Ruby to begin with.

But note that if you do #!/usr/local/bin/ruby -w -r fix_my_broken_syntax
you will be able to make it work.

Ara.T.Howard · Jun 9, 2005

Does this library have any practical value? Probably not. It's been suggested
in the Perl community that hacks like this are a good minor deterrent to those
trying to read source code you would rather keep hidden, but it must be stressed
that this is no form of serious security. Regardless, it's a fun little toy to
play with.

It was mentioned in the discussion that Perl, where ACME::Bleach comes from,
includes a framework for source filtering. It can be used to make modules that
modify source code much as we are doing in this quiz. Perl's Switch.pm is a
good example of this, but ironically ACME::Bleach is not.

That naturally leads to the question, can you build source filters in Ruby?
Clearly we can build ACME::Bleach, but not all source filters are as simple I'm
afraid. Consider this:

#!/usr/local/bin/ruby -w

require "fix_my_broken_syntax"

invalid++

Now the thought here is that fix_my_broken_syntax.rb will read my source, change
it so that it does something valid, eval() it, and exit() before the invalid
code is an issue. Here's a trivial example of fix_my_broken_syntax.rb:

#!/usr/local/bin/ruby -w

puts "Fixed!"
exit

Does that work? Unfortunately, no:

$ ruby invalid.rb
invalid.rb:5: syntax error
invalid++
^

Ruby never gets to loading the library, because it's not happy with the syntax
of the first file. That makes writing a source filter for anything that isn't
valid Ruby syntax complicated and if it is valid Ruby syntax, you can probably
just code it up in Ruby to begin with.

a hack but:

harp:~ > cat fix_my_broken_syntax.rb
src = open($0).read
src.gsub! %r/([_a-z][_a-zA-Z]*)\+\+/, '((\1+=1;\1 - 1))'
eval src
exit

harp:~ > cat a.rb
#!/usr/local/bin/ruby -r./fix_my_broken_syntax.rb
n = 41
p n++
p n

harp:~ > ./a.rb
41
42

cheers.

-a
--
===============================================================================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
===============================================================================

Brian Schröder · Jun 9, 2005

[Snip]=20

Obviously, a conversion of this type grossly inflates the size of the sou= rce.
Around eight times the size, to be exact. A couple of solutions used zli= b to
control the expansion, which I thought was clever. By compressing the so= urce
and then encoding() (and using a base three conversion) Dominik Bathom go= t
results around three times the inflation instead.

Using a base eight encoding plus zipping you can even reach a
deflation of source-length. See
http://ruby.brian-schroeder.de/quiz/whiteout/

regards and thanks for the summary,

Brian

--=20
http://ruby.brian-schroeder.de/

Stringed instrument chords: http://chordlist.brian-schroeder.de/

Logan Capaldo · Jun 10, 2005

On 6/9/05 said:
a hack but:
=20
harp:~ > cat fix_my_broken_syntax.rb
src =3D open($0).read
src.gsub! %r/([_a-z][_a-zA-Z]*)\+\+/, '((\1+=3D1;\1 - 1))'
eval src
exit
=20
=20
harp:~ > cat a.rb
#!/usr/local/bin/ruby -r./fix_my_broken_syntax.rb
n =3D 41
p n++
p n
=20
=20
harp:~ > ./a.rb
41
42
=20
=20
cheers.

[snip]

I have some suggestions for alternate methods. I haven't actually
tried any of these yet, so take this with a grain of salt.

The more interesting one I think would be to use ParseTree, assuming
it allows (or eventually will) allow you to insert a modified
parsetree back into the interpreter. You could then traverse the tree
and look for items semantically instead of by regexps. There are
disadvantages to this of course. You couldn't add new operators and
such for instance, although I would imagine it would be good for
things like AOP (It also probably would be impossible to implement
whiteout using this method). A related option is to write a parser in
ruby for ruby that emits ParseTree sexps that can once again be
inserted into the interpreter. You could then modify this parser to
add whatever syntax constructs you like (new operators etc.) as long
as they could be mapped onto existing ruby syntax (since this is the
point of source filters usually, I see no problem with that
limitation, any more complicated and its just another language written
in ruby).

The other option to consider is a filter using pipes. Have two files,
one with the filterable source (ie written in latin or whitespace or
whatever) and another with the regexp based transformer, and wrap it
up in a script. eg:

$ cat illegible.rb
#@#@#@# -- ? : 2
dfsdasdasd
$ cat filter.rb
#!/usr/bin/env ruby
class LineNoise
def transform
....
end
end

x =3D LineNoise.new

IO.popen("ruby") do |rb|
File.open("illegible.rb") do |ill|
ill.each do |line|
rb.print x.transform(line)
end
end
end
$

This gets rid of the eval nastiness but adds its own nastiness (like,
where do I find illegible.rb? etc.).

Just some ideas. Of course we could all write our own languages that
are just ruby with some syntax differences

Klaus Stein · Jun 10, 2005

Ruby Quiz said:
#!/usr/local/bin/ruby -w

require "fix_my_broken_syntax"

invalid++

[ Fix it ]

Does that work? Unfortunately, no:

$ ruby invalid.rb
invalid.rb:5: syntax error
invalid++
^

Ruby never gets to loading the library, because it's not happy with the
syntax of the first file.

What about using __END__ for this?

Klaus

[QUIZ] Whiteout (#34)	9	Jun 3, 2005
[SOLUTION] Whiteout (#34)	0	Jun 5, 2005
[SOLUTION] Whiteout (#34)	1	Jun 5, 2005
[SUMMARY] Goedel (#147)	1	Nov 29, 2007
Page do not work, when adding php code	1	Sep 16, 2022
Python client/server that reads HTML body from server	1	Apr 12, 2023
[SUMMARY] hexdump (#171)	0	Jul 31, 2008
[SUMMARY] Literate Ruby (#102)	0	Nov 24, 2006

[SUMMARY] Whiteout (#34)

Ruby Quiz

Florian Groß

Ara.T.Howard

Brian Schröder

Logan Capaldo

Klaus Stein

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads