Convert to binary and convert back to strings

S

Steven D'Aprano

I would xor each char in it with 'U' as a mild form of obfuscation...

I've often wished this would work.
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for ^: 'str' and 'str'

instead of the more verbose
'4'


Look at the array module to get things you can xor, or use ord() on
each byte, and char()

Arrays don't support XOR any more than strings do. What's the advantage to
using the array module if you still have to jump through hoops to get it
to work?

''.join([chr(ord(c) ^ 85) for c in text])

is probably about as simple as you can get.
 
P

Paul Rubin

Steven D'Aprano said:
Arrays don't support XOR any more than strings do. What's the advantage to
using the array module if you still have to jump through hoops to get it
to work?

It's a lot faster. Sometimes that matters.
 
S

Steven D'Aprano

It's a lot faster. Sometimes that matters.

But is it?
.... mask = ord('U')
.... return ''.join([chr(ord(c) ^ mask) for c in text])
........ mask = ord('U')
.... text = array.array('b', [b ^ mask for b in array.array('b',text)])
.... return text.tostring()
....
text = "Python"
setup = 'from __main__ import flip1, flip2, text'

timeit.Timer('flip1(text)', setup).repeat() [25.757978916168213, 23.174431085586548, 23.188597917556763]
timeit.Timer('flip2(text)', setup).repeat()
[25.736327886581421, 25.76999306678772, 26.135013818740845]


For a short string like "Python", using an array is a tiny bit slower, at
the cost of more complex code.

What about for a long string?
text = 'Python'*1000

timeit.Timer('flip1(text)', setup).repeat(3, 2000) [24.483185052871704, 26.007826089859009, 24.498424053192139]
timeit.Timer('flip2(text)', setup).repeat(3, 2000)
[12.18204402923584, 12.342558860778809, 12.16040301322937]

Well, that answers that question -- if you're converting a long string,
using array is faster. If it is a short string, it doesn't make much
difference.
 
P

Paul Rubin

Steven D'Aprano said:
For a short string like "Python", using an array is a tiny bit slower, at
the cost of more complex code.... if you're converting a long string,
using array is faster. If it is a short string, it doesn't make much
difference.

I modified your array version slightly:

def flip3(text):
n = len(text)
mask = ord('U')
text = array('b', text)
for i in xrange(n):
text ^= mask
return text.tostring()

and I got flip3("Python") a little faster than the listcomp version,
but yeah, I was concerned mostly about long strings.

For fixed-sized short strings, using array('l') and unrolling the loop
makes a big difference:

text = "Pythonic"
mask = array('l','UUUU')[0]

def flip4(text):
text = array('l', text)
text[0] ^= mask
text[1] ^= mask
return text.tostring()
timeit.Timer('flip1(text)', setup).repeat() # your version [35.932021141052246, 36.262560844421387, 40.019834041595459]
timeit.Timer('flip3(text)', setup).repeat() # flip3 above [33.44039511680603, 31.375681161880493, 31.374078035354614]
timeit.Timer('flip4(text)', setup).repeat() # flip4 above
[15.349261045455933, 15.526498794555664, 15.351589202880859]

See http://www.nightsong.com/phr/crypto/p3.py for an encryption
routine written this way.
 
H

Hendrik van Rooyen

I've often wished this would work.

Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: unsupported operand type(s) for ^: 'str' and 'str'

instead of the more verbose

'4'

you are not alone in this - to do something simple like calculating a BCC on a
string, or a checksum like at the end of a line in an Intel hex file is a bit of
a pain
in Python.
Arrays don't support XOR any more than strings do. What's the advantage to
using the array module if you still have to jump through hoops to get it
to work?

I think you will have less function calls, but I may be wrong:

s = 'some string that needs a bcc appended'
ar = array.array('B',s)
bcc = 0
for x in ar[:]:
bcc ^= x
ar.append(bcc)
s=ar.tostring()
''.join([chr(ord(c) ^ 85) for c in text])

is probably about as simple as you can get.

This is nice and compact.

It would be very nice if you could just use a single char string like
an int and apply the operators to it - the Python way seems so left-
handed - make it an int, do the work, make it back into a string -
and all the time we are working on essentially a one byte value...

- Hendrik
 
P

Paul Rubin

Hendrik van Rooyen said:
s = 'some string that needs a bcc appended'
ar = array.array('B',s)
bcc = 0
for x in ar[:]:
bcc ^= x
ar.append(bcc)
s=ar.tostring()

Untested:

import operator
s = 'some string that needs a bcc appended'
ar = array.array('B',s)
s += chr(reduce(operator.xor, ar))
 
M

Mikael Olofsson

Neil said:
Woah! You better quadruple it instead.
How about Double Pig Latin?
No, wait! Use the feared UDPLUD code.
You go Ubbi Dubbi to Pig Latin, and then Ubbi Dubbi again.
Let's see here... Ubububythubububonubpubay
That's what I call ubububeautubububifubububulbubay.

That looks slightly like the toy language that Swedish kids refer to as
the robbers language: You double each consonant, and place an o between
each such double consonant. So, applying that to Python, you would get
popytothohonon. Actually, one of the largest Swedish telecom companies
has used this toy language in one of their commercials.

/MiO
 
E

Eric Pederson

Harlin said:
Hi...

I would like to take a string like 'supercalifragilisticexpialidocius'
and write it to a file in binary forms -- this way a user cannot read
the string in case they were try to open in something like ascii text
editor. I'd also like to be able to read the binary formed data back
into string format so that it shows the original value. Is there any
way to do this in Python?

Thanks!

Harlin

To my mind, the more sensible job you do at programming this the worse
off you are, unless you use strong encryption. There are nearly
infinite approaches, so the random approach you use will be part of the
"security" of the obfuscation.

OK, I am not really taking this so seriously, but it is a fun question
(Python makes these minor things fun). Is there anyway to do this in
Python? You bet, so many ways... here's another:

s="""I would like to take a string like 'supercalifragilisticexpialidocius'
and write it to a file in binary forms -- this way a user cannot read
the string in case they were try to open in something like ascii text
editor. I'd also like to be able to read the binary formed data back
into string format so that it shows the original value. Is there any
way to do this in Python?"""

s0=s+"$"
s2="0 ".join([str(ord(c)) for c in s])
s1="".join([chr(int(i[:-1])) for i in s2.split("
")[:-1]])+chr(int(s2[-1]))[:-1]

def codeMe(s):
s0=s+"$"
return "0 ".join([str(ord(c)) for c in s0])

def uncodeMe(s):
return "".join([chr(int(i[:-1])) for i in s.split("
")[:-1]])+chr(int(s[-1]))[:-1]

def testit(s):
s2=codeMe(s)
s1=uncodeMe(s2)
strings={"original":s, "obfuscated":s2, "decoded":s1}
for k in strings.keys():
print k,": ","\n",strings[k], "\n\n"

testit(s)

-------------
the obfuscated looks like this:

730 320 1190 1110 1170 1080 1000 320 1080 1050 1070 1010 320 1160 1110
320 1160 970 1070 1010 320 970 320 1150 1160 1140 1050 1100 1030 320
1080 1050 1070 1010 320 390 1150 1170 1120 1010 1140 990 970 1080 1050
1020 1140 970 1030 1050 1080 1050 1150 1160 1050 990 1010 1200 1120 1050
970 1080 1050 1000 1110 990 1050 1170 1150 390 100 970 1100 1000 320
1190 1140 1050 1160 1010 320 1050 1160 320 1160 1110 320 970 320 1020
1050 1080 1010 320 1050 1100 320 980 1050 1100 970 1140 1210 320 1020
1110 1140 1090 1150 320 450 450 320 1160 1040 1050 1150 320 1190 970
1210 320 970 320 1170 1150 1010 1140 320 990 970 1100 1100 1110 1160 320
1140 1010 970 1000 100 1160 1040 1010 320 1150 1160 1140 1050 1100 1030
320 1050 1100 320 990 970 1150 1010 320 1160 1040 1010 1210 320 1190
1010 1140 1010 320 1160 1140 1210 320 1160 1110 320 1110 1120 1010 1100
320 1050 1100 320 1150 1110 1090 1010 1160 1040 1050 1100 1030 320 1080
1050 1070 1010 320 970 1150 990 1050 1050 320 1160 1010 1200 1160 100
1010 1000 1050 1160 1110 1140 460 320 730 390 1000 320 970 1080 1150
1110 320 1080 1050 1070 1010 320 1160 1110 320 980 1010 320 970 980 1080
1010 320 1160 1110 320 1140 1010 970 1000 320 1160 1040 1010 320 980
1050 1100 970 1140 1210 320 1020 1110 1140 1090 1010 1000 320 1000 970
1160 970 320 980 970 990 1070 100 1050 1100 1160 1110 320 1150 1160 1140
1050 1100 1030 320 1020 1110 1140 1090 970 1160 320 1150 1110 320 1160
1040 970 1160 320 1050 1160 320 1150 1040 1110 1190 1150 320 1160 1040
1010 320 1110 1140 1050 1030 1050 1100 970 1080 320 1180 970 1080 1170
1010 460 320 730 1150 320 1160 1040 1010 1140 1010 320 970 1100 1210 100
1190 970 1210 320 1160 1110 320 1000 1110 320 1160 1040 1050 1150 320
1050 1100 320 800 1210 1160 1040 1110 1100 630 36

Of course some overly curious application user may note the pattern of
"0" endings, strip those off, concatenate the numbers, and try several
conversions on them, at which point they may figure this out- and
contact you to gloat that they have hacked the file.

That's when you recruit them onto your development team and give them
some real work. :)

Have fun


EP
 
H

Hendrik van Rooyen

Paul Rubin said:
Hendrik van Rooyen said:
s = 'some string that needs a bcc appended'
ar = array.array('B',s)
bcc = 0
for x in ar[:]:
bcc ^= x
ar.append(bcc)
s=ar.tostring()

Untested:

import operator
s = 'some string that needs a bcc appended'
ar = array.array('B',s)
s += chr(reduce(operator.xor, ar))

Yikes! - someday soon I am going to read the docs on
what reduce does...

Won't this be slow because of the double function call on each char?

- Hendrik
 
P

Paul Rubin

Hendrik van Rooyen said:
Yikes! - someday soon I am going to read the docs on what reduce does...

Reduce just intersperses an operator over a sequence. For example,
reduce(operator.add, (a,b,c,d,e))
is a+b+c+d+e.
Won't this be slow because of the double function call on each char?

I think there's the same number of func calls, one xor per char.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top