lightweight encryption of text file

  • Thread starter Daniel Fetchinson
  • Start date
D

Daniel Fetchinson

I have a plain text file which I would like to protect in a very
simple minded, yet for my purposes sufficient, way. I'd like to
encrypt/convert it into a binary file in such a way that possession of
a password allows anyone to convert it back into the original text
file while not possessing the password one would only see the
following with the standard linux utility 'file':

[fetchinson@fetch ~]$ file encrypted.data
encrypted.data: data

and the effort required to convert the file back to the original text
file without the password would be equivalent to guessing the
password.

I'm fully aware of the security implications of this loose
specification, but for my purposes this would be a good solution.

What would be the simplest way to achieve this using preferably stock
python without 3rd party modules? If a not too complex 3rd party
module made it really simple that would be acceptable too.

Cheers,
Daniel
 
S

Steven D'Aprano

I have a plain text file which I would like to protect in a very simple
minded, yet for my purposes sufficient, way. I'd like to encrypt/convert
it into a binary file in such a way that possession of a password allows
anyone to convert it back into the original text file while not
possessing the password one would only see the following with the
standard linux utility 'file':

[fetchinson@fetch ~]$ file encrypted.data
encrypted.data: data

If that is your sole requirement, then the addition of a single non-text
byte (say, a null) anywhere in the file would be sufficient to have file
identify it as data. You say "encrypt/convert" -- does this mean that you
don't care if people can read the text in a hex editor, so long as file
identifies it as data?

Would something like a binary Vigenere Cipher be sufficient?


# Untested
def encrypt(plaintext, password):
cipher = []
for i, c in enumerate(plaintext):
shift = password[i % len(password)]
shift = ord(shift)
cipher.append((ord(c) + shift) % 256)
return ''.join([chr(n) for n in cipher])

def decrypt(ciphertext, password):
plain = []
for i, c in enumerate(ciphertext):
shift = password[i % len(password)]
shift = ord(shift)
plain.append((256 + ord(c) - shift) % 256)
return ''.join([chr(n) for n in plain])

(How times have changed... once upon a time, the Vigenere Cipher was
considered the gold-standard unbreakable encryption technology. Now it
merely qualifies as obfuscation.)

Is it acceptable if there is a chance (small, possibly vanishingly small)
that file will identify it as text? As far as I know, even the
heavyweight "serious" encryption algorithms don't *guarantee* that the
cipher text will include non-text bytes.

and the effort required to convert the file back to the original text
file without the password would be equivalent to guessing the password.

If you seriously mean that, then "lightweight encryption" won't do the
job, because it is vulnerable to frequency analysis, which is easier than
guessing the password. You need proper, heavy-weight encryption.

I'm fully aware of the security implications of this loose
specification, but for my purposes this would be a good solution.

Can you explain what your objection to real encryption is? Are you
concerned about the complex API? The memory requirements and processing
power required? (Neither of which are particularly high for small text
files on a modern PC, but perhaps you have to encrypt tens of millions of
huge files on an underpowered device...)

What would be the simplest way to achieve this using preferably stock
python without 3rd party modules? If a not too complex 3rd party module
made it really simple that would be acceptable too.

The problem is that, as I see it, you've assumed a solution rather than
state what your requirements are. I'm *guessing* that you are more
concerned of having to learn to use a complex API, rather than actually
*requiring* a lightweight encryption algorithm. If that's the case, then
something serious like blowfish or similar would be perfectly acceptable
to you, so long as the API was simple.

(On the other hand, perhaps vulnerability to frequency analysis is a
feature, not a bug, in your use-case. If you forget the password, you
have a chance of recovering the text.)
 
A

Arnaud Delobelle

Daniel Fetchinson said:
I have a plain text file which I would like to protect in a very
simple minded, yet for my purposes sufficient, way. I'd like to
encrypt/convert it into a binary file in such a way that possession of
a password allows anyone to convert it back into the original text
file while not possessing the password one would only see the
following with the standard linux utility 'file':

[fetchinson@fetch ~]$ file encrypted.data
encrypted.data: data
[...]

This is probably not what you want, but it is very simple and doesn't
import any module:) I am not qualified to say how easy it is to discover
the message without the password.

def str2int(txt):
return reduce(lambda n, c: n*255 + ord(c), txt, 0)

def int2str(n):
chars = []
while n:
n, o = divmod(n, 255)
chars.append(chr(o))
return ''.join(reversed(chars))

def encrypt(txt, pwd):
return int2str(str2int(txt)*str2int(pwd))

def decrypt(txt, pwd):
return int2str(str2int(txt)/str2int(pwd))

def test(txt, pwd):
encrypted_txt = encrypt(txt, pwd)
decrypted_txt = decrypt(encrypted_txt, pwd)
print "text:%r" % txt
print "encrypted:%r" % encrypted_txt
print "decrypted:%r" % decrypted_txt

text:'This encryption scheme is definitely unbreakable.'
encrypted:'&2\xa5\xd4\x17i+E\x01k\xfa\x94\xf80\xa8\x8f\xea.w\x128\xf1\xd9\x0f9\xf2t\xc9\r`\x90%\xd6\xf3~\x1f\x00%u&\x8a\xe4\xe0\xa7\xb8\xb0ec)S>\xcb\xf2>\xec'
decrypted:'This encryption scheme is definitely unbreakable.'
 
D

Daniel Fetchinson

I have a plain text file which I would like to protect in a very
simple minded, yet for my purposes sufficient, way. I'd like to
encrypt/convert it into a binary file in such a way that possession of
a password allows anyone to convert it back into the original text
file while not possessing the password one would only see the
following with the standard linux utility 'file':

[fetchinson@fetch ~]$ file encrypted.data
encrypted.data: data
[...]

This is probably not what you want, but it is very simple and doesn't
import any module:) I am not qualified to say how easy it is to discover
the message without the password.

def str2int(txt):
return reduce(lambda n, c: n*255 + ord(c), txt, 0)

def int2str(n):
chars = []
while n:
n, o = divmod(n, 255)
chars.append(chr(o))
return ''.join(reversed(chars))

def encrypt(txt, pwd):
return int2str(str2int(txt)*str2int(pwd))

def decrypt(txt, pwd):
return int2str(str2int(txt)/str2int(pwd))

def test(txt, pwd):
encrypted_txt = encrypt(txt, pwd)
decrypted_txt = decrypt(encrypted_txt, pwd)
print "text:%r" % txt
print "encrypted:%r" % encrypted_txt
print "decrypted:%r" % decrypted_txt

text:'This encryption scheme is definitely unbreakable.'
encrypted:'&2\xa5\xd4\x17i+E\x01k\xfa\x94\xf80\xa8\x8f\xea.w\x128\xf1\xd9\x0f9\xf2t\xc9\r`\x90%\xd6\xf3~\x1f\x00%u&\x8a\xe4\xe0\xa7\xb8\xb0ec)S>\xcb\xf2>\xec'
decrypted:'This encryption scheme is definitely unbreakable.'

Thanks, this looks pretty simple too, I will go with either Steven's
or with your solution.

Cheers,
Daniel
 
P

Paul Rubin

Daniel Fetchinson said:
I have a plain text file which I would like to protect in a very
simple minded, yet for my purposes sufficient, way.

For encrypting strings, use this module:

http://nightsong.com/phr/crypto/p3.py

Obviously this is limited to strings that fit in memory, which
might be a problem with large files. Some day I might get
around to adding a streaming interface to it.

The "file" command will not recognize the ciphertext as encrypted
data. It will just say "data".

If you want to be more serious, use pgp or gpg with the -c option
(password-based encryption). I think "file" does recognize the pgp
file format as encrypted data (RFC 2440).
 
N

Nobody

I have a plain text file which I would like to protect in a very
simple minded, yet for my purposes sufficient, way. I'd like to
encrypt/convert it into a binary file in such a way that possession of
a password allows anyone to convert it back into the original text
file while not possessing the password one would only see the
following with the standard linux utility 'file':
What would be the simplest way to achieve this using preferably stock
python without 3rd party modules? If a not too complex 3rd party
module made it really simple that would be acceptable too.

RC4 (aka ArcFour) is quite trivial to implement, and better than inventing
your own cipher or using a Vignere:

import itertools

class arcfour:
def __init__(self, key):
self.s = range(256)
self.schedule(map(ord, key))
self.pad = self.prng()

def swap(self, i, j):
self.s, self.s[j] = self.s[j], self.s

def schedule(self, key):
j = 0
for i, c in zip(xrange(256), itertools.cycle(key)):
j = (j + self.s + c) % 256
self.swap(i, j)

def prng(self):
i = j = 0
while True:
i = (i + 1) % 256
j = (j + self.s) % 256
self.swap(i, j)
yield self.s[(self.s + self.s[j]) % 256]

def crypt(self, string):
chars = (chr(c ^ r) for c, r in zip(map(ord, string), self.pad))
return ''.join(chars)

I suggest that you don't use the password itself as the key, unless you're
sure that a low-entropy string won't be used. Instead, create an SHA hash
(see the sha and hashlib modules) of the password and use that.
 
D

Daniel Fetchinson

I have a plain text file which I would like to protect in a very
simple minded, yet for my purposes sufficient, way. I'd like to
encrypt/convert it into a binary file in such a way that possession of
a password allows anyone to convert it back into the original text
file while not possessing the password one would only see the
following with the standard linux utility 'file':
What would be the simplest way to achieve this using preferably stock
python without 3rd party modules? If a not too complex 3rd party
module made it really simple that would be acceptable too.

RC4 (aka ArcFour) is quite trivial to implement, and better than inventing
your own cipher or using a Vignere:

import itertools

class arcfour:
def __init__(self, key):
self.s = range(256)
self.schedule(map(ord, key))
self.pad = self.prng()

def swap(self, i, j):
self.s, self.s[j] = self.s[j], self.s

def schedule(self, key):
j = 0
for i, c in zip(xrange(256), itertools.cycle(key)):
j = (j + self.s + c) % 256
self.swap(i, j)

def prng(self):
i = j = 0
while True:
i = (i + 1) % 256
j = (j + self.s) % 256
self.swap(i, j)
yield self.s[(self.s + self.s[j]) % 256]

def crypt(self, string):
chars = (chr(c ^ r) for c, r in zip(map(ord, string), self.pad))
return ''.join(chars)

I suggest that you don't use the password itself as the key, unless you're
sure that a low-entropy string won't be used. Instead, create an SHA hash
(see the sha and hashlib modules) of the password and use that.


Thanks, this looks very simple too, but where is the decryption code?
Wikipedia seems to suggest that encryption and decryption are both the
same but running crypt on the output of crypt doesn't give back the
original string. So probably I'm misunderstanding something.

Cheers,
Daniel
 
S

Steven D'Aprano

Thanks, this looks very simple too, but where is the decryption code?
Wikipedia seems to suggest that encryption and decryption are both the
same but running crypt on the output of crypt doesn't give back the
original string. So probably I'm misunderstanding something.

Yes, the nature of a stream cipher :)

What you're probably doing is what I did, before I had my Aha!!! moment:

attack at dawn
6371736C6E7C3F495C185629210BTraceback (most recent call last):
'\x16\xf7\xf1\xcc\xda\xb5\xe0\xbf\x0b\x13 bF\x8f'


So what's going on? Consider:

Because Arcfour uses xor for the encryption step, decryption is exactly
the same. So you only need one method to do both.

But because Arcfour stores state, calling it twice in a row doesn't give
the same result:
'03405412CA'


Arcfour is a stream cipher. When you call it twice on two different
strings, that is logically equivalent to calling it once on a single long
string made up of concatenating the two strings together. Each time you
encrypt a single character, the internal state ("self.s") changes. To
undo the change, you need the same state. The easiest way to do this is
by creating a new instance:



So long as the two instances stay in lock-step (every time you use up a
byte from the keystream in one, you do the same in the other) you can use
one to decrypt the output of the other. It doesn't even really matter
which one you use:
'Nobody expects the Spanish Inquisition!!!'


In summary: use the arcfour class to create a stream. If you are just
encrypting, or just decrypting, you can use one stream, or as many
streams as you like, using different keys. But to do both, you need two
streams, initiated with the same key, and kept in lockstep.

The advantage of a stream cipher is that you can encrypt a text without
needing all the text at once, and then decrypt it the same way:
output = []
output.append(encrypt.crypt("abcdefghi"))
output.append(encrypt.crypt("jklmno"))
output.append(encrypt.crypt("p"))
output.append(encrypt.crypt("qrstuvwxyz"))
output = ''.join(output)

plain = []
plain.append(decrypt.crypt(output[0:20]))
plain.append(decrypt.crypt(output[20:24]))
plain.append(decrypt.crypt(output[24:]))
''.join(plain)
'abcdefghijklmnopqrstuvwxyz'
 
N

Nobody

Thanks, this looks very simple too, but where is the decryption code?
Wikipedia seems to suggest that encryption and decryption are both the
same but running crypt on the output of crypt doesn't give back the
original string. So probably I'm misunderstanding something.

The encryption is stateful (it wouldn't be very good if it wasn't), so you
need to create and initialise a new arcfour instance for decryption; you
can't re-use the existing instance.
 
D

Daniel Fetchinson

Thanks, this looks very simple too, but where is the decryption code?
Wikipedia seems to suggest that encryption and decryption are both the
same but running crypt on the output of crypt doesn't give back the
original string. So probably I'm misunderstanding something.

Yes, the nature of a stream cipher :)

What you're probably doing is what I did, before I had my Aha!!! moment:

attack at dawn
6371736C6E7C3F495C185629210BTraceback (most recent call last):
'\x16\xf7\xf1\xcc\xda\xb5\xe0\xbf\x0b\x13 bF\x8f'


So what's going on? Consider:

Because Arcfour uses xor for the encryption step, decryption is exactly
the same. So you only need one method to do both.

But because Arcfour stores state, calling it twice in a row doesn't give
the same result:
'03405412CA'


Arcfour is a stream cipher. When you call it twice on two different
strings, that is logically equivalent to calling it once on a single long
string made up of concatenating the two strings together. Each time you
encrypt a single character, the internal state ("self.s") changes. To
undo the change, you need the same state. The easiest way to do this is
by creating a new instance:



So long as the two instances stay in lock-step (every time you use up a
byte from the keystream in one, you do the same in the other) you can use
one to decrypt the output of the other. It doesn't even really matter
which one you use:
'Nobody expects the Spanish Inquisition!!!'


In summary: use the arcfour class to create a stream. If you are just
encrypting, or just decrypting, you can use one stream, or as many
streams as you like, using different keys. But to do both, you need two
streams, initiated with the same key, and kept in lockstep.

The advantage of a stream cipher is that you can encrypt a text without
needing all the text at once, and then decrypt it the same way:
output = []
output.append(encrypt.crypt("abcdefghi"))
output.append(encrypt.crypt("jklmno"))
output.append(encrypt.crypt("p"))
output.append(encrypt.crypt("qrstuvwxyz"))
output = ''.join(output)

plain = []
plain.append(decrypt.crypt(output[0:20]))
plain.append(decrypt.crypt(output[20:24]))
plain.append(decrypt.crypt(output[24:]))
''.join(plain)
'abcdefghijklmnopqrstuvwxyz'


Thanks Steven, this was very helpful!

Cheers,
Daniel
 
P

Paul Rubin

Nobody said:
RC4 (aka ArcFour) is quite trivial to implement, and better than inventing
your own cipher or using a Vignere: ...

That's a cute implementation, but it has no authentication and doesn't
include any randomness, which means if you use the same key for two
inputs, there is a security failure (xor'ing the two ciphertexts reveals
the xor of the plaintexts). It also looks rather slow. I don't make
any guarantees about p3.py, but it has been reviewed by several experts
and appears to be reasonably sound for the type of casual use being
discussed here, and it is tuned for speed (given the implementation
constraints). For more demanding purposes, you should use a more
serious library like one of the OpenSSL wrappers.
 
N

Nobody

That's a cute implementation, but it has no authentication and doesn't
include any randomness, which means if you use the same key for two
inputs, there is a security failure (xor'ing the two ciphertexts reveals
the xor of the plaintexts).

Right. RC4 is a cipher, not a cryptosystem.

But, yeah, the OP needs to be aware of the difference (and probably isn't,
yet). So to take that a step further ...

The key passed to arcfour.schedule() shouldn't be re-used. If you need to
encrypt multiple files, use a different key for each. If you want to
encrypt multiple files with the same "password", generate a unique key by
hashing a combination of the password and a random salt (e.g. from
/dev/random), and prepend the salt to the beginning of the stream. To
decrypt, extract the salt from the stream to generate the key.

If you need to verify the data, append a hash of the ciphertext (a hash
of the plaintext would allow an attacker to confirm a guessed plaintext
or to confirm that two files contain the same plaintext). Stream ciphers
are vulnerable to replacement attacks:

(p1 xor r) xor (p1 xor p2) == (p2 xor r)

So if you can guess any part of the plaintext p1, you can replace it with
alternative plaintext p2 without needing to decrypt/encrypt or knowing
anything about the pad r.

Also, if this is for something important, I'd be concerned about how to
protect the key. That's hard enough to do in C, let alone in Python.
It also looks rather slow.

Any kind of bulk binary data processing in pure Python is slow. The code
was written mainly for simplicity, e.g. using generators means that you
don't have to deal with buffer sizes. Replacing " % 256" with " & 255"
might be worthwhile.
I don't make
any guarantees about p3.py, but it has been reviewed by several experts
and appears to be reasonably sound for the type of casual use being
discussed here, and it is tuned for speed (given the implementation
constraints). For more demanding purposes, you should use a more
serious library like one of the OpenSSL wrappers.

The OP specifically wanted to avoid third-party libraries.
 
P

Paul Rubin

Nobody said:
But, yeah, the OP needs to be aware of the difference (and probably isn't,
yet). So to take that a step further ...
The key passed to arcfour.schedule() shouldn't be re-used....
If you need to verify the data, append a hash of the ciphertext ...
If you want to encrypt multiple files with the same "password",
generate a unique key by hashing a combination of the password and a
random salt (e.g. from /dev/random)...

Yep, a whole lot of stuff that's easy to get wrong if it's left to the
user (for example, you mean "a MAC of the ciphertext", not "a hash of
the ciphertext"), and a lot of extra work even if the user gets it
right. It's simplest for the library to provide a single interface that
does all of it.
Any kind of bulk binary data processing in pure Python is slow.

Well, slow is a relative term, but p3.py is about 5x faster than the
fastest pure-Python rc4 implementation that I compared it to. Its
heavy lifting is done by the SHA and array modules, that are written in C.
 
C

Carl Banks

I have a plain text file which I would like to protect in a very
simple minded, yet for my purposes sufficient, way. I'd like to
encrypt/convert it into a binary file in such a way that possession of
a password allows anyone to convert it back into the original text
file while not possessing the password one would only see the
following with the standard linux utility 'file':

[fetchinson@fetch ~]$ file encrypted.data
encrypted.data: data

and the effort required to convert the file back to the original text
file without the password would be equivalent to guessing the
password.


gpg -c simpletextfile.txt -o simpletextfile.gpg

But I guess you can't depend on users to have gpg installed so you
have to roll out some unvetted Python tool.


Carl Banks
 
S

Steve Holden

Carl said:
I have a plain text file which I would like to protect in a very
simple minded, yet for my purposes sufficient, way. I'd like to
encrypt/convert it into a binary file in such a way that possession of
a password allows anyone to convert it back into the original text
file while not possessing the password one would only see the
following with the standard linux utility 'file':

[fetchinson@fetch ~]$ file encrypted.data
encrypted.data: data

and the effort required to convert the file back to the original text
file without the password would be equivalent to guessing the
password.


gpg -c simpletextfile.txt -o simpletextfile.gpg

But I guess you can't depend on users to have gpg installed so you
have to roll out some unvetted Python tool.
The OP's statement of requirements would be pretty much satisfied by the
"crypt" utility. He can even run it under Cygwin on Windows if
necessary. Cryptographic sophistication (or even cryptographic security)
was not requested (and would not be provided anyway by most of the
suggested solutions).

If any real protection is required then an ad-hoc program is definitely
not the way to provide it.

regards
Steve
 
A

Aahz

Well, that's sort of true about learning a complex API :) But it's
also true that I'm not storing anything really valuable in the file
but still wouldn't want to leave it lying around in plain text. In
case I lose the laptop with the file I seriously doubt anybody who
finds it will go through each and every file and try to find what's in
it, even though they look like data files and there is no hint what so
ever that any one of them contains encrypted info. If they see a text
file, well, that can give them ideas, so let's encrypt a little bit.

One reason I like OSX is that it's dirt-simple to encrypt my home dir.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,011
Latest member
AjaUqq1950

Latest Threads

Top