yEnc implementation in Python, bit slow

F

Freddie

Hi,

I posted a while ago for some help with my word finder program, which is now
quite a lot faster than I could manage. Thanks to all who helped :)

This time, I've written a basic batch binary usenet poster in Python, but
encoding the data into yEnc format is fairly slow. Is it possible to improve
the routine any, WITHOUT using non-standard libraries? I don't want to have
to rely on something strange ;)

yEncode1 tends to be slightly faster here for me on my K6/2 500:

$ python2.3 testyenc.py
yEncode1 401563 1.82
yEncode1 401563 1.83
yEncode2 401562 1.83
yEncode2 401562 1.83

Any help would be greatly appreciated :)

Freddie


import struct
import time
from zlib import crc32

def timing(f, n, a):
print f.__name__,
r = range(n)
t1 = time.clock()
for i in r:
#f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a); f(a)
f(a)
t2 = time.clock()
print round(t2-t1, 3)

def yEncSetup():
global YENC
YENC = [''] * 256

for I in range(256):
O = (I + 42) % 256
if O in (0, 10, 13, 61):
# Supposed to modulo 256, but err, why bother?
O += 64
YENC = '=%c' % O
else:
YENC = '%c' % O

def yEncode1(data):
global YENC
yenc = YENC

encoded = []
datalen = len(data)
n = 0
while n < datalen:
chunk = data[n:n+256]
n += len(chunk)
encoded.extend([yenc[ord(c)] for c in chunk])
encoded.append('\n')

print len(''.join(encoded)),

def yEncode2(data):
global YENC
yenc = YENC

lines = []
datalen = len(data)
n = 0

bits = divmod(datalen, 256)
format = '256s' * bits[0]
parts = struct.unpack(format, data[:-bits[1]])
for part in parts:
lines.append(''.join([yenc[ord(c)] for c in part]))

lines.append(''.join([yenc[ord(c)] for c in data[-bits[1]:]]))
print len('\n'.join(lines) + '\n'),


yEncSetup()

teststr1 = 'a' * 400000
teststr2 = 'b' * 400000

for meth in (yEncode1, yEncode2):
timing(meth, 1, teststr1)
timing(meth, 1, teststr2)
 
O

Oren Tirosh

Hi,

I posted a while ago for some help with my word finder program, which is now
quite a lot faster than I could manage. Thanks to all who helped :)

This time, I've written a basic batch binary usenet poster in Python, but
encoding the data into yEnc format is fairly slow. Is it possible to improve
the routine any, WITHOUT using non-standard libraries? I don't want to have
to rely on something strange ;)

Python is pretty quick as long as you avoid loops that operate character
by character. Try to use functions that operate on longer strings.

Suggestions:

For the (x+42)%256 build a translation table and use str.translate.
To encode characters as escape sequences use str.replace or re.sub.

Oren
 
F

Freddie

Suggestions:

For the (x+42)%256 build a translation table and use str.translate.
To encode characters as escape sequences use str.replace or re.sub.

Oren

Aahh. I couldn't work out how to use translate() at 4am this morning, but I
worked it out now :) This version is a whoooole lot faster, and actually
meets the yEnc line splitting spec. Bonus!

$ python2.3 testyenc.py
yEncode1 407682 1.98
yEncode2 407707 0.18

I'm not sure how to use re.sub to escape the characters, I assume it would
also be 4 seperate replaces? Also, it needs a slightly more random input
string than 'a' * 400000, so here we go.


test = []
for i in xrange(256):
test.append(chr(i))
teststr = ''.join(test*1562)


def yEncode2(data):
trans = ''
for i in range(256):
trans += chr((i+42)%256)

translated = data.translate(trans)

# escape =, NUL, LF, CR
for i in (61, 0, 10, 13):
j = '=%c' % (i + 64)
translated = translated.replace(chr(i), j)


encoded = []
n = 0
for i in range(0, len(translated), 256):
chunk = translated[n+i:n+i+256]
if chunk[-1] == '=':
chunk += translated[n+i+256+1]
n += 1
encoded.append(chunk)
encoded.append('\n')

result = ''.join(encoded)

print len(result),
return result
 
F

Freddie

Arr. There's an error here, the [n+i+256+1] shouldn't have a 1. I always get
that wrong :) The posted files actually decode now, and the yEncode()
overhead is a lot lower.

encoded = []
n = 0
for i in range(0, len(translated), 256):
chunk = translated[n+i:n+i+256]
if chunk[-1] == '=':
chunk += translated[n+i+256] <<< this line
n += 1
encoded.append(chunk)
encoded.append('\n')
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,054
Latest member
TrimKetoBoost

Latest Threads

Top