Integer to byte string - Speed improvements

P

Phrogz

I'm writing code that needs to store an integer as a sequence of bytes.
Array#pack works for 1-, 2-, and 4-byte integers, but I want to be able
to support Bignums as well.

I have working code below. I'm happy with it. However, I thought I'd
pose the question to the list: how much faster can you make the below
code run?

# The natural log of 2 (for base-2 logarithms)
Math::LOG2 = Math.log( 2 )

class Integer
# Converts the integer into a string, where the value of each
character
# is a byte in the bit representation of the integer.
# Low-order bytes appear first in the string.
#
# If the number of characters is not specified, the fewest number of
# characters that can represent the integer is used.
#
#
def to_bytestring( num_chars=nil )
unless num_chars
bits_needed = Math.log( self ) / Math::LOG2
num_chars = ( bits_needed / 8.0 ).ceil
end
if pack_code = { 1=>'C', 2=>'S', 4=>'L' }[ num_chars ]
[ self ].pack( pack_code )
else
(0...(num_chars)).map{ |i|
(( self >> i*8 ) & 0xFF ).chr
}.join
end
end

# Creates an integer from a byte string.
# See Integer#to_bytestring for more information.
def self.from_bytestring( str )
val = 0
index = 0
str.each_byte{ |b|
val += b << 8 * index
index += 1
}
val
end

end

[
0x48,
0x6948,
0x646c726f57206f6c6c6548,
0x217962755220666f207463657073612074656577732061207369206d756e676942
].each{ |i| puts i, i.to_bytestring, '' }

#=> 72
#=> H

#=> 26952
#=> Hi

#=> 121404708493354166158910792
#=> Hello World
#=>
3876042760240808297111079005855559646422331404814505579794973210389374306838850
#=> Bignum is a sweet aspect of Ruby!
 
P

Phrogz

I forgot to add - I'd also be interested in seeing a golf tournament on
the two methods. Of course whitespace and variable names could be
changed, but I'm interested in more terse, alternative methods of
calculating the same information.
 
N

Noah Easterly

So, a couple bugs:
# Low-order bytes appear first in the string. Not necessarily
if pack_code = { 1=>'C', 2=>'S', 4=>'L' }[ num_chars ]
in Array.pack, 'S', and 'L', use system-endian order.
To use little endian order use 'v' and 'V', respectively.

Personally, I think it's cleaner to just leave out the clause
altogether.
[...]
unless num_chars
bits_needed = Math.log( self ) / Math::LOG2
num_chars = ( bits_needed / 8.0 ).ceil
end
You're ignoring a bunch of edge cases here.
0 and 1, for example.

For 0, Math.log(0) == -Infinity, which will throw an error when 'ceil'
is called.
For 1, Math.log(1) == 0, so num_chars == 0, which isn't quite right
either.

You'll run into the same issue at other powers of 256, as well.

What you should actually do, instead of taking the ceiling, is round
down and add one.

You're also ignoring negative integers. Math.log doesn't like negative
numbers either, so you'll need to take the absolute value. This
doesn't solve the problem when you're reading back in, of
differentiating between positive and negative integers that have the
same byte code.

Here's my code. I ignore the positive/negative reading back in
problem.

#!/usr/bin/env ruby -w
# The natural log of 256 (for base-256 logarithms)
Math::LOG256 = Math.log( 256 )

class Integer
# Converts the integer into a string, where the value of each
# character is a byte in the bit representation of the integer,
# low order bytes first.
#
# If the number of characters is not specified, the fewest number of
# characters that can represent the integer is used.
def to_bytestring( num_octets=nil )

# find out how many octets are required to encode it
num_octets ||=
(self == 0) ? 1 : ( Math.log(self.abs) / Math::LOG256 ).to_i + 1

# encode the low num_octets bytes of the integer.
shifted = (self << 8)
Array.new(num_octets).map do
((shifted >>= 8) & 0xFF).chr
end.join
end

# Creates an integer from a byte string.
# See Integer#to_bytestring for more information.
# NOTE: reads in negative integers as positive
def self.from_bytestring( str )
str.reverse.split(//).inject(0) do |val, char|
(val << 8) | char[0]
end
end
end

[
-1,
0x0,
0x1,
255,
256,
-255,
-256,
0x48,
26952,
0x6948,
0x646c726f57206f6c6c6548,
0x217962755220666f207463657073612074656577732061207369206d756e676942
].each{ |i|
begin
p [i, i.to_bytestring, Integer.from_bytestring(i.to_bytestring)]
rescue StandardError => err
p err
end
}
 
R

Rick DeNatale

I'm writing code that needs to store an integer as a sequence of bytes.
Array#pack works for 1-, 2-, and 4-byte integers, but I want to be able
to support Bignums as well.

I have working code below. I'm happy with it. However, I thought I'd
pose the question to the list: how much faster can you make the below
code run?

# The natural log of 2 (for base-2 logarithms)
Math::LOG2 = Math.log( 2 )

No need for these, I'm on a campaign to let integers be integers.
class Integer
# Converts the integer into a string, where the value of each
character
# is a byte in the bit representation of the integer.
# Low-order bytes appear first in the string.
#
# If the number of characters is not specified, the fewest number of
# characters that can represent the integer is used.
#
#
def to_bytestring( num_chars=nil )

replace these four lines:
unless num_chars
bits_needed = Math.log( self ) / Math::LOG2

By the way, this won't work for negative integers. Dealing with
negative numbers provides some interesting questions which I'll ignore
for now.
num_chars = ( bits_needed / 8.0 ).ceil
end

with

raise RangeError.new("Can't convert negative integer to bytestring")
num_chars ||= self.size
if pack_code = { 1=>'C', 2=>'S', 4=>'L' }[ num_chars ]
[ self ].pack( pack_code )
else
I might try getting rid of the multiplication
(0...(num_chars)).map{ |i|
(( self >> i*8 ) & 0xFF ).chr
}.join
end

With something like:

result = []
i = self
until i == 0
result << (i & 0xFF).chr
i >>= 8
end

So following those ideas:
rick@frodo:/public/rubyscripts$ cat tobytes.rb
class Integer
def to_bytestring(num_chars=nil)
raise RangeError.new("Can't convert negative number to
bytestring") if self < 0
num_chars ||= self.size
mask = 0xFF << (8 * num_chars - 1)
while (self & mask) == 0
mask >>= 8
num_chars -= 1
end

result = []
bits_left = self
until bits_left == 0
result << (bits_left & 0xFF).chr
bits_left >>= 8
end
result.join
end

# Creates an integer from a byte string.
# See Integer#to_bytestring for more information.
def self.from_bytestring( str )
val = 0
index = 0
str.each_byte{ |b|
val += b << 8 * index
index += 1
}
val
end

end

[
0x48,
0x6948,
0x646c726f57206f6c6c6548,
0x217962755220666f207463657073612074656577732061207369206d756e676942
].each{ |i| puts i, i.to_bytestring, '' }


rick@frodo:/public/rubyscripts$ ruby tobytes.rb
72
H

26952
Hi

121404708493354166158910792
Hello World

3876042760240808297111079005855559646422331404814505579794973210389374306838850
Bignum is a sweet aspect of Ruby!
 
S

Simon Kröger

Phrogz wrote
I forgot to add - I'd also be interested in seeing a golf tournament on
the two methods. Of course whitespace and variable names could be
changed, but I'm interested in more terse, alternative methods of
calculating the same information.

v = 0x217962755220666f207463657073612074656577732061207369206d756e676942
puts [v.to_s(16)].pack('H*').reverse

terse enough? :)

cheers

Simon
 
P

Phrogz

Simon said:
v = 0x217962755220666f207463657073612074656577732061207369206d756e676942
puts [v.to_s(16)].pack('H*').reverse

terse enough? :)

Wow, that's hot. Thanks :)

Got any golf for the reverse? (Creating the number from the string.)
 
P

Phrogz

Noah said:
To use little endian order use 'v' and 'V', respectively.

Ah, thanks.
You're ignoring a bunch of edge cases here.
0 and 1, for example. [...]
What you should actually do, instead of taking the ceiling, is round
down and add one.

Thanks for pointing that out.

You're also ignoring negative integers.

On purpose, actually. I happen to know in my domain that it would be an
error to hit that case. I prefer to keep my code lean and leave out
various validity checks that also kill the program (albeit in a
slightly cleaner way).

# The natural log of 256 (for base-256 logarithms)
Math::LOG256 = Math.log( 256 )

Ah, tricky! I'm unused to thinking of crazy-high bases. That's a nice
shortcut, thanks.

def self.from_bytestring( str )
str.reverse.split(//).inject(0) do |val, char|
(val << 8) | char[0]
end

No solution is complete without an inject. ;)
I wonder (haven't tested yet): is splitting on "" any faster than
splitting on //?
Introducing a string.reverse also feels to me like it would slow things
down. Will need to run some benchmarks.


Thanks for the ideas and code!
 
S

Simon Kröger

Phrogz said:
Simon said:
v = 0x217962755220666f207463657073612074656577732061207369206d756e676942
puts [v.to_s(16)].pack('H*').reverse

terse enough? :)

Wow, that's hot. Thanks :)

Got any golf for the reverse? (Creating the number from the string.)

Well, just do the reverse:

s = 'Bignum is a sweet aspect of Ruby!'
puts s.reverse.unpack('H*').first.to_i(16)

cheers

Simon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,567
Members
45,042
Latest member
icassiem

Latest Threads

Top