Efficient processing of binary data streams in Ruby?

T

theosib

I'm writing a Ruby program that has to process binary data from files
and sockets. Data items are in bytes, 16-bit words, or 32-bit words,
and I cannot predict in advance whether the data will be msb-first or
lsb-first, so I end up writing things like this:

def unpack_16(x)
@msb_first ? ((x[0]<<8)|x[1]) : ((x[1]<<8)|x[0])
end

def pack_16(x)
y = "xx"
if (@msb_first)
y[0] = x>>8
y[1] = x&255
else
y[0] = x&255
y[1] = x>>8
end
end

I expect, however, that this will be painfully slow, and I can't
imagine that this hasn't been though of before. Is there a better way
to do this that will result in much better performance?

Thanks!
 
T

Tim Pease

I'm writing a Ruby program that has to process binary data from files
and sockets. Data items are in bytes, 16-bit words, or 32-bit words,
and I cannot predict in advance whether the data will be msb-first or
lsb-first, so I end up writing things like this:

def unpack_16(x)
@msb_first ? ((x[0]<<8)|x[1]) : ((x[1]<<8)|x[0])
end

def pack_16(x)
y = "xx"
if (@msb_first)
y[0] = x>>8
y[1] = x&255
else
y[0] = x&255
y[1] = x>>8
end
end

I expect, however, that this will be painfully slow, and I can't
imagine that this hasn't been though of before. Is there a better way
to do this that will result in much better performance?

def unpack_16( str )
@msb_first ? str.unpack('n') : str.unpack('S')
end

def pack_16( num )
@msb_first ? [num].pack('n') : [num].pack('S')
end


That will work for little-endian processors (Intel) but not for
big-endian processors (PowerPC, Sparc). For these methods to work on
the latter you'll have to do something like this ...

def unpack_16( str )
str = str.reverse unless @msb_first
str.unpack('n')
end

def pack_16( num )
str = [num].pack('n')
str.reverse unless @msb_first
end


Just define the desired method based on the processor type -- which
can be figued out by doing this ...

LITTLE_ENDIAN = [42].pack('I')[0] == 42

if LITTLE_ENDIAN
# define little endian methods here
else
# define big endian methods here
end

Hope that helps

Blessings,
TwP
 
A

ara.t.howard

I'm writing a Ruby program that has to process binary data from files and
sockets. Data items are in bytes, 16-bit words, or 32-bit words, and I
cannot predict in advance whether the data will be msb-first or lsb-first,
so I end up writing things like this:

def unpack_16(x)
@msb_first ? ((x[0]<<8)|x[1]) : ((x[1]<<8)|x[0])
end

def pack_16(x)
y = "xx"
if (@msb_first)
y[0] = x>>8
y[1] = x&255
else
y[0] = x&255
y[1] = x>>8
end
end

I expect, however, that this will be painfully slow, and I can't imagine
that this hasn't been though of before. Is there a better way to do this
that will result in much better performance?

this will be __extremely__ fast for even huge buffers of data


harp:~ > ruby a.rb
huge(100000) LSB(8) in 0.00117683410644531s
huge(100000) LSB(16) in 0.00181722640991211s
huge(100000) LSB(32) in 0.00884389877319336s
huge(100000) MSB(8) in 0.00245118141174316s
huge(100000) MSB(16) in 0.0045168399810791s
huge(100000) MSB(32) in 0.0078279972076416s


harp:~ > cat a.rb
require 'rubygems'
require 'narray'

module Intification
LSB = :LSB
MSB = :MSB
HOST = [42].pack('i').unpack('c').first == 42 ? LSB : MSB

def ints bits = 8, order = LSB
words = bits / 8

type =
case bits.to_i
when 8
NArray::BYTE
when 16
NArray::SINT
when 32
NArray::INT
else
raise ArgumentError, bits.inspect
end

na = NArray.to_na to_s, type, size/words
order == HOST ? na : na.swap_byte
end
end

class String
include Intification
end

def bm label
a = Time.now
yield
b = Time.now
puts "#{ label } in #{ b.to_f - a.to_f }s"
end

n = 100_000

huge = { :LSB => {}, :MSB => {} }

huge[:LSB][8] = [39,40,41,42].pack('c*') * n
huge[:LSB][16] = [39,40,41,42].pack('s*') * n
huge[:LSB][32] = [39,40,41,42].pack('i*') * n

huge[:MSB][8] = [39,40,41,42].pack('c*') * n
huge[:MSB][16] = [39,40,41,42].pack('n*') * n
huge[:MSB][32] = [39,40,41,42].pack('N*') * n

[:LSB, :MSB].each do |order|
[8,16,32].each do |bits|
bm "huge(#{ n }) #{ order.to_s}(#{ bits })" do
string = huge[order][bits]
ints = string.ints(bits, order)
last = ints[-4..-1]
raise unless last[0] = 39
raise unless last[1] = 40
raise unless last[2] = 41
raise unless last[3] = 42
end
end
end


regards.

if youre on windows i have an narray install

-a
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,733
Messages
2,569,440
Members
44,832
Latest member
GlennSmall

Latest Threads

Top