Bit Manipulation for a Pure Ruby SHA1 implementation

J

Jason Larsen

I'm trying to write a pure Ruby implementation of SHA1. In order to do
so, I need to be able to directly manipulate bits. I know about the
bitwise operators in Ruby, but I'm having issues with the way that
Ruby wraps things up in classes. Basically I've run into two problems.

PROBLEM 1: I need to pad the message by appending a 1-bit, and then a
certain number of 0s. When appending numbers (e.g. 0b1) to the end of
by string, Ruby wraps each number in its Fixnum class making it 1 byte
long: So basically adding 0b1 appends 0b0000_0001, which is 7 zeros
that shouldn't be there. The fix I've found is I to tack 0x80 on the
end, which concats 0b1000_0000. The downside is that I have to keep
track of my zeros in batches of bytes (which at this point my
implementation handles alright), but I'd like to know how to do it bit
by bit if possible.

I've seen the String.unpack and Array.pack used for binary data.
However, I can only get the bits out in one glob str.unpack("B*"), or
str.unpack("B8"*string.length), which still just gives me a bunch of
stringified 8bit binary values. Seems like there should be a better
way.

PROBLEM 2: After adding a one-bit and k zeros, I need to append a 64-
bit long integer representing the length of the message being hashed.
It seem like Fixnum is 8 bytes long on my machine, so the question is
how do I add a 64bit binary representation of this number to the end
of a string?

Any help would be greatly appreciated. Thanks in advance.
 
J

Josh Cheek

[Note: parts of this message were removed to make it a legal post.]

I'm trying to write a pure Ruby implementation of SHA1. In order to do
so, I need to be able to directly manipulate bits. I know about the
bitwise operators in Ruby, but I'm having issues with the way that
Ruby wraps things up in classes. Basically I've run into two problems.

PROBLEM 1: I need to pad the message by appending a 1-bit, and then a
certain number of 0s. When appending numbers (e.g. 0b1) to the end of
by string, Ruby wraps each number in its Fixnum class making it 1 byte
long: So basically adding 0b1 appends 0b0000_0001, which is 7 zeros
that shouldn't be there. The fix I've found is I to tack 0x80 on the
end, which concats 0b1000_0000. The downside is that I have to keep
track of my zeros in batches of bytes (which at this point my
implementation handles alright), but I'd like to know how to do it bit
by bit if possible.

I've seen the String.unpack and Array.pack used for binary data.
However, I can only get the bits out in one glob str.unpack("B*"), or
str.unpack("B8"*string.length), which still just gives me a bunch of
stringified 8bit binary values. Seems like there should be a better
way.

PROBLEM 2: After adding a one-bit and k zeros, I need to append a 64-
bit long integer representing the length of the message being hashed.
It seem like Fixnum is 8 bytes long on my machine, so the question is
how do I add a 64bit binary representation of this number to the end
of a string?

Any help would be greatly appreciated. Thanks in advance.
Hi, if you just need a portable SHA1 implementation, there is already one in
the standard library.
http://ruby-doc.org/stdlib/libdoc/digest/rdoc/index.html

As for your problems, I'm finding myself really confused by your
terminology. For example, when you say "adding a one-bit and k zeros" does
that mean num + 2**k, Or is num <<= 1 ; num += 1 ; num <<= k or something
else even?

It might be more helpful if you show "this is what I have, this is what I
want, this is how I tried to get there"
 
J

Jarsen

[Note:  parts of this message were removed to make it a legal post.]

I'm trying to write a pure Ruby implementation of SHA1. In order to do
so, I need to be able to directly manipulate bits. I know about the
bitwise operators in Ruby, but I'm having issues with the way that
Ruby wraps things up in classes. Basically I've run into two problems.
PROBLEM 1: I need to pad the message by appending a 1-bit, and then a
certain number of 0s. When appending numbers (e.g. 0b1) to the end of
by string, Ruby wraps each number in its Fixnum class making it 1 byte
long: So basically adding 0b1 appends 0b0000_0001, which is 7 zeros
that shouldn't be there. The fix I've found is I to tack 0x80 on the
end, which concats 0b1000_0000. The downside is that I have to keep
track of my zeros in batches of bytes (which at this point my
implementation handles alright), but I'd like to know how to do it bit
by bit if possible.
I've seen the String.unpack and Array.pack used for binary data.
However, I can only get the bits out in one glob str.unpack("B*"), or
str.unpack("B8"*string.length), which still just gives me a bunch of
stringified 8bit binary values. Seems like there should be a better
way.
PROBLEM 2: After adding a one-bit and k zeros, I need to append a 64-
bit long integer representing the length of the message being hashed.
It seem like Fixnum is 8 bytes long on my machine, so the question is
how do I add a 64bit binary representation of this number to the end
of a string?
Any help would be greatly appreciated. Thanks in advance.

Hi, if you just need a portable SHA1 implementation, there is already onein
the standard library.http://ruby-doc.org/stdlib/libdoc/digest/rdoc/index.html

As for your problems, I'm finding myself really confused by your
terminology. For example, when you say "adding a one-bit and k zeros" does
that mean num + 2**k, Or is num <<= 1 ; num += 1 ; num <<= k or something
else even?

It might be more helpful if you show "this is what I have, this is what I
want, this is how I tried to get there"

Sorry, looking back on that I even confused myself. I am aware of the
SHA1 implementation in the standard library, I'm just trying to write
one myself using pure Ruby.

This is what I have so far for padding:

def SHA1 message
k = 448-((message.bitsize % block_bitsize) + 1)
m = message.unpack("B*")[0].to_i(2) << (k+65)
padding = 1
padding <<= k+64
padding |= message.bitsize # then add last 64 bit block (message
length) which is the len
m |= padding # add the padding to the message
#... split into 512-bit blocks
#... do SHA1 operations
end

Where bitsize is defined as:
class String; def bitsize; bytesize * 8; end; end


The method so far seems to be giving me the correct value for m.
However, my problem is with the leading zeros. For example, if the
message is "abc, message.unpack("B*")[0] returns
"011000010110001001100011". When I call to_i(2) on that number, the
leading 0 bit will disappear - which makes sense because to 01 an 1
are both the same number.

However, when I begin to perform sha1 operations on those blocks I
will need that leading 0 back; in the example where message = "abc",
m.to_s(2) or will only give me a 511 digit long output, when I need a
512 bit block. So I need to figure out how to keep those leading zeros
in place so I have the correct block size. Hopefully this makes more
sense.
 
J

Jarsen

[Note:  parts of this message were removed to make it a legal post.]
Hi, if you just need a portable SHA1 implementation, there is already one in
the standard library.http://ruby-doc.org/stdlib/libdoc/digest/rdoc/index.html
As for your problems, I'm finding myself really confused by your
terminology. For example, when you say "adding a one-bit and k zeros" does
that mean num + 2**k, Or is num <<= 1 ; num += 1 ; num <<= k or something
else even?
It might be more helpful if you show "this is what I have, this is whatI
want, this is how I tried to get there"

Sorry, looking back on that I even confused myself. I am aware of the
SHA1 implementation in the standard library, I'm just trying to write
one myself using pure Ruby.

This is what I have so far for padding:

def SHA1 message
    k = 448-((message.bitsize % block_bitsize) + 1)

# EDIT
# I think I need this here actually for certain cases
k+=512 if k < 0
    m = message.unpack("B*")[0].to_i(2) << (k+65)
    padding = 1
    padding <<= k+64
    padding |= message.bitsize # then add last 64 bit block (message
length) which is the len
    m |= padding # add the padding to the message
    #... split into 512-bit blocks
    #... do SHA1 operations
end

Where bitsize is defined as:
    class String; def bitsize; bytesize * 8; end; end

The method so far seems to be giving me the correct value for m.
However, my problem is with the leading zeros. For example, if the
message is "abc, message.unpack("B*")[0] returns
"011000010110001001100011". When I call to_i(2) on that number, the
leading 0 bit will disappear - which makes sense because to 01 an 1
are both the same number.

However, when I begin to perform sha1 operations on those blocks I
will need that leading 0 back; in the example where message = "abc",
m.to_s(2) or will only give me a 511 digit long output, when I need a
512 bit block. So I need to figure out how to keep those leading zeros
in place so I have the correct block size. Hopefully this makes more
sense.
 
B

Brian Candler

Jason Larsen wrote in post #949355:
PROBLEM 1: I need to pad the message by appending a 1-bit, and then a
certain number of 0s. When appending numbers (e.g. 0b1) to the end of
by string, Ruby wraps each number in its Fixnum class making it 1 byte
long: So basically adding 0b1 appends 0b0000_0001, which is 7 zeros
that shouldn't be there. The fix I've found is I to tack 0x80 on the
end, which concats 0b1000_0000. The downside is that I have to keep
track of my zeros in batches of bytes (which at this point my
implementation handles alright), but I'd like to know how to do it bit
by bit if possible.

A String in ruby is a collection of bytes, not bits.

I think you'll find that as long as your input to your sha1 algorithm is
also a String, and thus a multiple of 8 bits, everything will work fine.

The way the sha1 algorithm is defined it can be implemented for an input
which is not a multiple of 8 bits, but as you've found, it's not easy to
represent that anyway :) Therefore, if you want your implementation to
work with strange numbers of bits, you'll first need to choose an input
representation.

If I were you, I'd be happy to limit my implementation to inputs which
are whole bytes. (After all, even a file on the filesystem is a whole
number of bytes)
PROBLEM 2: After adding a one-bit and k zeros, I need to append a 64-
bit long integer representing the length of the message being hashed.
It seem like Fixnum is 8 bytes long on my machine, so the question is
how do I add a 64bit binary representation of this number to the end
of a string?

Don't worry about Fixnum versus Bignum, that's internal to the
implementation and mostly hidden. Even on a 32-bit machine you can make
Integers which are 64-bits and above.

Unfortunately, Array#pack("Q") works differently on little-endian and
big-endian machines:

irb(main):001:0> [255].pack("Q")
=> "\377\000\000\000\000\000\000\000"

So for maximum portability I'd do two instances of "N", i.e.
n = 0x0123456789abcdef => 81985529216486895
[n >> 32, n].pack("NN")
=> "\001#Eg\211\253\315\357"

which you can check is correct like this:
[n >> 32, n].pack("NN").unpack("H*")
=> ["0123456789abcdef"]

Regards,

Brian.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,070
Latest member
BiogenixGummies

Latest Threads

Top