# Re: rsa implementation question

Discussion in 'Python' started by Ajay Brar, Aug 11, 2004.

1. ### Ajay BrarGuest

thanks
i am using RSa for signing documents and hence decrypting and then
encrypting to verify?
what i was rather trying to get at was what if the plaintext is too
large? if the plaintext is greater than (log2 n)/8, would you just throw
an error or would you break the plaintext in parts.
the reason i am asking this is because i am timing the sign operations
using the pcrypto package. the time is constant relative to the size of
the plaintext (as you would expect) but at a certain point there is an
increase in the time taken to sign. this corresponds to a very large
plaintext.
that is why i was wondering about the block thing.

Heiko Wundram wrote:

>Am Mittwoch, 11. August 2004 07:08 schrieb Ajay:
>
>
>>just like to know how the RSA implementation in the pcrypto package works.
>>Does it operate in blocks, if yes, what is the size of the blocks?
>>
>>

>
>RSA (Rivest-Shamir-Adleman encryption) never works in "blocks", as there is no
>notion of blocks in public key cryptography, there's only the notion of doing
>an operation on plaintext modulo a large prime n.
>
>Now, you could call (log2 n)/8 the block size in bytes of RSA for a certain
>encryption prime, but I'd never talk of block sizes with RSA, as normally you
>don't use RSA (or public-key cryptography in general) to encrypt plaintext a
>block at a time, but rather to encrypt a random string of bytes [len(s) <=
>(log2 n)/8 for the modulo prime of the algorithm], which is used as the key
>for a normal symmetric encryption algorithm, to which you feed the blocks.
>
>Thus, the receiver (and also the sender) only has to make one expensive
>calculation (retrieving the key from the encrypted RSA packet), whereas
>decrypting the cyphertext (or encrypting it) is done using a symmetric
>cypher, which has the advantage of being much faster to compute than a
>public-key cypher.
>
>Anyway, if you're interested in how symmetric and/or public-key cryptography
>work, read up on:
>
>http://www.cacr.math.uwaterloo.ca/hac/
>
>especially chapters 7 and 8.
>
>Heiko.
>
>

--
Ajay Brar
CS Honours 2004
Smart Internet Technology Research Group

http://www.it.usyd.edu.au/~abrar1

Ajay Brar, Aug 11, 2004

2. ### Sam HoldenGuest

On Wed, 11 Aug 2004 17:17:15 +1000, Ajay Brar <> wrote:
> thanks
> i am using RSa for signing documents and hence decrypting and then
> encrypting to verify?
> what i was rather trying to get at was what if the plaintext is too
> large? if the plaintext is greater than (log2 n)/8, would you just throw
> an error or would you break the plaintext in parts.
> the reason i am asking this is because i am timing the sign operations
> using the pcrypto package. the time is constant relative to the size of
> the plaintext (as you would expect) but at a certain point there is an
> increase in the time taken to sign. this corresponds to a very large
> plaintext.
> that is why i was wondering about the block thing.

If there is a sudden increase then in time then chances are you just
hit a cache barrier - the size before the increase fit into a
cache of some form (be it L1, L2, main memory, disk, etc) but the size
after the increase doesn't and hence you hit L2, main memory, disk,
network, etc.

Of course that's just a general observation I know nothing about the
workings of RSA (well I did implement it once, but that was *long* ago).

Whenever I've done signing with say GnuPG of largish items (such
as tarballs of code) I generate an MD5 sum of the item and then
sign that, giving something like:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

9272bee983085f56aa830ae909f9b8dc pywily.tgz
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFA9j+tH/cbVr+0UoYRAn3OAJ0bMW3HbbXsuFGZ7H9DK5iMLpBDtwCgkSvB
Dj03m5eKTODAA6OKQc34HG4=
=52FO
-----END PGP SIGNATURE-----

I've never actually cared enough to investigate whether that
significantly reduces the security. A smarter person would include the
filesize in the signed portion especially since you can append to tgz
files to generate the MD5 sum you want for your replacement
pywily.tgz...

[snip a full-quote... is that acceptable around these parts?]

--
Sam Holden

Sam Holden, Aug 11, 2004

3. ### Bryan OlsonGuest

> i am using RSa for signing documents and hence decrypting and then
> encrypting to verify?

Unfortunately yes, that seems to be what pycrypto is doing.
The method is now discredited.

> what i was rather trying to get at was what if the plaintext is too
> large?

Always hash and pad, for any size message. I suggest the SHA-1,
hash function, which is in the Python standard library as "sha".

Next you need a padding scheme that formats the message into a
block suitable for the RSA private key operation. The signing
method of PKCS#1 version 1.5 is the most popular RSA signature
scheme, and when the payload is a hash digest it has no known
serious weaknesses.

The function encode_block_from_message, below, will hash a given
message, then build and return a EMSA-PKCS1-v1_5 "Encoded
Message" (EM) from it. The returned EM is suitable for signing
with the pycrypto RSA sign function.

I agree with about half of Heiko Wundram's response.

# sha1_header_tuple is the prefix of the DER encoding of a:
# sequene(sequence(oid, NULL), octet_string)
# where the octet string has length 20, and completes the encoding.
#
sha1_header_tuple = (0x30, 0x21, 0x30, 0x9, 0x6, 0x5, 0x2b, 0xe,
0x3, 0x2, 0x1a, 0x5, 0x0, 0x4, 0x14)

def sha1_hash_and_encode(message):
return sha1_header + sha.new(message).digest()

def encode_block_from_message(message, intended_length):
"""Algorithm EMSA_PKCS1-v1_5 from PKCS 1 version 2
intended_length should be one octet less that modulus length
"""
der_encoding = sha1_hash_and_encode(message)
assert intended_length >= len(der_encoding) + 10
pad_string = chr(0xFF) * (intended_length - len(der_encoding) - 2)
result = chr(1) + pad_string + chr(0) + der_encoding
return result

--
--Bryan

Bryan Olson, Aug 11, 2004
4. ### AjayGuest

> > i am using RSa for signing documents and hence decrypting and then
> > encrypting to verify?

>
> Unfortunately yes, that seems to be what pycrypto is doing.
> The method is now discredited.

could you elaborate on that? i thought signing by decrypting is the way to
do it.

thanks

cheers

>
> > what i was rather trying to get at was what if the plaintext is too
> > large?

>
> Always hash and pad, for any size message. I suggest the SHA-1,
> hash function, which is in the Python standard library as "sha".
>
> Next you need a padding scheme that formats the message into a
> block suitable for the RSA private key operation. The signing
> method of PKCS#1 version 1.5 is the most popular RSA signature
> scheme, and when the payload is a hash digest it has no known
> serious weaknesses.
>
> The function encode_block_from_message, below, will hash a given
> message, then build and return a EMSA-PKCS1-v1_5 "Encoded
> Message" (EM) from it. The returned EM is suitable for signing
> with the pycrypto RSA sign function.
>
> I agree with about half of Heiko Wundram's response.
>
>
> # sha1_header_tuple is the prefix of the DER encoding of a:
> # sequene(sequence(oid, NULL), octet_string)
> # where the octet string has length 20, and completes the encoding.
> #
> sha1_header_tuple = (0x30, 0x21, 0x30, 0x9, 0x6, 0x5, 0x2b, 0xe,
> 0x3, 0x2, 0x1a, 0x5, 0x0, 0x4, 0x14)
>
>
>
> def sha1_hash_and_encode(message):
> return sha1_header + sha.new(message).digest()
>
>
> def encode_block_from_message(message, intended_length):
> """Algorithm EMSA_PKCS1-v1_5 from PKCS 1 version 2
> intended_length should be one octet less that modulus length
> """
> der_encoding = sha1_hash_and_encode(message)
> assert intended_length >= len(der_encoding) + 10
> pad_string = chr(0xFF) * (intended_length - len(der_encoding) - 2)
> result = chr(1) + pad_string + chr(0) + der_encoding
> return result
>
>
> --
> --Bryan
> --
> http://mail.python.org/mailman/listinfo/python-list
>

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

Ajay, Aug 11, 2004
5. ### Heiko WundramGuest

Am Mittwoch, 11. August 2004 10:21 schrieb Bryan Olson:
> I agree with about half of Heiko Wundram's response.

Well, with what don't you agree?

Anyway, I've not read anywhere that for signing a message it is discredited to
use RSA decrypt with private key, encrypt with public key.

Basically, what I always implemented is something like (pseudocode):

retv = [data]
strlen = len(data)
nlen = log2(n)/8
while strlen < nlen-3:
data.append(<randomchars>)
nlen -= len(<randomchars>)
data.append(chr(algo))
data.append(struct.pack("!H",strlen))
return "".join(data)

strlen = struct.unpack("!H",data[-2:])
nlen = log2(n)/8
if strlen > nlen-3:
raise ValueError, "Invalid size of data in packet."
return data[:strlen], ord(data[-3])

def sign(key,data):
# 0 is for sha algorithm.

def verify(key,data,sign):
netsgn, algo = unpad_after_rsa_decrypt(key.encrypt(sign),key)
if algo <> 0:
raise ValueError, "Invalid digest used in packet."
datasgn = sha.new(data).digest()
return datasgn == netsgn

Or something of the like... Anyway, what the deal about this algorithm is that
the number of digits of the data used for encryption/decryption is not known
in advance with high probability, only the last few digits might be known
(length of plaintext encrypted and algorithm used), whereas if you use normal
padding (with zeros), the problem domain is limited because only a certain
number of positions of the plaintext (e.g. 160 bits when using a digest)
actually contain data.

This does not make some public-key algorithms weaker (ElGamal), but RSA has to
cope with the fact that it doesn't do inherent randomization (so for equal
data to sign and equal key, you'll get equal signatures, which is bad!)
That's why I would advise you to go use ElGamal, which is much better in this
respect, and is patent-free too (well RSA is too, but anyway, the ElGamal
family of public-key ciphers always was).

And, if you were using it to encrypt/decrypt a symmetric encryption key, you
could also pad the algorithm used for encryption into the string, so that
only the proper receiving end could get this last bit of info on the
encryption method used (security by obscurity, but anyway).

So much for what I always did. I really don't know whether this is some form
of secure way to go, but at least no cryptography book I read has ever
discouraged the use of random padding while encrypting data which is much
shorter than the "block size" of a public-key crypto algorithm (esp. for
RSA).

Heiko.

Heiko Wundram, Aug 11, 2004
6. ### Heiko WundramGuest

Am Mittwoch, 11. August 2004 19:20 schrieb Heiko Wundram:
> retv = [data]
> strlen = len(data)
> nlen = log2(n)/8
> while strlen < nlen-3:
> data.append(<randomchars>)
> nlen -= len(<randomchars>)
> data.append(chr(algo))
> data.append(struct.pack("!H",strlen))
> return "".join(data)

Whoops, replace data.append by retv.append in that whole block...

Heiko.

Heiko Wundram, Aug 11, 2004
7. ### Heiko WundramGuest

Am Mittwoch, 11. August 2004 11:47 schrieb Ajay:
> > > i am using RSa for signing documents and hence decrypting and then
> > > encrypting to verify?

Oh, btw., if you're not necessarily bound to PyCrypto, you might give
Cryptopia a try:

http://www.heim-d.de/~heikowu/Crypto

It's a wrapper for GMP (PyGMP) combined with an ElGamal and RSA
encryption/decryption engine, and it's fast, faster than PyCrypto for large
key sizes (because it uses GMP) and simpler to use (because everything that
is returned from decrypt/encrypt is returned in classes which can easily be
stored to string/loaded from string). I've only built it on Linux and
Solaris, can't say if it'll build/work on Windows, although there is a port
of GMP to Windows out there...

Look at the examples if you need more info on how to use it.

Heiko.

Heiko Wundram, Aug 11, 2004
8. ### Bryan OlsonGuest

Heiko Wundram wrote:
> Am Mittwoch, 11. August 2004 10:21 schrieb Bryan Olson:
> > I agree with about half of Heiko Wundram's response.

>
> Well, with what don't you agree?

Well, since you asked:

|> RSA (Rivest-Shamir-Adleman encryption) never works in
|> "blocks", as there is no notion of blocks in public key
|> cryptography, there's only the notion of doing an operation
|> on plaintext modulo a large prime n.

There is a notion of blocks in many public-key ciphers,
including RSA. The modulus n in RSA is composite, not prime.
The "only the notion" statement implies that integer modular
arithmetic is the only base for public-key cryptography, which
is not true.

> Anyway, I've not read anywhere that for signing a message it is
> discredited to use RSA decrypt with private key, encrypt with
> public key.

Try the book you cited, section 11.2.3, Note 11.10, Example
11.11, and Remark 11.12.

http://www.cacr.math.uwaterloo.ca/hac/

Subsequent to the publishing of HAC, the 'redundancy function'
they describe in Section 11.3.5 'ISO/IEC 9796 formatting', fell
to a series of attacks, first by J. S. Coron, D. Naccache and J.
Stern, then improved and extended by D. Coppersmith, S. Halevi
and C. Jutla, and several following works.

The other redundancy function they describe is the one for which
I presented Python code (for the SHA-1 usage). Don't use it
without the hash function.

There are now more trustworthy padding methods for RSA signing
and encryption, based on the "Exact Security" and "OAEP" papers
of Bellare and Rogaway, with arguably interesting follow-ups by
Victor Shoup.

> Basically, what I always implemented is something like (pseudocode):

Don't do that, even for encryption. See Bleichenbacher's
attacks on RSA encrpytion:

http://www.bell-labs.com/user/bleichen/bib.html

[...]

> So much for what I always did. I really don't know whether this is some form
> of secure way to go, but at least no cryptography book I read has ever
> discouraged the use of random padding while encrypting data which is much
> shorter than the "block size" of a public-key crypto algorithm (esp. for
> RSA).

Then I'm guessing I won't see you at Crypto 04 next week

--
--Bryan

Bryan Olson, Aug 12, 2004
9. ### Bryan OlsonGuest

Ajay wrote:
> could you elaborate on that? i thought signing by decrypting is
> the way to do it.

That is how Rivest, Shamir and Adleman originally presented it,
and they did win the ACM's Turing Award for the work, but the
use of 'raw' RSA is full of subtle problems. To understand the
basics, see chapter 11 of the book that Heiko Wundram cited:
(available free on-line)

http://www.cacr.math.uwaterloo.ca/hac/

In you really want to understand the math, I cited some of the
major papers in my response to Wundram. That's beyond the scope
of this group.

If you're implementing, just use a current standard that
cryptologists respect. For basic RSA (en/de)crypt and
sign/verify, PKCS#1 is a fine way to go. As I write this the
current version is 2.1, which is also published as RFC 3447.

--
--Bryan

Bryan Olson, Aug 12, 2004
10. ### Heiko WundramGuest

Am Donnerstag, 12. August 2004 02:36 schrieb Bryan Olson:
> There is a notion of blocks in many public-key ciphers,
> including RSA. The modulus n in RSA is composite, not prime.
> The "only the notion" statement implies that integer modular
> arithmetic is the only base for public-key cryptography, which
> is not true.

Pardon me, yeah, the modulus is composite, I should've rather said that most
public key ciphers work in Z{n}. And I know that it's pretty easy to
generalize ElGamal to work in other rings than Z, or take bucket public key
cryptography for example.

I know of others, but: have you ever seen an implementation or use case of
some algorithm not working in Z{n}(*)? I have not, at least not when working
with computers.

> Try the book you cited, section 11.2.3, Note 11.10, Example
> 11.11, and Remark 11.12.
>
> http://www.cacr.math.uwaterloo.ca/hac/

Yup, I read those, and yeah, I know what they are talking about, and yeah,
that's not what I said. It's not about decrypting to sign, encrypting to
verify, it's about the redundancy function. In case you use a proper hash
function (I'm talking about identity with SHAing before signing), this attack
blatantly fails, as it would mean that you'd have to find hash collisions.
Even when I can easily find a m~ for which I have a valid signature m,
finding a plaintext for m~ which is meaningful should be impossible, due to
the hashing, for "long enough plaintext."

And the example code I gave was not about leaving out hashing before putting
it throught the identity function as "redundancy", it was exactly about
reducing redundancy, which probably I was to stupid to explain correctly,
but I'll try now.

> Don't do that, even for encryption. See Bleichenbacher's
> attacks on RSA encrpytion:
>
> http://www.bell-labs.com/user/bleichen/bib.html

Link doesn't work... But anyway, look at the same book on page 288, (ii):

"A pseudorandomly generated bitstring should be appended to the plaintext
before encryption (this is also called salting)."

That's exactly what my little example was trying to do: append a salt to the
data which was being encrypted. First, this gives an internal structure to
the data which is being encrypted/decrypted (and thus makes it easy to
discard invalid values), as you just have to check that the two lowest bytes
of the number coming out are <= keysize//8, and also assures that you don't
lose information when for example you are encrypting "what is this?" with a
768 bit key, now when this number comes out of decryption, the string will
have been "prepadded" with zeros, but as you know the actual length of the
string (from the two lowest bytes), you can extract the string properly.

Second, what this also achieves, is that you may encrypt the same plaintext
for the same recipient multiple times without an eavesdropper seeing that the
plaintext is identical (because of the salt), and

Thirdly, the attack stated on the given page is also not successful (when
encrypting to multiple recipients with the same encryption exponent, but this
is certainly not true for most common situations I need cryptography in).

So, pre/postpadding with salt is certainly not the wrong way to go for
encryption.

Now I'd really like to hear from you why this same argument doesn't go for
signing. Because:

1) I generate a signature for a string "some string" with SHA.
2) I decrypt this hash, which is salted by the same standard salting algorithm
I noted above with my private key to get the signature
3) I encrypt the signature to get the salted hash with my public key, which
can easily be unsalted, and then checked for validity.

Forging a valid signature would mean the following steps:

1) Find a pair of numbers (n,E(n)) for which E(n) has the lowest two bytes
<= keysize//8. It's pretty easy to insert additional constraints here: for
example, the string of randomized bits in E(n) has to have parity one to be
accepted, this would make this step harder (but not all of the positions may
be one, except when the bitstring is of length one, or something).
2) Now, find a pair from one of the pairs found in (1) which has a value which
has correct length (sha digests are always 160 bits long),
3) Now, for this pair, find a message which corresponds to the hash
value.

If you complete these three steps (well, you'd only have to find one pair in
step one and two, which should be hard enough), you'll have a faked
signature. But not before this.

I recon that step three is computationally infesible, and I also assume that
finding a pair (n,E(n)) which first of all has the properties that I give to
it (stated above), and also turns out to be a valid hash value for some
string I want it to be is computationally infesible. But, as I stated above,
even if step 1 is dead easy (by using the identify function here), what is
the use if I can easily find a a proper signature for a certain hash value, if
I have no plaintext for the hash value for which I have the signature? The
problem being that I can't construct the signature from the hash value, but
rather even when using identidy redundancy, I have to have the signature
first, before constructing the hash value.

At least that's what my professor taught me at univ, and with what I can
agree.

> Then I'm guessing I won't see you at Crypto 04 next week

No, you won't, I'm more or less a hobby cryptographer... And I guess I'm
certainly not the smartest at this, but... I'd love to hear why you think the
above argumentation is wrong, or direct me to a paper which explains why
salting a signature with an equally simple algorithm I proposed to salt a
string to encrypt won't make things "more secure" for signing also...

Anyway, if you're up to some more explaining, feel free to contact me
off-list... I'd be glad to learn...

Heiko.

Heiko Wundram, Aug 12, 2004
11. ### Bryan OlsonGuest

Hi Heiko. I didn't mean any offense by "I agree with about half
of Heiko Wundram's response." This is what I do for a living,
so I tend to sweat details and push points.

Heiko Wundram wrote:
> Am Donnerstag, 12. August 2004 02:36 schrieb Bryan Olson:
>
>>There is a notion of blocks in many public-key ciphers,
>>including RSA. The modulus n in RSA is composite, not prime.
>>The "only the notion" statement implies that integer modular
>>arithmetic is the only base for public-key cryptography, which
>>is not true.

>
> Pardon me, yeah, the modulus is composite, I should've rather said

that most
> public key ciphers work in Z{n}. And I know that it's pretty easy to
> generalize ElGamal to work in other rings than Z, or take bucket

public key
> cryptography for example.

I don't know what "bucket public key" cryptography is. Google
finds no matches.

Incidentally, ElGamal generalizes to groups; they don't have to be
rings. (Generalizes in the sense that it can verify what it
signs and decrypt what it encrypts; it's insecure in many
groups/rings).

> I know of others, but: have you ever seen an implementation or use

case of
> some algorithm not working in Z{n}(*)? I have not, at least not when

working
> with computers.

Sure. I worked for several years for Certicom, the leader in
elliptic-curve cryptography (ECC). ECC is efficient, deployed,
and now part of the federal standard for digital signatures,
FIPS 186-2 (and the draft of FIPS 186-3).

>>Try the book you cited, section 11.2.3, Note 11.10, Example
>>11.11, and Remark 11.12.
>>
>> http://www.cacr.math.uwaterloo.ca/hac/

>
> Yup, I read those, and yeah, I know what they are talking about, and

yeah,
> that's not what I said. It's not about decrypting to sign, encrypting to
> verify, it's about the redundancy function.

Which decrypt-to-sign doesn't have. In particular, look at your
code. HAC says, "Selection of an appropriate [redundancy
function] is critical to the security of the system." That's a
key place where your code falls down.

> In case you use a proper hash
> function (I'm talking about identity with SHAing before signing),
> this attack blatantly fails,

Actually, I was the one who wrote "Always hash and pad, for any
size message," and included code that uses SHA-1. Your follow-
up code didn't hash at all. Since the attack fails for the
method I use, but works against yours, I don't see what you are
arguing.

> as it would mean that you'd have to find hash collisions.
> Even when I can easily find a m~ for which I have a valid signature m,
> finding a plaintext for m~ which is meaningful should be impossible,

due to
> the hashing, for "long enough plaintext."

This one is subtle. If I read your pseudo-code correctly, and
in the place of the message we assume a SHA-1 hash, then there
are 160 bits of hash digest, 16-bits of length, and the rest of
the bits are random. Assuming the currently-most-popular RSA
key size, 1024 bits, the attacker has to control just 176 out of
1024 bits. The verifier can check those 176 bits, but 847 or
848 bits can be anything.

I'm not sure off-hand how weak it is, but none of the currently-
respected signature methods allow a single hash digest to have
anywhere near 2^847 valid signatures in a 2^1024 signature
space.

> And the example code I gave was not about leaving out hashing before

putting
> it throught the identity function as "redundancy", it was exactly about
> reducing redundancy, which probably I was to stupid to explain

correctly,
> but I'll try now.

I can't read your mind. I only knew what your post said your
code was about: "what I always implemented is something like
[...]".

The state-of-the art signature methods agree with you that some
randomness is a good thing. The need for randomness in
signatures is not yet well-understood; it's an assumption of the
(relative) security proofs, but no one has yet proved it is
necessary.

>>Don't do that, even for encryption. See Bleichenbacher's
>>attacks on RSA encryption:
>>
>> http://www.bell-labs.com/user/bleichen/bib.html

>
>
> Link doesn't work...

Hmm... works for me. Odd. Googling "Bleichenbacher" + "RSA"
should find it.

> But anyway, look at the same book on page 288, (ii):
>
> "A pseudorandomly generated bitstring should be appended to the

plaintext
> before encryption (this is also called salting)."
>
> That's exactly what my little example was trying to do: append a salt

to the
> data which was being encrypted.

And that's why I didn't fault your example for lack of
randomness. (Though technically the message and length could
fill the block, and thus not allow any randomness. That would
be a big mistake for encryption, and arguably a mistake for
signatures.)

If you have questions about these things, I suggest asking them
on sci.crypt. My advice here is to use cryptosystems that
experts have already vetted; do not roll your own variants.

--
--Bryan

Bryan Olson, Aug 12, 2004
12. ### Mel WilsonGuest

In article <>,
Heiko Wundram <> wrote:
>Am Mittwoch, 11. August 2004 10:21 schrieb Bryan Olson:
>> I agree with about half of Heiko Wundram's response.

>
>Well, with what don't you agree?
>
>Anyway, I've not read anywhere that for signing a message it is discredited to
>use RSA decrypt with private key, encrypt with public key.

I believe there's a supplied-plaintext attack. Your enemy can use
your public key to compute a special text, and ask you to sign it.
The effect of the computed text is that your enemy can then compute
your secret key from the signature. Hashing beforehand makes is almost
impossible to fall into this trap.

Regards. Mel.

Mel Wilson, Aug 16, 2004