Hashing long strings...

W

WTH

Hi everybody! (Hi Dr. Nick!)

I've got an applet that sends long strings (usually between 30 and 512
characters - but potentially up to 2048) via HTTP that represent
resource requests to my server (not the web server in this instance, a
TCP/IP server I've written to handle these resource requests) which is
running on the same machine as the web server. These resource
requests boil down to an individual file whose URL needs to be sent
back to the applet so it can download it.

The server (written in C++) needs to identify that request and
associate it with a previous request which has been fulfilled for
another client applet most likely in the past. If it has never had a
request exactly like this one, the server creates the required
resource and stores is on the web server and sends back the URL to the
client applet who then downloads the resource from the web server.
This approach works great for me and is scalable in the manner which I
currently require.

Because the incoming string can be really long and filled with all
kinds of 'filename unfriendly' characters, I simply hash the incoming
string and use that hash to identify if the resource already exists or
needs to be created (If a file with the same name as the hash already
exists, I've handled this exact request before and I send back the URL
to the file - otherwise, create the resource, name the file with the
hash, and send back the URL to the file.)

My initial implementation was under sever time pressure so the applet
didn't do the smart thing and figure out what the hash was and look on
the server without contacting the TCP/IP server - it just always
contacted the TCP/IP server and asked for the resource.

Now, I'd like to move the hashing into the applet to avoid that
potentially unnecessary communication with the server.

The implementation in the server is an RSA's AES implementation
(provided through the operating system) and using the 256-bit SHA
hashing algorithm.

So, after all the introductions, we get to the meat of the issue:

I need to implement this in JAVA 1.1 (yes, we have to support that far
back - 1.1.8 to be exact) and the MSJVM - we are an equal opportunity
group ;).

Anyone know of books/articles/samples that show how to hash long
strings to something comparable to 256-bit hash? It needs to be
usable as a filename, perfectly fine to be a one-way hash, and handle
2048 character strings (worst case) without collisions (or at least
very very very remote chances of collision - in the tens of millions.)

Am I missing some simpler method like CRC32 or something that I
discarded out of hand as being too prone to collisions without
understanding the math, et cetera...?

Very much appreciated,

WTH
 
W

WTH

Hi everybody! (Hi Dr. Nick!)

I've got an applet that sends long strings (usually between 30 and 512
characters - but potentially up to 2048) via HTTP that represent
resource requests to my server (not the web server in this instance, a
TCP/IP server I've written to handle these resource requests) which is
running on the same machine as the web server.  These resource
requests boil down to an individual file whose URL needs to be sent
back to the applet so it can download it.

The server (written in C++) needs to identify that request and
associate it with a previous request which has been fulfilled for
another client applet most likely in the past.  If it has never had a
request exactly like this one, the server creates the required
resource and stores is on the web server and sends back the URL to the
client applet who then downloads the resource from the web server.
This approach works great for me and is scalable in the manner which I
currently require.

Because the incoming string can be really long and filled with all
kinds of 'filename unfriendly' characters, I simply hash the incoming
string and use that hash to identify if the resource already exists or
needs to be created (If a file with the same name as the hash already
exists, I've handled this exact request before and I send back the URL
to the file - otherwise, create the resource, name the file with the
hash, and send back the URL to the file.)

My initial implementation was under sever time pressure so the applet
didn't do the smart thing and figure out what the hash was and look on
the server without contacting the TCP/IP server - it just always
contacted the TCP/IP server and asked for the resource.

Now, I'd like to move the hashing into the applet to avoid that
potentially unnecessary communication with the server.

The implementation in the server is an RSA's AES implementation
(provided through the operating system) and using the 256-bit SHA
hashing algorithm.

So, after all the introductions, we get to the meat of the issue:

I need to implement this in JAVA 1.1 (yes, we have to support that far
back - 1.1.8 to be exact) and the MSJVM - we are an equal opportunity
group ;).

Anyone know of books/articles/samples that show how to hash long
strings to something comparable to 256-bit hash?  It needs to be
usable as a filename, perfectly fine to be a one-way hash, and handle
2048 character strings (worst case) without collisions (or at least
very very very remote chances of collision - in the tens of millions.)

Am I missing some simpler method like CRC32 or something that I
discarded out of hand as being too prone to collisions without
understanding the math, et cetera...?

Very much appreciated,

     WTH

With all that babble, I forgot to mention the important part.

Hashing the string isn't hard, but my concern is that my C/C++ code
will hash differently than my Java code, so I'm really looking for a
self-implemented hashing algorithm I could put on both rather than
hope that SHA-256 will result in the same hash from both sides (across
multiple operating systems) even though it should. Is this a stupid
concern?

WTH
 
T

Tom Anderson

Hashing the string isn't hard, but my concern is that my C/C++ code will
hash differently than my Java code, so I'm really looking for a
self-implemented hashing algorithm I could put on both rather than hope
that SHA-256 will result in the same hash from both sides (across
multiple operating systems) even though it should. Is this a stupid
concern?

Yes. SHA-n, for any particular n, is SHA-n. Interoperability comes as
standard - it would be pretty useless for cryptographic purposes if it
didn't, as Alice and Bob might compute different hashes of the same data,
and shenanigans would ensue. If your implementation is correct - and i bet
you a pint that you can find a correct java implementation in under 30
minutes of googling - you have nothing to fear.

tom
 
D

David

Hi everybody! (Hi Dr. Nick!)

I've got an applet that sends long strings (usually between 30 and 512
characters - but potentially up to 2048) via HTTP that represent
resource requests to my server (not the web server in this instance, a
etc

If all you're doing is grabbing resources from a web-server, have you
considered putting in a solution like HESSIAN or the like that serves
up POJOs via HTTP. Failing that, RESTful frameworks will also let you
do the same thing - serving back data via JSON/HTMP/XML or whatever
you want. You could keep your applets much simpler this way.

David A.
 
L

Lew

Roedy said:
look at "digests". You have to convert them to base 36 to get strings
suitable as filenames.

Depends on the filesystem. Even with the English 26 some file systems will
let you use base 62. Furthermore, on my Ubuntu box I just did:

$ mount
/dev/sda1 on / type ext4
$ touch æsir.txt
$ ls
æsir.txt

so you can convert to bases much higher than 62 if you know you're on one of
those FSes.
 
R

Roedy Green

Depends on the filesystem. Even with the English 26 some file systems will
let you use base 62. Furthermore, on my Ubuntu box I just did:

I was thinking in terms of portable code. Just upper case letters and
digits should be safe. Even for single platform you might want to
restrict the set to avoid punctuation, space etc. in order the make
the generated names easier to read.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,876
Messages
2,569,932
Members
46,207
Latest member
MedallionGreensCBD

Latest Threads

Top