Hashing function different values on different OS ?

L

Lawrence

Hi all, I use a simple function to create a hash of a file using sha
for
an utility i'm writing.

The function is here :
public static String digest(File file) throws
FileNotFoundException, IOException, NoSuchAlgorithmException {
MessageDigest sha;
sha = MessageDigest.getInstance("sha");
DigestInputStream din = new DigestInputStream(new
BufferedInputStream(new FileInputStream(file)),sha);


while (din.read() != -1){}
din.close();

return sha.digest().toString();

}

I send a file over a network (LAN) between a mac and a windows
computer, both using my application.
I sent zip files, mp3s, jpegs, bmps, txt, tiff, gif, and videos and it
all worked perfectly, but the
outcoming hash is different for the same file.
How weird is that ?Maybe the name of the file matters ?It shouldn't.
 
L

Luc The Perverse

Lawrence said:
Hi all, I use a simple function to create a hash of a file using sha
for
an utility i'm writing.

The function is here :
public static String digest(File file) throws
FileNotFoundException, IOException, NoSuchAlgorithmException {
MessageDigest sha;
sha = MessageDigest.getInstance("sha");
DigestInputStream din = new DigestInputStream(new
BufferedInputStream(new FileInputStream(file)),sha);


while (din.read() != -1){}
din.close();

return sha.digest().toString();

}

I send a file over a network (LAN) between a mac and a windows
computer, both using my application.
I sent zip files, mp3s, jpegs, bmps, txt, tiff, gif, and videos and it
all worked perfectly, but the
outcoming hash is different for the same file.
How weird is that ?Maybe the name of the file matters ?It shouldn't.

IF you weren't using java I'd say it could be an endian problem.

Test small file and dump hex to screen and compare it.

I'm intrigued ;)
 
R

Richter~9.6

Hi all, I use a simple function to create a hash of a file using sha
for
an utility i'm writing.

The function is here :
public static String digest(File file) throws
FileNotFoundException, IOException, NoSuchAlgorithmException {
MessageDigest sha;
sha = MessageDigest.getInstance("sha");
DigestInputStream din = new DigestInputStream(new
BufferedInputStream(new FileInputStream(file)),sha);

while (din.read() != -1){}
din.close();

return sha.digest().toString();

}

I send a file over a network (LAN) between a mac and a windows
computer, both using my application.
I sent zip files, mp3s, jpegs, bmps, txt, tiff, gif, and videos and it
all worked perfectly, but the
outcoming hash is different for the same file.
How weird is that ?Maybe the name of the file matters ?It shouldn't.

Have you tried zipping up the contents before moving it and unzipping
it on the target machine?

Regards,
Richard
 
A

Alex Hunsley

Lawrence said:
Hi all, I use a simple function to create a hash of a file using sha
for
an utility i'm writing.

The function is here :
public static String digest(File file) throws
FileNotFoundException, IOException, NoSuchAlgorithmException {
MessageDigest sha;
sha = MessageDigest.getInstance("sha");
DigestInputStream din = new DigestInputStream(new
BufferedInputStream(new FileInputStream(file)),sha);


while (din.read() != -1){}
din.close();

return sha.digest().toString();

}

I send a file over a network (LAN) between a mac and a windows
computer, both using my application.

Like Luc, I was suspecting endian problems for a moment, but Java's
standard streams assume network byte order (big endian), so Java
operating at both ends should match up ok.
Could it be something to do with how MessageDigest may be doing any seeding?
lex
 
E

Eric Sosman

Lawrence said:
Hi all, I use a simple function to create a hash of a file using sha
for
an utility i'm writing.

The function is here :
public static String digest(File file) throws
FileNotFoundException, IOException, NoSuchAlgorithmException {
MessageDigest sha;
sha = MessageDigest.getInstance("sha");
DigestInputStream din = new DigestInputStream(new
BufferedInputStream(new FileInputStream(file)),sha);


while (din.read() != -1){}
din.close();

return sha.digest().toString();

}

I send a file over a network (LAN) between a mac and a windows
computer, both using my application.
I sent zip files, mp3s, jpegs, bmps, txt, tiff, gif, and videos and it
all worked perfectly, but the
outcoming hash is different for the same file.
How weird is that ?Maybe the name of the file matters ?It shouldn't.

Have you examined the way you "send the file" over the
network? Note that Mac and Windows use different conventions
to mark the ends of lines in text files, so "the same" text
will be represented by different byte sequences on the two
machines. Transport mechanisms like FTP make the conversion
automatically, so you may not have noticed it happening.
 
P

Paul Tomblin

In a previous article said:
network? Note that Mac and Windows use different conventions
to mark the ends of lines in text files, so "the same" text
will be represented by different byte sequences on the two
machines. Transport mechanisms like FTP make the conversion
automatically, so you may not have noticed it happening.

Just to expand on that a bit, if you transfer using ftp and tell it that
the file is ascii, it will convert the ends of lines, and if you tell it
that it's binary it won't. Some ftp clients auto-detect what you're
sending and set the binary/ascii flag correctly, but many don't, and if
you send a binary file without telling it that it's binary, it will end up
badly corrupted.
 
L

Lawrence

To answer your question let me explain.
I transfer the file using my own java program, I use simple chunks of
bytes and I save them to new files.
Since both client & server are in java and written by me I believe
there shoulodn't be
any endian problem of any sort.
At the end the program is pretty simple, I make a hash code, i send
the hash code with some other info
such as file name and file size, then the clients connects back and
request the file by sending the hash, i check on
a hashmap the file, i send it via chunks of bytes.
I do check that if the chunk is not fulled by the InputStream i write
only the read data, on both client and server.
When the transfer is completed the client checks that the file
received has the same hash that the server initially stated.
This is always false.
For any file type.
But I tried many types and including dmg disk images or rar files,
jpegs, videos, zip and they all work afterwards.
I'm going to send a very small file and check on both sides the hex
prints.
Will let you know ..
 
L

Lawrence

To answer your question let me explain.
I transfer the file using my own java program, I use simple chunks of

Sorry for the bad quoting before.
I just tried with a hex editor to open a file send on both sides,
and they are equal.
So the problem is in the function.
For a file that has inside the 4 characters "CIAO" hex [ 43 49 41
4F ]
on MAC the hash is [B@425743
For the same file, on a Windows machine is [B@472d48

Done again on a mac is [B@238016.
Done again on the windows machine is [B@3ae941

I don't understand .. how is this possible ?

Maybe there is something wrong to having an array of bytes to string ?
The statement that returns in the method i posed.

Thanks folks
 
L

Lothar Kimmeringer

Lawrence said:
return sha.digest().toString();

byte[].toString doesn't work the way you think.
You have to do something like this:

byte[] digest = sha.digest();
StringBuffer sb = new StringBuffer();
for (int i = 0; i < digest.length; i++){
if ((digest & 0xff) < 16){
sb.append("0");
}
sb.append(Integer.toHexString(digest & 0xff);
sb.append(" ");
}
return sb.toString();

I wrote this by hand without checking for errors, so the
correct result might be different.

BTW: When reading or writing data, don't use Streams or
Readers/Writers that convert data like PrintStreams
or InputStreamReader/OutputStreamWriter.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
L

Lawrence

Lawrence said:
return sha.digest().toString();

byte[].toString doesn't work the way you think.
You have to do something like this:

byte[] digest = sha.digest();
StringBuffer sb = new StringBuffer();
for (int i = 0; i < digest.length; i++){
if ((digest & 0xff) < 16){
sb.append("0");
}
sb.append(Integer.toHexString(digest & 0xff);
sb.append(" ");}

return sb.toString();

I wrote this by hand without checking for errors, so the
correct result might be different.


Cool, I though that an array to string will always return the same
value but
i forgot that arrays are objects that have other things such as
references when they do
toString ..
I will test your code (but I need to have a look back to shift
operator and bit wise and) :p
 
M

Mike Schilling

Lawrence said:
To answer your question let me explain.
I transfer the file using my own java program, I use simple chunks of

Sorry for the bad quoting before.
I just tried with a hex editor to open a file send on both sides,
and they are equal.
So the problem is in the function.
For a file that has inside the 4 characters "CIAO" hex [ 43 49 41
4F ]
on MAC the hash is [B@425743
For the same file, on a Windows machine is [B@472d48

Done again on a mac is [B@238016.
Done again on the windows machine is [B@3ae941

I don't understand .. how is this possible ?

"[B@425763" means "This is a byte array, and it's object number 425763 in
the JVM". It doesn't say anything about the contents of the byte array. I
presume it comes from code like

byte[] barr.
System.out.println(barr.toString());

Try something like

for (int i = 0; i < barr.length; i++)
{
System.out.print(Integer.toHexStrng(barr & 0xFF);
System.out.print(", ");
}

to see what the byte array contains..
 
L

Lew

Lothar said:
sb.append(Integer.toHexString(digest & 0xff);

I need to have a look back to shift operator and bit wise and) :p

This use of the operator & is called "masking", and the int operand 0xff in
this example a "mask".

Only the bits in the other operand that match position with the 1s in the mask
will make it through to the result. The rest are masked out, as with a resist
in a circuit-board etching.

In the given example, the lowest byte of digest will show up in the lowest
byte of the argument to toHexString(), masked in by the 0xff; the upper bytes
of the argument will all be zeroed. This has an effect of ensuring a positive
argument to toHexString().

- Lew
 
L

Lawrence

[SNIP]
the upper bytes
of the argument will all be zeroed. This has an effect of ensuring a positive
argument to toHexString(). [SNIP]
- Lew


Wait.
I though something different.
Hex rappresent at most 16 different combinations per digit, so two hex
digit rappresent 256 combination
, 8 bits, 1 byte.
Then it does some kind of implicit conversion applying and bit wise
operation between
0xFF which is like a bit string of 8 1s.
The result should be an number (what, hex or int or even a byte)
that if is smaller than 16 means it will be of only one digit,
therefore
a 0 is added in front of the hex digit.

Am I wrong ?
 
L

Lothar Kimmeringer

Lawrence said:
Hex rappresent at most 16 different combinations per digit, so two hex
digit rappresent 256 combination

Hex represents a value with the base of 16. One "digit" can
therefore represent numbers from 0 to 15. How many "combinations"
can be represented depends on the bitlength. Integer (used here)
can hold 32 Bits, so a Hex-number can be up to 8 Hex-digits
(aka Nibbles) long.
, 8 bits, 1 byte.
Then it does some kind of implicit conversion applying and bit wise
operation between
0xFF which is like a bit string of 8 1s.

The usage of the mask has the reason to covert the signed byte
to an unsigned int-value. Alternatively you can do a
digest + (digest < 0 ? 256 : 0);
But this is much more complicated to read and understand what
is intended to happen here.

If you don't do this kind of thing and you just do a
Integer.toHexString((int) digest);
a set value of e.g. 255 will lead to the hex-value
of FFFFFFFF to be returned. Why? If you set 255 (0xff)
to a byte that is signed, the value will be -1 after
that (that's what 0xff represents). If you just cast
it to an int, the value still is -1, there are just
more bits being set (0xffffffff).

The construct (digest & 0xff) tells the VM, to
cast digest to int (0xffffffff) and do a logical
AND with the value 0xff). The result is 0x000000ff,
which is the same value as being set previously.
The result should be an number (what, hex or int or even a byte)
that if is smaller than 16 means it will be of only one digit,
therefore
a 0 is added in front of the hex digit.

That's the first check. Alternatively the if-statement
can be if(digest >= 0 && digest < 16) but again
this is harder to read and understand two weeks later.

In C you just would use "unsigned byte" (I know byte
doesn't exist in C but I don't want to start confusing
things staring to use char here). In Java you always
have to do these kind of things when handling unsigned
data with signed types.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
L

Lew

Don't confuse a numeric value with the String representation of that value.

0xff is a number, equal to 255. It is 32 bits long, not 8. The top 24 bits are 0.

In this case, an int. 0xff is an int, digest is no wider than an int, so
the result of & is an int.

Digits only apply to the String form. The int form is always four bytes long.

In the String representation only.

You need to study types and numeric operations in Java.

- Lew
 
L

Lawrence

[SNIP}
You need to study types and numeric operations in Java.

- Lew

I do.
Thank you a lot, all of you.
At least now I understand how it does it, I hate when I don't.

Lawrence
 
L

Lew

I do.
Thank you a lot, all of you.
At least now I understand how it does it, I hate when I don't.

I apologize. I should have phrased that advice, "The reasons for this behavior
are in the definitions of (numeric) types and numeric operations in Java."

In a nutshell, binary numeric operations perform unary and binary operand
promotion at various points. Literals like '0xff' have the virtue of
representing positive int values while looking an awful lot like unsigned byte
values. This makes them ideal to mask (signed) narrow values into positive
wider ones.

Some view Java's snubbing of unsigned bytes as a flaw. That's as may be, but
it is a reality for good or ill.

In the world of implicit conversions, be very, very aware.

Gird your loins and venture into the world of unadorned truth in the Java
Language Specification (JLS).

<http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html>

Integer literals:
<http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.1>

The integer bitwise operators:
<http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#5233>

Numeric promotions:
<http://java.sun.com/docs/books/jls/third_edition/html/conversions.html#5.6>

- Lew
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,482
Members
44,901
Latest member
Noble71S45

Latest Threads

Top