Hashing function different values on different OS ?

Discussion in 'Java' started by Lawrence, Feb 17, 2007.

  1. Lawrence

    Lawrence Guest

    Hi all, I use a simple function to create a hash of a file using sha
    for
    an utility i'm writing.

    The function is here :
    public static String digest(File file) throws
    FileNotFoundException, IOException, NoSuchAlgorithmException {
    MessageDigest sha;
    sha = MessageDigest.getInstance("sha");
    DigestInputStream din = new DigestInputStream(new
    BufferedInputStream(new FileInputStream(file)),sha);


    while (din.read() != -1){}
    din.close();

    return sha.digest().toString();

    }

    I send a file over a network (LAN) between a mac and a windows
    computer, both using my application.
    I sent zip files, mp3s, jpegs, bmps, txt, tiff, gif, and videos and it
    all worked perfectly, but the
    outcoming hash is different for the same file.
    How weird is that ?Maybe the name of the file matters ?It shouldn't.
     
    Lawrence, Feb 17, 2007
    #1
    1. Advertising

  2. "Lawrence" <> wrote in message
    news:...
    > Hi all, I use a simple function to create a hash of a file using sha
    > for
    > an utility i'm writing.
    >
    > The function is here :
    > public static String digest(File file) throws
    > FileNotFoundException, IOException, NoSuchAlgorithmException {
    > MessageDigest sha;
    > sha = MessageDigest.getInstance("sha");
    > DigestInputStream din = new DigestInputStream(new
    > BufferedInputStream(new FileInputStream(file)),sha);
    >
    >
    > while (din.read() != -1){}
    > din.close();
    >
    > return sha.digest().toString();
    >
    > }
    >
    > I send a file over a network (LAN) between a mac and a windows
    > computer, both using my application.
    > I sent zip files, mp3s, jpegs, bmps, txt, tiff, gif, and videos and it
    > all worked perfectly, but the
    > outcoming hash is different for the same file.
    > How weird is that ?Maybe the name of the file matters ?It shouldn't.


    IF you weren't using java I'd say it could be an endian problem.

    Test small file and dump hex to screen and compare it.

    I'm intrigued ;)

    --
    LTP

    :)
     
    Luc The Perverse, Feb 17, 2007
    #2
    1. Advertising

  3. Lawrence

    Richter~9.6 Guest

    On Feb 17, 5:21 am, "Lawrence" <> wrote:
    > Hi all, I use a simple function to create a hash of a file using sha
    > for
    > an utility i'm writing.
    >
    > The function is here :
    > public static String digest(File file) throws
    > FileNotFoundException, IOException, NoSuchAlgorithmException {
    > MessageDigest sha;
    > sha = MessageDigest.getInstance("sha");
    > DigestInputStream din = new DigestInputStream(new
    > BufferedInputStream(new FileInputStream(file)),sha);
    >
    > while (din.read() != -1){}
    > din.close();
    >
    > return sha.digest().toString();
    >
    > }
    >
    > I send a file over a network (LAN) between a mac and a windows
    > computer, both using my application.
    > I sent zip files, mp3s, jpegs, bmps, txt, tiff, gif, and videos and it
    > all worked perfectly, but the
    > outcoming hash is different for the same file.
    > How weird is that ?Maybe the name of the file matters ?It shouldn't.


    Have you tried zipping up the contents before moving it and unzipping
    it on the target machine?

    Regards,
    Richard
     
    Richter~9.6, Feb 17, 2007
    #3
  4. Lawrence

    Alex Hunsley Guest

    Lawrence wrote:
    > Hi all, I use a simple function to create a hash of a file using sha
    > for
    > an utility i'm writing.
    >
    > The function is here :
    > public static String digest(File file) throws
    > FileNotFoundException, IOException, NoSuchAlgorithmException {
    > MessageDigest sha;
    > sha = MessageDigest.getInstance("sha");
    > DigestInputStream din = new DigestInputStream(new
    > BufferedInputStream(new FileInputStream(file)),sha);
    >
    >
    > while (din.read() != -1){}
    > din.close();
    >
    > return sha.digest().toString();
    >
    > }
    >
    > I send a file over a network (LAN) between a mac and a windows
    > computer, both using my application.


    Like Luc, I was suspecting endian problems for a moment, but Java's
    standard streams assume network byte order (big endian), so Java
    operating at both ends should match up ok.
    Could it be something to do with how MessageDigest may be doing any seeding?
    lex


    > I sent zip files, mp3s, jpegs, bmps, txt, tiff, gif, and videos and it
    > all worked perfectly, but the
    > outcoming hash is different for the same file.
    > How weird is that ?Maybe the name of the file matters ?It shouldn't.
    >
     
    Alex Hunsley, Feb 17, 2007
    #4
  5. Lawrence

    Eric Sosman Guest

    Lawrence wrote:
    > Hi all, I use a simple function to create a hash of a file using sha
    > for
    > an utility i'm writing.
    >
    > The function is here :
    > public static String digest(File file) throws
    > FileNotFoundException, IOException, NoSuchAlgorithmException {
    > MessageDigest sha;
    > sha = MessageDigest.getInstance("sha");
    > DigestInputStream din = new DigestInputStream(new
    > BufferedInputStream(new FileInputStream(file)),sha);
    >
    >
    > while (din.read() != -1){}
    > din.close();
    >
    > return sha.digest().toString();
    >
    > }
    >
    > I send a file over a network (LAN) between a mac and a windows
    > computer, both using my application.
    > I sent zip files, mp3s, jpegs, bmps, txt, tiff, gif, and videos and it
    > all worked perfectly, but the
    > outcoming hash is different for the same file.
    > How weird is that ?Maybe the name of the file matters ?It shouldn't.


    Have you examined the way you "send the file" over the
    network? Note that Mac and Windows use different conventions
    to mark the ends of lines in text files, so "the same" text
    will be represented by different byte sequences on the two
    machines. Transport mechanisms like FTP make the conversion
    automatically, so you may not have noticed it happening.

    --
    Eric Sosman
    lid
     
    Eric Sosman, Feb 17, 2007
    #5
  6. Lawrence

    Paul Tomblin Guest

    In a previous article, Eric Sosman <> said:
    >network? Note that Mac and Windows use different conventions
    >to mark the ends of lines in text files, so "the same" text
    >will be represented by different byte sequences on the two
    >machines. Transport mechanisms like FTP make the conversion
    >automatically, so you may not have noticed it happening.


    Just to expand on that a bit, if you transfer using ftp and tell it that
    the file is ascii, it will convert the ends of lines, and if you tell it
    that it's binary it won't. Some ftp clients auto-detect what you're
    sending and set the binary/ascii flag correctly, but many don't, and if
    you send a binary file without telling it that it's binary, it will end up
    badly corrupted.


    --
    Paul Tomblin <> http://blog.xcski.com/
    The way NT mounts filesystems is something I'd expect to find in a
    barnyard or on a stock-breeding farm.
    -- Mike Andrews
     
    Paul Tomblin, Feb 17, 2007
    #6
  7. Lawrence

    Lawrence Guest

    To answer your question let me explain.
    I transfer the file using my own java program, I use simple chunks of
    bytes and I save them to new files.
    Since both client & server are in java and written by me I believe
    there shoulodn't be
    any endian problem of any sort.
    At the end the program is pretty simple, I make a hash code, i send
    the hash code with some other info
    such as file name and file size, then the clients connects back and
    request the file by sending the hash, i check on
    a hashmap the file, i send it via chunks of bytes.
    I do check that if the chunk is not fulled by the InputStream i write
    only the read data, on both client and server.
    When the transfer is completed the client checks that the file
    received has the same hash that the server initially stated.
    This is always false.
    For any file type.
    But I tried many types and including dmg disk images or rar files,
    jpegs, videos, zip and they all work afterwards.
    I'm going to send a very small file and check on both sides the hex
    prints.
    Will let you know ..

    On Feb 17, 2:54 pm, (Paul Tomblin) wrote:
    > In a previous article, Eric Sosman <> said:
    >
    > >network? Note that Mac and Windows use different conventions
    > >to mark the ends of lines in text files, so "the same" text
    > >will be represented by different byte sequences on the two
    > >machines. Transport mechanisms like FTP make the conversion
    > >automatically, so you may not have noticed it happening.

    >
    > Just to expand on that a bit, if you transfer using ftp and tell it that
    > the file is ascii, it will convert the ends of lines, and if you tell it
    > that it's binary it won't. Some ftp clients auto-detect what you're
    > sending and set the binary/ascii flag correctly, but many don't, and if
    > you send a binary file without telling it that it's binary, it will end up
    > badly corrupted.
     
    Lawrence, Feb 17, 2007
    #7
  8. Lawrence

    Lawrence Guest

    On Feb 17, 5:27 pm, "Lawrence" <> wrote:
    > To answer your question let me explain.
    > I transfer the file using my own java program, I use simple chunks of


    Sorry for the bad quoting before.
    I just tried with a hex editor to open a file send on both sides,
    and they are equal.
    So the problem is in the function.
    For a file that has inside the 4 characters "CIAO" hex [ 43 49 41
    4F ]
    on MAC the hash is [B@425743
    For the same file, on a Windows machine is [B@472d48

    Done again on a mac is [B@238016.
    Done again on the windows machine is [B@3ae941

    I don't understand .. how is this possible ?

    Maybe there is something wrong to having an array of bytes to string ?
    The statement that returns in the method i posed.

    Thanks folks
     
    Lawrence, Feb 17, 2007
    #8
  9. Lawrence wrote:

    > return sha.digest().toString();


    byte[].toString doesn't work the way you think.
    You have to do something like this:

    byte[] digest = sha.digest();
    StringBuffer sb = new StringBuffer();
    for (int i = 0; i < digest.length; i++){
    if ((digest & 0xff) < 16){
    sb.append("0");
    }
    sb.append(Integer.toHexString(digest & 0xff);
    sb.append(" ");
    }
    return sb.toString();

    I wrote this by hand without checking for errors, so the
    correct result might be different.

    BTW: When reading or writing data, don't use Streams or
    Readers/Writers that convert data like PrintStreams
    or InputStreamReader/OutputStreamWriter.


    Regards, Lothar
    --
    Lothar Kimmeringer E-Mail:
    PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

    Always remember: The answer is forty-two, there can only be wrong
    questions!
     
    Lothar Kimmeringer, Feb 17, 2007
    #9
  10. Lawrence

    Lawrence Guest

    On Feb 17, 6:08 pm, Lothar Kimmeringer <>
    wrote:
    > Lawrence wrote:
    > > return sha.digest().toString();

    >
    > byte[].toString doesn't work the way you think.
    > You have to do something like this:
    >
    > byte[] digest = sha.digest();
    > StringBuffer sb = new StringBuffer();
    > for (int i = 0; i < digest.length; i++){
    > if ((digest & 0xff) < 16){
    > sb.append("0");
    > }
    > sb.append(Integer.toHexString(digest & 0xff);
    > sb.append(" ");}
    >
    > return sb.toString();
    >
    > I wrote this by hand without checking for errors, so the
    > correct result might be different.


    Cool, I though that an array to string will always return the same
    value but
    i forgot that arrays are objects that have other things such as
    references when they do
    toString ..
    I will test your code (but I need to have a look back to shift
    operator and bit wise and) :p
     
    Lawrence, Feb 17, 2007
    #10
  11. "Lawrence" <> wrote in message
    news:...
    > On Feb 17, 5:27 pm, "Lawrence" <> wrote:
    >> To answer your question let me explain.
    >> I transfer the file using my own java program, I use simple chunks of

    >
    > Sorry for the bad quoting before.
    > I just tried with a hex editor to open a file send on both sides,
    > and they are equal.
    > So the problem is in the function.
    > For a file that has inside the 4 characters "CIAO" hex [ 43 49 41
    > 4F ]
    > on MAC the hash is [B@425743
    > For the same file, on a Windows machine is [B@472d48
    >
    > Done again on a mac is [B@238016.
    > Done again on the windows machine is [B@3ae941
    >
    > I don't understand .. how is this possible ?


    "[B@425763" means "This is a byte array, and it's object number 425763 in
    the JVM". It doesn't say anything about the contents of the byte array. I
    presume it comes from code like

    byte[] barr.
    System.out.println(barr.toString());

    Try something like

    for (int i = 0; i < barr.length; i++)
    {
    System.out.print(Integer.toHexStrng(barr & 0xFF);
    System.out.print(", ");
    }

    to see what the byte array contains..
     
    Mike Schilling, Feb 17, 2007
    #11
  12. Lawrence

    Lew Guest

    Lothar Kimmeringer wrote:
    >> sb.append(Integer.toHexString(digest & 0xff);


    Lawrence wrote:
    > I need to have a look back to shift operator and bit wise and) :p


    This use of the operator & is called "masking", and the int operand 0xff in
    this example a "mask".

    Only the bits in the other operand that match position with the 1s in the mask
    will make it through to the result. The rest are masked out, as with a resist
    in a circuit-board etching.

    In the given example, the lowest byte of digest will show up in the lowest
    byte of the argument to toHexString(), masked in by the 0xff; the upper bytes
    of the argument will all be zeroed. This has an effect of ensuring a positive
    argument to toHexString().

    - Lew
     
    Lew, Feb 17, 2007
    #12
  13. Lawrence

    Lawrence Guest

    [SNIP]
    >the upper bytes
    > of the argument will all be zeroed. This has an effect of ensuring a positive
    > argument to toHexString().

    [SNIP]
    > - Lew



    Wait.
    I though something different.
    Hex rappresent at most 16 different combinations per digit, so two hex
    digit rappresent 256 combination
    , 8 bits, 1 byte.
    Then it does some kind of implicit conversion applying and bit wise
    operation between
    0xFF which is like a bit string of 8 1s.
    The result should be an number (what, hex or int or even a byte)
    that if is smaller than 16 means it will be of only one digit,
    therefore
    a 0 is added in front of the hex digit.

    Am I wrong ?
     
    Lawrence, Feb 18, 2007
    #13
  14. Lawrence wrote:

    > Hex rappresent at most 16 different combinations per digit, so two hex
    > digit rappresent 256 combination


    Hex represents a value with the base of 16. One "digit" can
    therefore represent numbers from 0 to 15. How many "combinations"
    can be represented depends on the bitlength. Integer (used here)
    can hold 32 Bits, so a Hex-number can be up to 8 Hex-digits
    (aka Nibbles) long.

    > , 8 bits, 1 byte.
    > Then it does some kind of implicit conversion applying and bit wise
    > operation between
    > 0xFF which is like a bit string of 8 1s.


    The usage of the mask has the reason to covert the signed byte
    to an unsigned int-value. Alternatively you can do a
    digest + (digest < 0 ? 256 : 0);
    But this is much more complicated to read and understand what
    is intended to happen here.

    If you don't do this kind of thing and you just do a
    Integer.toHexString((int) digest);
    a set value of e.g. 255 will lead to the hex-value
    of FFFFFFFF to be returned. Why? If you set 255 (0xff)
    to a byte that is signed, the value will be -1 after
    that (that's what 0xff represents). If you just cast
    it to an int, the value still is -1, there are just
    more bits being set (0xffffffff).

    The construct (digest & 0xff) tells the VM, to
    cast digest to int (0xffffffff) and do a logical
    AND with the value 0xff). The result is 0x000000ff,
    which is the same value as being set previously.

    > The result should be an number (what, hex or int or even a byte)
    > that if is smaller than 16 means it will be of only one digit,
    > therefore
    > a 0 is added in front of the hex digit.


    That's the first check. Alternatively the if-statement
    can be if(digest >= 0 && digest < 16) but again
    this is harder to read and understand two weeks later.

    In C you just would use "unsigned byte" (I know byte
    doesn't exist in C but I don't want to start confusing
    things staring to use char here). In Java you always
    have to do these kind of things when handling unsigned
    data with signed types.


    Regards, Lothar
    --
    Lothar Kimmeringer E-Mail:
    PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

    Always remember: The answer is forty-two, there can only be wrong
    questions!
     
    Lothar Kimmeringer, Feb 18, 2007
    #14
  15. Lawrence

    Lew Guest

    Lawrence wrote:
    >> Hex rappresent at most 16 different combinations per digit, so two hex
    >> digit rappresent 256 combination


    Don't confuse a numeric value with the String representation of that value.

    >> Then it does some kind of implicit conversion applying and bit wise
    >> operation between
    >> 0xFF which is like a bit string of 8 1s.


    0xff is a number, equal to 255. It is 32 bits long, not 8. The top 24 bits are 0.

    >> The result should be an number (what, hex or int or even a byte)


    In this case, an int. 0xff is an int, digest is no wider than an int, so
    the result of & is an int.

    >> that if is smaller than 16 means it will be of only one digit,
    >> therefore


    Digits only apply to the String form. The int form is always four bytes long.

    >> a 0 is added in front of the hex digit.


    In the String representation only.

    You need to study types and numeric operations in Java.

    - Lew
     
    Lew, Feb 18, 2007
    #15
  16. Lawrence

    Lawrence Guest

    [SNIP}
    > You need to study types and numeric operations in Java.
    >
    > - Lew


    I do.
    Thank you a lot, all of you.
    At least now I understand how it does it, I hate when I don't.

    Lawrence
     
    Lawrence, Feb 18, 2007
    #16
  17. Lawrence

    Lew Guest

    Lew wrote:
    >> You need to study types and numeric operations in Java.


    Lawrence wrote:
    > I do.
    > Thank you a lot, all of you.
    > At least now I understand how it does it, I hate when I don't.


    I apologize. I should have phrased that advice, "The reasons for this behavior
    are in the definitions of (numeric) types and numeric operations in Java."

    In a nutshell, binary numeric operations perform unary and binary operand
    promotion at various points. Literals like '0xff' have the virtue of
    representing positive int values while looking an awful lot like unsigned byte
    values. This makes them ideal to mask (signed) narrow values into positive
    wider ones.

    Some view Java's snubbing of unsigned bytes as a flaw. That's as may be, but
    it is a reality for good or ill.

    In the world of implicit conversions, be very, very aware.

    Gird your loins and venture into the world of unadorned truth in the Java
    Language Specification (JLS).

    <http://java.sun.com/docs/books/jls/third_edition/html/j3TOC.html>

    Integer literals:
    <http://java.sun.com/docs/books/jls/third_edition/html/lexical.html#3.10.1>

    The integer bitwise operators:
    <http://java.sun.com/docs/books/jls/third_edition/html/expressions.html#5233>

    Numeric promotions:
    <http://java.sun.com/docs/books/jls/third_edition/html/conversions.html#5.6>

    - Lew
     
    Lew, Feb 18, 2007
    #17
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Owen Jacobson
    Replies:
    3
    Views:
    4,627
    CBFalconer
    May 26, 2005
  2. Nate Smith

    Hashing across different types

    Nate Smith, Aug 18, 2004, in forum: Ruby
    Replies:
    5
    Views:
    113
    Gavin Sinclair
    Aug 19, 2004
  3. Nate Smith

    Hashing across different types

    Nate Smith, Aug 18, 2004, in forum: Ruby
    Replies:
    0
    Views:
    92
    Nate Smith
    Aug 18, 2004
  4. Brian Schröder

    Hashing VALUES to C-Structs

    Brian Schröder, Aug 28, 2005, in forum: Ruby
    Replies:
    1
    Views:
    87
    Joel VanderWerf
    Aug 28, 2005
  5. Adam Adam
    Replies:
    8
    Views:
    120
Loading...

Share This Page