Reverse engineering CRC?

Discussion in 'Python' started by Gregory Ewing, Mar 8, 2010.

  1. Given some known data/crc pairs, how feasible is it to
    figure out the polynomial being used to generate the crc?

    In the case I'm looking at, it appears that the crc
    size may be at least 24 bits, so just trying all possible
    polynomials probably isn't doable.

    An article I found hints at the possibility of using
    GCDs to make the search more efficient, but doesn't go
    into any details. Anyone know of any literature about
    this?

    If it helps, I have the ability to generate test cases
    with known message contents to some extent, although
    I don't have complete control over the contents. Also
    it's a manual process, so generating large numbers of
    them automatically isn't an option.

    --
    Greg
     
    Gregory Ewing, Mar 8, 2010
    #1
    1. Advertising

  2. On Mon, 08 Mar 2010 16:09:12 +1300, Gregory Ewing wrote:

    > Given some known data/crc pairs, how feasible is it to figure out the
    > polynomial being used to generate the crc?


    Can you just ask the application developer what CRC is being used? Or
    look at the source code? Disassemble the binary?


    > In the case I'm looking at, it appears that the crc size may be at least
    > 24 bits, so just trying all possible polynomials probably isn't doable.


    "At least"? Can't you tell by looking at them?

    A good place to start is here:

    http://en.wikipedia.org/wiki/Cyclic_redundancy_check
    http://en.wikipedia.org/wiki/List_of_checksum_algorithms

    You can at least eliminate known CRCs. There doesn't appear to be any 24-
    bit CRCs in common use, and very few other 24-bit checksums either, so
    you're probably looking at a custom CRC.


    > An article I found hints at the possibility of using GCDs to make the
    > search more efficient, but doesn't go into any details. Anyone know of
    > any literature about this?


    If you're reduced to trying to determine the polynomial from a sample of
    checksums, that's a curve fitting problem. There are various ways to fit
    polynomials through a set of known points, but as far as I know none of
    them are designed for reverse-engineering CRCs.

    You may be able to find something about curve-fitting using discrete
    maths (i.e. where all values are limited to integers) or with constraints.

    You should probably take this to somewhere like sci.crypt. Here's a
    thread from a few years back which might give you some hints:

    http://www.derkeiler.com/Newsgroups/sci.crypt/2008-07/msg00035.html

    Or here:

    http://stackoverflow.com/questions/401231/determining-crc-algorithm-from-
    data-crc-embedded-application


    --
    Steven
     
    Steven D'Aprano, Mar 8, 2010
    #2
    1. Advertising

  3. On Mon, 08 Mar 2010 16:09:12 +1300, Gregory Ewing wrote:

    > Given some known data/crc pairs, how feasible is it to figure out the
    > polynomial being used to generate the crc?


    Google is your friend:

    http://www.woodmann.com/fravia/crctut1.htm



    --
    Steven
     
    Steven D'Aprano, Mar 8, 2010
    #3
  4. Gregory Ewing

    Dave Angel Guest

    Steven D'Aprano wrote:
    > On Mon, 08 Mar 2010 16:09:12 +1300, Gregory Ewing wrote:
    >
    >
    >> Given some known data/crc pairs, how feasible is it to figure out the
    >> polynomial being used to generate the crc?
    >>

    >
    > Google is your friend:
    >
    > http://www.woodmann.com/fravia/crctut1.htm
    >
    >

    That page was interesting to read, especially since I've implemented the
    three algorithms - CRC16, CRC32, and the reversed version of CRC16, all
    in the long-distant past.

    However, if there's anything in there about how to derive the polynomial
    algorithm from (a few) samples I missed it entirely. Instead, what it
    calls reverse engineering is figuring out how to modify a message to
    force it to have a desired CRC value (when the CRC polynomial is already
    known).

    DaveA
     
    Dave Angel, Mar 8, 2010
    #4
  5. Steven D'Aprano wrote:

    > Can you just ask the application developer what CRC is being used? Or
    > look at the source code? Disassemble the binary?


    There's no source, and the binary is enormous. I could ask,
    but I wouldn't hold out much hope of them being willing to
    tell me.

    >>it appears that the crc size may be at least
    >>24 bits, so just trying all possible polynomials probably isn't doable.

    >
    > "At least"? Can't you tell by looking at them?


    It's not entirely clear exactly which bytes are part of the
    CRC. There are 3 adjacent bytes in the header of the file
    that change when I modify the contents, which led me to
    think it was a 24-bit CRC. But I now believe that one of
    them is not part of the CRC, and it's actually 16 bits.

    Using pycrc, I've now tried all possible 16-bit polynomials,
    with various combinations of bit and byte reversal, but I
    haven't found one that works consistently, so I'm wondering
    whether it's using some non-standard algorithm.

    --
    Greg
     
    Gregory Ewing, Mar 8, 2010
    #5
  6. Gregory Ewing

    Dave Angel Guest

    Gregory Ewing wrote:
    > Steven D'Aprano wrote:
    >
    >> Can you just ask the application developer what CRC is being used? Or
    >> look at the source code? Disassemble the binary?

    >
    > There's no source, and the binary is enormous. I could ask,
    > but I wouldn't hold out much hope of them being willing to
    > tell me.
    >
    >>> it appears that the crc size may be at least
    >>> 24 bits, so just trying all possible polynomials probably isn't doable.

    >>
    >> "At least"? Can't you tell by looking at them?

    >
    > It's not entirely clear exactly which bytes are part of the
    > CRC. There are 3 adjacent bytes in the header of the file
    > that change when I modify the contents, which led me to
    > think it was a 24-bit CRC. But I now believe that one of
    > them is not part of the CRC, and it's actually 16 bits.
    >
    > Using pycrc, I've now tried all possible 16-bit polynomials,
    > with various combinations of bit and byte reversal, but I
    > haven't found one that works consistently, so I'm wondering
    > whether it's using some non-standard algorithm.
    >

    Or even some other standard algorithm. If you know so little about the
    value, how do you even know it's a CRC ? Could it be a ones-complement
    sum, such as used in Ethernet?

    Is the problem really worth it? The possibilities are practically
    unbounded. And if the developer is really determined to make it
    difficult, they could be doing multiple passes over the data, in which
    case probably disassembly (or subtle debug tracing) may be your best bet.

    DaveA

    Dave
     
    Dave Angel, Mar 8, 2010
    #6
  7. Dave Angel wrote:
    > If you know so little about the
    > value, how do you even know it's a CRC ? Could it be a ones-complement
    > sum, such as used in Ethernet?


    I'm going by the fact that the application reports a
    "CRC mismatch" when it's wrong. I can't be sure that what
    it calls a "CRC" is really a true CRC, but it's more than
    a simple sum, because changing one bit in the file results
    in a completely different value.

    > Is the problem really worth it?


    Probably not -- it looks like fixing the problem I'm trying
    to fix by hand will be faster in the long run. I just thought
    it might turn out to be a solved problem and someone could
    point me to an algorithm for it, but it seems not.

    --
    Greg
     
    Gregory Ewing, Mar 8, 2010
    #7
  8. Gregory Ewing

    Dave Angel Guest

    Gregory Ewing wrote:
    > <div class="moz-text-flowed" style="font-family: -moz-fixed">Dave
    > Angel wrote:
    >> If you know so little about the value, how do you even know it's a
    >> CRC ? Could it be a ones-complement sum, such as used in Ethernet?

    >
    > I'm going by the fact that the application reports a
    > "CRC mismatch" when it's wrong. I can't be sure that what
    > it calls a "CRC" is really a true CRC, but it's more than
    > a simple sum, because changing one bit in the file results
    > in a completely different value.
    >
    >> Is the problem really worth it?

    >
    > Probably not -- it looks like fixing the problem I'm trying
    > to fix by hand will be faster in the long run. I just thought
    > it might turn out to be a solved problem and someone could
    > point me to an algorithm for it, but it seems not.
    >

    If you assume it's done in a single pass, and you know which byte is the
    end of the buffer, I'd think you could learn a lot by just tweaking that
    last byte. But I still think you'd want to automate your testing,
    presumably by some keystroke-stuffer.

    DaveA
     
    Dave Angel, Mar 8, 2010
    #8
  9. Dave Angel wrote:

    > If you assume it's done in a single pass, and you know which byte is the
    > end of the buffer, I'd think you could learn a lot by just tweaking that
    > last byte.


    I'm sure I would, but unfortunately I can't control the
    last byte. The bytes that I can influence are some distance
    back from the end of the data.

    --
    Greg
     
    Gregory Ewing, Mar 10, 2010
    #9
  10. Gregory Ewing

    Steve Howell Guest

    On Mar 7, 7:09 pm, Gregory Ewing <> wrote:
    > Given some known data/crc pairs, how feasible is it to
    > figure out the polynomial being used to generate the crc?
    >
    > In the case I'm looking at, it appears that the crc
    > size may be at least 24 bits, so just trying all possible
    > polynomials probably isn't doable.
    >
    > An article I found hints at the possibility of using
    > GCDs to make the search more efficient, but doesn't go
    > into any details. Anyone know of any literature about
    > this?
    >
    > If it helps, I have the ability to generate test cases
    > with known message contents to some extent, although
    > I don't have complete control over the contents. Also
    > it's a manual process, so generating large numbers of
    > them automatically isn't an option.
    >



    Hi Greg. I would at least flip one bit at a time on the first byte of
    your data to see if the transformation is bitwise.
     
    Steve Howell, Mar 10, 2010
    #10
  11. Steve Howell wrote:

    > Hi Greg. I would at least flip one bit at a time on the first byte of
    > your data to see if the transformation is bitwise.


    I'm actually making good progress on this -- it turns out
    there *is* a way of deducing the polynomial by looking at
    the effect of single-bit flips. It's actually quite simple,
    with no brute-force searching needed at all.

    Things get a bit tricky when you don't quite know all
    of the data that goes into the CRC, though, which seems
    to be the case here...

    I'm writing up an essay on my experiences. I'll post a
    link when it's finished.

    --
    Greg
     
    Gregory Ewing, Mar 11, 2010
    #11
  12. In message <>, Dave Angel
    wrote:

    > However, if there's anything in there about how to derive the polynomial
    > algorithm from (a few) samples I missed it entirely.


    Given that CRC is all just a sequence of xor operations, what happens if you
    xor various pairs of CRCs together, wouldn’t that cancel out at least parts
    of the operations?
     
    Lawrence D'Oliveiro, Mar 12, 2010
    #12
  13. In message <>, Gregory Ewing wrote:

    > I'm going by the fact that the application reports a
    > "CRC mismatch" when it's wrong. I can't be sure that what
    > it calls a "CRC" is really a true CRC, but it's more than
    > a simple sum, because changing one bit in the file results
    > in a completely different value.


    They could be using a strong cryptographic hash and truncating it to 16 bits
    or something.

    In which case you’ve got your work cut out for you...
     
    Lawrence D'Oliveiro, Mar 12, 2010
    #13
  14. Lawrence D'Oliveiro wrote:

    > They could be using a strong cryptographic hash and truncating it to 16 bits
    > or something.
    >
    > In which case you’ve got your work cut out for you...


    Nope, I've determined that it's actually a pretty standard
    CRC, and it's even using one of the standard polynomials,
    0x8005. I'll explain the details of how I figured that
    out in my essay.

    What confused me initially is that it seems to be adding
    a few extra bytes to the checked data that aren't present
    in the file. Figuring out what they're supposed to contain
    is proving to be quite a headache...

    --
    Greg
     
    Gregory Ewing, Mar 12, 2010
    #14
  15. On 3/12/2010 3:24 AM Gregory Ewing said...
    > What confused me initially is that it seems to be adding
    > a few extra bytes to the checked data that aren't present
    > in the file. Figuring out what they're supposed to contain
    > is proving to be quite a headache...


    Length?

    Emile
     
    Emile van Sebille, Mar 12, 2010
    #15
  16. In article <>,
    Gregory Ewing <> wrote:
    >Given some known data/crc pairs, how feasible is it to
    >figure out the polynomial being used to generate the crc?
    >
    >In the case I'm looking at, it appears that the crc
    >size may be at least 24 bits, so just trying all possible
    >polynomials probably isn't doable.
    >
    >An article I found hints at the possibility of using
    >GCDs to make the search more efficient, but doesn't go
    >into any details. Anyone know of any literature about
    >this?
    >
    >If it helps, I have the ability to generate test cases
    >with known message contents to some extent, although
    >I don't have complete control over the contents. Also
    >it's a manual process, so generating large numbers of
    >them automatically isn't an option.


    If it is really a CRC, it is doable.

    You can have an indication, if the intention is to detect
    machine errors (transmission or disk errors) or they want
    you to prevent tampering with the file.
    In the latter case it may be a one-way hash. Then it is near
    impossible, as this is the design criterion for a one-way hash.

    >--
    >Greg


    Groetjes Albert

    --
    --
    Albert van der Horst, UTRECHT,THE NETHERLANDS
    Economic growth -- being exponential -- ultimately falters.
    albert@spe&ar&c.xs4all.nl &=n http://home.hccnet.nl/a.w.m.van.der.horst
     
    Albert van der Horst, Mar 13, 2010
    #16
  17. I've solved the problem now.

    It turned out to be a very standard CRC algorithm, complicated
    by the presence of a few extra bytes of data being checked that
    didn't appear explicitly in the file anywhere.

    In the process I developed some very general techniques for
    solving this kind of problem, which I've written about here
    if anyone's interested:

    http://www.cosc.canterbury.ac.nz/greg.ewing/essays/CRC-Reverse-Engineering.html

    Thanks for everyone's help,
    Greg
     
    Gregory Ewing, Mar 15, 2010
    #17
  18. Gregory Ewing

    jkn Guest

    Hi Greg
    Just to say thanks for taking the time to write up your work on
    this interesting topic.

    Cheers
    J^n
     
    jkn, Mar 15, 2010
    #18
  19. On Mon, Mar 15, 2010 at 6:29 AM, Gregory Ewing
    <> wrote:
    > I've solved the problem now.
    >
    > It turned out to be a very standard CRC algorithm, complicated
    > by the presence of a few extra bytes of data being checked that
    > didn't appear explicitly in the file anywhere.
    >
    > In the process I developed some very general techniques for
    > solving this kind of problem, which I've written about here
    > if anyone's interested:
    >
    > http://www.cosc.canterbury.ac.nz/greg.ewing/essays/CRC-Reverse-Engineering.html
    >
    > Thanks for everyone's help,
    > Greg


    Nice writeup, thanks.

    Geremy Condra
     
    geremy condra, Mar 15, 2010
    #19
  20. En Mon, 15 Mar 2010 07:29:51 -0300, Gregory Ewing
    <> escribió:

    > I've solved the problem now.
    >
    > It turned out to be a very standard CRC algorithm, complicated
    > by the presence of a few extra bytes of data being checked that
    > didn't appear explicitly in the file anywhere.
    >
    > In the process I developed some very general techniques for
    > solving this kind of problem, which I've written about here
    > if anyone's interested:
    >
    > http://www.cosc.canterbury.ac.nz/greg.ewing/essays/CRC-Reverse-Engineering.html


    A good solution to an interesting problem - and very nicely explained too!

    --
    Gabriel Genellina
     
    Gabriel Genellina, Mar 16, 2010
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    0
    Views:
    869
  2. Replies:
    0
    Views:
    627
  3. Mamut

    crc-8 and crc-16 code...

    Mamut, Feb 21, 2007, in forum: C++
    Replies:
    5
    Views:
    4,061
    Victor Bazarov
    Feb 22, 2007
  4. `Zidane Tribal
    Replies:
    1
    Views:
    2,521
    Joe Smith
    Jul 28, 2007
  5. `Zidane Tribal
    Replies:
    3
    Views:
    258
    Sisyphus
    Jul 27, 2007
Loading...

Share This Page