Finding two strings with the same crc32

Discussion in 'Perl Misc' started by Jonas Nilsson, Nov 9, 2004.

  1. Just for fun I would like to find two strings that have the same crc32. The
    probably of doing this for a set of for example one million strings is more
    than 99.9999999999999999999999999999999999999999999999997%.

    To calculate the chance of $n strings leading to all different crc32 is done
    by this code: (assuming randomly distributed crc32 values)

    use strict;
    my $tmp=1;

    my $n=1000000;
    my $ant=2**32;
    for (0..$n) {
    $tmp*=($ant-$_)/$ant;
    print "$_\t$tmp\n" unless ($_%10000);
    }

    If I try to find two string that give the same crc32 however I don't
    succeed. I use the code below to spot strings that have the same first 28
    bits (to save memory I don't store all 32 bits). The code don't find any
    however even after checking 6 million strings!

    Why?

    Sorry for being a bit off topic.

    use strict;
    use Digest::CRC qw(crc32);
    my $string='aaaaa';
    $|=1;
    my $v;

    for (0..11881375) {
    unless ($_%100000) {
    print "$_\t",length($v),"\n";
    }

    my $tmp=crc32($string)>>4;
    if (vec($v,$tmp,1)) {
    print "$string\n";
    }
    vec($v,$tmp,1)=1;
    $string++;
    }
    /jN
    Jonas Nilsson, Nov 9, 2004
    #1
    1. Advertising

  2. Jonas Nilsson

    Mark Shelor Guest

    Jonas Nilsson wrote:

    > If I try to find two string that give the same crc32 however I don't
    > succeed. I use the code below to spot strings that have the same first 28
    > bits (to save memory I don't store all 32 bits). The code don't find any
    > however even after checking 6 million strings!
    >
    > Why?



    It's possible to find crc32 collisions without too much effort. Since
    there are 2**32 possible CRC values, on average you'll need to search
    only O(2**16) strings to find a collision.

    I wrote a tiny Perl program that found the following collision in 91903
    trails:

    $string1 = "aaaaa0.462031558722291"
    $string2 = "aaaaa0.0585754039730588"

    These strings share a common CRC value of 1938556049.

    If you're interested, here's the Perl program I used:

    use strict;
    use warnings;
    use Digest::CRC qw(crc32);

    my $c;
    my $n;
    my $r;
    my %h;
    srand;
    for ($n = 0;; $n++) {
    $r = "aaaaa" . rand();
    $c = crc32($r);
    if ($h{$c}) {
    print "collision: $h{$c} and $r\n";
    print "$n trials\n";
    last;
    }
    $h{$c} = $r;
    }

    Regards, Mark
    Mark Shelor, Nov 9, 2004
    #2
    1. Advertising

  3. "Jonas Nilsson" <> wrote in message
    news:cmq4na$5qm$...
    > Just for fun I would like to find two strings that have the same crc32.

    The
    > probably of doing this for a set of for example one million strings is

    more
    > than 99.9999999999999999999999999999999999999999999999997%.


    I found out that the differences can't be close (within the size of the
    checksum that is ).


    > If I try to find two string that give the same crc32 however I don't
    > succeed. I use the code below to spot strings that have the same first 28
    > bits (to save memory I don't store all 32 bits). The code don't find any
    > however even after checking 6 million strings!
    >


    Modifying my program a bit gave me some other matching strings:

    examples

    Maria has nine red beds.
    Steven has fifteen white tables.
    Both have the crc32 "248210933"

    Joe has fourteen magenta things.
    Lars has thirteen black balls.
    Both have the crc32 "93832682"

    /jN
    Jonas Nilsson, Nov 9, 2004
    #3
  4. "Mark Shelor" <> wrote in message
    news:...
    > Jonas Nilsson wrote:
    >
    > > If I try to find two string that give the same crc32 however I don't
    > > succeed. I use the code below to spot strings that have the same first

    28
    > > bits (to save memory I don't store all 32 bits). The code don't find any
    > > however even after checking 6 million strings!
    > >
    > > Why?

    >
    >
    > It's possible to find crc32 collisions without too much effort. Since
    > there are 2**32 possible CRC values, on average you'll need to search
    > only O(2**16) strings to find a collision.
    >
    > I wrote a tiny Perl program that found the following collision in 91903
    > trails:
    >
    > $string1 = "aaaaa0.462031558722291"
    > $string2 = "aaaaa0.0585754039730588"
    >


    By random searching instead of systematic I quickly found a lot of matches
    (with crc32 below):
    /jN

    oxueekz
    pyqptgs
    1122772949

    nktteme
    qjpatal
    1867444077

    nuktjcj
    bimxkml
    2973957111

    atimtna
    nfxqcvx
    4014160497

    tgnvsia
    kfjcbeh
    550598113

    kmswcnn
    hcdguxq
    4090013392

    vswkuqk
    yafwbir
    2261340929

    yezrovl
    vwknxnu
    2954574600

    cujnfpg
    phhwvrh
    3289759799

    xucsdhf
    gtgfudo
    4164393361

    gqplbmq
    hcapuuh
    656642059

    igeougy
    ytpfssi
    4118505601

    driaytu
    hnomxzs
    2212230278

    iketcbk
    jerdutt
    2479104683

    rpppuko
    mqtedgf
    1986672624

    kvbesjh
    twfpbfa
    2892019439

    evwzaos
    zwsopcz
    3029719102

    ugtegak
    jfppvmb
    3715719613

    iknybbk
    jeyittt
    398238032

    itqfybq
    jzfvotn
    3806847855

    ywouxys
    ukiyywu
    3626703900

    zqlzlzj
    vmjvmtl
    1949335477

    aitjoui
    qzaciay
    3794947975

    vxhmxnl
    yjyqovu
    1882453027

    zpshpoa
    ubbtgwx
    311908063

    exhzdhp
    jjyfspi
    3165437067

    xsdmynd
    knftilk
    2149740729

    tohzrgd
    xsnvsib
    3862892194

    sgkldej
    lfoyuic
    809259157

    izzndbw
    fhkrszn
    2534674167

    kjkkgzg
    xwirwxh
    4108936481

    yvwjyuh
    jkusiwg
    1821358308

    fxtdmbs
    vkamkvc
    4104486534

    ftqxzdh
    ezfhlrw
    2484994613

    hvaledh
    xetecpx
    554199908

    yeissqw
    jxkjcsx
    4294947840

    sdpoqcw
    cwefwwg
    2627502466

    nepvour
    rjcshod
    110262302

    cvivwhc
    pkkogjl
    136192574

    bygmhsd
    qdetxqk
    1195331540

    obwpuxb
    smdurbt
    2466482257

    iwuhnto
    eksdozi
    2396159347

    cjegpcp
    ovckqmv
    861582261

    dpozeev
    hlivdkp
    430397453

    tianznw
    gtcwjlx
    3110265897

    xcsjggh
    hpfcasx
    36665361

    xzxuysm
    kgzliqb
    287371322

    chihdwk
    otodeym
    1558575868

    tnkxzms
    koomkaz
    3085259689

    kceezbm
    tbapknd
    4198087903

    wlwfvnq
    kcdcqtg
    304773531

    oewnsln
    cyqbrbh
    492142473

    tobzbve
    knfoszl
    2490869493

    zxytuar
    yvndcwm
    2991426191

    ecvoads
    zbrzphz
    1591549120

    zgffxfz
    yiqvnpe
    288303136

    twuzdyi
    xksvewo
    596722119

    rfsgnet
    bufnhqd
    756968307

    euuhciw
    jgdttqn
    3764572437

    whprspf
    kgcwtjp
    3048824742

    ymlbinm
    flhwxbd
    308304037

    dupnrrz
    tfegtfj
    1644000365

    axmicjj
    reopshe
    146039524

    djqgnyv
    tydnhmf
    745237997

    nwvmesa
    rxehbiw
    2678562110

    gdsfrzk
    hvbzebr
    3101837067

    zehmtce
    edlxeol
    3451105439

    bzqdqlf
    atftgzy
    1671489439

    silhbdt
    czyadpd
    3505195570

    bzrxqyr
    atehgom
    329271041

    rngqazj
    mocdpvc
    492217567

    gdnftyq
    xejseux
    4065378867

    flodegc
    ymkqtkj
    352398593

    ejidvrz
    jxxxajc
    810447580

    dexsuqr
    tvmzseb
    3499196161

    vyzlbpd
    fjoeddt
    3537376586

    yrhzlkq
    unnvmew
    4017118138

    lxmoaip
    syizpey
    2579702319

    zpcpays
    eqgepuz
    354399221

    azizsuh
    rgkccwg
    3441578396

    qzpwdlz
    bgrntnu
    1152814553

    uwbrthp
    zesncpi
    3104178606

    anhjdzw
    rsjstxx
    736480241

    hhfncss
    gzwrtkj
    2341875164

    jdjqdpe
    vkytcjs
    2096604753

    mceivsy
    qlvlqio
    2468755540

    mykpeip
    rxoetey
    1960566923

    wqjcbhb
    hpnvsdk
    3406243371

    wnflwjz
    kauippl
    2678321277

    jxeiyfg
    yegpidh
    4185286856

    phwijrf
    cuupzpi
    1692899291

    uzhiqpm
    fgjparb
    803662884
    Jonas Nilsson, Nov 9, 2004
    #4
  5. Jonas Nilsson

    Mark Shelor Guest

    Jonas Nilsson wrote:

    > By random searching instead of systematic I quickly found a lot of matches
    > (with crc32 below):
    > /jN
    >
    > oxueekz
    > pyqptgs
    > 1122772949



    A good illustration of the "birthday paradox".

    But, can you come up with an efficient Perl program that finds an input
    string whose CRC value is 1756683253?

    Hint: I'd recommend writing it in C! ;)
    Mark Shelor, Nov 9, 2004
    #5
  6. "Mark Shelor" <> wrote :
    >
    > But, can you come up with an efficient Perl program that finds an input
    > string whose CRC value is 1756683253?
    >
    > Hint: I'd recommend writing it in C! ;)


    Shall I write a Perl-program in C?

    I'm currently running this:

    use strict;
    use Digest::CRC qw(crc32 crc16);
    my $string='aaaaaaa';
    $|=1;

    while (1) {
    if (crc32($string)==1756683253) {
    print $string;
    exit;
    }
    $string++;
    }
    Which finishes after a maximum time of 44 hours. Just wait and see.
    (Or should I start from 'zzzzzzz' working down?)
    /jN
    Jonas Nilsson, Nov 9, 2004
    #6
  7. On Tue, 9 Nov 2004 13:27:43 +0100, "Jonas Nilsson"
    <> wrote:

    >> Hint: I'd recommend writing it in C! ;)

    >
    >Shall I write a Perl-program in C?


    No. But if "shall" (allegedly wittily) means "can", then the answer
    is: yes!


    Michele
    --
    {$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
    (($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^<R<Y]*YB='
    ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER<Z`S(G.DZZ9OX0Z')=~/./g)x2,$_,
    256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,
    Michele Dondi, Nov 9, 2004
    #7
  8. "Jonas Nilsson" <> wrote in message
    news:cmqd3k$7v6$...
    > "Mark Shelor" <> wrote :
    > >
    > > But, can you come up with an efficient Perl program that finds an input
    > > string whose CRC value is 1756683253?
    > >
    > > Hint: I'd recommend writing it in C! ;)

    >
    > Shall I write a Perl-program in C?
    >
    > I'm currently running this:
    >

    <snip code>

    I've run through 2.6 billion lowercase letter combination without a hit. Is
    it so that using only a subset of bytes (like 'a'-'z' == 97 - 122) I will
    only have access to a subset of the CRC32 values? I thougt not. If there
    still is no hit after the weekend I will start to think otherwise...

    /jN

    <Revised code - able to restart from last position>

    use strict;
    use integer;
    use Digest::CRC qw(crc32);
    my $string='a';
    $|=1;
    if (open(FILE,"<crc1756683253.log")) {
    $string=<FILE>;
    chomp($string);
    close(FILE);
    }

    my $t0=time();
    while (1) {
    for (1..500000) {
    if (crc32($string)==1756683253) {
    my $timediff=(time()-$t0);
    print "After $timediff seconds I found this: $string\n";
    if (open(FILE,">>crc1756683253")) {
    print FILE "After $timediff seconds I found this: $string\n";
    close(FILE);
    }
    }
    $string++;
    }
    open(FILE,">crc1756683253.log") or die;
    print FILE "$string\n";
    my $number;
    my $mult=1;
    my $ex=26;
    no integer;
    for (reverse split //,$string) {
    $number+=(ord($_)-96)*$mult;
    $mult*=$ex;
    }
    $number--;
    print FILE "Now I checked $number strings!\n";
    close FILE;
    use integer;
    }
    Jonas Nilsson, Nov 11, 2004
    #8
  9. Jonas Nilsson

    David Filmer Guest

    What about MD5?

    This is interesting. If I may follow up (though I realize it's getting OT; I
    beg your indulgence)...

    What about MD5? I am of the opinion that it is infeaseable to find two
    strings that share the same MD5 digest (without taking a hundred million
    (billion? trillion?) years or so to do it). Would anyone in the group
    disagree?
    David Filmer, Nov 11, 2004
    #9
  10. Jonas Nilsson

    Sam Holden Guest

    Re: What about MD5?

    On Thu, 11 Nov 2004 00:00:53 -0800,
    David Filmer <> wrote:
    > This is interesting. If I may follow up (though I realize it's getting OT; I
    > beg your indulgence)...
    >
    > What about MD5? I am of the opinion that it is infeaseable to find two
    > strings that share the same MD5 digest (without taking a hundred million
    > (billion? trillion?) years or so to do it). Would anyone in the group
    > disagree?


    Yes.

    http://www.tcs.hut.fi/~mjos/md5/

    --
    Sam Holden
    Sam Holden, Nov 11, 2004
    #10
  11. I've got a hit...

    C:\Documents and Settings\jonni>perl -MDigest::CRC
    print Digest::CRC::crc32('mddxzib');
    ^D
    1756683253

    Excited? Naa...
    /jN

    "Jonas Nilsson" <> wrote in message
    news:cmv3rg$phv$...
    > "Jonas Nilsson" <> wrote in message
    > news:cmqd3k$7v6$...
    > > "Mark Shelor" <> wrote :
    > > >
    > > > But, can you come up with an efficient Perl program that finds an

    input
    > > > string whose CRC value is 1756683253?
    > > >
    > > > Hint: I'd recommend writing it in C! ;)

    > >
    > > Shall I write a Perl-program in C?
    > >
    > > I'm currently running this:
    > >

    > <snip code>
    >
    > I've run through 2.6 billion lowercase letter combination without a hit.

    Is
    > it so that using only a subset of bytes (like 'a'-'z' == 97 - 122) I will
    > only have access to a subset of the CRC32 values? I thougt not. If there
    > still is no hit after the weekend I will start to think otherwise...
    >
    > /jN
    >
    > <Revised code - able to restart from last position>
    >
    > use strict;
    > use integer;
    > use Digest::CRC qw(crc32);
    > my $string='a';
    > $|=1;
    > if (open(FILE,"<crc1756683253.log")) {
    > $string=<FILE>;
    > chomp($string);
    > close(FILE);
    > }
    >
    > my $t0=time();
    > while (1) {
    > for (1..500000) {
    > if (crc32($string)==1756683253) {
    > my $timediff=(time()-$t0);
    > print "After $timediff seconds I found this: $string\n";
    > if (open(FILE,">>crc1756683253")) {
    > print FILE "After $timediff seconds I found this: $string\n";
    > close(FILE);
    > }
    > }
    > $string++;
    > }
    > open(FILE,">crc1756683253.log") or die;
    > print FILE "$string\n";
    > my $number;
    > my $mult=1;
    > my $ex=26;
    > no integer;
    > for (reverse split //,$string) {
    > $number+=(ord($_)-96)*$mult;
    > $mult*=$ex;
    > }
    > $number--;
    > print FILE "Now I checked $number strings!\n";
    > close FILE;
    > use integer;
    > }
    >
    >
    Jonas Nilsson, Nov 15, 2004
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. anupam

    vhdl code for crc32 checksum

    anupam, Sep 3, 2004, in forum: VHDL
    Replies:
    4
    Views:
    7,876
    Kai Harrekilde-Petersen
    Sep 6, 2004
  2. Replies:
    9
    Views:
    5,067
    TomoP
    Apr 14, 2009
  3. Mohun Biswas

    java.util.zip.CRC32 uses long??

    Mohun Biswas, Apr 1, 2004, in forum: Java
    Replies:
    1
    Views:
    2,393
    Marco Schmidt
    Apr 1, 2004
  4. Mohun Biswas

    CRC32 during straming gzip

    Mohun Biswas, May 6, 2004, in forum: Java
    Replies:
    6
    Views:
    723
    Roedy Green
    May 7, 2004
  5. palmis

    CRC32

    palmis, Dec 21, 2005, in forum: Java
    Replies:
    16
    Views:
    3,066
    Luc The Perverse
    Dec 22, 2005
Loading...

Share This Page