Finding two strings with the same crc32

J

Jonas Nilsson

Just for fun I would like to find two strings that have the same crc32. The
probably of doing this for a set of for example one million strings is more
than 99.9999999999999999999999999999999999999999999999997%.

To calculate the chance of $n strings leading to all different crc32 is done
by this code: (assuming randomly distributed crc32 values)

use strict;
my $tmp=1;

my $n=1000000;
my $ant=2**32;
for (0..$n) {
$tmp*=($ant-$_)/$ant;
print "$_\t$tmp\n" unless ($_%10000);
}

If I try to find two string that give the same crc32 however I don't
succeed. I use the code below to spot strings that have the same first 28
bits (to save memory I don't store all 32 bits). The code don't find any
however even after checking 6 million strings!

Why?

Sorry for being a bit off topic.

use strict;
use Digest::CRC qw(crc32);
my $string='aaaaa';
$|=1;
my $v;

for (0..11881375) {
unless ($_%100000) {
print "$_\t",length($v),"\n";
}

my $tmp=crc32($string)>>4;
if (vec($v,$tmp,1)) {
print "$string\n";
}
vec($v,$tmp,1)=1;
$string++;
}
/jN
 
M

Mark Shelor

Jonas said:
If I try to find two string that give the same crc32 however I don't
succeed. I use the code below to spot strings that have the same first 28
bits (to save memory I don't store all 32 bits). The code don't find any
however even after checking 6 million strings!

Why?


It's possible to find crc32 collisions without too much effort. Since
there are 2**32 possible CRC values, on average you'll need to search
only O(2**16) strings to find a collision.

I wrote a tiny Perl program that found the following collision in 91903
trails:

$string1 = "aaaaa0.462031558722291"
$string2 = "aaaaa0.0585754039730588"

These strings share a common CRC value of 1938556049.

If you're interested, here's the Perl program I used:

use strict;
use warnings;
use Digest::CRC qw(crc32);

my $c;
my $n;
my $r;
my %h;
srand;
for ($n = 0;; $n++) {
$r = "aaaaa" . rand();
$c = crc32($r);
if ($h{$c}) {
print "collision: $h{$c} and $r\n";
print "$n trials\n";
last;
}
$h{$c} = $r;
}

Regards, Mark
 
J

Jonas Nilsson

Jonas Nilsson said:
Just for fun I would like to find two strings that have the same crc32. The
probably of doing this for a set of for example one million strings is more
than 99.9999999999999999999999999999999999999999999999997%.

I found out that the differences can't be close (within the size of the
checksum that is ).

If I try to find two string that give the same crc32 however I don't
succeed. I use the code below to spot strings that have the same first 28
bits (to save memory I don't store all 32 bits). The code don't find any
however even after checking 6 million strings!

Modifying my program a bit gave me some other matching strings:

examples

Maria has nine red beds.
Steven has fifteen white tables.
Both have the crc32 "248210933"

Joe has fourteen magenta things.
Lars has thirteen black balls.
Both have the crc32 "93832682"

/jN
 
J

Jonas Nilsson

Mark Shelor said:
It's possible to find crc32 collisions without too much effort. Since
there are 2**32 possible CRC values, on average you'll need to search
only O(2**16) strings to find a collision.

I wrote a tiny Perl program that found the following collision in 91903
trails:

$string1 = "aaaaa0.462031558722291"
$string2 = "aaaaa0.0585754039730588"

By random searching instead of systematic I quickly found a lot of matches
(with crc32 below):
/jN

oxueekz
pyqptgs
1122772949

nktteme
qjpatal
1867444077

nuktjcj
bimxkml
2973957111

atimtna
nfxqcvx
4014160497

tgnvsia
kfjcbeh
550598113

kmswcnn
hcdguxq
4090013392

vswkuqk
yafwbir
2261340929

yezrovl
vwknxnu
2954574600

cujnfpg
phhwvrh
3289759799

xucsdhf
gtgfudo
4164393361

gqplbmq
hcapuuh
656642059

igeougy
ytpfssi
4118505601

driaytu
hnomxzs
2212230278

iketcbk
jerdutt
2479104683

rpppuko
mqtedgf
1986672624

kvbesjh
twfpbfa
2892019439

evwzaos
zwsopcz
3029719102

ugtegak
jfppvmb
3715719613

iknybbk
jeyittt
398238032

itqfybq
jzfvotn
3806847855

ywouxys
ukiyywu
3626703900

zqlzlzj
vmjvmtl
1949335477

aitjoui
qzaciay
3794947975

vxhmxnl
yjyqovu
1882453027

zpshpoa
ubbtgwx
311908063

exhzdhp
jjyfspi
3165437067

xsdmynd
knftilk
2149740729

tohzrgd
xsnvsib
3862892194

sgkldej
lfoyuic
809259157

izzndbw
fhkrszn
2534674167

kjkkgzg
xwirwxh
4108936481

yvwjyuh
jkusiwg
1821358308

fxtdmbs
vkamkvc
4104486534

ftqxzdh
ezfhlrw
2484994613

hvaledh
xetecpx
554199908

yeissqw
jxkjcsx
4294947840

sdpoqcw
cwefwwg
2627502466

nepvour
rjcshod
110262302

cvivwhc
pkkogjl
136192574

bygmhsd
qdetxqk
1195331540

obwpuxb
smdurbt
2466482257

iwuhnto
eksdozi
2396159347

cjegpcp
ovckqmv
861582261

dpozeev
hlivdkp
430397453

tianznw
gtcwjlx
3110265897

xcsjggh
hpfcasx
36665361

xzxuysm
kgzliqb
287371322

chihdwk
otodeym
1558575868

tnkxzms
koomkaz
3085259689

kceezbm
tbapknd
4198087903

wlwfvnq
kcdcqtg
304773531

oewnsln
cyqbrbh
492142473

tobzbve
knfoszl
2490869493

zxytuar
yvndcwm
2991426191

ecvoads
zbrzphz
1591549120

zgffxfz
yiqvnpe
288303136

twuzdyi
xksvewo
596722119

rfsgnet
bufnhqd
756968307

euuhciw
jgdttqn
3764572437

whprspf
kgcwtjp
3048824742

ymlbinm
flhwxbd
308304037

dupnrrz
tfegtfj
1644000365

axmicjj
reopshe
146039524

djqgnyv
tydnhmf
745237997

nwvmesa
rxehbiw
2678562110

gdsfrzk
hvbzebr
3101837067

zehmtce
edlxeol
3451105439

bzqdqlf
atftgzy
1671489439

silhbdt
czyadpd
3505195570

bzrxqyr
atehgom
329271041

rngqazj
mocdpvc
492217567

gdnftyq
xejseux
4065378867

flodegc
ymkqtkj
352398593

ejidvrz
jxxxajc
810447580

dexsuqr
tvmzseb
3499196161

vyzlbpd
fjoeddt
3537376586

yrhzlkq
unnvmew
4017118138

lxmoaip
syizpey
2579702319

zpcpays
eqgepuz
354399221

azizsuh
rgkccwg
3441578396

qzpwdlz
bgrntnu
1152814553

uwbrthp
zesncpi
3104178606

anhjdzw
rsjstxx
736480241

hhfncss
gzwrtkj
2341875164

jdjqdpe
vkytcjs
2096604753

mceivsy
qlvlqio
2468755540

mykpeip
rxoetey
1960566923

wqjcbhb
hpnvsdk
3406243371

wnflwjz
kauippl
2678321277

jxeiyfg
yegpidh
4185286856

phwijrf
cuupzpi
1692899291

uzhiqpm
fgjparb
803662884
 
M

Mark Shelor

Jonas said:
By random searching instead of systematic I quickly found a lot of matches
(with crc32 below):
/jN

oxueekz
pyqptgs
1122772949


A good illustration of the "birthday paradox".

But, can you come up with an efficient Perl program that finds an input
string whose CRC value is 1756683253?

Hint: I'd recommend writing it in C! ;)
 
J

Jonas Nilsson

Mark Shelor said:
But, can you come up with an efficient Perl program that finds an input
string whose CRC value is 1756683253?

Hint: I'd recommend writing it in C! ;)

Shall I write a Perl-program in C?

I'm currently running this:

use strict;
use Digest::CRC qw(crc32 crc16);
my $string='aaaaaaa';
$|=1;

while (1) {
if (crc32($string)==1756683253) {
print $string;
exit;
}
$string++;
}
Which finishes after a maximum time of 44 hours. Just wait and see.
(Or should I start from 'zzzzzzz' working down?)
/jN
 
J

Jonas Nilsson

Jonas Nilsson said:
Shall I write a Perl-program in C?

I'm currently running this:
<snip code>

I've run through 2.6 billion lowercase letter combination without a hit. Is
it so that using only a subset of bytes (like 'a'-'z' == 97 - 122) I will
only have access to a subset of the CRC32 values? I thougt not. If there
still is no hit after the weekend I will start to think otherwise...

/jN

<Revised code - able to restart from last position>

use strict;
use integer;
use Digest::CRC qw(crc32);
my $string='a';
$|=1;
if (open(FILE,"<crc1756683253.log")) {
$string=<FILE>;
chomp($string);
close(FILE);
}

my $t0=time();
while (1) {
for (1..500000) {
if (crc32($string)==1756683253) {
my $timediff=(time()-$t0);
print "After $timediff seconds I found this: $string\n";
if (open(FILE,">>crc1756683253")) {
print FILE "After $timediff seconds I found this: $string\n";
close(FILE);
}
}
$string++;
}
open(FILE,">crc1756683253.log") or die;
print FILE "$string\n";
my $number;
my $mult=1;
my $ex=26;
no integer;
for (reverse split //,$string) {
$number+=(ord($_)-96)*$mult;
$mult*=$ex;
}
$number--;
print FILE "Now I checked $number strings!\n";
close FILE;
use integer;
}
 
D

David Filmer

This is interesting. If I may follow up (though I realize it's getting OT; I
beg your indulgence)...

What about MD5? I am of the opinion that it is infeaseable to find two
strings that share the same MD5 digest (without taking a hundred million
(billion? trillion?) years or so to do it). Would anyone in the group
disagree?
 
S

Sam Holden

This is interesting. If I may follow up (though I realize it's getting OT; I
beg your indulgence)...

What about MD5? I am of the opinion that it is infeaseable to find two
strings that share the same MD5 digest (without taking a hundred million
(billion? trillion?) years or so to do it). Would anyone in the group
disagree?

Yes.

http://www.tcs.hut.fi/~mjos/md5/
 
J

Jonas Nilsson

I've got a hit...

C:\Documents and Settings\jonni>perl -MDigest::CRC
print Digest::CRC::crc32('mddxzib');
^D
1756683253

Excited? Naa...
/jN
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,901
Latest member
Noble71S45

Latest Threads

Top