perl implementation of rand() and srand()

S

Simon

hi everyone

I'm trying to implement a Java-version of the perl-based "razor"-client.
razor is spamfilter using a client-server system where users can "vote" if
a message is spam or not. (it is commercially known under the name
"spamnet").

The client is able to choose random positions in a e-mail-message and
computes these parts of the message to build an identifier (hash).

The positions are chosen according to the following system:

srand(<server specified seed-number>);

rand(<length of message>); several times to chose portions of the text.

all clients and all servers have to use the same positions in order to
generate a comparable identifier for the message. This is my problem: if I
want to implement a Java-Client for this system, i have to be able to
generate the same "random" number sequence in Java. I need the source code
of the perl implementation of srand() and rand() to be able to do this. can
anyone point me in the right direction please?

thx
Simon
 
P

Paul Lalli

generate a comparable identifier for the message. This is my problem: if I
want to implement a Java-Client for this system, i have to be able to
generate the same "random" number sequence in Java. I need the source code
of the perl implementation of srand() and rand() to be able to do this. can
anyone point me in the right direction please?

The right direction for the source code to perl? Here:
http://www.cpan.org/src/

Paul Lalli
 
S

Simon

The right direction for the source code to perl? Here:
http://www.cpan.org/src/

i've managed to find that link, too, but wasn't able to find anything about
the implementation of srand() or rand() inside it. if it's in there, can
someone tell me where to look for it?

Simon
 
B

Ben Morrow

Simon said:
i've managed to find that link, too, but wasn't able to find anything about
the implementation of srand() or rand() inside it. if it's in there, can
someone tell me where to look for it?

In pp.c, the functions PP(pp_rand) and PP(pp_srand). Basically, they
just call whatever C-library implementation Configure found: I would
have thought that the usual Java random number function would do just
fine.

Ben
 
M

Martien Verbruggen

I'm trying to implement a Java-version of the perl-based "razor"-client.
[snip]

The client is able to choose random positions in a e-mail-message and
computes these parts of the message to build an identifier (hash).

The positions are chosen according to the following system:

srand(<server specified seed-number>);

rand(<length of message>); several times to chose portions of the text.

all clients and all servers have to use the same positions in order to
generate a comparable identifier for the message.

Perl's rand() just calls whatever rand() function the system it runs
on provides, and those are notoriously non-identical. In later
versions of Perl, the person compiling Perl can actually override what
pseudo-random generator they want to use.

In other words, rand() in Perl is not guaranteed to produce the same
results at all times.

Are you sure you interpreted the razor code correctly? I couldn't
actually find a document that describes the server-client exchange.

Martien
 
S

Simon

In pp.c, the functions PP(pp_rand) and PP(pp_srand). Basically, they
just call whatever C-library implementation Configure found:

i'm currently checking this path to find out if this is the help i need to
reimplement the perl random number generator. but thx for that hint!
I would
have thought that the usual Java random number function would do just
fine.

The java rng is fine, but does produce different random numbers than the
one provided by perl. And since I have to reproduce the same number
sequences, i can't use java's rng.

Simon
 
S

Simon

Perl's rand() just calls whatever rand() function the system it runs
on provides, and those are notoriously non-identical. In later
versions of Perl, the person compiling Perl can actually override what
pseudo-random generator they want to use.

In other words, rand() in Perl is not guaranteed to produce the same
results at all times.

this is exactly my problem. since i need to produce the same random numbers
as the original razor client, as i believe (see below).
Are you sure you interpreted the razor code correctly? I couldn't
actually find a document that describes the server-client exchange.

here is a snipplet from razor source code, where different positions inside
a mail-messages are "randomly" chosen for computing an identifier (hash):

<snip>
srand($$self{seed});

my @content = split /$$self{separator}/, $content;

my $lines = scalar @content;

# Randomly choose relative locations and section sizes (in percent)
my $sections = 6;
my $ssize = 100/$sections;
my @rel_lineno = map { rand($ssize) + ($_*$ssize) } 0 .. ($sections-1);
my @lineno = map { int(($_ * $lines)/100) } @rel_lineno;

my @rel_offset1 = map { rand(50) + ($_*50) } qw(0 1);
my @rel_offset2 = map { rand(50) + ($_*50) } qw(0 1);
</snip>

these positions are then used to compute a hash which is then compared
against stored hashes on the server. i assume that if i chose other
positions in the message, i never end up with a hash, that represents the
message in the right way, when trying to compare with the db on the server
side. don't you agree?

Simon
 
M

Martien Verbruggen

this is exactly my problem. since i need to produce the same random numbers
as the original razor client, as i believe (see below).

But what I'm saying is, that since the razor clients are implemented
in Perl, that the current set of installations out there can't rely on
rand() always returning the same sequence already.

So, the current implementations of the razor client and protocol can't
require rand() to always return the same pseudo-random sequence
number, so your Java implementation should not need to care either.
here is a snipplet from razor source code, where different positions inside
a mail-messages are "randomly" chosen for computing an identifier (hash):

<snip>
srand($$self{seed});

my @content = split /$$self{separator}/, $content;

my $lines = scalar @content;

# Randomly choose relative locations and section sizes (in percent)
my $sections = 6;
my $ssize = 100/$sections;
my @rel_lineno = map { rand($ssize) + ($_*$ssize) } 0 .. ($sections-1);
my @lineno = map { int(($_ * $lines)/100) } @rel_lineno;

my @rel_offset1 = map { rand(50) + ($_*50) } qw(0 1);
my @rel_offset2 = map { rand(50) + ($_*50) } qw(0 1);
</snip>

these positions are then used to compute a hash which is then compared
against stored hashes on the server. i assume that if i chose other
positions in the message, i never end up with a hash, that represents the
message in the right way, when trying to compare with the db on the server
side. don't you agree?

I don't know. Like I said, I couldn't find any documentation on how
the protocol works, or what sort of signature is generated.

On first reading, your argument sounds convincing, but when you think
about it, it can't be right, since every rand() in every Perl
installation could be a different one (unlikely, but there certainly
will be differences). There must be some other trick in the algorithm
that avoids the need to have a specific offset sequence.


Have you tried contacting the author of the razor modules? They might
have some documentation that describes how the whole thing works. It's
often easier to work from documentation like that than to try to
reverse engineer an algorithm from code that implements it.

Martien
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,754
Messages
2,569,522
Members
44,995
Latest member
PinupduzSap

Latest Threads

Top