my small hashlib - using pythons hash-functions

  • Thread starter Mathias Panzenboeck
  • Start date
M

Mathias Panzenboeck

Hi.

I wrote a small hashlib for C. Because I'm new to hashes I looked at pythons implementation and
reused *some* of the code... or more the mathematical "hash-function", not really the code.

In particular I looked at pythons hash and lookup functions, so I came up with this (see the code
underneath).

So, can this code be considered as derived and do I have to put my code under the GPL? I'd like to
publish it under something less restrictive, like a BSD style license. But if GPL is the only way,
then GPL it is. :)

-panzi


#define PERTURB_SHIFT 5

static ENTRY_T * lookup(CONTAINER_T * table, hash_const_str_t key,
int (*loop_cond)(ENTRY_T * ptr, hash_const_str_t key)) {
register ENTRY_T * entries = table->entries;
register hash_size_t size = table->size;
register hash_size_t pos = hash(key) % size;
register hash_size_t perturb = pos;

register ENTRY_T * ptr = entries + pos;

while(loop_cond(ptr,key)) {
pos = ((pos << 2) + pos + perturb + 1) % size;
perturb >>= PERTURB_SHIFT;

ptr = entries + pos;

#ifdef HASH_COUNT_COLLISIONS
++ table->collisions;
#endif
}

return ptr;
}

hash_t hash(register hash_const_str_t str) {
/* python 2.5's hash function: */
register hash_size_t len;
register hash_t h;

assert(str);

len = strlen(str);
h = *str << 7;

while (*str)
h = (1000003 * h) ^ *str ++;
h ^= len;

return h;
}
 
P

Paul Rubin

Mathias Panzenboeck said:
So, can this code be considered as derived and do I have to put my
code under the GPL? I'd like to publish it under something less
restrictive, like a BSD style license. But if GPL is the only way,
then GPL it is. :)

Python is not GPL'd but has its own license. You can make the derived
work GPL if you want, but it is not required. You'll have to check
its terms to see whether you can specifically use the BSD license.
Simplest might be to just stay with the Python license.
 
M

Mathias Panzenboeck

Paul said:
Python is not GPL'd but has its own license. You can make the derived
work GPL if you want, but it is not required. You'll have to check
its terms to see whether you can specifically use the BSD license.
Simplest might be to just stay with the Python license.

Yes, right. It's PSF.

But the question is: *IS* this derived work? I mean, it's not copied code.
It's the same hashing-logic, which I learned by watching pythons code.

-panzi
 
F

Fredrik Lundh

Mathias said:
But the question is: *IS* this derived work? I mean, it's not copied code.
It's the same hashing-logic, which I learned by watching pythons code.

given that it's only a few lines of code, and there's hardly any other
way to write those lines if you want to implement the same algorithm,
I'd say it falls under "fair use". crediting the Python developers in
the source code should be good enough.

</F>
 
M

Mathias Panzenboeck

Fredrik said:
given that it's only a few lines of code, and there's hardly any other
way to write those lines if you want to implement the same algorithm,
I'd say it falls under "fair use". crediting the Python developers in
the source code should be good enough.

</F>

ic. I'll write a mail to the python developers then.

The code in python looks like this (see code underneath), just if someone what's to know.
I only used the logic of the parts I understand. ;)

Objects/stringobject.c:

static long
string_hash(PyStringObject *a)
{
register Py_ssize_t len;
register unsigned char *p;
register long x;

if (a->ob_shash != -1)
return a->ob_shash;
len = a->ob_size;
p = (unsigned char *) a->ob_sval;
x = *p << 7;
while (--len >= 0)
x = (1000003*x) ^ *p++;
x ^= a->ob_size;
if (x == -1)
x = -2;
a->ob_shash = x;
return x;
}

Objects/dictobject.c:

static dictentry *
lookdict_string(dictobject *mp, PyObject *key, register long hash)
{
register size_t i;
register size_t perturb;
register dictentry *freeslot;
register size_t mask = (size_t)mp->ma_mask;
dictentry *ep0 = mp->ma_table;
register dictentry *ep;

/* Make sure this function doesn't have to handle non-string keys,
including subclasses of str; e.g., one reason to subclass
strings is to override __eq__, and for speed we don't cater to
that here. */
if (!PyString_CheckExact(key)) {
#ifdef SHOW_CONVERSION_COUNTS
++converted;
#endif
mp->ma_lookup = lookdict;
return lookdict(mp, key, hash);
}
i = hash & mask;
ep = &ep0;
if (ep->me_key == NULL || ep->me_key == key)
return ep;
if (ep->me_key == dummy)
freeslot = ep;
else {
if (ep->me_hash == hash && _PyString_Eq(ep->me_key, key))
return ep;
freeslot = NULL;
}

/* In the loop, me_key == dummy is by far (factor of 100s) the
least likely outcome, so test for that last. */
for (perturb = hash; ; perturb >>= PERTURB_SHIFT) {
i = (i << 2) + i + perturb + 1;
ep = &ep0[i & mask];
if (ep->me_key == NULL)
return freeslot == NULL ? ep : freeslot;
if (ep->me_key == key
|| (ep->me_hash == hash
&& ep->me_key != dummy
&& _PyString_Eq(ep->me_key, key)))
return ep;
if (ep->me_key == dummy && freeslot == NULL)
freeslot = ep;
}
}
 
R

Robert Kern

Fredrik said:
given that it's only a few lines of code, and there's hardly any other
way to write those lines if you want to implement the same algorithm,
I'd say it falls under "fair use". crediting the Python developers in
the source code should be good enough.

If you'll forgive my pedantry, that's not "fair use," but simply the use of
material that cannot be copyrighted by itself (in many jursidictions, at least)
because it is so short, obvious, non-creative, etc.

A good overview of fair use is here:

http://fairuse.stanford.edu/Copyright_and_Fair_Use_Overview/chapter9/index.html

IANAL. TINLA.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,902
Latest member
Elena68X5

Latest Threads

Top