Converting a string to a number by using INT (no hash method)

F

Ferrous Cranus

I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(

And the best part is that "that" number must be able to turn back into a path.

This way i DON'T EVEN HAVE TO STORE THE ACTUAL HTML PAGE'S ABSOLUTE PATH!!!!

1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page)
2. I turn the path into a 4-digitnumber
3. i store that number to the database. I DONT EVEN HAVE TO STORE THE PATH TO THE DATABASE ANYMORE!!! this is just great!
 
L

Leonard, Arah

I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(
And the best part is that "that" number must be able to turn back into a path.

This way i DON'T EVEN HAVE TO STORE THE ACTUAL HTML PAGE'S ABSOLUTE PATH!!!!

1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page) 2. I turn the path into a 4-digitnumber 3. istore that number to the database. I DONT EVEN HAVE TO STORE THE PATH TO THE DATABASE ANYMORE!!! this is just great!

Without involving some kind of lookup table/map service to store the paths (which would entirely defeat the purpose) what you are ranting about is technically impossible. If you tried really really hard you *might* be able to convert a string that long into some kind of 4-digit integer checksum, but you would *never* be able to convert that back into a file path. Nor would it be guaranteed to be unique.
 
M

Mark Lawrence

I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(

And the best part is that "that" number must be able to turn back into a path.

This way i DON'T EVEN HAVE TO STORE THE ACTUAL HTML PAGE'S ABSOLUTE PATH!!!!

1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page)
2. I turn the path into a 4-digitnumber
3. i store that number to the database. I DONT EVEN HAVE TO STORE THE PATH TO THE DATABASE ANYMORE!!! this is just great!

Hi Iron Skull,

I hereby nominate you for Troll of the Millenium.
 
D

Dave Angel

I just need a way to CONVERT a string(absolute path) to a 4-digit unique number with INT!!! That's all i want!! But i cannot make it work :(

And the best part is that "that" number must be able to turn back into a path.

This way i DON'T EVEN HAVE TO STORE THE ACTUAL HTML PAGE'S ABSOLUTE PATH!!!!

1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page)
2. I turn the path into a 4-digitnumber
3. i store that number to the database. I DONT EVEN HAVE TO STORE THE PATH TO THE DATABASE ANYMORE!!! this is just great!

I had prepared a detailed response, showing what your choices are with
this new constraint. But I can see from this post here that there's no
point, so I've thrown it out.

Either you're trolling, or you have a very limited knowledge of
mathematics. This isn't a programming problem, it's a simple problem
of information theory.

Unless you constrain your users to very restrictive filenames, what you
ask here simply cannot be done.

Perpetual motion machine, anyone? Or a compression algorithm which can
be applied repeatedly to a chunk of data until it shrinks down to one
byte? No way to do it without cheating, and the literature is full of
examples of people cheating.
 
F

Ferrous Cranus

Τη ΤÏίτη, 22 ΙανουαÏίου 2013 6:27:32 μ.μ. UTC+2, οχÏήστης Leonard, Arah έγÏαψε:
Without involving some kind of lookup table/map service to store the paths (which would entirely >defeat the purpose) what you are ranting about is technically impossible. If you tried really >really hard you *might* be able to convert a string that long into some kind of 4-digit integer >checksum, but you would *never* be able to convert that back into a file path. Nor would it be >guaranteed to be unique.

Now that iam thinking of it more and more, i don't have to turn the 'path' back to a 'number'

So, what i want is a function foo() that does this:

foo( "some long string" ) --> 1234

=====================
1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page)
2. turn the 'path' to 4-digit number and save it as 'pin' (how?)
3. i store that number to the database. I DONT EVEN HAVE TO STORE THE HTML PAGE'S PATH TO THE DATABASE ANYMORE!!! this is just great!


At some later time i want to check the weblog of that .html page


1. request the page as: http://mydomain.gr/index.html?show=log
2. .htaccess gives my script the absolute path of the requested .html file
3. turn the 'path' to 4-digit number and save it as 'pin' (this is what i'am asking)
4. select all log records for that specific .html page (based on the 'pin'column)


Since i have the requested 'path' which has been converted to a database stored 4-digit number, i'am aware for which page i'am requesting detailed data from, so i look upon the 'pin' column in the database and thus i know which records i want to select.

No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page

Can this be done?
 
F

Ferrous Cranus

Τη ΤÏίτη, 22 ΙανουαÏίου 2013 6:27:32 μ.μ. UTC+2, οχÏήστης Leonard, Arah έγÏαψε:
Without involving some kind of lookup table/map service to store the paths (which would entirely >defeat the purpose) what you are ranting about is technically impossible. If you tried really >really hard you *might* be able to convert a string that long into some kind of 4-digit integer >checksum, but you would *never* be able to convert that back into a file path. Nor would it be >guaranteed to be unique.

Now that iam thinking of it more and more, i don't have to turn the 'path' back to a 'number'

So, what i want is a function foo() that does this:

foo( "some long string" ) --> 1234

=====================
1. User requests a specific html page( .htaccess gives my script the absolute path for that .html page)
2. turn the 'path' to 4-digit number and save it as 'pin' (how?)
3. i store that number to the database. I DONT EVEN HAVE TO STORE THE HTML PAGE'S PATH TO THE DATABASE ANYMORE!!! this is just great!


At some later time i want to check the weblog of that .html page


1. request the page as: http://mydomain.gr/index.html?show=log
2. .htaccess gives my script the absolute path of the requested .html file
3. turn the 'path' to 4-digit number and save it as 'pin' (this is what i'am asking)
4. select all log records for that specific .html page (based on the 'pin'column)


Since i have the requested 'path' which has been converted to a database stored 4-digit number, i'am aware for which page i'am requesting detailed data from, so i look upon the 'pin' column in the database and thus i know which records i want to select.

No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page

Can this be done?
 
D

D'Arcy J.M. Cain

Why bother? Just wish for a zillion dollars and then you never have to
program again. At least that would be theoretically possible.
*might* be able to convert a string that long into some kind of
4-digit integer checksum, but you would *never* be able to convert
that back into a file path. Nor would it be guaranteed to be unique.

In fact, if you have 10,001 files it is absolutely guaranteed to have at
least one duplicate entry.
 
L

Leonard, Arah

No need, to turn the number back to a path anymore, just the path to a number, to identify the specific .html page
Can this be done?

Guaranteed to be unique? Not even remotely possible. Even with a lookup table approach (which defeats your purpose of not storing the path) with 4 digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers. And that's the best-case scenario. Anything else would be worse.

Not guaranteed to be unique? Easy. Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:
pin = int( htmlpage.encode("hex"), 16 ) % 10000
It'll give you your number, but there are no guarantees of uniqueness. You're looking at more blind random luck using that.
 
F

Ferrous Cranus

Τη ΤÏίτη, 22 ΙανουαÏίου 2013 7:24:26 μ.μ. UTC+2, οχÏήστης Leonard, Arah έγÏαψε:
Guaranteed to be unique? Not even remotely possible. Even with a lookuptable approach (which defeats your purpose of not storing the path) with 4digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers. And that's the best-case scenario. Anything else would be worse.



Not guaranteed to be unique? Easy. Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:

pin = int( htmlpage.encode("hex"), 16 ) % 10000

It'll give you your number, but there are no guarantees of uniqueness. You're looking at more blind random luck using that.

Finally!!!!!! THANK YOU VERY MUCH!!! THIS IS WHAT I WAS LOOKING FOR!!!

NOW, if you please explain it to me from the innermost parenthesis please, because i do want to understand it!!!

And since i'am sure it works, and i just used it on http://superhost.gr
please view my domain and help me understand why its producing errors for me.
Your 1-line code surely works but somethings not letting my webpage load normally.

Please take a look....
 
F

Ferrous Cranus

Τη ΤÏίτη, 22 ΙανουαÏίου 2013 7:24:26 μ.μ. UTC+2, οχÏήστης Leonard, Arah έγÏαψε:
Guaranteed to be unique? Not even remotely possible. Even with a lookuptable approach (which defeats your purpose of not storing the path) with 4digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers. And that's the best-case scenario. Anything else would be worse.



Not guaranteed to be unique? Easy. Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:

pin = int( htmlpage.encode("hex"), 16 ) % 10000

It'll give you your number, but there are no guarantees of uniqueness. You're looking at more blind random luck using that.

Finally!!!!!! THANK YOU VERY MUCH!!! THIS IS WHAT I WAS LOOKING FOR!!!

NOW, if you please explain it to me from the innermost parenthesis please, because i do want to understand it!!!

And since i'am sure it works, and i just used it on http://superhost.gr
please view my domain and help me understand why its producing errors for me.
Your 1-line code surely works but somethings not letting my webpage load normally.

Please take a look....
 
J

John Gordon

In said:
You're looking at more blind random luck using that.
Finally!!!!!! THANK YOU VERY MUCH!!! THIS IS WHAT I WAS LOOKING FOR!!!

No it isn't; you said you wanted a unique 4-digit number. This method
can return the same 4-digit number for lots of different file paths.
NOW, if you please explain it to me from the innermost parenthesis please,
because i do want to understand it!!!

1. Transform the html path string into a (large) hexadecimal number
using the encode() function.

2. Convert the hexadecimal number into a decimal integer using the
int() function.

3. Shrink the integer into the range 0-9999 by using the % operator.
 
F

Ferrous Cranus

Τη ΤÏίτη, 22 ΙανουαÏίου 2013 7:24:26 μ.μ. UTC+2, οχÏήστης Leonard, Arah έγÏαψε:
Guaranteed to be unique? Not even remotely possible. Even with a lookuptable approach (which defeats your purpose of not storing the path) with 4digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers. And that's the best-case scenario. Anything else would be worse.



Not guaranteed to be unique? Easy. Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:

pin = int( htmlpage.encode("hex"), 16 ) % 10000

It'll give you your number, but there are no guarantees of uniqueness. You're looking at more blind random luck using that.

==============================================
pin = int( htmlpage.encode("hex"), 16 ) % 10000
==============================================

Can you please explain the differences to what you have posted opposed to this perl coding?

==============================================
foreach my $ltr(@ltrs){
$hash = ( $hash + ord($ltr)) %10000;
==============================================

I want to understand this and see it implemented in Python.
 
F

Ferrous Cranus

Τη ΤÏίτη, 22 ΙανουαÏίου 2013 7:24:26 μ.μ. UTC+2, οχÏήστης Leonard, Arah έγÏαψε:
Guaranteed to be unique? Not even remotely possible. Even with a lookuptable approach (which defeats your purpose of not storing the path) with 4digits you're looking at a maximum 10000 unique file paths before your system duplicates numbers. And that's the best-case scenario. Anything else would be worse.



Not guaranteed to be unique? Easy. Just take then previously given example of pin = int( htmlpage.encode("hex"), 16 ) and mod it to your limit, to make:

pin = int( htmlpage.encode("hex"), 16 ) % 10000

It'll give you your number, but there are no guarantees of uniqueness. You're looking at more blind random luck using that.

==============================================
pin = int( htmlpage.encode("hex"), 16 ) % 10000
==============================================

Can you please explain the differences to what you have posted opposed to this perl coding?

==============================================
foreach my $ltr(@ltrs){
$hash = ( $hash + ord($ltr)) %10000;
==============================================

I want to understand this and see it implemented in Python.
 
M

Michael Torrie

============================================== pin = int(
htmlpage.encode("hex"), 16 ) % 10000
==============================================

Can you please explain the differences to what you have posted
opposed to this perl coding?

============================================== foreach my
$ltr(@ltrs){ $hash = ( $hash + ord($ltr)) %10000;
==============================================

I want to understand this and see it implemented in Python.

It isn't quite the thing. The perl code is merely a checksum of the
ascii value of the characters in the file name, that is then chopped
down to a number < 10000. The Python code is taking the ascii value of
each character in the file name, converting it to a hexadecimal pair of
digits, stringing them all out into a long string, then converting that
to a number using the hexadecimal number parser. This results in a
*very* large number, 8-bits per letter in the original file name, and
then chops that down to 10000. Technically neither method is a hash and
neither will generate unique numbers.

Here's the python algorithm used on a short word:
'hello' => '68656c6c6f' (h = 0x68', e=0x65', 0x6c', 0=0x6f)
=> 0x68656c6c6f => 448378203247
mod that with 10000 and you get 3247

If you would simply run the python interpreter and try these things out
you could see how and why they work or not work. What is stopping you
from doing this?
 
D

Dave Angel

==============================================
pin = int( htmlpage.encode("hex"), 16 ) % 10000
==============================================

Can you please explain the differences to what you have posted opposed to this perl coding?

==============================================
foreach my $ltr(@ltrs){
$hash = ( $hash + ord($ltr)) %10000;
==============================================

I want to understand this and see it implemented in Python.

The perl code will produce the same hash for "abc.html" as for
"bca.html" That's probably one reason Leonard didn't try to
transliterate the buggy code.

In any case, the likelihood of a hash collision for any non-trivial
website is substantial. As I said elsewhere, if you hash 100 files you
have about a 40% chance of a collision.

If you hash 220 files, the likelihood is about 90%
 
F

Ferrous Cranus

Τη ΤÏίτη, 22 ΙανουαÏίου 2013 9:02:48 μ.μ. UTC+2, οχÏήστης Michael Torrie έγÏαψε:
It isn't quite the thing. The perl code is merely a checksum of the

ascii value of the characters in the file name, that is then chopped

down to a number < 10000. The Python code is taking the ascii value of

each character in the file name, converting it to a hexadecimal pair of

digits, stringing them all out into a long string, then converting that

to a number using the hexadecimal number parser. This results in a

*very* large number, 8-bits per letter in the original file name, and

then chops that down to 10000. Technically neither method is a hash and

neither will generate unique numbers.



Here's the python algorithm used on a short word:

'hello' => '68656c6c6f' (h = 0x68', e=0x65', 0x6c', 0=0x6f)

=> 0x68656c6c6f => 448378203247

mod that with 10000 and you get 3247



If you would simply run the python interpreter and try these things out

you could see how and why they work or not work. What is stopping you

from doing this?


May i sent you my code by mail so for you see whats wrong and http://superhost.gr produces error?<br><br>

1. this is not a script that iam being paid for.
2, this is not a class assignemnt

I just want to use that method of gettign this to work.
 
F

Ferrous Cranus

Τη ΤÏίτη, 22 ΙανουαÏίου 2013 9:02:48 μ.μ. UTC+2, οχÏήστης Michael Torrie έγÏαψε:
It isn't quite the thing. The perl code is merely a checksum of the

ascii value of the characters in the file name, that is then chopped

down to a number < 10000. The Python code is taking the ascii value of

each character in the file name, converting it to a hexadecimal pair of

digits, stringing them all out into a long string, then converting that

to a number using the hexadecimal number parser. This results in a

*very* large number, 8-bits per letter in the original file name, and

then chops that down to 10000. Technically neither method is a hash and

neither will generate unique numbers.



Here's the python algorithm used on a short word:

'hello' => '68656c6c6f' (h = 0x68', e=0x65', 0x6c', 0=0x6f)

=> 0x68656c6c6f => 448378203247

mod that with 10000 and you get 3247



If you would simply run the python interpreter and try these things out

you could see how and why they work or not work. What is stopping you

from doing this?


May i sent you my code by mail so for you see whats wrong and http://superhost.gr produces error?<br><br>

1. this is not a script that iam being paid for.
2, this is not a class assignemnt

I just want to use that method of gettign this to work.
 
A

Alan Spence

Τη ΤÏίτη, 22 ΙανουαÏίου 2013 9:02:48 μ.μ. UTC+2, ο χÏήστης Michael Torrie έγÏαψε:


May i sent you my code by mail so for you see whats wrong and http://superhost.gr produces error?<br><br>

1. this is not a script that iam being paid for.
2, this is not a class assignemnt

I just want to use that method of gettign this to work.

All pages, strings and objects map to:

http://redwing.hutman.net/~mreed/warriorshtm/ferouscranus.htm

Alan
 
L

Leonard, Arah

The perl code will produce the same hash for "abc.html" as for "bca.html" That's probably one reason Leonard didn't try to transliterate the buggycode.
Actually, to give credit where it's due, it wasn't me. I just modified someone else's interesting solution in this thread and added the silly limit of 10000 to it.
In any case, the likelihood of a hash collision for any non-trivial website is substantial.

Exactly. Four digits is hardly enough range for it to be even remotely safe. And even then range isn't really the issue as technically it just improves your odds.

The results of a modulus operator are still non-unique no matter how many digits are there to work with ... within reason. Statistically anyone who buys a ticket could potentially win the lottery no matter how bad the odds are. ;)

And now back to the OP, I'm still confused on this four-digit limitation. Why isn't the limitation at least adhering to a bytelength like byte/short/long? Is this database storing a string of characters instead of an actualnumber? (And if so, then why not just block out 255 characters instead of4 to store a whole path? Or at the very least treat 4 characters as 4 bytes to greatly increase the numeric range?)
 
J

John Gordon

In said:
May i sent you my code by mail so for you see whats wrong and
http://superhost.gr produces error?

I tried going to that address and got some error output. I noticed this
in the error dump:

186 if cursor.rowcount == 0:
187 cursor.execute( '''INSERT INTO visitors(pin, host
, hits, useros, browser, date) VALUES(%s, %s, %s, %s, %s)''', (pin, hos
t, 1, useros, browser, date) )

The INSERT statement gives six column names but only five placeholders (%s)
in the VALUES clause.

Perhaps that's the problem?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,483
Members
44,903
Latest member
orderPeak8CBDGummies

Latest Threads

Top