Using filepath method to identify an .html page

Discussion in 'Python' started by Ferrous Cranus, Jan 22, 2013.

  1. Hello, i decided to switch from embedding string into .html to actually grab the filepath in order to identify it:

    # =================================================================================================================
    # open current html template and get the page ID number
    # ===============================

    f = open( page )

    # read first line of the file
    firstline = f.readline()

    # find the ID of the file and store it
    pin = re.match( r'<!-- (\d+) -->', firstline ).group(1)
    =================================

    This is what i used to have.

    Now, can you pleas help me write the switch to filepath identifier?
    I'am having trouble writing it.
     
    Ferrous Cranus, Jan 22, 2013
    #1
    1. Advertising

  2. On Tue, 22 Jan 2013 02:07:54 -0800, Ferrous Cranus wrote:

    > Hello, i decided to switch from embedding string into .html to actually
    > grab the filepath in order to identify it:


    What do you think "the filepath" means, and how do you think you would
    grab it?

    I can only guess you mean the full path to the file, like:

    /home/steve/mypage.html

    C:\My Documents\mypage.html


    Is that what you mean?


    > # open current html template and get the page ID number
    > f = open( page )
    > # read first line of the file
    > firstline = f.readline()
    > # find the ID of the file and store it
    > pin = re.match( r'<!-- (\d+) -->', firstline ).group(1)
    >
    > This is what i used to have.
    >
    > Now, can you pleas help me write the switch to filepath identifier? I'am
    > having trouble writing it.


    I don't understand the question.


    --
    Steven
     
    Steven D'Aprano, Jan 22, 2013
    #2
    1. Advertising

  3. # ==================================================
    # produce a hash based on html page's filepath and convert it to an integet, that will be uses to identify the page itself.
    # ==================================================

    pin = int( hashlib.md5( htmlpage ) )


    I just tried that but it produced an error.
    What am i doing wrong?
     
    Ferrous Cranus, Jan 22, 2013
    #3
  4. # ====================================================================================================================================
    # produce a hash string based on html page's filepath and convert it to an integer, that will then be used to identify the page itself
    # ====================================================================================================================================

    pin = int( hashlib.md5( htmlpage ) )

    This fails. why?

    htmlpage = a string respresenting the absolute path of the requested .html file
    hashlib.md5( htmlpage ) = conversion of the above string to a hashed string
    int( hashlib.md5( htmlpage ) ) = conversion of the above hashed string to a number

    Why this fails?
     
    Ferrous Cranus, Jan 22, 2013
    #4
  5. Ferrous Cranus

    Lele Gaifax Guest

    Ferrous Cranus <> writes:

    > pin = int( hashlib.md5( htmlpage ) )
    >
    > This fails. why?
    >
    > htmlpage = a string respresenting the absolute path of the requested .html file
    > hashlib.md5( htmlpage ) = conversion of the above string to a hashed string


    No, that statement does not "convert" a string into another, but rather
    returns a "md5 HASH object":

    >>> import hashlib
    >>> hashlib.md5('foo')

    <md5 HASH object @ 0xb76dbcf0>

    Consulting the hashlib documentation[1], you could learn about that
    object's methods.

    > int( hashlib.md5( htmlpage ) ) = conversion of the above hashed string to a number
    >
    > Why this fails?


    Because in general you can't "convert" an arbitrary object instance (in
    your case an hashlib.HASH instance) to an integer value. Why do you need
    an integer? Isn't hexdigest() what you want?

    >>> print _.hexdigest()

    acbd18db4cc2f85cedef654fccc4a4d8

    Do yourself a favor and learn using the interpreter to test your
    snippets line by line, most problems will find an easy answer :)

    ciao, lele.

    [1] http://docs.python.org/2.7/library/hashlib.html#module-hashlib
    --
    nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
    real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
    | -- Fortunato Depero, 1929.
     
    Lele Gaifax, Jan 22, 2013
    #5
  6. On Tue, Jan 22, 2013 at 10:53 PM, Ferrous Cranus <> wrote:
    > # ==================================================
    > # produce a hash based on html page's filepath and convert it to an integet, that will be uses to identify the page itself.
    > # ==================================================
    >
    > pin = int( hashlib.md5( htmlpage ) )
    >
    >
    > I just tried that but it produced an error.
    > What am i doing wrong?


    First and foremost, here's what you're doing wrong: You're saying "it
    produced an error". Python is one of those extremely helpful languages
    that tells you, to the best of its ability, exactly WHAT went wrong,
    WHERE it went wrong, and - often - WHY it failed. For comparison, I've
    just tonight been trying to fix up a legacy accounting app that was
    written in Visual BASIC back when that wouldn't get scorn heaped on
    you from the whole world. When we fire up one particular module, it
    bombs with a little message box saying "File not found". That's all.
    Just one little message, and the application terminates (uncleanly, at
    that). What file? How was it trying to open it? I do know that it
    isn't one of its BTrieve data files, because when one of THEM isn't
    found, the crash looks different (but it's still a crash). My current
    guess is that it's probably a Windows DLL file or something, but it's
    really not easy to tell...

    ChrisA
     
    Chris Angelico, Jan 22, 2013
    #6
  7. Ferrous Cranus

    Dave Angel Guest

    On 01/22/2013 07:02 AM, Ferrous Cranus wrote:
    > # ====================================================================================================================================
    > # produce a hash string based on html page's filepath and convert it to an integer, that will then be used to identify the page itself
    > # ====================================================================================================================================
    >
    > pin = int( hashlib.md5( htmlpage ) )
    >
    > This fails. why?
    >
    > htmlpage = a string respresenting the absolute path of the requested .html file
    > hashlib.md5( htmlpage ) = conversion of the above string to a hashed string
    > int( hashlib.md5( htmlpage ) ) = conversion of the above hashed string to a number
    >
    > Why this fails?
    >


    Is your copy/paste broken? It could be useful to actually show in what
    way it "fails."

    The md5 method produces a "HASH object", not a string. So int() cannot
    process that.

    To produce a digest string from the hash object, you want to call
    hexdigest() method. The result of that is a hex literal string. So you
    cannot just call int() on it, since that defaults to decimal.

    To convert a hex string to an int, you need the extra parameter of int:

    int(mystring, 16)

    Now, see if you can piece it together.


    --
    DaveA
     
    Dave Angel, Jan 22, 2013
    #7
  8. Τη ΤÏίτη, 22 ΙανουαÏίου 2013 2:29:21 μ.μ. UTC+2, οχÏήστης Dave Angel έγÏαψε:
    > On 01/22/2013 07:02 AM, Ferrous Cranus wrote:
    >
    > > # ====================================================================================================================================

    >
    > > # produce a hash string based on html page's filepath and convert it toan integer, that will then be used to identify the page itself

    >
    > > # ====================================================================================================================================

    >
    > >

    >
    > > pin = int( hashlib.md5( htmlpage ) )

    >
    > >

    >
    > > This fails. why?

    >
    > >

    >
    > > htmlpage = a string respresenting the absolute path of the requested ..html file

    >
    > > hashlib.md5( htmlpage ) = conversion of the above string to a hashed string

    >
    > > int( hashlib.md5( htmlpage ) ) = conversion of the above hashed string to a number

    >
    > >

    >
    > > Why this fails?

    >
    > >

    >
    >
    >
    > Is your copy/paste broken? It could be useful to actually show in what
    >
    > way it "fails."
    >
    >
    >
    > The md5 method produces a "HASH object", not a string. So int() cannot
    >
    > process that.
    >
    >
    >
    > To produce a digest string from the hash object, you want to call
    >
    > hexdigest() method. The result of that is a hex literal string. So you
    >
    > cannot just call int() on it, since that defaults to decimal.
    >
    >
    >
    > To convert a hex string to an int, you need the extra parameter of int:
    >
    >
    >
    > int(mystring, 16)
    >
    >
    >
    > Now, see if you can piece it together.
    >



    htmlpage = a string respresenting the absolute path of the requested .html file


    What i want to do, is to associate a number to an html page's absolute pathfor to be able to use that number for my database relations instead of theBIG absolute path string.

    so to get an integer out of a string i would just have to type:

    pin = int( htmlpage )

    But would that be unique?
     
    Ferrous Cranus, Jan 22, 2013
    #8
  9. Τη ΤÏίτη, 22 ΙανουαÏίου 2013 2:29:21 μ.μ. UTC+2, οχÏήστης Dave Angel έγÏαψε:
    > On 01/22/2013 07:02 AM, Ferrous Cranus wrote:
    >
    > > # ====================================================================================================================================

    >
    > > # produce a hash string based on html page's filepath and convert it toan integer, that will then be used to identify the page itself

    >
    > > # ====================================================================================================================================

    >
    > >

    >
    > > pin = int( hashlib.md5( htmlpage ) )

    >
    > >

    >
    > > This fails. why?

    >
    > >

    >
    > > htmlpage = a string respresenting the absolute path of the requested ..html file

    >
    > > hashlib.md5( htmlpage ) = conversion of the above string to a hashed string

    >
    > > int( hashlib.md5( htmlpage ) ) = conversion of the above hashed string to a number

    >
    > >

    >
    > > Why this fails?

    >
    > >

    >
    >
    >
    > Is your copy/paste broken? It could be useful to actually show in what
    >
    > way it "fails."
    >
    >
    >
    > The md5 method produces a "HASH object", not a string. So int() cannot
    >
    > process that.
    >
    >
    >
    > To produce a digest string from the hash object, you want to call
    >
    > hexdigest() method. The result of that is a hex literal string. So you
    >
    > cannot just call int() on it, since that defaults to decimal.
    >
    >
    >
    > To convert a hex string to an int, you need the extra parameter of int:
    >
    >
    >
    > int(mystring, 16)
    >
    >
    >
    > Now, see if you can piece it together.
    >



    htmlpage = a string respresenting the absolute path of the requested .html file


    What i want to do, is to associate a number to an html page's absolute pathfor to be able to use that number for my database relations instead of theBIG absolute path string.

    so to get an integer out of a string i would just have to type:

    pin = int( htmlpage )

    But would that be unique?
     
    Ferrous Cranus, Jan 22, 2013
    #9
  10. Τη ΤÏίτη, 22 ΙανουαÏίου 2013 2:47:16 μ.μ. UTC+2, οχÏήστης Ferrous Cranus έγÏαψε:
    > Τη ΤÏίτη, 22 ΙανουαÏίου 2013 2:29:21 μ.μ. UTC+2, ο χÏήστης Dave Angel έγÏαψε:
    >
    > > On 01/22/2013 07:02 AM, Ferrous Cranus wrote:

    >
    > >

    >
    > > > # ====================================================================================================================================

    >
    > >

    >
    > > > # produce a hash string based on html page's filepath and convert it to an integer, that will then be used to identify the page itself

    >
    > >

    >
    > > > # ====================================================================================================================================

    >
    > >

    >
    > > >

    >
    > >

    >
    > > > pin = int( hashlib.md5( htmlpage ) )

    >
    > >

    >
    > > >

    >
    > >

    >
    > > > This fails. why?

    >
    > >

    >
    > > >

    >
    > >

    >
    > > > htmlpage = a string respresenting the absolute path of the requested .html file

    >
    > >

    >
    > > > hashlib.md5( htmlpage ) = conversion of the above string to a hashed string

    >
    > >

    >
    > > > int( hashlib.md5( htmlpage ) ) = conversion of the above hashed string to a number

    >
    > >

    >
    > > >

    >
    > >

    >
    > > > Why this fails?

    >
    > >

    >
    > > >

    >
    > >

    >
    > >

    >
    > >

    >
    > > Is your copy/paste broken? It could be useful to actually show in what

    >
    > >

    >
    > > way it "fails."

    >
    > >

    >
    > >

    >
    > >

    >
    > > The md5 method produces a "HASH object", not a string. So int() cannot

    >
    > >

    >
    > > process that.

    >
    > >

    >
    > >

    >
    > >

    >
    > > To produce a digest string from the hash object, you want to call

    >
    > >

    >
    > > hexdigest() method. The result of that is a hex literal string. So you

    >
    > >

    >
    > > cannot just call int() on it, since that defaults to decimal.

    >
    > >

    >
    > >

    >
    > >

    >
    > > To convert a hex string to an int, you need the extra parameter of int:

    >
    > >

    >
    > >

    >
    > >

    >
    > > int(mystring, 16)

    >
    > >

    >
    > >

    >
    > >

    >
    > > Now, see if you can piece it together.

    >
    > >

    >
    >
    >
    >
    >
    > htmlpage = a string respresenting the absolute path of the requested .html file
    >
    >
    >
    >
    >
    > What i want to do, is to associate a number to an html page's absolute path for to be able to use that number for my database relations instead of the BIG absolute path string.
    >
    >
    >
    > so to get an integer out of a string i would just have to type:
    >
    >
    >
    > pin = int( htmlpage )
    >
    >
    >
    > But would that be unique?


    Another error even without hasing anyhting http://superhost.gr to view it please
     
    Ferrous Cranus, Jan 22, 2013
    #10
  11. Τη ΤÏίτη, 22 ΙανουαÏίου 2013 2:47:16 μ.μ. UTC+2, οχÏήστης Ferrous Cranus έγÏαψε:
    > Τη ΤÏίτη, 22 ΙανουαÏίου 2013 2:29:21 μ.μ. UTC+2, ο χÏήστης Dave Angel έγÏαψε:
    >
    > > On 01/22/2013 07:02 AM, Ferrous Cranus wrote:

    >
    > >

    >
    > > > # ====================================================================================================================================

    >
    > >

    >
    > > > # produce a hash string based on html page's filepath and convert it to an integer, that will then be used to identify the page itself

    >
    > >

    >
    > > > # ====================================================================================================================================

    >
    > >

    >
    > > >

    >
    > >

    >
    > > > pin = int( hashlib.md5( htmlpage ) )

    >
    > >

    >
    > > >

    >
    > >

    >
    > > > This fails. why?

    >
    > >

    >
    > > >

    >
    > >

    >
    > > > htmlpage = a string respresenting the absolute path of the requested .html file

    >
    > >

    >
    > > > hashlib.md5( htmlpage ) = conversion of the above string to a hashed string

    >
    > >

    >
    > > > int( hashlib.md5( htmlpage ) ) = conversion of the above hashed string to a number

    >
    > >

    >
    > > >

    >
    > >

    >
    > > > Why this fails?

    >
    > >

    >
    > > >

    >
    > >

    >
    > >

    >
    > >

    >
    > > Is your copy/paste broken? It could be useful to actually show in what

    >
    > >

    >
    > > way it "fails."

    >
    > >

    >
    > >

    >
    > >

    >
    > > The md5 method produces a "HASH object", not a string. So int() cannot

    >
    > >

    >
    > > process that.

    >
    > >

    >
    > >

    >
    > >

    >
    > > To produce a digest string from the hash object, you want to call

    >
    > >

    >
    > > hexdigest() method. The result of that is a hex literal string. So you

    >
    > >

    >
    > > cannot just call int() on it, since that defaults to decimal.

    >
    > >

    >
    > >

    >
    > >

    >
    > > To convert a hex string to an int, you need the extra parameter of int:

    >
    > >

    >
    > >

    >
    > >

    >
    > > int(mystring, 16)

    >
    > >

    >
    > >

    >
    > >

    >
    > > Now, see if you can piece it together.

    >
    > >

    >
    >
    >
    >
    >
    > htmlpage = a string respresenting the absolute path of the requested .html file
    >
    >
    >
    >
    >
    > What i want to do, is to associate a number to an html page's absolute path for to be able to use that number for my database relations instead of the BIG absolute path string.
    >
    >
    >
    > so to get an integer out of a string i would just have to type:
    >
    >
    >
    > pin = int( htmlpage )
    >
    >
    >
    > But would that be unique?


    Another error even without hasing anyhting http://superhost.gr to view it please
     
    Ferrous Cranus, Jan 22, 2013
    #11
  12. On Tue, Jan 22, 2013 at 11:47 PM, Ferrous Cranus <> wrote:
    > What i want to do, is to associate a number to an html page's absolute path for to be able to use that number for my database relations instead of the BIG absolute path string.
    >
    > so to get an integer out of a string i would just have to type:
    >
    > pin = int( htmlpage )
    >
    > But would that be unique?


    The absolute path probably isn't that big. Just use it. Any form of
    hashing will give you a chance of a collision.

    ChrisA
     
    Chris Angelico, Jan 22, 2013
    #12
  13. On Tue, 22 Jan 2013 04:47:16 -0800, Ferrous Cranus wrote:

    > htmlpage = a string respresenting the absolute path of the requested
    > .html file



    That is a very misleading name for a variable. The contents of the
    variable are not a html page, but a file name.

    htmlpage = "/home/steve/my-web-page.html" # Bad variable name.

    filename = "/home/steve/my-web-page.html" # Better variable name.



    > What i want to do, is to associate a number to an html page's absolute
    > path for to be able to use that number for my database relations instead
    > of the BIG absolute path string.


    Firstly, don't bother. What you consider "BIG", your database will
    consider trivially small. What is it, 100 characters long? 200? Unlikely
    to be 300, since I think many file systems don't support paths that long.
    But let's say it is 300 characters long.

    That's likely to be 600 bytes, or a bit more than half a kilobyte. Your
    database won't even notice that.


    > so to get an integer out of a string i would just have to type:
    >
    > pin = int( htmlpage )


    No, that doesn't work. int() does not convert arbitrary strings into
    numbers. What made you think that this could possibly work?

    What do you expect int("my-web-page.html") to return? Should it return 23
    or 794 or 109432985462940911485 or 42?

    > But would that be unique?


    Wrong question.


    Just tell your database to make the file name an indexed field, and it
    will handle giving every path a unique number for you. You can then
    forget all about that unique number, because it is completely irrelevant
    to you, and safely use the path while the database treats it in the
    fastest and most efficient fashion necessary.


    --
    Steven
     
    Steven D'Aprano, Jan 22, 2013
    #13
  14. Τη ΤÏίτη, 22 ΙανουαÏίου 2013 3:04:41 μ.μ. UTC+2, οχÏήστης Steven D'Aprano έγÏαψε:

    > What do you expect int("my-web-page.html") to return? Should it return 23
    > or 794 or 109432985462940911485 or 42?


    I expected a unique number from the given string to be produced so i could have a (number <=> string) relation. What does int( somestring ) is returning really? i don;t have IDLE to test.


    > Just tell your database to make the file name an indexed field, and it
    >
    > will handle giving every path a unique number for you. You can then
    >
    > forget all about that unique number, because it is completely irrelevant
    >
    > to you, and safely use the path while the database treats it in the
    >
    > fastest and most efficient fashion necessary.


    This counter.py will work on a shared hosting enviroment, so absolutes paths are BIG and expected like this:

    /home/nikos/public_html/varsa.gr/articles/html/files/index.html

    In addition to that my counter.py script maintains details in a database table that stores information for each and every webpage requested.

    My 'visitors' database has 2 tables:

    pin --- page ---- hits (that's to store general information for all html pages)

    pin <-refers to-> page

    pin ---- host ---- hits ---- useros ---- browser ---- date (that's to store detailed information for all html pages)

    (thousands of records to hold every page's information)


    'pin' has to be a number because if i used the column 'page' instead, just imagine the database's capacity withholding detailed information for each and every .html requested by visitors!!!

    So i really - really need to associate a (4-digit integer <=> htmlpage's absolute path)

    Maybe it can be done by creating a MySQL association between the two columns, but i dont know how such a thing can be done(if it can).

    So, that why i need to get a "unique" number out of a string. please help.
     
    Ferrous Cranus, Jan 22, 2013
    #14
  15. On Wed, Jan 23, 2013 at 12:57 AM, Ferrous Cranus <> wrote:
    > Ôç Ôñßôç, 22 Éáíïõáñßïõ 2013 3:04:41 ì.ì. UTC+2, ï ÷ñÞóôçò Steven D'Aprano Ýãñáøå:
    >
    >> What do you expect int("my-web-page.html") to return? Should it return 23
    >> or 794 or 109432985462940911485 or 42?

    >
    > I expected a unique number from the given string to be produced so i could have a (number <=> string) relation. What does int( somestring ) is returning really? i don;t have IDLE to test.


    Just run python without any args, and you'll get interactive mode. You
    can try things out there.

    > This counter.py will work on a shared hosting enviroment, so absolutes paths are BIG and expected like this:
    >
    > /home/nikos/public_html/varsa.gr/articles/html/files/index.html


    That's not big. Trust me, modern databases work just fine with unique
    indexes like that. The most common way to organize the index is with a
    binary tree, so the database has to look through log(N) entries.
    That's like figuring out if the two numbers 142857 and 857142 are the
    same; you don't need to look through 1,000,000 possibilities, you just
    need to look through the six digits each number has.

    > 'pin' has to be a number because if i used the column 'page' instead, just imagine the database's capacity withholding detailed information for eachand every .html requested by visitors!!!


    Not that bad actually. I've happily used keys easily that long, and
    expected the database to ensure uniqueness without costing
    performance.

    > So i really - really need to associate a (4-digit integer <=> htmlpage's absolute path)


    Is there any chance that you'll have more than 10,000 pages? If so, a
    four-digit number is *guaranteed* to have duplicates. And if you
    research the Birthday Paradox, you'll find that any sort of hashing
    algorithm is likely to produce collisions a lot sooner than that.

    > Maybe it can be done by creating a MySQL association between the two columns, but i dont know how such a thing can be done(if it can).
    >
    > So, that why i need to get a "unique" number out of a string. please help..


    Ultimately, that unique number would end up being a foreign key into a
    table of URLs and IDs. So just skip that table and use the URLs
    directly - much easier. In this instance, there's no value in
    normalizing.

    ChrisA
     
    Chris Angelico, Jan 22, 2013
    #15
  16. Τη ΤÏίτη, 22 ΙανουαÏίου 2013 4:33:03 μ.μ. UTC+2, οχÏήστης Chris Angelico έγÏαψε:
    > On Wed, Jan 23, 2013 at 12:57 AM, Ferrous Cranus <> wrote:
    >
    > > Τη ΤÏίτη, 22 ΙανουαÏίου 2013 3:04:41 μ.μ. UTC+2, ο χÏήστης Steven D'Aprano έγÏαψε:

    >
    > >

    >
    > >> What do you expect int("my-web-page.html") to return? Should it return23

    >
    > >> or 794 or 109432985462940911485 or 42?

    >
    > >

    >
    > > I expected a unique number from the given string to be produced so i could have a (number <=> string) relation. What does int( somestring ) is returning really? i don;t have IDLE to test.

    >
    >
    >
    > Just run python without any args, and you'll get interactive mode. You
    >
    > can try things out there.
    >
    >
    >
    > > This counter.py will work on a shared hosting enviroment, so absolutes paths are BIG and expected like this:

    >
    > >

    >
    > > /home/nikos/public_html/varsa.gr/articles/html/files/index.html

    >
    >
    >
    > That's not big. Trust me, modern databases work just fine with unique
    >
    > indexes like that. The most common way to organize the index is with a
    >
    > binary tree, so the database has to look through log(N) entries.
    >
    > That's like figuring out if the two numbers 142857 and 857142 are the
    >
    > same; you don't need to look through 1,000,000 possibilities, you just
    >
    > need to look through the six digits each number has.
    >
    >
    >
    > > 'pin' has to be a number because if i used the column 'page' instead, just imagine the database's capacity withholding detailed information for each and every .html requested by visitors!!!

    >
    >
    >
    > Not that bad actually. I've happily used keys easily that long, and
    >
    > expected the database to ensure uniqueness without costing
    >
    > performance.
    >
    >
    >
    > > So i really - really need to associate a (4-digit integer <=> htmlpage's absolute path)

    >
    >
    >
    > Is there any chance that you'll have more than 10,000 pages? If so, a
    >
    > four-digit number is *guaranteed* to have duplicates. And if you
    >
    > research the Birthday Paradox, you'll find that any sort of hashing
    >
    > algorithm is likely to produce collisions a lot sooner than that.
    >
    >
    >
    > > Maybe it can be done by creating a MySQL association between the two columns, but i dont know how such a thing can be done(if it can).

    >
    > >

    >
    > > So, that why i need to get a "unique" number out of a string. please help.

    >
    >
    >
    > Ultimately, that unique number would end up being a foreign key into a
    >
    > table of URLs and IDs. So just skip that table and use the URLs
    >
    > directly - much easier. In this instance, there's no value in
    >
    > normalizing.
    >
    >
    >
    > ChrisA


    I insist, perhaps compeleld, to use a key to associate a number to a filename.
    Would you help please?

    I dont know this is supposed to be written. i just know i need this:

    number = function_that_returns_a_number_out_of_a_string( absolute_path_of_a_html_file)

    Would someone help me write that in python coding? We are talkign 1 line ofcode here....
     
    Ferrous Cranus, Jan 22, 2013
    #16
  17. Τη ΤÏίτη, 22 ΙανουαÏίου 2013 4:33:03 μ.μ. UTC+2, οχÏήστης Chris Angelico έγÏαψε:
    > On Wed, Jan 23, 2013 at 12:57 AM, Ferrous Cranus <> wrote:
    >
    > > Τη ΤÏίτη, 22 ΙανουαÏίου 2013 3:04:41 μ.μ. UTC+2, ο χÏήστης Steven D'Aprano έγÏαψε:

    >
    > >

    >
    > >> What do you expect int("my-web-page.html") to return? Should it return23

    >
    > >> or 794 or 109432985462940911485 or 42?

    >
    > >

    >
    > > I expected a unique number from the given string to be produced so i could have a (number <=> string) relation. What does int( somestring ) is returning really? i don;t have IDLE to test.

    >
    >
    >
    > Just run python without any args, and you'll get interactive mode. You
    >
    > can try things out there.
    >
    >
    >
    > > This counter.py will work on a shared hosting enviroment, so absolutes paths are BIG and expected like this:

    >
    > >

    >
    > > /home/nikos/public_html/varsa.gr/articles/html/files/index.html

    >
    >
    >
    > That's not big. Trust me, modern databases work just fine with unique
    >
    > indexes like that. The most common way to organize the index is with a
    >
    > binary tree, so the database has to look through log(N) entries.
    >
    > That's like figuring out if the two numbers 142857 and 857142 are the
    >
    > same; you don't need to look through 1,000,000 possibilities, you just
    >
    > need to look through the six digits each number has.
    >
    >
    >
    > > 'pin' has to be a number because if i used the column 'page' instead, just imagine the database's capacity withholding detailed information for each and every .html requested by visitors!!!

    >
    >
    >
    > Not that bad actually. I've happily used keys easily that long, and
    >
    > expected the database to ensure uniqueness without costing
    >
    > performance.
    >
    >
    >
    > > So i really - really need to associate a (4-digit integer <=> htmlpage's absolute path)

    >
    >
    >
    > Is there any chance that you'll have more than 10,000 pages? If so, a
    >
    > four-digit number is *guaranteed* to have duplicates. And if you
    >
    > research the Birthday Paradox, you'll find that any sort of hashing
    >
    > algorithm is likely to produce collisions a lot sooner than that.
    >
    >
    >
    > > Maybe it can be done by creating a MySQL association between the two columns, but i dont know how such a thing can be done(if it can).

    >
    > >

    >
    > > So, that why i need to get a "unique" number out of a string. please help.

    >
    >
    >
    > Ultimately, that unique number would end up being a foreign key into a
    >
    > table of URLs and IDs. So just skip that table and use the URLs
    >
    > directly - much easier. In this instance, there's no value in
    >
    > normalizing.
    >
    >
    >
    > ChrisA


    I insist, perhaps compeleld, to use a key to associate a number to a filename.
    Would you help please?

    I dont know this is supposed to be written. i just know i need this:

    number = function_that_returns_a_number_out_of_a_string( absolute_path_of_a_html_file)

    Would someone help me write that in python coding? We are talkign 1 line ofcode here....
     
    Ferrous Cranus, Jan 22, 2013
    #17
  18. Ferrous Cranus

    Dave Angel Guest

    On 01/22/2013 09:55 AM, Ferrous Cranus wrote:
    > Τη ΤÏίτη, 22 ΙανουαÏίου 2013 4:33:03 μ.μ. UTC+2, ο χÏήστης Chris Angelico έγÏαψε:
    >> On Wed, Jan 23, 2013 at 12:57 AM, Ferrous Cranus <> wrote:
    >>
    >>> Τη ΤÏίτη, 22 ΙανουαÏίου 2013 3:04:41 μ.μ. UTC+2, ο χÏήστης Steven D'Aprano έγÏαψε:

    >>
    >>>

    >>
    >>>> What do you expect int("my-web-page.html") to return? Should it return 23

    >>
    >>>> or 794 or 109432985462940911485 or 42?

    >>
    >>>

    >>
    >>> I expected a unique number from the given string to be produced so i could have a (number <=> string) relation. What does int( somestring ) is returning really? i don;t have IDLE to test.

    >>
    >>
    >>
    >> Just run python without any args, and you'll get interactive mode. You
    >>
    >> can try things out there.
    >>
    >>
    >>
    >>> This counter.py will work on a shared hosting enviroment, so absolutes paths are BIG and expected like this:

    >>
    >>>

    >>
    >>> /home/nikos/public_html/varsa.gr/articles/html/files/index.html

    >>
    >>
    >>
    >> That's not big. Trust me, modern databases work just fine with unique
    >>
    >> indexes like that. The most common way to organize the index is with a
    >>
    >> binary tree, so the database has to look through log(N) entries.
    >>
    >> That's like figuring out if the two numbers 142857 and 857142 are the
    >>
    >> same; you don't need to look through 1,000,000 possibilities, you just
    >>
    >> need to look through the six digits each number has.
    >>
    >>
    >>
    >>> 'pin' has to be a number because if i used the column 'page' instead, just imagine the database's capacity withholding detailed information for each and every .html requested by visitors!!!

    >>
    >>
    >>
    >> Not that bad actually. I've happily used keys easily that long, and
    >>
    >> expected the database to ensure uniqueness without costing
    >>
    >> performance.
    >>
    >>
    >>
    >>> So i really - really need to associate a (4-digit integer <=> htmlpage's absolute path)

    >>
    >>
    >>
    >> Is there any chance that you'll have more than 10,000 pages? If so, a
    >>
    >> four-digit number is *guaranteed* to have duplicates. And if you
    >>
    >> research the Birthday Paradox, you'll find that any sort of hashing
    >>
    >> algorithm is likely to produce collisions a lot sooner than that.
    >>
    >>
    >>
    >>> Maybe it can be done by creating a MySQL association between the two columns, but i dont know how such a thing can be done(if it can).

    >>
    >>>

    >>
    >>> So, that why i need to get a "unique" number out of a string. please help.

    >>
    >>
    >>
    >> Ultimately, that unique number would end up being a foreign key into a
    >>
    >> table of URLs and IDs. So just skip that table and use the URLs
    >>
    >> directly - much easier. In this instance, there's no value in
    >>
    >> normalizing.
    >>
    >>
    >>
    >> ChrisA

    >
    > I insist, perhaps compeleld, to use a key to associate a number to a filename.
    > Would you help please?
    >
    > I dont know this is supposed to be written. i just know i need this:
    >
    > number = function_that_returns_a_number_out_of_a_string( absolute_path_of_a_html_file)
    >
    > Would someone help me write that in python coding? We are talkign 1 line of code here....
    >


    I gave you every piece of that code in my last response. So you're not
    willing to compose the line from the clues?


    --
    DaveA
     
    Dave Angel, Jan 22, 2013
    #18
  19. On Wed, Jan 23, 2013 at 1:55 AM, Ferrous Cranus <> wrote:
    > I insist, perhaps compeleld, to use a key to associate a number to a filename.
    > Would you help please?
    >
    > I dont know this is supposed to be written. i just know i need this:
    >
    > number = function_that_returns_a_number_out_of_a_string( absolute_path_of_a_html_file)
    >
    > Would someone help me write that in python coding? We are talkign 1 line of code here....


    def function_that_returns_a_number_out_of_a_string(string, cache=[]):
    return cache.index(string) if string in cache else
    (cache.append(string) or len(cache)-1)

    That will work perfectly, as long as you don't care how long the
    numbers end up, and as long as you have a single Python script doing
    the work, and as long as you make sure you save and load that cache
    any time you shut down the script, and so on.

    It will also, and rightly, be decried as a bad idea. But hey, you did
    specify that it be one line of code. For your real job, USE A DATABASE
    COLUMN.

    ChrisA
     
    Chris Angelico, Jan 22, 2013
    #19
  20. Τη ΤÏίτη, 22 ΙανουαÏίου 2013 5:05:49 μ.μ. UTC+2, οχÏήστης Dave Angel έγÏαψε:
    > On 01/22/2013 09:55 AM, Ferrous Cranus wrote:
    >
    > > Τη ΤÏίτη, 22 ΙανουαÏίου 2013 4:33:03 μ.μ. UTC+2, ο χÏήστης Chris Angelico έγÏαψε:

    >
    > >> On Wed, Jan 23, 2013 at 12:57 AM, Ferrous Cranus <> wrote:

    >
    > >>

    >
    > >>> Τη ΤÏίτη, 22 ΙανουαÏίου 2013 3:04:41 μ.μ. UTC+2, ο χÏήστης Steven D'Aprano έγÏαψε:

    >
    > >>

    >
    > >>>

    >
    > >>

    >
    > >>>> What do you expect int("my-web-page.html") to return? Should it return 23

    >
    > >>

    >
    > >>>> or 794 or 109432985462940911485 or 42?

    >
    > >>

    >
    > >>>

    >
    > >>

    >
    > >>> I expected a unique number from the given string to be produced so i could have a (number <=> string) relation. What does int( somestring ) isreturning really? i don;t have IDLE to test.

    >
    > >>

    >
    > >>

    >
    > >>

    >
    > >> Just run python without any args, and you'll get interactive mode. You

    >
    > >>

    >
    > >> can try things out there.

    >
    > >>

    >
    > >>

    >
    > >>

    >
    > >>> This counter.py will work on a shared hosting enviroment, so absolutes paths are BIG and expected like this:

    >
    > >>

    >
    > >>>

    >
    > >>

    >
    > >>> /home/nikos/public_html/varsa.gr/articles/html/files/index.html

    >
    > >>

    >
    > >>

    >
    > >>

    >
    > >> That's not big. Trust me, modern databases work just fine with unique

    >
    > >>

    >
    > >> indexes like that. The most common way to organize the index is with a

    >
    > >>

    >
    > >> binary tree, so the database has to look through log(N) entries.

    >
    > >>

    >
    > >> That's like figuring out if the two numbers 142857 and 857142 are the

    >
    > >>

    >
    > >> same; you don't need to look through 1,000,000 possibilities, you just

    >
    > >>

    >
    > >> need to look through the six digits each number has.

    >
    > >>

    >
    > >>

    >
    > >>

    >
    > >>> 'pin' has to be a number because if i used the column 'page' instead,just imagine the database's capacity withholding detailed information for each and every .html requested by visitors!!!

    >
    > >>

    >
    > >>

    >
    > >>

    >
    > >> Not that bad actually. I've happily used keys easily that long, and

    >
    > >>

    >
    > >> expected the database to ensure uniqueness without costing

    >
    > >>

    >
    > >> performance.

    >
    > >>

    >
    > >>

    >
    > >>

    >
    > >>> So i really - really need to associate a (4-digit integer <=> htmlpage's absolute path)

    >
    > >>

    >
    > >>

    >
    > >>

    >
    > >> Is there any chance that you'll have more than 10,000 pages? If so, a

    >
    > >>

    >
    > >> four-digit number is *guaranteed* to have duplicates. And if you

    >
    > >>

    >
    > >> research the Birthday Paradox, you'll find that any sort of hashing

    >
    > >>

    >
    > >> algorithm is likely to produce collisions a lot sooner than that.

    >
    > >>

    >
    > >>

    >
    > >>

    >
    > >>> Maybe it can be done by creating a MySQL association between the two columns, but i dont know how such a thing can be done(if it can).

    >
    > >>

    >
    > >>>

    >
    > >>

    >
    > >>> So, that why i need to get a "unique" number out of a string. please help.

    >
    > >>

    >
    > >>

    >
    > >>

    >
    > >> Ultimately, that unique number would end up being a foreign key into a

    >
    > >>

    >
    > >> table of URLs and IDs. So just skip that table and use the URLs

    >
    > >>

    >
    > >> directly - much easier. In this instance, there's no value in

    >
    > >>

    >
    > >> normalizing.

    >
    > >>

    >
    > >>

    >
    > >>

    >
    > >> ChrisA

    >
    > >

    >
    > > I insist, perhaps compeleld, to use a key to associate a number to a filename.

    >
    > > Would you help please?

    >
    > >

    >
    > > I dont know this is supposed to be written. i just know i need this:

    >
    > >

    >
    > > number = function_that_returns_a_number_out_of_a_string( absolute_path_of_a_html_file)

    >
    > >

    >
    > > Would someone help me write that in python coding? We are talkign 1 line of code here....

    >
    > >

    >
    >
    >
    > I gave you every piece of that code in my last response. So you're not
    >
    > willing to compose the line from the clues?


    I cannot.
    I don't even know yet if hashing needs to be used for what i need.

    The only thing i know is that:

    a) i only need to get a number out of string(being an absolute path)
    b) That number needs to be unique, because "that" number is an indicator tothe actual html file.

    Would you help me write this in python?

    Why the hell

    pin = int ( '/home/nikos/public_html/index.html' )

    fails? because it has slashes in it?
     
    Ferrous Cranus, Jan 22, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Laura
    Replies:
    1
    Views:
    565
    Gunnar Hjalmarsson
    Jun 5, 2004
  2. Lauchlan M

    current filepath?

    Lauchlan M, Aug 18, 2003, in forum: ASP .Net
    Replies:
    1
    Views:
    346
    Michal A. Valasek
    Aug 18, 2003
  3. darrel
    Replies:
    0
    Views:
    433
    darrel
    Jun 24, 2005
  4. Mars
    Replies:
    2
    Views:
    432
    Bengt Richter
    Jul 19, 2003
  5. Dan Quach
    Replies:
    3
    Views:
    144
    Robert Klemme
    Jan 6, 2011
Loading...

Share This Page