Uniquely identifying each & every html template

Discussion in 'Python' started by Ferrous Cranus, Jan 18, 2013.

  1. I use this .htaccess file to rewrite every .html request to counter.py

    # =================================================================================================================
    RewriteEngine On
    RewriteCond %{REQUEST_FILENAME} -f
    RewriteRule ^/?(.+\.html) /cgi-bin/counter.py?htmlpage=$1 [L,PT,QSA]
    # =================================================================================================================



    counter.py script is created for creating, storing, increasing, displaying a counter for each webpage for every website i have.
    It's supposed to identify each webpage by a <!-- Number --> and then do it's database stuff from there

    # =================================================================================================================
    # open current html template and get the page ID number
    # =================================================================================================================
    f = open( '/home/nikos/public_html/' + page )

    # read first line of the file
    firstline = f.readline()

    # find the ID of the file and store it
    pin = re.match( r'<!-- (\d+) -->', firstline ).group(1)
    # =================================================================================================================

    It works as expected and you can see it works normally by viewing: http//superhost.gr (bottom down its the counter)

    What is the problem you ask?!
    Problem is that i have to insert at the very first line of every .html template of mine, a unique string containing a number like:

    index.html <!-- 1 -->
    somefile.html <!-- 2-->
    other.html <!-- 3 -->
    nikos.html <!-- 4 -->
    cool.html <!-- 5 -->

    to HELP counter.py identify each webpage at a unique way.

    Well.... its about 1000 .html files inside my DocumentRoot and i cannot edit ALL of them of course!
    Some of them created by Notepad++, some with the use of Dreamweaver and some others with Joomla CMS
    Even if i could embed a number to every html page, it would have been a very tedious task, and what if a change was in order? Edit them ALL back again? Of course not.

    My question is HOW am i suppose to identify each and every html webpage i have, without the need of editing and embedding a string containing a number for them. In other words by not altering their contents.

    or perhaps by modifying them a bit..... but in an automatic way....?

    Thank you ALL in advance.
     
    Ferrous Cranus, Jan 18, 2013
    #1
    1. Advertising

  2. Ferrous Cranus

    John Gordon Guest

    In <> Ferrous Cranus <> writes:

    > Problem is that i have to insert at the very first line of every .html template of mine, a unique string containing a number like:


    > index.html <!-- 1 -->
    > somefile.html <!-- 2-->
    > other.html <!-- 3 -->
    > nikos.html <!-- 4 -->
    > cool.html <!-- 5 -->


    > to HELP counter.py identify each webpage at a unique way.


    Instead of inserting unique content in every page, can you use the
    document path itself as the identifier?

    --
    John Gordon A is for Amy, who fell down the stairs
    B is for Basil, assaulted by bears
    -- Edward Gorey, "The Gashlycrumb Tinies"
     
    John Gordon, Jan 18, 2013
    #2
    1. Advertising

  3. Ferrous Cranus

    Dave Angel Guest

    On 01/18/2013 03:48 PM, Ferrous Cranus wrote:
    > I use this .htaccess file to rewrite every .html request to counter.py
    >
    > # =================================================================================================================
    > RewriteEngine On
    > RewriteCond %{REQUEST_FILENAME} -f
    > RewriteRule ^/?(.+\.html) /cgi-bin/counter.py?htmlpage=$1 [L,PT,QSA]
    > # =================================================================================================================
    >
    >
    >
    > counter.py script is created for creating, storing, increasing, displaying a counter for each webpage for every website i have.
    > It's supposed to identify each webpage by a <!-- Number --> and then do it's database stuff from there
    >
    > # =================================================================================================================
    > # open current html template and get the page ID number
    > # =================================================================================================================
    > f = open( '/home/nikos/public_html/' + page )
    >
    > # read first line of the file
    > firstline = f.readline()
    >
    > # find the ID of the file and store it
    > pin = re.match( r'<!-- (\d+) -->', firstline ).group(1)
    > # =================================================================================================================
    >
    > It works as expected and you can see it works normally by viewing: http//superhost.gr (bottom down its the counter)
    >
    > What is the problem you ask?!
    > Problem is that i have to insert at the very first line of every .html template of mine, a unique string containing a number like:
    >
    > index.html <!-- 1 -->
    > somefile.html <!-- 2-->
    > other.html <!-- 3 -->
    > nikos.html <!-- 4 -->
    > cool.html <!-- 5 -->
    >
    > to HELP counter.py identify each webpage at a unique way.
    >
    > Well.... its about 1000 .html files inside my DocumentRoot and i cannot edit ALL of them of course!
    > Some of them created by Notepad++, some with the use of Dreamweaver and some others with Joomla CMS
    > Even if i could embed a number to every html page, it would have been a very tedious task, and what if a change was in order? Edit them ALL back again? Of course not.
    >
    > My question is HOW am i suppose to identify each and every html webpage i have, without the need of editing and embedding a string containing a number for them. In other words by not altering their contents.
    >
    > or perhaps by modifying them a bit..... but in an automatic way....?
    >
    > Thank you ALL in advance.
    >
    >


    I don't understand the problem. A trivial Python script could scan
    through all the files in the directory, checking which ones are missing
    the identifier, and rewriting the file with the identifier added.

    So, since you didn't come to that conclusion, there must be some other
    reason you don't want to edit the files. Is it that the real sources
    are elsewhere (e.g. Dreamweaver), and whenever one recompiles those
    sources, these files get replaced (without identifiers)?

    If that's the case, then I figure you have about 3 choices:

    1) use the file path as your key, instead of requiring a number
    2) use a hash of the page (eg. md5) as your key. of course this could
    mean that you get a new value whenever the page is updated. That's good
    in many situations, but you don't give enough information to know if
    that's desirable for you or not.
    3) Keep an external list of filenames, and their associated id numbers.
    The database would be a good place to store such a list, in a separate
    table.

    --
    DaveA
     
    Dave Angel, Jan 18, 2013
    #3
  4. Τη ΠαÏασκευή, 18 ΙανουαÏίου 2013 10:59:17 μ.μ. UTC+2, ο χÏήστης John Gordon έγÏαψε:

    > Instead of inserting unique content in every page, can't you use the
    > document path itself as the identifier?


    No, i cannot, becaue it would mess things at later time when i for example:

    1. mv name.html othername.html (document's filename altered)
    2. mv name.html /subfolder/name.html (document's path altered)

    Hence, new database counters will be created for each of the above cases.
     
    Ferrous Cranus, Jan 18, 2013
    #4
  5. Τη Σάββατο, 19 ΙανουαÏίου 2013 12:09:28 Ï€.μ. UTC+2, ο χÏήστης Dave Angel έγÏαψε:

    > I don't understand the problem. A trivial Python script could scan
    >
    > through all the files in the directory, checking which ones are missing
    >
    > the identifier, and rewriting the file with the identifier added.


    >
    > So, since you didn't come to that conclusion, there must be some other
    >
    > reason you don't want to edit the files. Is it that the real sources
    >
    > are elsewhere (e.g. Dreamweaver), and whenever one recompiles those
    >
    > sources, these files get replaced (without identifiers)?


    Exactly. Files get modified/updates thus the embedded identifier will be missing each time. So, relying on embedding code to html template content is not practical.


    > If that's the case, then I figure you have about 3 choices:
    > 1) use the file path as your key, instead of requiring a number


    No, i cannot, because it would mess things at a later time on when i for example:

    1. mv name.html othername.html (document's filename altered)
    2. mv name.html /subfolder/name.html (document's filepath altered)

    Hence, new database counters will be created for each of the above actions,therefore i will be having 2 counters for the same file, and the latter one will start from a zero value.

    Pros: If the file's contents gets updated, that won't affect the counter.
    Cons: If filepath is altered, then duplicity will happen.


    > 2) use a hash of the page (eg. md5) as your key. of course this could
    > mean that you get a new value whenever the page is updated. That's good
    > in many situations, but you don't give enough information to know if
    > that's desirable for you or not.


    That sounds nice! A hash is a mathematical algorithm that produce a unique number after analyzing each file's contents? But then again what if the html templated gets updated? That update action will create a new hash for thefile, hence another counter will be created for the same file, same end result as (1) solution.

    Pros: If filepath is altered, that won't affect the counter.
    Cons: If file's contents gets updated the, then duplicity will happen.


    > 3) Keep an external list of filenames, and their associated id numbers.
    > The database would be a good place to store such a list, in a separate table.


    I did not understand that solution.


    We need to find a way so even IF:

    (filepath gets modified && file content's gets modified) simultaneously thecounter will STILL retains it's value.
     
    Ferrous Cranus, Jan 19, 2013
    #5
  6. Τη Σάββατο, 19 ΙανουαÏίου 2013 12:09:28 Ï€.μ. UTC+2, ο χÏήστης Dave Angel έγÏαψε:

    > I don't understand the problem. A trivial Python script could scan
    >
    > through all the files in the directory, checking which ones are missing
    >
    > the identifier, and rewriting the file with the identifier added.


    >
    > So, since you didn't come to that conclusion, there must be some other
    >
    > reason you don't want to edit the files. Is it that the real sources
    >
    > are elsewhere (e.g. Dreamweaver), and whenever one recompiles those
    >
    > sources, these files get replaced (without identifiers)?


    Exactly. Files get modified/updates thus the embedded identifier will be missing each time. So, relying on embedding code to html template content is not practical.


    > If that's the case, then I figure you have about 3 choices:
    > 1) use the file path as your key, instead of requiring a number


    No, i cannot, because it would mess things at a later time on when i for example:

    1. mv name.html othername.html (document's filename altered)
    2. mv name.html /subfolder/name.html (document's filepath altered)

    Hence, new database counters will be created for each of the above actions,therefore i will be having 2 counters for the same file, and the latter one will start from a zero value.

    Pros: If the file's contents gets updated, that won't affect the counter.
    Cons: If filepath is altered, then duplicity will happen.


    > 2) use a hash of the page (eg. md5) as your key. of course this could
    > mean that you get a new value whenever the page is updated. That's good
    > in many situations, but you don't give enough information to know if
    > that's desirable for you or not.


    That sounds nice! A hash is a mathematical algorithm that produce a unique number after analyzing each file's contents? But then again what if the html templated gets updated? That update action will create a new hash for thefile, hence another counter will be created for the same file, same end result as (1) solution.

    Pros: If filepath is altered, that won't affect the counter.
    Cons: If file's contents gets updated the, then duplicity will happen.


    > 3) Keep an external list of filenames, and their associated id numbers.
    > The database would be a good place to store such a list, in a separate table.


    I did not understand that solution.


    We need to find a way so even IF:

    (filepath gets modified && file content's gets modified) simultaneously thecounter will STILL retains it's value.
     
    Ferrous Cranus, Jan 19, 2013
    #6
  7. Ferrous Cranus

    Dave Angel Guest

    On 01/19/2013 03:39 AM, Ferrous Cranus wrote:
    > Τη Σάββατο, 19 ΙανουαÏίου 2013 12:09:28 Ï€.μ. UTC+2, ο χÏήστης Dave Angel έγÏαψε:
    >
    >> I don't understand the problem. A trivial Python script could scan
    >>
    >> through all the files in the directory, checking which ones are missing
    >>
    >> the identifier, and rewriting the file with the identifier added.

    >
    >>
    >> So, since you didn't come to that conclusion, there must be some other
    >>
    >> reason you don't want to edit the files. Is it that the real sources
    >>
    >> are elsewhere (e.g. Dreamweaver), and whenever one recompiles those
    >>
    >> sources, these files get replaced (without identifiers)?

    >
    > Exactly. Files get modified/updates thus the embedded identifier will be missing each time. So, relying on embedding code to html template content is not practical.
    >
    >
    >> If that's the case, then I figure you have about 3 choices:
    >> 1) use the file path as your key, instead of requiring a number

    >
    > No, i cannot, because it would mess things at a later time on when i for example:
    >
    > 1. mv name.html othername.html (document's filename altered)
    > 2. mv name.html /subfolder/name.html (document's filepath altered)
    >
    > Hence, new database counters will be created for each of the above actions, therefore i will be having 2 counters for the same file, and the latter one will start from a zero value.
    >
    > Pros: If the file's contents gets updated, that won't affect the counter.
    > Cons: If filepath is altered, then duplicity will happen.
    >
    >
    >> 2) use a hash of the page (eg. md5) as your key. of course this could
    >> mean that you get a new value whenever the page is updated. That's good
    >> in many situations, but you don't give enough information to know if
    >> that's desirable for you or not.

    >
    > That sounds nice! A hash is a mathematical algorithm that produce a unique number after analyzing each file's contents? But then again what if the html templated gets updated? That update action will create a new hash for the file, hence another counter will be created for the same file, same end result as (1) solution.
    >
    > Pros: If filepath is altered, that won't affect the counter.
    > Cons: If file's contents gets updated the, then duplicity will happen.
    >
    >
    >> 3) Keep an external list of filenames, and their associated id numbers.
    >> The database would be a good place to store such a list, in a separate table.

    >
    > I did not understand that solution.
    >
    >
    > We need to find a way so even IF:
    >
    > (filepath gets modified && file content's gets modified) simultaneously the counter will STILL retains it's value.
    >


    You don't yet have a programming problem, you have a specification
    problem. Somehow, you want a file to be considered "the same" even when
    it's moved, renamed and/or modified. So all files are the same, and you
    only need one id.

    Don't pick a mechanism until you have an self-consistent spec.

    --
    DaveA
     
    Dave Angel, Jan 19, 2013
    #7
  8. On Sat, 19 Jan 2013 00:39:44 -0800 (PST), Ferrous Cranus
    <> declaimed the following in
    gmane.comp.python.general:
    > We need to find a way so even IF:
    >
    > (filepath gets modified && file content's gets modified) simultaneously the counter will STILL retains it's value.


    The only viable solution the /I/ can visualize is one in which any
    operation ON the template file MUST ALSO operate on the counter... That
    is; you only operate on the templates using a front-end application that
    automatically links the counter information every time...
    --
    Wulfraed Dennis Lee Bieber AF6VN
    HTTP://wlfraed.home.netcom.com/
     
    Dennis Lee Bieber, Jan 19, 2013
    #8
  9. Τη Σάββατο, 19 ΙανουαÏίου 2013 11:32:41 μ.μ. UTC+2, ο χÏήστης Dennis Lee Bieber έγÏαψε:
    > On Sat, 19 Jan 2013 00:39:44 -0800 (PST), Ferrous Cranus
    >
    > <> declaimed the following in
    >
    > gmane.comp.python.general:
    >
    > > We need to find a way so even IF:

    >
    > >

    >
    > > (filepath gets modified && file content's gets modified) simultaneouslythe counter will STILL retains it's value.

    >
    >
    >
    > The only viable solution the /I/ can visualize is one in which any
    >
    > operation ON the template file MUST ALSO operate on the counter... That
    >
    > is; you only operate on the templates using a front-end application that
    >
    > automatically links the counter information every time...
    >
    > --
    >
    > Wulfraed Dennis Lee Bieber AF6VN
    >
    > HTTP://wlfraed.home.netcom.com/


    CANNOT BE DONE because every html templates has been written by different methods.

    dramweaver
    joomla
    notepad++
     
    Ferrous Cranus, Jan 21, 2013
    #9
  10. Τη Σάββατο, 19 ΙανουαÏίου 2013 11:32:41 μ.μ. UTC+2, ο χÏήστης Dennis Lee Bieber έγÏαψε:
    > On Sat, 19 Jan 2013 00:39:44 -0800 (PST), Ferrous Cranus
    >
    > <> declaimed the following in
    >
    > gmane.comp.python.general:
    >
    > > We need to find a way so even IF:

    >
    > >

    >
    > > (filepath gets modified && file content's gets modified) simultaneouslythe counter will STILL retains it's value.

    >
    >
    >
    > The only viable solution the /I/ can visualize is one in which any
    >
    > operation ON the template file MUST ALSO operate on the counter... That
    >
    > is; you only operate on the templates using a front-end application that
    >
    > automatically links the counter information every time...
    >
    > --
    >
    > Wulfraed Dennis Lee Bieber AF6VN
    >
    > HTTP://wlfraed.home.netcom.com/


    CANNOT BE DONE because every html templates has been written by different methods.

    dramweaver
    joomla
    notepad++
     
    Ferrous Cranus, Jan 21, 2013
    #10
  11. Τη Σάββατο, 19 ΙανουαÏίου 2013 11:00:15 Ï€.μ. UTC+2, ο χÏήστης Dave Angel έγÏαψε:
    > On 01/19/2013 03:39 AM, Ferrous Cranus wrote:
    >
    > > Τη Σάββατο, 19 ΙανουαÏίου 2013 12:09:28 Ï€.μ. UTC+2, ο χÏήστης Dave AngelέγÏαψε:

    >
    > >

    >
    > >> I don't understand the problem. A trivial Python script could scan

    >
    > >>

    >
    > >> through all the files in the directory, checking which ones are missing

    >
    > >>

    >
    > >> the identifier, and rewriting the file with the identifier added.

    >
    > >

    >
    > >>

    >
    > >> So, since you didn't come to that conclusion, there must be some other

    >
    > >>

    >
    > >> reason you don't want to edit the files. Is it that the real sources

    >
    > >>

    >
    > >> are elsewhere (e.g. Dreamweaver), and whenever one recompiles those

    >
    > >>

    >
    > >> sources, these files get replaced (without identifiers)?

    >
    > >

    >
    > > Exactly. Files get modified/updates thus the embedded identifier will be missing each time. So, relying on embedding code to html template contentis not practical.

    >
    > >

    >
    > >

    >
    > >> If that's the case, then I figure you have about 3 choices:

    >
    > >> 1) use the file path as your key, instead of requiring a number

    >
    > >

    >
    > > No, i cannot, because it would mess things at a later time on when i for example:

    >
    > >

    >
    > > 1. mv name.html othername.html (document's filename altered)

    >
    > > 2. mv name.html /subfolder/name.html (document's filepath altered)

    >
    > >

    >
    > > Hence, new database counters will be created for each of the above actions, therefore i will be having 2 counters for the same file, and the latter one will start from a zero value.

    >
    > >

    >
    > > Pros: If the file's contents gets updated, that won't affect the counter.

    >
    > > Cons: If filepath is altered, then duplicity will happen.

    >
    > >

    >
    > >

    >
    > >> 2) use a hash of the page (eg. md5) as your key. of course this could

    >
    > >> mean that you get a new value whenever the page is updated. That's good

    >
    > >> in many situations, but you don't give enough information to know if

    >
    > >> that's desirable for you or not.

    >
    > >

    >
    > > That sounds nice! A hash is a mathematical algorithm that produce a unique number after analyzing each file's contents? But then again what if thehtml templated gets updated? That update action will create a new hash forthe file, hence another counter will be created for the same file, same end result as (1) solution.

    >
    > >

    >
    > > Pros: If filepath is altered, that won't affect the counter.

    >
    > > Cons: If file's contents gets updated the, then duplicity will happen.

    >
    > >

    >
    > >

    >
    > >> 3) Keep an external list of filenames, and their associated id numbers..

    >
    > >> The database would be a good place to store such a list, in a separatetable.

    >
    > >

    >
    > > I did not understand that solution.

    >
    > >

    >
    > >

    >
    > > We need to find a way so even IF:

    >
    > >

    >
    > > (filepath gets modified && file content's gets modified) simultaneouslythe counter will STILL retains it's value.

    >
    > >

    >
    >
    >
    > You don't yet have a programming problem, you have a specification
    >
    > problem. Somehow, you want a file to be considered "the same" even when
    >
    > it's moved, renamed and/or modified. So all files are the same, and you
    >
    > only need one id.
    >
    > Don't pick a mechanism until you have an self-consistent spec.



    I do have the specification.

    An .html page must retain its database counter value even if its:

    (renamed && moved && contents altered)


    [original attributes of the file]:

    filename: index.html
    filepath: /home/nikos/public_html/
    contents: <html> Hello </html>

    [get modified to]:

    filename: index2.html
    filepath: /home/nikos/public_html/folder/subfolder/
    contents: <html> Hello, people </html>


    The file is still the same, even though its attributes got modified.
    We want counter.py script to still be able to "identify" the .html page, hence its counter value in order to get increased properly.
     
    Ferrous Cranus, Jan 21, 2013
    #11
  12. Τη Σάββατο, 19 ΙανουαÏίου 2013 11:00:15 Ï€.μ. UTC+2, ο χÏήστης Dave Angel έγÏαψε:
    > On 01/19/2013 03:39 AM, Ferrous Cranus wrote:
    >
    > > Τη Σάββατο, 19 ΙανουαÏίου 2013 12:09:28 Ï€.μ. UTC+2, ο χÏήστης Dave AngelέγÏαψε:

    >
    > >

    >
    > >> I don't understand the problem. A trivial Python script could scan

    >
    > >>

    >
    > >> through all the files in the directory, checking which ones are missing

    >
    > >>

    >
    > >> the identifier, and rewriting the file with the identifier added.

    >
    > >

    >
    > >>

    >
    > >> So, since you didn't come to that conclusion, there must be some other

    >
    > >>

    >
    > >> reason you don't want to edit the files. Is it that the real sources

    >
    > >>

    >
    > >> are elsewhere (e.g. Dreamweaver), and whenever one recompiles those

    >
    > >>

    >
    > >> sources, these files get replaced (without identifiers)?

    >
    > >

    >
    > > Exactly. Files get modified/updates thus the embedded identifier will be missing each time. So, relying on embedding code to html template contentis not practical.

    >
    > >

    >
    > >

    >
    > >> If that's the case, then I figure you have about 3 choices:

    >
    > >> 1) use the file path as your key, instead of requiring a number

    >
    > >

    >
    > > No, i cannot, because it would mess things at a later time on when i for example:

    >
    > >

    >
    > > 1. mv name.html othername.html (document's filename altered)

    >
    > > 2. mv name.html /subfolder/name.html (document's filepath altered)

    >
    > >

    >
    > > Hence, new database counters will be created for each of the above actions, therefore i will be having 2 counters for the same file, and the latter one will start from a zero value.

    >
    > >

    >
    > > Pros: If the file's contents gets updated, that won't affect the counter.

    >
    > > Cons: If filepath is altered, then duplicity will happen.

    >
    > >

    >
    > >

    >
    > >> 2) use a hash of the page (eg. md5) as your key. of course this could

    >
    > >> mean that you get a new value whenever the page is updated. That's good

    >
    > >> in many situations, but you don't give enough information to know if

    >
    > >> that's desirable for you or not.

    >
    > >

    >
    > > That sounds nice! A hash is a mathematical algorithm that produce a unique number after analyzing each file's contents? But then again what if thehtml templated gets updated? That update action will create a new hash forthe file, hence another counter will be created for the same file, same end result as (1) solution.

    >
    > >

    >
    > > Pros: If filepath is altered, that won't affect the counter.

    >
    > > Cons: If file's contents gets updated the, then duplicity will happen.

    >
    > >

    >
    > >

    >
    > >> 3) Keep an external list of filenames, and their associated id numbers..

    >
    > >> The database would be a good place to store such a list, in a separatetable.

    >
    > >

    >
    > > I did not understand that solution.

    >
    > >

    >
    > >

    >
    > > We need to find a way so even IF:

    >
    > >

    >
    > > (filepath gets modified && file content's gets modified) simultaneouslythe counter will STILL retains it's value.

    >
    > >

    >
    >
    >
    > You don't yet have a programming problem, you have a specification
    >
    > problem. Somehow, you want a file to be considered "the same" even when
    >
    > it's moved, renamed and/or modified. So all files are the same, and you
    >
    > only need one id.
    >
    > Don't pick a mechanism until you have an self-consistent spec.



    I do have the specification.

    An .html page must retain its database counter value even if its:

    (renamed && moved && contents altered)


    [original attributes of the file]:

    filename: index.html
    filepath: /home/nikos/public_html/
    contents: <html> Hello </html>

    [get modified to]:

    filename: index2.html
    filepath: /home/nikos/public_html/folder/subfolder/
    contents: <html> Hello, people </html>


    The file is still the same, even though its attributes got modified.
    We want counter.py script to still be able to "identify" the .html page, hence its counter value in order to get increased properly.
     
    Ferrous Cranus, Jan 21, 2013
    #12
  13. On Mon, Jan 21, 2013 at 6:08 PM, Ferrous Cranus <> wrote:
    > An .html page must retain its database counter value even if its:
    >
    > (renamed && moved && contents altered)


    Then you either need to tag them in some external way, or have some
    kind of tracking operation - for instance, if you require that all
    renames/moves be done through a script, that script can update its
    pointer. Otherwise, you need magic, and lots of it.

    ChrisA
     
    Chris Angelico, Jan 21, 2013
    #13
  14. Τη ΔευτέÏα, 21 ΙανουαÏίου 2013 9:20:15 Ï€.μ. UTC+2, ο χÏήστης Chris Angelico έγÏαψε:
    > On Mon, Jan 21, 2013 at 6:08 PM, Ferrous Cranus <> wrote:
    >
    > > An .html page must retain its database counter value even if its:

    >
    > >

    >
    > > (renamed && moved && contents altered)

    >
    >
    >
    > Then you either need to tag them in some external way, or have some
    >
    > kind of tracking operation - for instance, if you require that all
    >
    > renames/moves be done through a script, that script can update its
    >
    > pointer. Otherwise, you need magic, and lots of it.
    >
    >
    >
    > ChrisA


    This python script acts upon websites other people use and
    every html templates has been written by different methods(notepad++, dreamweaver, joomla).

    Renames and moves are performed, either by shell access or either by cPanel access by website owners.

    That being said i have no control on HOW and WHEN users alter their html pages.
     
    Ferrous Cranus, Jan 21, 2013
    #14
  15. Τη ΔευτέÏα, 21 ΙανουαÏίου 2013 9:20:15 Ï€.μ. UTC+2, ο χÏήστης Chris Angelico έγÏαψε:
    > On Mon, Jan 21, 2013 at 6:08 PM, Ferrous Cranus <> wrote:
    >
    > > An .html page must retain its database counter value even if its:

    >
    > >

    >
    > > (renamed && moved && contents altered)

    >
    >
    >
    > Then you either need to tag them in some external way, or have some
    >
    > kind of tracking operation - for instance, if you require that all
    >
    > renames/moves be done through a script, that script can update its
    >
    > pointer. Otherwise, you need magic, and lots of it.
    >
    >
    >
    > ChrisA


    This python script acts upon websites other people use and
    every html templates has been written by different methods(notepad++, dreamweaver, joomla).

    Renames and moves are performed, either by shell access or either by cPanel access by website owners.

    That being said i have no control on HOW and WHEN users alter their html pages.
     
    Ferrous Cranus, Jan 21, 2013
    #15
  16. On Mon, Jan 21, 2013 at 8:19 PM, Ferrous Cranus <> wrote:
    > This python script acts upon websites other people use and
    > every html templates has been written by different methods(notepad++, dreamweaver, joomla).
    >
    > Renames and moves are performed, either by shell access or either by cPanel access by website owners.
    >
    > That being said i have no control on HOW and WHEN users alter their html pages.


    Then I recommend investing in some magic. There's an old-established
    business JW Wells & Co, Family Sorcerers. They've a first-rate
    assortment of magic, and for raising a posthumous shade with effects
    that are comic, or tragic, there's no cheaper house in the trade! If
    anyone anything lacks, he'll find it all ready in stacks, if he'll
    only look in on the resident Djinn, number seventy, Simmery Axe!

    Seriously, you're asking for something that's beyond the power of
    humans or computers. You want to identify that something's the same
    file, without tracking the change or having any identifiable tag.
    That's a fundamentally impossible task.

    ChrisA
     
    Chris Angelico, Jan 21, 2013
    #16
  17. Τη ΔευτέÏα, 21 ΙανουαÏίου 2013 11:31:24 Ï€.μ. UTC+2, ο χÏήστης Chris Angelico έγÏαψε:
    > On Mon, Jan 21, 2013 at 8:19 PM, Ferrous Cranus <> wrote:
    >
    > > This python script acts upon websites other people use and

    >
    > > every html templates has been written by different methods(notepad++, dreamweaver, joomla).

    >
    > >

    >
    > > Renames and moves are performed, either by shell access or either by cPanel access by website owners.

    >
    > >

    >
    > > That being said i have no control on HOW and WHEN users alter their html pages.

    >
    >
    >
    > Then I recommend investing in some magic. There's an old-established
    >
    > business JW Wells & Co, Family Sorcerers. They've a first-rate
    >
    > assortment of magic, and for raising a posthumous shade with effects
    >
    > that are comic, or tragic, there's no cheaper house in the trade! If
    >
    > anyone anything lacks, he'll find it all ready in stacks, if he'll
    >
    > only look in on the resident Djinn, number seventy, Simmery Axe!
    >
    >
    >
    > Seriously, you're asking for something that's beyond the power of
    >
    > humans or computers. You want to identify that something's the same
    >
    > file, without tracking the change or having any identifiable tag.
    >
    > That's a fundamentally impossible task.


    No, it is difficult but not impossible.
    It just cannot be done by tagging the file by:

    1. filename
    2. filepath
    3. hash (math algorithm producing a string based on the file's contents)

    We need another way to identify the file WITHOUT using the above attributes..
     
    Ferrous Cranus, Jan 21, 2013
    #17
  18. Τη ΔευτέÏα, 21 ΙανουαÏίου 2013 11:31:24 Ï€.μ. UTC+2, ο χÏήστης Chris Angelico έγÏαψε:
    > On Mon, Jan 21, 2013 at 8:19 PM, Ferrous Cranus <> wrote:
    >
    > > This python script acts upon websites other people use and

    >
    > > every html templates has been written by different methods(notepad++, dreamweaver, joomla).

    >
    > >

    >
    > > Renames and moves are performed, either by shell access or either by cPanel access by website owners.

    >
    > >

    >
    > > That being said i have no control on HOW and WHEN users alter their html pages.

    >
    >
    >
    > Then I recommend investing in some magic. There's an old-established
    >
    > business JW Wells & Co, Family Sorcerers. They've a first-rate
    >
    > assortment of magic, and for raising a posthumous shade with effects
    >
    > that are comic, or tragic, there's no cheaper house in the trade! If
    >
    > anyone anything lacks, he'll find it all ready in stacks, if he'll
    >
    > only look in on the resident Djinn, number seventy, Simmery Axe!
    >
    >
    >
    > Seriously, you're asking for something that's beyond the power of
    >
    > humans or computers. You want to identify that something's the same
    >
    > file, without tracking the change or having any identifiable tag.
    >
    > That's a fundamentally impossible task.


    No, it is difficult but not impossible.
    It just cannot be done by tagging the file by:

    1. filename
    2. filepath
    3. hash (math algorithm producing a string based on the file's contents)

    We need another way to identify the file WITHOUT using the above attributes..
     
    Ferrous Cranus, Jan 21, 2013
    #18
  19. On 21 January 2013 12:06, Ferrous Cranus <> wrote:
    > Ôç ÄåõôÝñá, 21 Éáíïõáñßïõ 2013 11:31:24 ð.ì. UTC+2, ï ÷ñÞóôçò Chris Angelico Ýãñáøå:
    >>
    >> Seriously, you're asking for something that's beyond the power of
    >> humans or computers. You want to identify that something's the same
    >> file, without tracking the change or having any identifiable tag.
    >>
    >> That's a fundamentally impossible task.

    >
    > No, it is difficult but not impossible.
    > It just cannot be done by tagging the file by:
    >
    > 1. filename
    > 2. filepath
    > 3. hash (math algorithm producing a string based on the file's contents)
    >
    > We need another way to identify the file WITHOUT using the above attributes.


    This is a very old problem (still unsolved I believe):
    http://en.wikipedia.org/wiki/Ship_of_Theseus


    Oscar
     
    Oscar Benjamin, Jan 21, 2013
    #19
  20. This is trolling Ferrous. you are a troll. Go away


    On Mon, Jan 21, 2013 at 7:39 AM, Oscar Benjamin
    <>wrote:

    > On 21 January 2013 12:06, Ferrous Cranus <> wrote:
    > > Τη ΔευτέÏα, 21 ΙανουαÏίου 2013 11:31:24 Ï€.μ. UTC+2, ο χÏήστης Chris

    > Angelico έγÏαψε:
    > >>
    > >> Seriously, you're asking for something that's beyond the power of
    > >> humans or computers. You want to identify that something's the same
    > >> file, without tracking the change or having any identifiable tag.
    > >>
    > >> That's a fundamentally impossible task.

    > >
    > > No, it is difficult but not impossible.
    > > It just cannot be done by tagging the file by:
    > >
    > > 1. filename
    > > 2. filepath
    > > 3. hash (math algorithm producing a string based on the file's contents)
    > >
    > > We need another way to identify the file WITHOUT using the above

    > attributes.
    >
    > This is a very old problem (still unsolved I believe):
    > http://en.wikipedia.org/wiki/Ship_of_Theseus
    >
    >
    > Oscar
    > --
    > http://mail.python.org/mailman/listinfo/python-list
    >




    --
    Joel Goldstick
    http://joelgoldstick.com
     
    Joel Goldstick, Jan 21, 2013
    #20
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    2
    Views:
    630
    Curt_C [MVP]
    Jul 27, 2004
  2. Alan Silver

    How do I uniquely identify a control?

    Alan Silver, Feb 24, 2005, in forum: ASP .Net
    Replies:
    6
    Views:
    469
    Alan Silver
    Feb 24, 2005
  3. bazzer
    Replies:
    0
    Views:
    548
    bazzer
    Apr 10, 2006
  4. bazzer
    Replies:
    8
    Views:
    1,613
    kruch
    Mar 23, 2007
  5. Veli-Pekka Tätilä
    Replies:
    6
    Views:
    108
    Anno Siegel
    Aug 23, 2005
Loading...

Share This Page