Uniquely identifying each & every html template


Ad

Advertisements

A

alex23

Renames and  moves are performed, either by shell access or either by cPanel access by website owners.

These websites owners, are you charging them for this "service" you
provide?

You seriously need to read up on some fundamentals of how the web +
apache + Python works. As it stands, you're asking us to do your job
for you, and it's getting TEDIOUS with you TELLING us how WRONG we are.
 
F

Ferrous Cranus

Τη ΔευτέÏα, 21 ΙανουαÏίου 2013 2:47:54 μ.μ. UTC+2, ο χÏήστης Joel Goldstick έγÏαψε:
This is trolling Ferrous.  you are a troll.  Go away

Just because you cannot answer my question that doesn't make me a troll you know.
 
F

Ferrous Cranus

Τη ΔευτέÏα, 21 ΙανουαÏίου 2013 2:47:54 μ.μ. UTC+2, ο χÏήστης Joel Goldstick έγÏαψε:
This is trolling Ferrous.  you are a troll.  Go away

Just because you cannot answer my question that doesn't make me a troll you know.
 
F

Ferrous Cranus

Τη ΔευτέÏα, 21 ΙανουαÏίου 2013 2:56:24 μ.μ. UTC+2, ο χÏήστης alex23 έγÏαψε:
These websites owners, are you charging them for this "service" you

provide?



You seriously need to read up on some fundamentals of how the web +

apache + Python works. As it stands, you're asking us to do your job

for you, and it's getting TEDIOUS with you TELLING us how WRONG we are.

Dude, i host 4 sites of friend fo mine who want the same type of counter like i use iun my website.

ALL, iam asking for is a way to make this work.
 
F

Ferrous Cranus

Τη ΔευτέÏα, 21 ΙανουαÏίου 2013 9:20:15 Ï€.μ. UTC+2, ο χÏήστης Chris Angelico έγÏαψε:
Then you either need to tag them in some external way, or have some

kind of tracking operation - for instance, if you require that all

renames/moves be done through a script, that script can update its

pointer. Otherwise, you need magic, and lots of it.



ChrisA


Perhaps we should look into on how's the OS handles the file to get an ideaon how its done?
 
Ad

Advertisements

F

Ferrous Cranus

Τη ΔευτέÏα, 21 ΙανουαÏίου 2013 9:20:15 Ï€.μ. UTC+2, ο χÏήστης Chris Angelico έγÏαψε:
Then you either need to tag them in some external way, or have some

kind of tracking operation - for instance, if you require that all

renames/moves be done through a script, that script can update its

pointer. Otherwise, you need magic, and lots of it.



ChrisA


Perhaps we should look into on how's the OS handles the file to get an ideaon how its done?
 
P

Piet van Oostrum

Ferrous Cranus said:
This python script acts upon websites other people use and every html
templates has been written by different methods(notepad++,
dreamweaver, joomla).

Renames and moves are performed, either by shell access or either by
cPanel access by website owners.

That being said i have no control on HOW and WHEN users alter their html pages.

Under these circumstances the only way to solve it is to put an
identification *inside* the file and make sure it will not be changed.
It could for example be some invisible piece of HTML, or an attribute to
some tag. If that can't be done the problem cannot be solved and it
makes no sense keeping asking the same question over and over again.
 
D

Dave Angel

No, it is difficult but not impossible.
It just cannot be done by tagging the file by:

1. filename
2. filepath
3. hash (math algorithm producing a string based on the file's contents)

We need another way to identify the file WITHOUT using the above attributes.

Repeating the same impossible scenario won't solve it. You need to find
some other way to recognize the file. If you can't count on either
name, location, or content, you can't do it.

Try solving the problem by hand. If you examine the files, and a
particular one has both changed names and content, how are you going to
decide that it's the "same" one? Define "same" in a way that you could
do it by hand, and you're halfway towards a programming solution.

Maybe it'd be obvious from an analogy. Suppose you're HR for a company
with 100 employees, and a strange policy of putting paychecks under the
wipers of the employees' windshields. All the employee cars are kept
totally clean of personal belongings, with no registration or license
plates. The lot has no reserved parking places, so every car has a
random location.

For a while, you just memorize the make/model/color of each car, and
everything's fine. But one day several of the employees buy new cars.
How do you then associate each car with each employee?

I've got it - you require each one to keep a numbered parking sticker,
and they move the sticker when they get a new car.

Or, you give everyone a marked, reserved parking place.

Or you require each employee to report any car exchanges to you, so you
can update your records.

If you can solve this one, you can probably solve the other one. Until
then, we have no spec.
 
A

alex23

ALL, iam asking for is a way to make this work.

No, ALL you are asking is for us to take an _impossible_ situation and
make it magically work for you, without your having to improve your
understanding of the problem or modifying your requirements in any
way. You don't see *your ignorance* as the problem, preferring instead
to blame others and Python itself for your failings. None of the
solutions proposed satisfy you because they seem like too much work,
and you're convinced that this can just happen.

It can't, and you desperately need to educate yourself on some vital
aspects of _how the web works_ (and Python, and file systems, and *NIX
environments etc etc).
 
Ad

Advertisements

A

alex23

Perhaps we should look into on how's the OS handles the file to get an idea on how its done?

Who is this "we" you speak of? You mean "you", right?

You do that and get back to us when you believe you've found something
that helps.
 
O

Oscar Benjamin

That wiki article gives a hint to a poosible solution -use a timestamp to
determine which key is valid when.

In the Ship of Theseus, it is only argued that it is the same ship
because people were aware of the incremental changes that took place
along the way. The same applies here: if you don't track the
incremental changes and the two files have nothing concrete in common,
what does it mean to say that a file is "the same file" as some older
file?

That being said, I've always been impressed with the way that git can
understand when I think that a file is the same as some older file
(though it does sometimes go wrong):

~/tmp$ git init
Initialized empty Git repository in /home/oscar/tmp/.git/
~/tmp$ vim old.py
~/tmp$ cat old.py
#!/usr/bin/env python

print('This is a fairly useless script.')
print("Maybe I'll improve it later...")
~/tmp$ git add old.py
~/tmp$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached <file>..." to unstage)
#
# new file: old.py
#
~/tmp$ git commit
[master (root-commit) 8e91665] First commit
1 file changed, 4 insertions(+)
create mode 100644 old.py
~/tmp$ ls
old.py
~/tmp$ cat old.py > new.py
~/tmp$ rm old.py
~/tmp$ vim new.py
~/tmp$ cat new.py
#!/usr/bin/env python

print('This is a fairly useless script.')
print("Maybe I'll improve it later...")

print("Although, I've edited it somewhat, it's still useless")
~/tmp$ git status
# On branch master
# Changes not staged for commit:
# (use "git add/rm <file>..." to update what will be committed)
# (use "git checkout -- <file>..." to discard changes in working directory)
#
# deleted: old.py
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# new.py
no changes added to commit (use "git add" and/or "git commit -a")
~/tmp$ git add -A .
~/tmp$ git status
# On branch master
# Changes to be committed:
# (use "git reset HEAD <file>..." to unstage)
#
# renamed: old.py -> new.py
#

So it *is* Theseus' ship!


Oscar
 
C

Chris Angelico

In the Ship of Theseus, it is only argued that it is the same ship
because people were aware of the incremental changes that took place
along the way. The same applies here: if you don't track the
incremental changes and the two files have nothing concrete in common,
what does it mean to say that a file is "the same file" as some older
file?

That being said, I've always been impressed with the way that git can
understand when I think that a file is the same as some older file
(though it does sometimes go wrong):

Yeah, git's awesome like that :) It looks at file similarity, though,
so if you completely rewrite a file and simultaneously rename/move it,
git will lose track of it. And as you say, sometimes it gets things
wrong - if you merge a large file into a small one, git will report it
as a deletion and rename. (Of course, it doesn't make any difference.
It's just a matter of reporting.) Mercurial, if I understand
correctly, actually _tracks_ moves (and copies), but git just records
a deletion and a creation.

My family in fact has a literal "grandfather's axe" (except that I
don't think either of my grandfathers actually owned it, but it's my
Dad's old axe) that has had many new handles and a couple of new
heads. Bringing it back to computers, we have on our network two
computers "Stanley" and "Ollie" that have been there ever since we
first set up that network. Back then, it was coax cable, 10base2, no
routers/switches/etc, and the computers were I think early Pentiums.
We installed the database on one of them, and set the other in Dad's
office. Today, we have a modern Ethernet setup with modern hardware
and cat-5 cable; we still have Stanley with the database and Ollie in
the office. The name/identity of the computer is mostly associated
with its roles; but those roles can shift too (there was a time when
Ollie was the internet gateway, but that's no longer the case).
Identity is its own attribute.

The problem isn't that identity can't exist. It's that it can't be
discovered. That takes external knowledge. Dave's analogy is accurate.

ChrisA
 
R

rusi

+1 internets for referencing my most favourite thought experiment
ever :)

+2 Oscar for giving me this name.

A more apposite (to computers) experience:

Ive a computer whose OS I wanted to upgrade without disturbing the
existing setup. Decided to fit a new hard disk with a new OS.
Installed the OS on a new hard disk, fitted the new hard disk into the
old computer and rebooted.

The messages that started coming were: New Hardware detected: monitor,
mouse, network card etc etc. but not new disk!

Strange! The only one thing new is not seen as new but all the old
things are seen as new.


So…
Ask a layman whats a computer and he'll point to the box and call it
'CPU'.
Ask a more computer literate person and he'll point to the chip inside
the box and say 'CPU'
Ask the computer itself and it says 'Disk'.

Moral:
Object identity is at best hard -- usually unsolvable
 
R

rusi

Τη ΔευτέÏα, 21 ΙανουαÏίου 2013 9:20:15 Ï€.μ.. UTC+2, ο χÏήστης Chris Angelico έγÏαψε:















Perhaps we should look into on how's the OS handles the file to get an idea on how its done?

Yes…
Perhaps the most useful for you suggestion Ive seen in this thread is
to look at git.
If you do you will find that
a. git has to do a great deal more work than you expect to factorize
out content-tracking from file-tracking
b. Yet it can get it wrong

Look at
snapshoting file systems http://en.wikipedia.org/wiki/Snapshot_(computer_storage)#File_systems
like winfs (cancelled) and btrfs
Slightly more practical may be timevault http://www.dedoimedo.com/computers/timevault.html
 
Ad

Advertisements

C

Chris Angelico

Ive a computer whose OS I wanted to upgrade without disturbing the
existing setup. Decided to fit a new hard disk with a new OS.
Installed the OS on a new hard disk, fitted the new hard disk into the
old computer and rebooted.

The messages that started coming were: New Hardware detected: monitor,
mouse, network card etc etc. but not new disk!

Strange! The only one thing new is not seen as new but all the old
things are seen as new.

That's because you asked the OS to look at the computer, and the OS
was on the disk. So in that sense, you did give it a whole lot of new
hardware but not a new disk. However, Windows Product Activation would
probably have called that a new computer, meaning that Microsoft deems
it to be new. (I've no idea about other non-free systems. Free systems
don't care about new computer vs same computer, of course.)

ChrisA
 
F

Ferrous Cranus

Τη ΔευτέÏα, 21 ΙανουαÏίου 2013 10:48:11 μ.μ. UTC+2, ο χÏήστης Piet van Oostrum έγÏαψε:
Under these circumstances the only way to solve it is to put an

identification *inside* the file and make sure it will not be changed.

It could for example be some invisible piece of HTML, or an attribute to

some tag. If that can't be done the problem cannot be solved and it

makes no sense keeping asking the same question over and over again.

The solution you propose is what i already use for my website.
Since its my website i can edit all the .html i want embedding a unique number in each and evey one of them as i showed in my initial post.

Problem is i'am not allowed to do the same with the other websites i host.
And apart from that even if i was allowed to, an html page could be rewritten thus the identified would get lost.
 
F

Ferrous Cranus

Τη ΤÏίτη, 22 ΙανουαÏίου 2013 6:04:09 Ï€.μ. UTC+2, οχÏήστης Tim Roberts έγÏαψε:
Right, and that makes it impossible to solve this problem.



Think about some scenarios. Let's say I have a web site with two pages:

~/web/page1.html

~/web/page2.html



Now let's say I use some editor to make a copy of page1 called page1a.html.

~/web/page1.html

~/web/page1a.html

~/web/page2.html



Should page1a.html be considered the same page as page1.html? What if I

subsequently delete page1.html? What if I don't? How long will you wait

before deciding they are the same?

--

Tim Roberts, (e-mail address removed)

Providenza & Boekelheide, Inc.

You are right, it cannot be done.

So i have 2 options .

Either identify an .html file from its "filepath" or from its "hash".

Which method do you advice me to utilize?
 
Ad

Advertisements

J

John Gordon

No, i cannot, because it would mess things at a later time on when i for
example:
1. mv name.html othername.html (document's filename altered)
2. mv name.html /subfolder/name.html (document's filepath altered)

Will the file always reside on the same device? If so, perhaps you could
use the file inode number as the key.

(That seems fairly brittle though. For example if the disk crashes and is
restored from a backup, the inodes could easily be different.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top