Adrienne said:
Right, what we really need is something that would explicitly say "Do
not follow this link because it essentially goes to the same place,
with the same content",
Do we? It would still be a command-like "element". And why would we, as
authors, give instructions rather than (meta)information to search engines?
We cannot know whether search engines actually want to crawl, maybe for a
good reason, different copies of a document.
In HTML 4.01, there is a sloppily written semi-quasi-normative and
pseudo-descriptive list of rel attribute values. Among them, the following
is interesting here:
Alternate
Designates substitute versions for the document in which the link occurs.
When used together with the lang attribute, it implies a translated version
of the document. When used together with the media attribute, it implies a
version designed for a different medium (or media).
Now _that_ would be descriptive. We're not saying what should be _done_ with
the link. We simply _describe_ the relationship between the linking and the
linked resource.
Yet, it says "substitute versions", not "copies". In its vagueness, a <link
rel="alternate" ...> element is almost useless, except perhaps on advanced
browsers that give the user optional access to alternate versions via
browser's interface. But such <link>ing is rather pointless, because most
users would not be able to make use of them, so explicit <a href> links
would be needed anyway.
What _could_ make sense is rel="copy" if it were defined and used as meaning
that the linked resource is a copy of the linking resource, as regards to
content, with possible differences in presentation style and format (e.g.,
Word format vs. HTML format, in cases were such format difference implies no
difference in content). This would raise the inconvenient question whether
rel="copy" is to be taken as a commitment of some kind to _keep_ the
resources identical in content.
On the other hand, such markup would be fairly useless, since search engines
need to investigate, and they do investigate, the actual content of pages to
detect copies.