email address obfuscation

J

jojo

John said:
And, as I've explained, the techniques to obfuscate e-mail addresses
proposed in this thread run contrary to the spirit of Internet
specifications.

That's because spambots are against the spirit of the internet, too. If
the "dark side" does not follow the rules we don't have to follow them
either.
 
J

John Dunlop

jojo:
That's because spambots are against the spirit of the internet, too.

Under discussion was not the spirit of the Internet but the spirit of
Internet specifications. What harvesters do does not run contrary to
the word or the spirit of the two specifications I mentioned. I would
maintain that what you proposed - replacing US-ASCII characters with
character references in HTML, and percent-encoding octets in URLs that
would otherwise be treated as data - does.
If the "dark side" does not follow the rules we don't have to follow them
either.

Come on. Internet specifications are a boon! If you fail to grasp the
advantages they bring - if you fail to imagine a WWW without them - why
wait until the "dark side" supposedly deviates from them before you
ignore them yourself?

Besides, in this war, there are more effective and less harmful
strategies than obfuscation.
 
N

Nikita the Spider

"John Dunlop said:
dorayme:

[re overcoming e-mail address obfuscation]
The point is this though: robbers tend to go for the low lying
fruit first and there is plenty enough of that to go around. Do
you understand what I am saying? No need to crash through even
slightly heavier security.

Yes, but I am merely pointing out that obfuscating e-mail addresses is
inferior to real security; I am not claiming to know what harvesters
actually do!

Myself, I'm pretty impressed by the fact that the entity-encoded address
received only two spams while its unprotected counterpart has received
over 700. If this method is inferior, I'd like to know to what! If there
are other methods that are equally easy to implement and don't
inconvenience users, I can't say I've heard of them.
Mind that old axiom 'security by obscurity gives a false sense of
security'?

I'd argue that we're not talking about security here so much as
annoyance reduction. I don't mean to nitpick about your words; I
honestly think the difference is important. Security prohibits access to
a resource and there are clear negative consequences when it fails (my
account is cracked, for example). By contrast, my inbox lost its spam
virginity a long time ago. All I can do now with the resources I have
available is to limit further, ahem, penetrations.
And, as I've explained, the techniques to obfuscate e-mail addresses
proposed in this thread run contrary to the spirit of Internet
specifications. That a construct is included in a specification is
hardly license to exploit it.

I see your point, but the spec isn't strongly worded. As you pointed
out, the relevant section is here:
http://www.w3.org/TR/html401/charset.html#h-5.3

"A given character encoding may not be able to express all characters of
the document character set. For such encodings, or when hardware or
software configurations do not allow users to input some document
characters directly, authors may use SGML character references."

But it also says this:
"Character references are a character encoding-independent mechanism for
entering any character from the document character set."

Using entities to encode email addresses fits perfectly well within this
provision, IMO.

Cheers
 
C

Chris F.A. Johnson

dorayme:

[re overcoming e-mail address obfuscation]

And, as I've explained, the techniques to obfuscate e-mail addresses
proposed in this thread run contrary to the spirit of Internet
specifications. That a construct is included in a specification is
hardly license to exploit it.

As NtS pointed out, that's not true (or, at least, debatable).
Deal with spam at your end; don't pass the buck.

That's what obfuscate e-mail addresses do. Letting spam be
generated any more than necessary is passing the buck. The
important thing it to prevent it (as much as possible) in the first
place.
 
J

John Dunlop

Chris F.A. Johnson:
[John Dunlop:]
Deal with spam at your end; don't pass the buck.

That's what obfuscate e-mail addresses do.

?? No. E-mail address obfuscation tries to deal with the problem at
the user's end. Its aim is to remove all trouble for the e-mail
address owner, no matter the cost to anyone else. If obfuscation dealt
with the problem at your end, it wouldn't be obfuscation since there
would be nothing to obfuscate.
Letting spam be generated any more than necessary is passing the buck.

It is not your job to prevent spam being generated, unless you are
actively fighting against it, in which case there are better, more
effective approaches than e-mail address obfuscation.
 
J

John Dunlop

Nikita the Spider:
Myself, I'm pretty impressed by the fact that the entity-encoded address
received only two spams while its unprotected counterpart has received
over 700. If this method is inferior, I'd like to know to what!

mentioned now more than once in this thread: normal counter-spam
measures. That means junk mail filters both at the server and at the
MUA.

[re e-mail address obfuscation running contrary to the spirit of
Internet specs]
I see your point, but the spec isn't strongly worded.

Well, every clause in the spec is vague enough to be open to, however
absurd, interpretation.

I specifically talked not about the spec's wording but about its
spirit. To learn about the spirit of HTML you have to trace its
history: follow the past discussions, study the earlier drafts and
specifications, find out why the constructs were introduced in the
first place.
As you pointed out, the relevant section is here:

http://www.w3.org/TR/html401/charset.html#h-5.3

I quoted from there but did not mean that as the 'relevant section' to
learn why character references came about. You will find that not in
the HTML4.0 spec but in ISO8879 (my copy's at work and I haven't yet
memorised it all, so much to my consternation I can't give you chapter
and verse.)
But it also says this:
"Character references are a character encoding-independent mechanism for
entering any character from the document character set."

Using entities to encode email addresses fits perfectly well within this
provision, IMO.

That's not even half the story.
 
D

dorayme

"John Dunlop said:
dorayme:

[re overcoming e-mail address obfuscation]
If it is so little effort, what is your theory about why it is so
effective (if it is as recent indications suggest)? Perhaps I can
help you:

No help needed, dorayme, thank you. Someone in this thread has already
advanced a plausible theory: laziness. Even the slightest extra
effort is too much because unobfuscated e-mail addresses are plentiful,
easy pickings even. No need to stretch.

I am in a picky mood, just excuse and ignore it: the lazy theory
is inadequate, not so plausible. You do need help. Go and study
the robber analogy of mine, the robber is not lazy. He can get
what he wants from unsecured houses. He is rationalising his
resources.
Yes, but I am merely pointing out that obfuscating e-mail addresses is
inferior to real security; I am not claiming to know what harvesters
actually do!

You were giving a different impression to me at least. I was
getting a message from your words that it was ineffective, that
it would not deter. You did not make things so utterly clear. You
did not say out loud, yes, it will reduce spam but these are the
downsides... You gave the impression of conflating these issues.

Mind that old axiom 'security by obscurity gives a false sense of
security'?

<g> I have a car protection system I made myself that is a sort
of inverse of this! It consists of a "key" and "switch" that is
not hidden from view, it is just not obvious to anyone's mind. It
gives me a great sense of security and has worked on a number of
occasions, both on my car and my daughter's and a neighbours'...
Deal with spam at your end; don't pass the buck.

It is not my spam. Tell that to my client. But, Jock, be careful,
he is 6 foot 8 inches and built like a brick shit-house, has red
hair and is not delicate, if you know what I mean. I think I will
use en encoding just on this occasion...
 
N

Nikita the Spider

"John Dunlop said:
Nikita the Spider:


mentioned now more than once in this thread: normal counter-spam
measures. That means junk mail filters both at the server and at the
MUA.

Hmmm, I guess we'll have to disagree on the criteria we use to measure
"inferior". Even the best mail filters can generate false positives,
which is something that an entity-encoded address won't do. And it'd
have to be a pretty darn effective filter (or set of filters) to achieve
what the entity encoding has done in this test. Furthermore, entity
encoding is something that any Web page author can do; the same can't be
said for setting up and tuning server-side filters. Last but not least,
entity encoding *prevents spam from being generated*. Mail filtering
doesn't do this. And if I just rely on my ISP's filters to handle my
spam for me, isn't that "passing the buck"?

[re e-mail address obfuscation running contrary to the spirit of
Internet specs]
I see your point, but the spec isn't strongly worded.

Well, every clause in the spec is vague enough to be open to, however
absurd, interpretation.

If you say so.
I specifically talked not about the spec's wording but about its
spirit. To learn about the spirit of HTML you have to trace its
history: follow the past discussions, study the earlier drafts and
specifications, find out why the constructs were introduced in the
first place.


I quoted from there but did not mean that as the 'relevant section' to
learn why character references came about. You will find that not in
the HTML4.0 spec but in ISO8879 (my copy's at work and I haven't yet
memorised it all, so much to my consternation I can't give you chapter
and verse.)

I haven't read ISO8879. I'll grant that my opinion might change after
doing so. But of all of the abuses to which HTML has been and is
subjected (sending XHTML as text/html comes to mind), I find it hard to
believe that entity encoding email addresses would be in the top one
hundred of many people's lists, if at all.
 
D

Dan

dorayme said:
Anyone here using methods to make it more difficult for spammers
to garner email addresses from web pages. Mostly interested to
hear from anyone using specific methods (rather than anything
else like further reviews, analyses of the ultimate effectiveness
etc, having things like "removeThis" inside the email address
that is in the "mailto:").

I personally find it aesthetically distasteful to do any sort of
obfuscation of addresses; it just seems to go against the grain of
Internet standards that have always been designed to keep things as
open as possible, not intentionally obscure. Some of the
character-encoding stuff I can more-or-less tolerate because you have
to view the source code to see that it's whacked out, but other things
like spelling out "address at something dot net", or putting in
signature notes like "remove 'x' from my address", or embedding an
address as a graphic, just rub my nose in the fact that it's being
intentionally made more difficult to use. That's the sort of thing up
with which I won't put.
 
D

dorayme

"Dan said:
I personally find it aesthetically distasteful to do any sort of
obfuscation of addresses; it just seems to go against the grain of
Internet standards that have always been designed to keep things as
open as possible, not intentionally obscure. Some of the
character-encoding stuff I can more-or-less tolerate because you have
to view the source code to see that it's whacked out, but other things
like spelling out "address at something dot net", or putting in
signature notes like "remove 'x' from my address", or embedding an
address as a graphic, just rub my nose in the fact that it's being
intentionally made more difficult to use. That's the sort of thing up
with which I won't put.

That is a fine speech. See my reference to Burning Mississipi. :)


But I agree that the seen email address should be normal
looking. There is a way around this, to not put any at all, just
a link, the words being, "email us" or whatever.

I would be interested to hear from anyone who has an idea of the
chances of email harvesting happening from the expressed text on
the page as distinct from the source. Without some idea of this
knowledge, one is less equipped to inform the good-guy dirty
tricks department. (If Spider's impressive figures are anything
to go on, it looks like these evil bots garner from the source
mainly)
 
N

Nico Schuyt

Nikita said:
I've set up several spamtrap addresses to study this. Eventually I'll
write a short article about my findings, but in the meantime I'll
summarize here. I have three email addresses all on the same page. One
is naked (i.e. just (e-mail address removed)), one is entity encoded (i.e.
foo etc.) and one is added to the page by Javascript.
The number of spams each has gotten to date is as follows:
naked - 715
entities - 2
javascript - 1
In short, the entities look pretty effective to me. They're nice
because they don't disturb one's visitors at all and you don't have
to mess around with any Javascript.
But another way of looking at it is to say that Javascript protection
is twice as effective as entity protection. =) (Thanks to Huff's "How
to Lie with Statistics")

Both are unreliable. Even *I* can make script that extracts email addresses
from JS or entity coded text :)
Use a mail form.
 
D

dorayme

"Nico Schuyt said:
Both are unreliable. Even *I* can make script that extracts email addresses
from JS or entity coded text :)
Use a mail form.

Would you, Mr Korpela and Jock - you see, Nico what good company
you are in... :) - please not ignore the fact that it works to
actually stop spam. If you don't think it actually does, say so
loud and clear. The issue of it "being easy" to overcome is quite
irrelevent in a world where almost no bots do this. This is the
world you earthlings and I live for the moment. What world are
you talking about? One in which Spider's stats are not true? In
this world it looks to me to be very reliable for now.
 
N

Nico Schuyt

Would you, Mr Korpela and Jock - you see, Nico what good company
you are in... :) - please not ignore the fact that it works to
actually stop spam. If you don't think it actually does, say so
loud and clear.

Working *now* is no guarantee what so ever for being effective in the near
future.
The issue of it "being easy" to overcome is quite
irrelevent in a world where almost no bots do this. This is the
world you earthlings and I live for the moment. What world are
you talking about? One in which Spider's stats are not true?

Stats are never true :)
In this world it looks to me to be very reliable for now.

The place is right; it's the time that might be a problem.
Tomorrow I'll launch my new evil bot.
 
J

John Dunlop

dorayme:
[John Dunlop:]
I am in a picky mood, just excuse and ignore it: the lazy theory
is inadequate, not so plausible. You do need help. Go and study
the robber analogy of mine, the robber is not lazy. He can get
what he wants from unsecured houses. He is rationalising his
resources.

Oops! Ok, so 'lazy' might not be the /mot juste/, as they say in the
Gorbals, but 'rationalising one's resources' seems to be more or less a
rehashing of the same theory, no? Anyway, it's one I'll have to
remember next time I'm asked to go to the gym.
You were giving a different impression to me at least. I was
getting a message from your words that it was ineffective, that
it would not deter. You did not make things so utterly clear. You
did not say out loud, yes, it will reduce spam but these are the
downsides...

'I should emphasize that I'm not saying that attempts at obfuscation
will universally fail, only that it takes little effort to overcome
them.'

Does it reduce spam? It would seem to reduce the amount of spam that
that e-mail address owner receives, yes, but whether it makes an impact
on spam in the grand scheme of things, I don't know. Wouldn't a
harvester simply pick other addresses?
You gave the impression of conflating these issues.

Ok. Let me list some options.

1. Obfuscate the address on the page:
a. munging
b. character references
c. percent-encodings
d. human-only addresses (e.g., 'user (at) host')
e. address written in javascript
2. Implement junk mail filters:
a. server filters
b. MUA filters
3. Remove all trace of the address.

Now my position regarding 1(b,c). Character references are the lesser
of the two evils, because while percent-encodings actually change the
URL for some degrees of equivalency, upsetting the user-interface,
character references don't.

But character references were 'intended to be used when you could not
otherwise enter a character conveniently in the text' (/The SGML
Handbook/ p. 356). I would be surprised if it inconvenienced you to
enter most US-ASCII characters directly.
<g> I have a car protection system I made myself that is a sort
of inverse of this! It consists of a "key" and "switch" that is
not hidden from view, it is just not obvious to anyone's mind. It
gives me a great sense of security and has worked on a number of
occasions, both on my car and my daughter's and a neighbours'...

I could find other analogies such as hiding the backdoor key to your
house under a stone, or hiding the key to your car under a wheel arch,
but I'm not sure what you're getting at here. The sense of security
can be real but false.
I think I will use en encoding just on this occasion...

If you feel the practical benefits of e-mail address obfuscation
outweigh the practical downsides - e.g., the impression of
unprofessionalism, the mangling of the user-interface by
percent-encoding - and the theoretical downsides, who am I to stand in
your way.

I suppose any persuasiveness I enjoyed must yield to Friday the 13th.
 
N

Nikita the Spider

A mail form != an email address hyperlink. The former is less convenient
for the user. Yes, email forms limit spam but so does putting one's
email address in an image instead of text, or writing "foo (at) example
dot com". As John Dunlop rightly said, that's "passing the buck" -- you
inconvenience the user. I want to avoid that if possible.
Working *now* is no guarantee what so ever for being effective in the near
future.

The same could be said for all spam blocking methods (my Bayesian
filters used to work a lot better, for example). So should we should
abandon all attempts to block spam because none of them are guaranteed?
Hmmmm, OK. But you go first. ;)
 
H

Harlan Messinger

Nikita said:
A mail form != an email address hyperlink. The former is less convenient
for the user.

Not necessarily, and altogether false for users not using an e-mail
client on their local machine, e.g., all users of web-based mail
services, many users of computers at their work place, and all users of
computers at libraries, Internet cafes, etc.

Further, if you are interested in, or think you may ever be interested
in, capturing information from the user besides the message itself (how
did you hear about us? is this a bug report, a help request, or a new
feature suggestion?), then the form is the way to go.
 
N

Nico Schuyt

Nikita said:
Nico Schuyt" said:
dorayme said:
"Nico Schuyt" wrote:
Nikita the Spider wrote:
I've set up several spamtrap addresses to study this.
[JS versus entity encoding]
Both are unreliable. Even *I* can make script that extracts email
addresses from JS or entity coded text :)
Use a mail form.
A mail form != an email address hyperlink. The former is less
convenient for the user.

Maybe it's more inconvenient for the user if he tries to contact you in an
internet cafe (no mail client)
Yes, email forms limit spam but so does
putting one's email address in an image instead of text, or writing
"foo (at) example dot com".

Not so friendly for the visitor either

But I just applied your entity-encoding-tric in a site where I needed an
e-mail address and didn't had time to install a form :)
Thanks for the tip!

BTW for encoding of a string ($str) with the e-mail address into html
entities I used:
<php
$str="<e-mail address>";
for ($i=0;$i<strlen($str);$i++)
printf('&#%03d;',ord($str{$i}));
?>
 
N

Nikita the Spider

Harlan Messinger said:
Not necessarily, and altogether false for users not using an e-mail
client on their local machine, e.g., all users of web-based mail
services, many users of computers at their work place, and all users of
computers at libraries, Internet cafes, etc.

Fair enough, I hadn't thought of those scenarios. But Web mail users
*do* have an email client on their local machine -- the browser.
Further, if you are interested in, or think you may ever be interested
in, capturing information from the user besides the message itself (how
did you hear about us? is this a bug report, a help request, or a new
feature suggestion?), then the form is the way to go.

Yes, some of these are good candidates for forms.
 
D

dorayme

Would you... please not ignore the fact that it works to
actually stop spam. If you don't think it actually does, say so
loud and clear.

Working *now* is no guarantee what so ever for being effective in the near
future.
[/QUOTE]
This is simply not true. If you had left it at the "no guarantee:
and not added the "whatsoever" you would have had a fighting
chance old chap.
Stats are never true :)

Come now Nico, you can't believe this.
The place is right; it's the time that might be a problem.
Tomorrow I'll launch my new evil bot.

Ah... this is the kind of talk I like, evil talk. If you have
plans... all this is different...
 
D

dorayme

"John Dunlop said:
If you feel the practical benefits of e-mail address obfuscation
outweigh the practical downsides - e.g., the impression of
unprofessionalism, the mangling of the user-interface by
percent-encoding - and the theoretical downsides, who am I to stand in
your way.

Well, this is what I would like to know more about. When I do
bad, I prefer not to be low class about it. I want to know what
evil I commit. What mangling are you talking about? I have
attempted a few times to raise the question about how bots work,
on the source code or the expressed page text (visible and
audible etc as normal text to humans). It seems it is mainly the
former. If so, who besides alt.html types will it seem so
unprofessional to?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,778
Messages
2,569,605
Members
45,238
Latest member
Top CryptoPodcasts

Latest Threads

Top