email address obfuscation

D

dorayme

<[email protected]
t.rr.com>,
Nikita the Spider said:
The same could be said for all spam blocking methods (my Bayesian
filters used to work a lot better, for example). So should we should
abandon all attempts to block spam because none of them are guaranteed?
Hmmmm, OK. But you go first. ;)

Actually, Spider, I was just saying to a friend this morning, my
Mac Mail.app filters based on this type of mathematics is failing
me lately... bit alarming actually, i am thinking is it the junk
algorithms not learning any more (they used to be good) or are
the spammers just on to these algorithms bigtime now. Never mind,
clients, websites... I may need to actually buy a better spam set
up for me... I suppose this is OT! But I was interested to hear
your remark about Bayesian filters. Doubtless, there are all
kinds of these...
 
N

Nico Schuyt

This is simply not true. If you had left it at the "no guarantee:
and not added the "whatsoever" you would have had a fighting
chance old chap.


Ah well, let's not argue. Like I mentioned in a later posting, I fully admit
the technique of Nikita is a good alternative for the JS-solution.
Come now Nico, you can't believe this.

But I really do :) It's not the statistics, that's pure mathematics; it's
the uncertainty in the methods the data are collected.
Ah... this is the kind of talk I like, evil talk. If you have
plans... all this is different...

Don't worry, no plans :)
 
H

Harlan Messinger

Nikita said:
Fair enough, I hadn't thought of those scenarios. But Web mail users
*do* have an email client on their local machine -- the browser.

Well, of course I said it was false that an e-mail link is more
convenient, not false that it can be used! But it can't be used
directly. Clicking a link won't open the browser to a Compose Mail page
on the user's e-mail service. Instead, it may cause an error. Or, if an
e-mail client *is* installed, but configured for someone else, it could
open a new message window, letting the user cluelessly send an e-mail
under someone else's account.
 
D

dorayme

Come now Nico, you can't believe this.

But I really do :) It's not the statistics, that's pure mathematics; it's
the uncertainty in the methods the data are collected.[/QUOTE]

Ah I see what you are saying I think. Yes, I would like to see
more data on these experiments. Spider has mentioned he will one
day do this. Perhaps time for a little experiment or two
ourselves to confirm... :)
 
J

Joe (GKF)

?? No. E-mail address obfuscation tries to deal with the problem at
the user's end. Its aim is to remove all trouble for the e-mail
address owner, no matter the cost to anyone else.
What exactly is the "cost to anyone else" of my choosing to use hash
entities to hide my email addy from bots while leaving it perfectly
clear to humans? Just curious.

....
It is not your job to prevent spam being generated, unless you are
actively fighting against it, in which case there are better, more
effective approaches than e-mail address obfuscation.
Whose job do you suppose it is then?
 
J

Joe (GKF)

....
But I agree that the seen email address should be normal
looking. There is a way around this, to not put any at all, just
a link, the words being, "email us" or whatever.

The problem with that, as I see it, is that people who might want to
email you but are not able to at that time (because they are in an
Internet Cafe, Library, or someplace else they can't send email from)
can't just write the addy on the back of an envelope and take it with
them. Anyway, there's someething that inspires trust about an address
you can actually see - and if the 'bots have trouble, so much the
better.
I would be interested to hear from anyone who has an idea of the
chances of email harvesting happening from the expressed text on
the page as distinct from the source.

but ... nah, you probably feel like a dill already.
 
J

John Dunlop

Joe (GKF):
What exactly is the "cost to anyone else" of my choosing to use hash
entities to hide my email addy from bots while leaving it perfectly
clear to humans? Just curious.

Probably nothing. What has that to do with the price of fish?
Whose job do you suppose it is then?

What do I care?

I don't get spam. I don't obfuscate my address.
 
D

dorayme

Joe (GKF) said:
...

The problem with that, as I see it, is that people who might want to
email you but are not able to at that time (because they are in an
Internet Cafe, Library, or someplace else they can't send email from)
can't just write the addy on the back of an envelope and take it with
them. Anyway, there's someething that inspires trust about an address
you can actually see - and if the 'bots have trouble, so much the
better.

Yes. You are right. And if the bots have trouble, so much the
better.
but ... nah, you probably feel like a dill already.

Not really (but that's the mark of a dill, you see).

Have this idea that the source is searched for addresses but that
the expressed text could be too ...

The simple fact is that I do not know how these bots work, do
they look in strings starting with "mailto:" or even simpler, any
"well-formed" ascii email string.
 
D

dorayme

"John Dunlop said:
I don't get spam.

OK Jock, time to spill the beans. No one simply just does not get
spam. There is a story behind how you do not get spam. What is
the story? Every single little secret please. Don't be shy now.
 
J

John Dunlop

dorayme:
OK Jock, time to spill the beans. No one simply just does not get
spam. There is a story behind how you do not get spam. What is
the story? Every single little secret please. Don't be shy now.

If I told you, I would have to kill you.
 
J

John Dunlop

dorayme:
What mangling are you talking about?

Mangling URLs by percent-encoding octets that could have remained as
raw data.

| For consistency, percent-encoded octets in the ranges
| of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39),
| hyphen (%2D), period (%2E), underscore (%5F), or tilde
| (%7E) should not be created by URI producers and, when
| found in a URI, should be decoded to their corresponding
| unreserved characters by URI normalizers.

(RFC3986 : 2.3)

Two conditions:

1. Unreserved characters should not be percent-encoded.
2. If found, they should be decoded.

E-mail address obfuscation that percent-encodes the octets of
unreserved characters runs afoul of (1). And if 'global transcription'
is a consideration, anyone who obfuscates their address in this way
relies on (2).
If so, who besides alt.html types will it seem so unprofessional to?

Anyone, I'd imagine, faced with a URL chock full of %xx. If it hasn't
yet been decoded.
 
J

jojo

John said:
Besides, in this war, there are more effective and less harmful
strategies than obfuscation.

Yes, I know them. They are called"spam-filters"... It wasn't my idea to
obfuscate th emmail-address, I just pointed out a way how to do it if
you want to. And AFAIK there is no way of obfuscation that doesn't run
againest the spirit of the internet specifications.

jojo
 
N

Nikita the Spider

dorayme said:
<[email protected]
t.rr.com>,


Actually, Spider, I was just saying to a friend this morning, my
Mac Mail.app filters based on this type of mathematics is failing
me lately... bit alarming actually, i am thinking is it the junk
algorithms not learning any more (they used to be good) or are
the spammers just on to these algorithms bigtime now. Never mind,
clients, websites... I may need to actually buy a better spam set
up for me... I suppose this is OT! But I was interested to hear
your remark about Bayesian filters. Doubtless, there are all
kinds of these...

I'm also using Mail.app. Many spams include random bits of prose
(non-spammy words) to offset the weight of the spammy content of the
email. This is a pretty effective technique against a lot of statistical
weighting filters, which is what I think Mail.app and lots of other
programs use.
 
N

Nikita the Spider

The simple fact is that I do not know how these bots work, do
they look in strings starting with "mailto:" or even simpler, any
"well-formed" ascii email string.

I'm sure there's a variety of them out there. I've gotten hits on URLs
before that are only expressed in HTML comments, which tells me that
some bots are not properly parsing the HTML but probably just scanning
the source for "<a" or "http://" and using that as their flag for a
link. I would think that some do the same for "mailto:" as you
suggested, or maybe just "@".
 
N

Nikita the Spider

"Nico Schuyt said:
But I really do :) It's not the statistics, that's pure mathematics; it's
the uncertainty in the methods the data are collected.

Nico, my methods are perfect! Trust me! =)

Seriously, you're right, I already referred once in this thread to one
of my favorite books, How To Lie with Statistics. There are lots of ways
to mismeasure things and to misrepresent the measurements. I guess I
will write up my methods sooner rather than later since the topic is
fresh on my mind now. That way you can judge whether my findings have
any merit.

As dorayme suggested, I'd love to see others try the same test to see if
the results hold up. It could be that my corner of the Internet is just
populated by stupid bots.
 
J

Joe (GKF)

Have this idea that the source is searched for addresses but that
the expressed text could be too ...

The simple fact is that I do not know how these bots work, do
they look in strings starting with "mailto:" or even simpler, any
"well-formed" ascii email string.
I don't know the answer to that either, which is why I &# the "mailto:"
stuff and the text that is to appear on the page as well.
 
D

dorayme

Joe (GKF) said:
I don't know the answer to that either, which is why I &# the "mailto:"
stuff and the text that is to appear on the page as well.

Me too.

OK Joe, I will confess something to you... you know how you said
I might be feeling like a dill... well a couple of things about
this:

(1) That's the sweetest thing you have ever said to me... you
little cucumber yourself...

but

(2) I was imagining different types of spam robots:

a. The sort that live in a little robot-house somewhere. They
have a little sleep and get up and have a little oil and turn on
their html source only browsers. Their job is to get the email
addresses by looking at source only. Some use javascript and
entity type decoders, some don't.

b. The sort that don't have some of their members using little
monitors. Their job is to get the email addresses by looking at
what is on their little screens. They are made a bit different to
the other robots

Now, I reckon this is more than dillish, I boast that it is
outright idiocy. Never overestimate me. In my case, it is a
special martian trait, less is more.
 
M

MG

jojo said:
You can improve that: use HTML-Entities for "mailto:" and hex-entities
(%41 for A) for the email-adress itself.

How do I use hex-entities in HTML? I tried using %41 but it displays as %41,
not as A.

MG
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,780
Messages
2,569,608
Members
45,241
Latest member
Lisa1997

Latest Threads

Top