Random image text generation?

S

skip

Is there a module out there that will generate an image with a random text
string such as the confirmation images you see on various websites? I'm
thinking I'm going to have to add that to the forms on the Mojam websites.
Over the past couple weeks we've begun to get lots of spam submission crap.

Thx,

Skip
 
M

Mitja Trampus

Is there a module out there that will generate an image with a random text
string such as the confirmation images you see on various websites?

They're called captcha images or captchas for short.
Googling for "python captcha" returns several hits; see what you like...
 
J

Jorge Godoy

S

skip

Mitja> They're called captcha images or captchas for short. Googling
Mitja> for "python captcha" returns several hits; see what you like...

Thanks. I'd never heard that term before. Assuming I can get PIL installed
with freetype support on my Mac the ASPN recipe looks like it will do the
trick.

Skip
 
S

Steven D'Aprano

Mitja> They're called captcha images or captchas for short. Googling
Mitja> for "python captcha" returns several hits; see what you like...

Thanks. I'd never heard that term before. Assuming I can get PIL installed
with freetype support on my Mac the ASPN recipe looks like it will do the
trick.

Keep in mind two serious problems with captchas:

- they're impossible for the blind or people using text-only browsers to
see -- even mere colour blindness can make some captchas impossible to
solve;

- sometimes they're too difficult for even those with perfect vision to
decipher.


Two alternatives:

Instead of displaying an obfuscated image of a nonsense word, display six
randomly chosen photos, where five are of the same thing but not the same
image. E.g. you might show five different kittens and a horse. The user
has to click on the image that is not the same as the others.
State-of-the-art horse-recognition software is not yet in widespread use
by spammers *wink*

For a text only solution, consider putting up a natural language question
such as:

What is the third letter of 'national'?
What is four plus two?
How many eggs in a dozen?
Fill in the blank: Mary had a little ____ its fleece was white as snow.
Cat, Dog, Apple, Bird. One of those words is a fruit. Which one?

Beware of making the questions too difficult or too specific:

In the third season of Babylon Five, what did Mr Morden ask Lando?

Also, keep in mind that all captchas are vulnerable to the old
distributed hybrid human-machine network trick: "We'll give you a free
account on our porn site if you spend fifteen minutes a day solving
captchas for our bot network." The only solution to that, I fear, is open
season on spammers and anyone who buys from a spammer.
 
L

Leif K-Brooks

Steven said:
For a text only solution, consider putting up a natural language question
such as:

What is the third letter of 'national'?
What is four plus two?
How many eggs in a dozen?
Fill in the blank: Mary had a little ____ its fleece was white as snow.
Cat, Dog, Apple, Bird. One of those words is a fruit. Which one?

That wouldn't work as a true CAPTCHA (Completely Automated *Public*
Turing test to tell Computers and Humans Apart), since making the list
of questions and answers public would defeat its purpose.
 
S

skip

Steven> Keep in mind two serious problems with captchas:

Steven> - they're impossible for the blind or people using text-only
Steven> browsers to see -- even mere colour blindness can make some
Steven> captchas impossible to solve;

Steven> - sometimes they're too difficult for even those with perfect
Steven> vision to decipher.

Sure, but I'm more concerned in the immediate term with people not
submitting random crap advertising cheap drugs into my concert database.
(I've been running the Musi-Cal concert database for nearly 12 years. This
is the first time I've ever had to consider resorting to something like
this, and it really pisses me off that I have to.) I can deal with visual
impairment in other ways (like asking users who can't respond to the captcha
by simply emailing their concert data directly to me).

Steven> Instead of displaying an obfuscated image of a nonsense word,
...

Thanks for the suggestions. I'll keep them in mind.

Another possibility is to run the submissions through SpamBayes and silently
direct any which score as "unsure" or "spam" to me for review. Users
wouldn't even need to know their submissions were being scrutinized.

Skip
 
B

Ben Finney

Leif K-Brooks said:
That wouldn't work as a true CAPTCHA (Completely Automated *Public*
Turing test to tell Computers and Humans Apart), since making the
list of questions and answers public would defeat its purpose.

The "Public" part of a CAPTCHA is the algorithm. The data consumed and
produced by the algorithm don't need to be publicly correlated -- and
indeed shouldn't be, for exactly the reason you state.
 
L

Leif K-Brooks

Ben said:
The "Public" part of a CAPTCHA is the algorithm. The data consumed and
produced by the algorithm don't need to be publicly correlated -- and
indeed shouldn't be, for exactly the reason you state.

When the CAPTCHA is based entirely on a fixed list of questions and
answers, I think it's reasonable to treat that list as part of the
algorithm, since the CAPTCHA couldn't function without it. Similarly, I
think most people would consider an image-based CAPTCHA for which the
algorithm but not the fonts were available to be non-public
 
H

Hendrik van Rooyen

From: "Leif K-Brooks said:
That wouldn't work as a true CAPTCHA (Completely Automated *Public*
Turing test to tell Computers and Humans Apart), since making the list
of questions and answers public would defeat its purpose.

you could consider keeping these answers secret - the spammers would then be
stymied...
 
P

Paul Rubin

Steven D'Aprano said:
Instead of displaying an obfuscated image of a nonsense word, display six
randomly chosen photos, where five are of the same thing but not the same
image. E.g. you might show five different kittens and a horse. The user
has to click on the image that is not the same as the others.
State-of-the-art horse-recognition software is not yet in widespread use
by spammers *wink*

No need to recognize the horse. Just pick one of the pictures at
random and you'll get the right one 1/6th of the time. Repeat ad
infinitum--they're spammers and like to repeat stuff after all.
That's why those conventional captcha images make you recognize a
multi-character string: so the guessing chance is low.
 
S

skip

Paul> No need to recognize the horse. Just pick one of the pictures at
Paul> random and you'll get the right one 1/6th of the time. Repeat ad
Paul> infinitum--they're spammers and like to repeat stuff after all.
Paul> That's why those conventional captcha images make you recognize a
Paul> multi-character string: so the guessing chance is low.

Actually, the ones I saw that used a set of "one of these things is not like
the other" images gave you a pop-up menu of maybe 100-200 words. The user
needed to choose the name of the different object from that list. That
makes it a bit harded to guess. Of course, these sorts of tests suffer from
the same shortcoming as the randomly generated string. Visually impaired
people have trouble with it.

I finally settled on just reusing the SpamBayes engine to detect/reject spam
submissions.

Skip
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,768
Messages
2,569,574
Members
45,051
Latest member
CarleyMcCr

Latest Threads

Top