U
Uncle_Fester
I want to test for "things that look more or less like real English
words" from parsed hypertext.
I know that
while ($text =~ /([A-Za-z0-9_\'\-]+)/g )
will catch most of what I want most of the time.
The tricky bit is this :
How might I allow 'oo' and 'ee' and not 'ff' or '--' ?
How might I exclude patterns like '_________' or '010101010101' ?
Any thoughts?
words" from parsed hypertext.
I know that
while ($text =~ /([A-Za-z0-9_\'\-]+)/g )
will catch most of what I want most of the time.
The tricky bit is this :
How might I allow 'oo' and 'ee' and not 'ff' or '--' ?
How might I exclude patterns like '_________' or '010101010101' ?
Any thoughts?