Script for finding words of any size that do NOT contain vowels withacute diacritic marks?

N

nwaits

I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
Thank you.
 
D

Dave Angel

I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
Thank you.

if you can construct a list of "illegal" characters, then you can simply
check each character of the word against the list, and if it succeeds
for all of the characters, it's a winner.

If that's not fast enough, you can build a translation table from the
list of illegal characters, and use translate on each word. Then it
becomes a question of checking if the translated word is all zeroes.
More setup time, but much faster looping for each word.
 
W

wxjmfauth

Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :
if you can construct a list of "illegal" characters, then you can simply

check each character of the word against the list, and if it succeeds

for all of the characters, it's a winner.



If that's not fast enough, you can build a translation table from the

list of illegal characters, and use translate on each word. Then it

becomes a question of checking if the translated word is all zeroes.

More setup time, but much faster looping for each word.



--



DaveA

Lazy way.
Py3.2
.... w_decomposed = unicodedata.normalize('NFKD', w)
.... return 'no' if len(w) == len(w_decomposed) else 'yes'
....
Should be ok for the CombiningDiacriticalMarks unicode range
(common diacritics)

jmf
 
W

wxjmfauth

Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :
if you can construct a list of "illegal" characters, then you can simply

check each character of the word against the list, and if it succeeds

for all of the characters, it's a winner.



If that's not fast enough, you can build a translation table from the

list of illegal characters, and use translate on each word. Then it

becomes a question of checking if the translated word is all zeroes.

More setup time, but much faster looping for each word.



--



DaveA

Lazy way.
Py3.2
.... w_decomposed = unicodedata.normalize('NFKD', w)
.... return 'no' if len(w) == len(w_decomposed) else 'yes'
....
Should be ok for the CombiningDiacriticalMarks unicode range
(common diacritics)

jmf
 
I

Ian Kelly

... w_decomposed = unicodedata.normalize('NFKD', w)
... return 'no' if len(w) == len(w_decomposed) else 'yes'
...
'no'

Is there something wrong with True and False that you had to replace
them with strings?

"return len(w) != len(w_decomposed)" is all you need.
 
W

wxjmfauth

Le mercredi 17 octobre 2012 19:07:43 UTC+2, Ian a écrit :
Is there something wrong with True and False that you had to replace

them with strings?



"return len(w) != len(w_decomposed)" is all you need.

Not at all, I knew this. In this I decided to program like
this.

Do you get it? Yes/No or True/False

jmf
 
W

wxjmfauth

Le mercredi 17 octobre 2012 19:07:43 UTC+2, Ian a écrit :
Is there something wrong with True and False that you had to replace

them with strings?



"return len(w) != len(w_decomposed)" is all you need.

Not at all, I knew this. In this I decided to program like
this.

Do you get it? Yes/No or True/False

jmf
 
C

Chris Angelico

Not at all, I knew this. In this I decided to program like
this.

Do you get it? Yes/No or True/False

Yes but why? When you're returning a boolean concept, why not return a
boolean value? You don't even use values with one that
compares-as-true and the other that compares-as-false (for instance,
you could write the function so that it returns just the
diacritic-containing characters, meaning it'll return "" if there
aren't any). To what benefit?

Puzzled.

ChrisA
 
I

Ian Kelly

Not at all, I knew this. In this I decided to program like
this.

Do you get it? Yes/No or True/False

It's just bad style, because both 'yes' and 'no' evaluate true.

if HasDiacritics('éléphant'):
print('Correct!')

if HasDiacritics('elephant'):
print('Error!')

Prints:

Correct!
Error!

You could replace the test with "if HasDiacritics('elephant') ==
'yes':", but why force the caller to write that out when the former
test is more natural and less prone to error (e.g. typoing 'yes')?
 
W

wxjmfauth

Le mercredi 17 octobre 2012 20:28:21 UTC+2, Ian a écrit :
It's just bad style, because both 'yes' and 'no' evaluate true.



if HasDiacritics('éléphant'):

print('Correct!')



if HasDiacritics('elephant'):

print('Error!')



Prints:



Correct!

Error!



You could replace the test with "if HasDiacritics('elephant') ==

'yes':", but why force the caller to write that out when the former

test is more natural and less prone to error (e.g. typoing 'yes')?

I *know* all this. In my prev. msg, the goal was to emph. the
usage of *unicode.normalize()".

jmf
 
W

wxjmfauth

Le mercredi 17 octobre 2012 20:28:21 UTC+2, Ian a écrit :
It's just bad style, because both 'yes' and 'no' evaluate true.



if HasDiacritics('éléphant'):

print('Correct!')



if HasDiacritics('elephant'):

print('Error!')



Prints:



Correct!

Error!



You could replace the test with "if HasDiacritics('elephant') ==

'yes':", but why force the caller to write that out when the former

test is more natural and less prone to error (e.g. typoing 'yes')?

I *know* all this. In my prev. msg, the goal was to emph. the
usage of *unicode.normalize()".

jmf
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,744
Messages
2,569,479
Members
44,899
Latest member
RodneyMcAu

Latest Threads

Top