Script for finding words of any size that do NOT contain vowels withacute diacritic marks?

nwaits · Oct 17, 2012

I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
Thank you.

Dave Angel · Oct 17, 2012

I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
Thank you.

if you can construct a list of "illegal" characters, then you can simply
check each character of the word against the list, and if it succeeds
for all of the characters, it's a winner.

If that's not fast enough, you can build a translation table from the
list of illegal characters, and use translate on each word. Then it
becomes a question of checking if the translated word is all zeroes.
More setup time, but much faster looping for each word.

wxjmfauth · Oct 17, 2012

Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :

if you can construct a list of "illegal" characters, then you can simply

check each character of the word against the list, and if it succeeds

for all of the characters, it's a winner.

If that's not fast enough, you can build a translation table from the

list of illegal characters, and use translate on each word. Then it

becomes a question of checking if the translated word is all zeroes.

More setup time, but much faster looping for each word.

--

DaveA

Lazy way.
Py3.2
.... w_decomposed = unicodedata.normalize('NFKD', w)
.... return 'no' if len(w) == len(w_decomposed) else 'yes'
....
Should be ok for the CombiningDiacriticalMarks unicode range
(common diacritics)

jmf

wxjmfauth · Oct 17, 2012

Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :

if you can construct a list of "illegal" characters, then you can simply

check each character of the word against the list, and if it succeeds

for all of the characters, it's a winner.

If that's not fast enough, you can build a translation table from the

list of illegal characters, and use translate on each word. Then it

becomes a question of checking if the translated word is all zeroes.

More setup time, but much faster looping for each word.

--

DaveA

Lazy way.
Py3.2
.... w_decomposed = unicodedata.normalize('NFKD', w)
.... return 'no' if len(w) == len(w_decomposed) else 'yes'
....
Should be ok for the CombiningDiacriticalMarks unicode range
(common diacritics)

jmf

Ian Kelly · Oct 17, 2012

... w_decomposed = unicodedata.normalize('NFKD', w)
... return 'no' if len(w) == len(w_decomposed) else 'yes'
...
'no'

Is there something wrong with True and False that you had to replace
them with strings?

"return len(w) != len(w_decomposed)" is all you need.

wxjmfauth · Oct 17, 2012

Le mercredi 17 octobre 2012 19:07:43 UTC+2, Ian a écrit :

Is there something wrong with True and False that you had to replace

them with strings?

"return len(w) != len(w_decomposed)" is all you need.

Not at all, I knew this. In this I decided to program like
this.

Do you get it? Yes/No or True/False

jmf

wxjmfauth · Oct 17, 2012

Le mercredi 17 octobre 2012 19:07:43 UTC+2, Ian a écrit :

Is there something wrong with True and False that you had to replace

them with strings?

"return len(w) != len(w_decomposed)" is all you need.

Not at all, I knew this. In this I decided to program like
this.

Do you get it? Yes/No or True/False

jmf

Chris Angelico · Oct 17, 2012

Not at all, I knew this. In this I decided to program like
this.

Do you get it? Yes/No or True/False

Yes but why? When you're returning a boolean concept, why not return a
boolean value? You don't even use values with one that
compares-as-true and the other that compares-as-false (for instance,
you could write the function so that it returns just the
diacritic-containing characters, meaning it'll return "" if there
aren't any). To what benefit?

Puzzled.

ChrisA

Ian Kelly · Oct 17, 2012

Not at all, I knew this. In this I decided to program like
this.

Do you get it? Yes/No or True/False

It's just bad style, because both 'yes' and 'no' evaluate true.

if HasDiacritics('éléphant'):
print('Correct!')

if HasDiacritics('elephant'):
print('Error!')

Prints:

Correct!
Error!

You could replace the test with "if HasDiacritics('elephant') ==
'yes':", but why force the caller to write that out when the former
test is more natural and less prone to error (e.g. typoing 'yes')?

wxjmfauth · Oct 17, 2012

Le mercredi 17 octobre 2012 20:28:21 UTC+2, Ian a écrit :

It's just bad style, because both 'yes' and 'no' evaluate true.

if HasDiacritics('éléphant'):

print('Correct!')

if HasDiacritics('elephant'):

print('Error!')

Prints:

Correct!

Error!

You could replace the test with "if HasDiacritics('elephant') ==

'yes':", but why force the caller to write that out when the former

test is more natural and less prone to error (e.g. typoing 'yes')?

I *know* all this. In my prev. msg, the goal was to emph. the
usage of *unicode.normalize()".

jmf

wxjmfauth · Oct 17, 2012

Le mercredi 17 octobre 2012 20:28:21 UTC+2, Ian a écrit :

It's just bad style, because both 'yes' and 'no' evaluate true.

if HasDiacritics('éléphant'):

print('Correct!')

if HasDiacritics('elephant'):

print('Error!')

Prints:

Correct!

Error!

You could replace the test with "if HasDiacritics('elephant') ==

'yes':", but why force the caller to write that out when the former

test is more natural and less prone to error (e.g. typoing 'yes')?

I *know* all this. In my prev. msg, the goal was to emph. the
usage of *unicode.normalize()".

jmf

Can someone pls help me with a little algorithm script	1	Nov 28, 2024
Bootstrap contact form not working	2	Feb 15, 2025
Working on mobile css menu with plenty of frustration!	2	Dec 29, 2022
menu bar and banner responsive issues....any guidance is appreciated!	0	Apr 5, 2016
javascript function call does not contain rendered asp code for firstrecord of recordset	0	Jul 24, 2008
Reading in cooked mode (was Re: Python MSI not installing, log fileshowing name of a Viatnemese comm	8	Mar 22, 2014
FAQ Topic - How do I find the size of the window? (2009-12-23)	14	Dec 22, 2009
jQuery Overloading Strategy -- What Not To Do	25	Sep 10, 2011

Script for finding words of any size that do NOT contain vowels withacute diacritic marks?

nwaits

Dave Angel

wxjmfauth

wxjmfauth

Ian Kelly

wxjmfauth

wxjmfauth

Chris Angelico

Ian Kelly

wxjmfauth

wxjmfauth

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads