Script for finding words of any size that do NOT contain vowels withacute diacritic marks?

Discussion in 'Python' started by nwaits, Oct 17, 2012.

  1. nwaits

    nwaits Guest

    I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
    Thank you.
     
    nwaits, Oct 17, 2012
    #1
    1. Advertising

  2. nwaits

    Dave Angel Guest

    Re: Script for finding words of any size that do NOT contain vowelswith acute diacritic marks?

    On 10/17/2012 10:31 AM, nwaits wrote:
    > I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?
    > Thank you.


    if you can construct a list of "illegal" characters, then you can simply
    check each character of the word against the list, and if it succeeds
    for all of the characters, it's a winner.

    If that's not fast enough, you can build a translation table from the
    list of illegal characters, and use translate on each word. Then it
    becomes a question of checking if the translated word is all zeroes.
    More setup time, but much faster looping for each word.

    --

    DaveA
     
    Dave Angel, Oct 17, 2012
    #2
    1. Advertising

  3. nwaits

    Guest

    Re: Script for finding words of any size that do NOT contain vowelswith acute diacritic marks?

    Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :
    > On 10/17/2012 10:31 AM, nwaits wrote:
    >
    > > I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?

    >
    > > Thank you.

    >
    >
    >
    > if you can construct a list of "illegal" characters, then you can simply
    >
    > check each character of the word against the list, and if it succeeds
    >
    > for all of the characters, it's a winner.
    >
    >
    >
    > If that's not fast enough, you can build a translation table from the
    >
    > list of illegal characters, and use translate on each word. Then it
    >
    > becomes a question of checking if the translated word is all zeroes.
    >
    > More setup time, but much faster looping for each word.
    >
    >
    >
    > --
    >
    >
    >
    > DaveA


    Lazy way.
    Py3.2

    >>> import unicodedata
    >>> def HasDiacritics(w):

    .... w_decomposed = unicodedata.normalize('NFKD', w)
    .... return 'no' if len(w) == len(w_decomposed) else 'yes'
    ....
    >>> HasDiacritics('éléphant')

    'yes'
    >>> HasDiacritics('elephant')

    'no'
    >>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')

    'yes'
    >>> HasDiacritics('U')

    'no'
    >>>


    Should be ok for the CombiningDiacriticalMarks unicode range
    (common diacritics)

    jmf
     
    , Oct 17, 2012
    #3
  4. nwaits

    Guest

    Re: Script for finding words of any size that do NOT contain vowelswith acute diacritic marks?

    Le mercredi 17 octobre 2012 17:00:46 UTC+2, Dave Angel a écrit :
    > On 10/17/2012 10:31 AM, nwaits wrote:
    >
    > > I'm very impressed with python's wordlist script for plain text. Is there a script for finding words that do NOT have certain diacritic marks, like acute or grave accents (utf-8), over the vowels?

    >
    > > Thank you.

    >
    >
    >
    > if you can construct a list of "illegal" characters, then you can simply
    >
    > check each character of the word against the list, and if it succeeds
    >
    > for all of the characters, it's a winner.
    >
    >
    >
    > If that's not fast enough, you can build a translation table from the
    >
    > list of illegal characters, and use translate on each word. Then it
    >
    > becomes a question of checking if the translated word is all zeroes.
    >
    > More setup time, but much faster looping for each word.
    >
    >
    >
    > --
    >
    >
    >
    > DaveA


    Lazy way.
    Py3.2

    >>> import unicodedata
    >>> def HasDiacritics(w):

    .... w_decomposed = unicodedata.normalize('NFKD', w)
    .... return 'no' if len(w) == len(w_decomposed) else 'yes'
    ....
    >>> HasDiacritics('éléphant')

    'yes'
    >>> HasDiacritics('elephant')

    'no'
    >>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')

    'yes'
    >>> HasDiacritics('U')

    'no'
    >>>


    Should be ok for the CombiningDiacriticalMarks unicode range
    (common diacritics)

    jmf
     
    , Oct 17, 2012
    #4
  5. nwaits

    Ian Kelly Guest

    Re: Script for finding words of any size that do NOT contain vowelswith acute diacritic marks?

    On Wed, Oct 17, 2012 at 9:32 AM, <> wrote:
    >>>> import unicodedata
    >>>> def HasDiacritics(w):

    > ... w_decomposed = unicodedata.normalize('NFKD', w)
    > ... return 'no' if len(w) == len(w_decomposed) else 'yes'
    > ...
    >>>> HasDiacritics('éléphant')

    > 'yes'
    >>>> HasDiacritics('elephant')

    > 'no'
    >>>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')

    > 'yes'
    >>>> HasDiacritics('U')

    > 'no'


    Is there something wrong with True and False that you had to replace
    them with strings?

    "return len(w) != len(w_decomposed)" is all you need.
     
    Ian Kelly, Oct 17, 2012
    #5
  6. nwaits

    Guest

    Re: Script for finding words of any size that do NOT contain vowelswith acute diacritic marks?

    Le mercredi 17 octobre 2012 19:07:43 UTC+2, Ian a écrit :
    > On Wed, Oct 17, 2012 at 9:32 AM, <> wrote:
    >
    > >>>> import unicodedata

    >
    > >>>> def HasDiacritics(w):

    >
    > > ... w_decomposed = unicodedata.normalize('NFKD', w)

    >
    > > ... return 'no' if len(w) == len(w_decomposed) else 'yes'

    >
    > > ...

    >
    > >>>> HasDiacritics('éléphant')

    >
    > > 'yes'

    >
    > >>>> HasDiacritics('elephant')

    >
    > > 'no'

    >
    > >>>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')

    >
    > > 'yes'

    >
    > >>>> HasDiacritics('U')

    >
    > > 'no'

    >
    >
    >
    > Is there something wrong with True and False that you had to replace
    >
    > them with strings?
    >
    >
    >
    > "return len(w) != len(w_decomposed)" is all you need.


    Not at all, I knew this. In this I decided to program like
    this.

    Do you get it? Yes/No or True/False

    jmf
     
    , Oct 17, 2012
    #6
  7. nwaits

    Guest

    Re: Script for finding words of any size that do NOT contain vowelswith acute diacritic marks?

    Le mercredi 17 octobre 2012 19:07:43 UTC+2, Ian a écrit :
    > On Wed, Oct 17, 2012 at 9:32 AM, <> wrote:
    >
    > >>>> import unicodedata

    >
    > >>>> def HasDiacritics(w):

    >
    > > ... w_decomposed = unicodedata.normalize('NFKD', w)

    >
    > > ... return 'no' if len(w) == len(w_decomposed) else 'yes'

    >
    > > ...

    >
    > >>>> HasDiacritics('éléphant')

    >
    > > 'yes'

    >
    > >>>> HasDiacritics('elephant')

    >
    > > 'no'

    >
    > >>>> HasDiacritics('\N{LATIN CAPITAL LETTER U WITH DIAERESIS AND MACRON}')

    >
    > > 'yes'

    >
    > >>>> HasDiacritics('U')

    >
    > > 'no'

    >
    >
    >
    > Is there something wrong with True and False that you had to replace
    >
    > them with strings?
    >
    >
    >
    > "return len(w) != len(w_decomposed)" is all you need.


    Not at all, I knew this. In this I decided to program like
    this.

    Do you get it? Yes/No or True/False

    jmf
     
    , Oct 17, 2012
    #7
  8. Re: Script for finding words of any size that do NOT contain vowelswith acute diacritic marks?

    On Thu, Oct 18, 2012 at 5:17 AM, <> wrote:
    > Not at all, I knew this. In this I decided to program like
    > this.
    >
    > Do you get it? Yes/No or True/False


    Yes but why? When you're returning a boolean concept, why not return a
    boolean value? You don't even use values with one that
    compares-as-true and the other that compares-as-false (for instance,
    you could write the function so that it returns just the
    diacritic-containing characters, meaning it'll return "" if there
    aren't any). To what benefit?

    Puzzled.

    ChrisA
     
    Chris Angelico, Oct 17, 2012
    #8
  9. nwaits

    Ian Kelly Guest

    Re: Script for finding words of any size that do NOT contain vowelswith acute diacritic marks?

    On Wed, Oct 17, 2012 at 12:17 PM, <> wrote:
    > Not at all, I knew this. In this I decided to program like
    > this.
    >
    > Do you get it? Yes/No or True/False


    It's just bad style, because both 'yes' and 'no' evaluate true.

    if HasDiacritics('éléphant'):
    print('Correct!')

    if HasDiacritics('elephant'):
    print('Error!')

    Prints:

    Correct!
    Error!

    You could replace the test with "if HasDiacritics('elephant') ==
    'yes':", but why force the caller to write that out when the former
    test is more natural and less prone to error (e.g. typoing 'yes')?
     
    Ian Kelly, Oct 17, 2012
    #9
  10. nwaits

    Guest

    Re: Script for finding words of any size that do NOT contain vowelswith acute diacritic marks?

    Le mercredi 17 octobre 2012 20:28:21 UTC+2, Ian a écrit :
    > On Wed, Oct 17, 2012 at 12:17 PM, <> wrote:
    >
    > > Not at all, I knew this. In this I decided to program like

    >
    > > this.

    >
    > >

    >
    > > Do you get it? Yes/No or True/False

    >
    >
    >
    > It's just bad style, because both 'yes' and 'no' evaluate true.
    >
    >
    >
    > if HasDiacritics('éléphant'):
    >
    > print('Correct!')
    >
    >
    >
    > if HasDiacritics('elephant'):
    >
    > print('Error!')
    >
    >
    >
    > Prints:
    >
    >
    >
    > Correct!
    >
    > Error!
    >
    >
    >
    > You could replace the test with "if HasDiacritics('elephant') ==
    >
    > 'yes':", but why force the caller to write that out when the former
    >
    > test is more natural and less prone to error (e.g. typoing 'yes')?


    I *know* all this. In my prev. msg, the goal was to emph. the
    usage of *unicode.normalize()".

    jmf
     
    , Oct 17, 2012
    #10
  11. nwaits

    Guest

    Re: Script for finding words of any size that do NOT contain vowelswith acute diacritic marks?

    Le mercredi 17 octobre 2012 20:28:21 UTC+2, Ian a écrit :
    > On Wed, Oct 17, 2012 at 12:17 PM, <> wrote:
    >
    > > Not at all, I knew this. In this I decided to program like

    >
    > > this.

    >
    > >

    >
    > > Do you get it? Yes/No or True/False

    >
    >
    >
    > It's just bad style, because both 'yes' and 'no' evaluate true.
    >
    >
    >
    > if HasDiacritics('éléphant'):
    >
    > print('Correct!')
    >
    >
    >
    > if HasDiacritics('elephant'):
    >
    > print('Error!')
    >
    >
    >
    > Prints:
    >
    >
    >
    > Correct!
    >
    > Error!
    >
    >
    >
    > You could replace the test with "if HasDiacritics('elephant') ==
    >
    > 'yes':", but why force the caller to write that out when the former
    >
    > test is more natural and less prone to error (e.g. typoing 'yes')?


    I *know* all this. In my prev. msg, the goal was to emph. the
    usage of *unicode.normalize()".

    jmf
     
    , Oct 17, 2012
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Phil Slater
    Replies:
    8
    Views:
    1,099
    Howard
    May 17, 2004
  2. ChrisN
    Replies:
    0
    Views:
    848
    ChrisN
    Sep 19, 2006
  3. Matt
    Replies:
    7
    Views:
    1,594
    Oliver Wong
    Jan 2, 2007
  4. ivalki
    Replies:
    5
    Views:
    1,068
    Chris ( Val )
    Jan 16, 2007
  5. Replies:
    3
    Views:
    95
Loading...

Share This Page