Programming games in historical linguistics with Python

Dax Bloom · Nov 30, 2010

Hello,

Following a discussion that began 3 weeks ago I would like to ask a
question regarding substitution of letters according to grammatical
rules in historical linguistics. I would like to automate the
transformation of words according to complex rules of phonology and
integrate that script in a visual environment.
Here follows the previous thread:
http://groups.google.com/group/comp...t&q=evolutionary+linguistics#fe7c2c82ecf0dbf5

Is there a way to refer to vowels and consonants as a subcategory of
text? Is there a function to remove all vowels? How should one create
and order the dictionary file for the rules? How to chain several
transformations automatically from multiple rules? Finally can anyone
show me what existing python program or phonological software can do
this?

What function could tag syllables, the word nucleus and the codas? How
easy is it to bridge this with a more visual environment where
interlinear, aligned text can be displayed with Greek notations and
braces as usual in the phonology textbooks?

Best regards,

Dax Bloom

Vlastimil Brom · Dec 1, 2010

2010/11/30 Dax Bloom said:
Hello,

Following a discussion that began 3 weeks ago I would like to ask a
question regarding substitution of letters according to grammatical
rules in historical linguistics. I would like to automate the
transformation of words according to complex rules of phonology and
integrate that script in a visual environment.
Here follows the previous thread:
http://groups.google.com/group/comp...t&q=evolutionary+linguistics#fe7c2c82ecf0dbf5

Is there a way to refer to vowels and consonants as a subcategory of
text? Is there a function to remove all vowels? How should one create
and order the dictionary file for the rules? How to chain several
transformations automatically from multiple rules? Finally can anyone
show me what existing python program or phonological software can do
this?

What function could tag syllables, the word nucleus and the codas? How
easy is it to bridge this with a more visual environment where
interlinear, aligned text can be displayed with Greek notations and
braces as usual in the phonology textbooks?

Best regards,

Dax Bloom

Hi,
as far as I know, there is no predefined function or library for
distinguishing vowels or consonants, but these can be simply
implemented individually according to the exact needs.

e.g. regular expressions can be used here: to remove vowels, the code
could be (example from the command prompt):

import re
re.sub(r"(?i)[aeiouy]", "", "This is a SAMPLE TEXT") 'Ths s SMPL TXT'

Click to expand...

Click to expand...

See http://docs.python.org/library/re.html
or
http://www.regular-expressions.info/
for the regexp features.

You may eventually try the new development version regex, which adds
many interesting new features and remove some limitations
http://bugs.python.org/issue2636

In some cases regular expressions aren't really appropriate or may
become too complicated.
Sometimes a parsing library like pyparsing may be a more adequate tool:
http://pyparsing.wikispaces.com/

If the rules are simple enough, that they can be formulated for single
characters or character clusters with a regular expression, you can
model the phonological changes as a series of replacements with
matching patterns and the respective replacement patterns.

For character-wise matching and replacing the regular expressions are
very effective; using lookarounds
http://www.regular-expressions.info/lookaround.html
even some combinatorics for conditional changes can be expressed;
however, i would find some more complex conditions, suprasegmentals,
morpheme boundaries etc. rather difficult to formalise this way...

hth,
vbr

Gnarlodious · Dec 1, 2010

Have you considered entering all this data into an SQLite database?
You could do fast searches based on any features you deem relevant to
the phoneme. Using an SQLite editor application you can get started
building a database right away. You can add columns as you get the
inspiration, along with any tags you want. Putting it all in database
tables can really make chaotic linguistic data seem manageable.

My own linguistics project uses mostly SQLite and a number of
OrderedDict's based on .plist files. It is all working very nicely,
although I haven't tried to deal with any phonetics (yet).

-- Gnarlie
http://Sectrum.com

Steven D'Aprano · Dec 1, 2010

Is there a way to refer to vowels and consonants as a subcategory of
text? Is there a function to remove all vowels? How should one create
and order the dictionary file for the rules? How to chain several
transformations automatically from multiple rules? Finally can anyone
show me what existing python program or phonological software can do
this?

Have you looked at NLTK?

http://www.nltk.org/

The questions you are asking are awfully specific for a general
programming forum like this. You might have better luck asking on a NLTK
forum, or possibly if you could pose your questions in ways that don't
assume extensive familiarity in linguistics ("word nucleus"? "codas"?).

Is there a function to remove vowels -- not specifically, but provided
you can tell us what characters you wish to treat as vowels, and provided
you are satisfied with a fairly simple search-and-replace, this will do
it:

# For Python 2.x
def disemvowel(s):
"""Quick and dirty disemvoweller."""
tbl = string.maketrans('', '')
return string.translate(s, tbl, "aeiouAEIOU")

If you want a function that can distinguish between Y being used as a
vowel from Y being used as a consonant, you'll need something much more
sophisticated -- try the NLTK.

Programming games in historical linguistics with Python

Dax Bloom

Vlastimil Brom

Gnarlodious

Steven D'Aprano

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads