Using Python for a demonstration in historical linguistics

D

Dax Bloom

Hello,

In the framework of a project on evolutionary linguistics I wish to
have a program to process words and simulate the effect of sound
shift, for instance following the Rask's-Grimm's rule. I look to have
python take a dictionary file or a string input and replace the
consonants in it with the Grimm rule equivalent. For example:
bʰ → b → p → f
dʰ → d → t → θ
gʰ → g → k → x
gʷʰ → gʷ → kʷ → xʷ
If the dictionary file has the word "Abe" I want the program to
replace the letter b with f forming the word "Afe" and write the
result in a tabular file. How easy is it to find the python functions
to do that?

Best regards,

Dax Bloom
 
C

Chris Rebert

Hello,

In the framework of a project on evolutionary linguistics I wish to
have a program to process words and simulate the effect of sound
shift, for instance following the Rask's-Grimm's rule. I look to have
python take a dictionary file or a string input and replace the
consonants in it with the Grimm rule equivalent. For example:
bʰ → b → p → f
dʰ → d → t → θ
gʰ → g → k → x
gʷʰ → gʷ → kʷ → xʷ
If the dictionary file has the word "Abe" I want the program to
replace the letter b with f forming the word "Afe" and write the
result in a tabular file. How easy is it to find the python functions
to do that?

Tabular files:
http://docs.python.org/library/csv.html

Character substitution:
(a) http://docs.python.org/library/string.html#string.maketrans and
http://docs.python.org/library/stdtypes.html#str.translate
(b) http://docs.python.org/library/stdtypes.html#str.replace
In either case, learn about dicts:
http://docs.python.org/library/stdtypes.html#dict

Cheers,
Chris
 
M

MRAB

Hello,

In the framework of a project on evolutionary linguistics I wish to
have a program to process words and simulate the effect of sound
shift, for instance following the Rask's-Grimm's rule. I look to have
python take a dictionary file or a string input and replace the
consonants in it with the Grimm rule equivalent. For example:
bʰ → b → p → f
dʰ → d → t → θ
gʰ → g → k → x
gʷʰ → gʷ → kʷ → xʷ
If the dictionary file has the word "Abe" I want the program to
replace the letter b with f forming the word "Afe" and write the
result in a tabular file. How easy is it to find the python functions
to do that?
Very. :)

I'd build a dict of each rule:

bʰ → b
b → p

etc, and then use the re module to perform the replacements in one
pass, looking up the new sound for each match.
 
P

Peter Otten

Dax said:
Hello,

In the framework of a project on evolutionary linguistics I wish to
have a program to process words and simulate the effect of sound
shift, for instance following the Rask's-Grimm's rule. I look to have
python take a dictionary file or a string input and replace the
consonants in it with the Grimm rule equivalent. For example:
bʰ → b → p → f
dʰ → d → t → θ
gʰ → g → k → x
gʷʰ → gʷ → kʷ → xʷ
If the dictionary file has the word "Abe" I want the program to
replace the letter b with f forming the word "Afe" and write the
result in a tabular file. How easy is it to find the python functions
to do that?

Best regards,

Dax Bloom
.... In the framework of a project on evolutionary linguistics I wish to
.... have a program to process words and simulate the effect of sound
.... shift, for instance following the Rask's-Grimm's rule. I look to have
.... python take a dictionary file or a string input and replace the
.... consonants in it with the Grimm rule equivalent. For example:
.... """
rules = ["bpf", ("d", "t", "th"), "gkx"]
for rule in rules:
.... rule = rule[::-1] # go back in time
.... for i in range(len(rule)-1):
.... s = s.replace(rule, rule[i+1])
....
In de brameworg ob a brojecd on evoludionary linguisdics I wish do
have a brogram do brocess words and simulade de ebbecd ob sound
shibd, bor insdance bollowing de Rasg's-Grimm's rule. I loog do have
bydon dage a dicdionary bile or a sdring inbud and reblace de
consonands in id wid de Grimm rule equivalend. For egamble:

;)

If you are using nonascii characters like θ you should use unicode instead
of str. Basically this means writing string constants as u"..." instead of
"..." and opening your files with

f = codecs.open(filename, encoding="utf-8")

instead of

f = open(filename)

Peter
 
S

Steven D'Aprano

If you are using nonascii characters like θ you should use unicode
instead of str. Basically this means writing string constants as u"..."
instead of "..."


Or using Python 3.1 instead of 2.x.
 
G

garabik-news-2005-05

Dax Bloom said:
I look to have
python take a dictionary file or a string input and replace the
consonants in it with the Grimm rule equivalent. ....
How easy is it to find the python functions
to do that?

http://code.activestate.com/recipes/81330-single-pass-multiple-replace/

--
-----------------------------------------------------------
| Radovan Garabík http://kassiopeia.juls.savba.sk/~garabik/ |
| __..--^^^--..__ garabik @ kassiopeia.juls.savba.sk |
-----------------------------------------------------------
Antivirus alert: file .signature infected by signature virus.
Hi! I'm a signature virus! Copy me into your signature file to help me spread!
 
P

Peter Otten

Peter said:
... In the framework of a project on evolutionary linguistics I wish to
... have a program to process words and simulate the effect of sound
... shift, for instance following the Rask's-Grimm's rule. I look to have
... python take a dictionary file or a string input and replace the
... consonants in it with the Grimm rule equivalent. For example:
... """
rules = ["bpf", ("d", "t", "th"), "gkx"]
for rule in rules:
... rule = rule[::-1] # go back in time
... for i in range(len(rule)-1):
... s = s.replace(rule, rule[i+1])
...


Warning: this simple-minded approach somewhat limits the possible rules.
E. g. it fails for

a --> b
b --> a
'aaaa'

while unicode.translate() can deal with it:
u'baab'

Or, if you are using Python 3.x as Steven suggested:
'baab'

Peter
 
V

Vlastimil Brom

2010/11/6 Dax Bloom said:
Hello,

In the framework of a project on evolutionary linguistics I wish to
have a program to process words and simulate the effect of sound
shift, for instance following the Rask's-Grimm's rule. I look to have
python take a dictionary file or a string input and replace the
consonants in it with the Grimm rule equivalent. For example:
bʰ → b → p → f
dʰ → d → t → θ
gʰ → g → k → x
gʷʰ → gʷ → kʷ → xʷ
If the dictionary file has the word "Abe" I want the program to
replace the letter b with f forming the word "Afe" and write the
result in a tabular file. How easy is it to find the python functions
to do that?

Best regards,

Dax Bloom

Hi,
I guess, the most difficult part would be, to select appropriate
words, to apply the simple rules on (in order not to get "problems"
with Verner's Law or other special rules).
You also normally wouldn't want to chain the changes like the above,
but to keep them separated
bÊ° → b; p → f (ie. *bÊ°rÄter- > ... brother and not *p-... (at least
without the High German consonant shift)).
of course, there are also vowel changes to be dealt with and many more
peculiarities ...

As for implementation, I guess, the simplest way might be to use
regular expression replacements - re.sub(...) with a replace function
looking up the appropriate results in a dictionary.
maybe something along the lines:

########################################

Rask_Grimm_re = ur"[bdgptk]Ê°?"
Rask_Grimm_dct = {u"b":u"p", u"bʰ": u"b", u"t": u"þ", } # ...

def repl_fn(m):
return Rask_Grimm_dct.get(m.group(), m.group())

ie_txt = u" bÊ°rÄter ... "
almost_germ_txt = re.sub(Rask_Grimm_re, repl_fn, ie_txt)
print u"%s >> %s" % (ie_txt, almost_germ_txt) # vowel changes etc. TBD

########################################

bÊ°rÄter ... >> brÄþer ...


hth,
vbr
 
D

Dax Bloom

2010/11/6 Dax Bloom <[email protected]>:




In the framework of a project on evolutionary linguistics I wish to
have a program to process words and simulate the effect of sound
shift, for instance following the Rask's-Grimm's rule. I look to have
python take a dictionary file or a string input and replace the
consonants in it with the Grimm rule equivalent. For example:
bʰ → b → p → f
dʰ → d → t → θ
gʰ → g → k → x
gʷʰ → gʷ → kʷ → xʷ
If the dictionary file has the word "Abe" I want the program to
replace the letter b with f forming the word "Afe" and write the
result in a tabular file. How easy is it to find the python functions
to do that?
Best regards,
Dax Bloom

Hi,
I guess, the most difficult part would be, to select appropriate
words, to apply the simple rules on (in order not to get "problems"
with Verner's Law or other special rules).
You also normally wouldn't want to chain the changes like the above,
but to keep them separated
bÊ° → b; p → f (ie. *bÊ°rÄter- > ... brother and not *p-... (at least
without the High German consonant shift)).
of course, there are also vowel changes to be dealt with and many more
peculiarities ...

As for implementation, I guess, the simplest way might be to use
regular expression replacements - re.sub(...) with a replace function
looking up the appropriate results in a dictionary.
maybe something along the lines:

########################################

Rask_Grimm_re = ur"[bdgptk]Ê°?"
Rask_Grimm_dct = {u"b":u"p", u"bʰ": u"b", u"t": u"þ", } # ...

def repl_fn(m):
    return Rask_Grimm_dct.get(m.group(), m.group())

ie_txt = u" bÊ°rÄter ... "
almost_germ_txt = re.sub(Rask_Grimm_re, repl_fn, ie_txt)
print u"%s >> %s" % (ie_txt, almost_germ_txt) # vowel changes etc. TBD

########################################

 bÊ°rÄter ...  >>  brÄþer ...

hth,
  vbr


Hello,

Thx to every one of you for the prompt response. Resuming the thread
of November 5 on evolutionary linguistics, is there a way to refer to
a sub-category of text like vowels or consonants? If not, is there a
way to optimize the code by creating these sub-categories?
I would need to arrange substitution rules into groups because there
might be a whole lot more than the ones I mentioned in the example on
Rask-Grimm rule; I would like each substitution to produce a new entry
and not all substitutions to result in a single entry. I want to do
things in two steps (or ‘passes’) and apply to the results of the
group 1 of rules the rules of group 2.

I understand that it could be particularly useful for the study of
phonology to have a dynamic analysis system with adjustable rules; in
this branch of linguistics parts of a word like the nucleus or the
codas are tagged with abbreviatory notations explaining ‘phonological
processes’ with schemas; such historical mutations of language as the
metathesis, the prothesis, the anaptyxis or fusional assimilation
could be included among the rules that we mentioned for the
substitution. It might require the replacing of certain letters with
Greek notation in applying phonological processes. What function could
tag syllables, the word nucleus and the codas? How easy is it to
bridge this with a more visual environment where schematic analysis
can be displayed with highlights and notations such as in the
phonology textbooks?

To outline the goals of the program:
1) Arranging rules for substitution into groups of rules
2) Applying substitutions to string input in logic of “Multiple pass
multiple replaceâ€
3) Returning a string for each substitution
4) Making program environment visual

When quoting parts of code can you please precise where to insert them
in the code and what the variables mean?

Best wishes,

Dax Bloom
 
D

Dax Bloom

Peter said:
... In the framework of a project onevolutionarylinguisticsI wish to
... have a program to process words and simulate the effect of sound
... shift, for instance following the Rask's-Grimm's rule. I look to have
... python take a dictionary file or a string input and replace the
... consonants in it with the Grimm rule equivalent. For example:
... """
rules = ["bpf", ("d", "t", "th"), "gkx"]
for rule in rules:
...     rule = rule[::-1] # go back in time
...     for i in range(len(rule)-1):
...             s = s.replace(rule, rule[i+1])
...


Warning: this simple-minded approach somewhat limits the possible rules.
E. g. it fails for

a --> b
b --> a

'aaaa'

while unicode.translate() can deal with it:

u'baab'

Or, if you are using Python 3.x as Steven suggested:

'baab'

Peter


Hi Peter,

I read your interesting replies 20 days ago and after several exams
and a university semester, I would like to address more fully your
answers to my post. However could you please clarify some of the code
inputs that you suggested and in what order to insert them in the
script?
... In the framework of a project onevolutionarylinguisticsI wish to
... have a program to process words and simulate the effect of sound
... shift, for instance following the Rask's-Grimm's rule. I look to have
... python take a dictionary file or a string input and replace the
... consonants in it with the Grimm rule equivalent. For example:
... """
rules = ["bpf", ("d", "t", "th"), "gkx"]
for rule in rules:
...     rule = rule[::-1] # go back in time
...     for i in range(len(rule)-1):
...             s = s.replace(rule, rule[i+1])
...


Best regards,

Dax Bloom
 
V

Vlastimil Brom

2010/11/27 Dax Bloom said:
2010/11/6 Dax Bloom <[email protected]>:
...
Rask_Grimm_re = ur"[bdgptk]Ê°?"
Rask_Grimm_dct = {u"b":u"p", u"bʰ": u"b", u"t": u"þ", } # ....

def repl_fn(m):
    return Rask_Grimm_dct.get(m.group(), m.group())

ie_txt = u" bÊ°rÄter ... "
almost_germ_txt = re.sub(Rask_Grimm_re, repl_fn, ie_txt)
print u"%s >> %s" % (ie_txt, almost_germ_txt) # vowel changes etc. TBD

########################################

 bÊ°rÄter ...  >>  brÄþer ...

hth,
  vbr
...
Hello Vlastimil,

Could you please explain what the variables %s and % mean and how to
implement this part of the code in a working python program? I can't
fully appreciate Peter's quote on rules


Best regards,

Dax Bloom
Hi, the mentioned part is called string interpolation;
the last line is equivalent to
print u"%s >> %s" % (ie_txt, almost_germ_txt) # vowel changes etc. TBD
is equivalent to the simple string concatenation:
print ie_txt+ u" >> " + almost_germ_txt
see:
http://docs.python.org/library/stdtypes.html#string-formatting-operations

The values of the tuple (or eventually dict or another mapping) given
after the modulo operator % are inserted at the respective positions
(here %s) of the preceding string (or unicode);
some more advanced adjustments or conversions are also possible here,
which aren't needed in this simple case.

(There is also another string formatting mechanism in the newer
versions of python
http://docs.python.org/library/string.html#formatstrings
which may be more suitable for more complex tasks.)

The implementation depends on the rest of your program and the
input/output of the data, you wish to have (to be able to print the
output with rather non-trivial characters, you will need the unicode
enabled console (Idle is a basic one available with python).
Otherwise the sample is self contained and should be runnable as is;
you can add other needed items to Rask_Grimm_dct and all substrings
matching Rask_Grimm_re will be replaced in one pass.
You can also add a series of such replacements (re pattern and a dict
of a ie: germ pairs), of course only for context-free changes.
On the other hand, I have no simple idea how th deal with Verner's Law
and the like (even if you passed the accents in the PIE forms); well
besides a lexicographic approach, where you would have to identify the
word stems to decide the changes to be applied.

hth,
vbr
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,578
Members
45,052
Latest member
LucyCarper

Latest Threads

Top