T
tony
I wonder if anyone has a script that could do the following?
-Read a file containing a regular text (e.g. a news story);
-Count word type repetitions for each pair of lines, disregarding
numbers and ignoring case. A valid repetition is a word type that
occurs in both lines of the pair. Words that occur many times in one
line of the pair only must be disregarded;
-Print out the count for word repetitions for each sentence pair in
the formats shown below ;
For instance:
input:
cat: cat is sitting on 2 mats.
Dog: dog is sitting.
dog, CAT and 2 mATs.
output format 1:
#format:
#[line][line][repetitions][words repeated]
[1][2][2][is,sitting]
[1][3][2][cat,mats]
[2][3][1][dog]
output format 2:
#matrix format:
#[line 1]:[1 & 1][1 & 2][1 & 3]
#[line 2]:[2 & 1][2 & 2][2 & 3]
#[line 3]:[3 & 1][3 & 2][3 & 3]
[1]:[0][2][2]
[2]:[2][0][1]
[3]:[2][1][0]
thanks very much indeed
tony berber
Catholic University of Sao Paulo, Brazil
Applied Linguistics Postgraduate Program
tony4 at uol.com.br
-Read a file containing a regular text (e.g. a news story);
-Count word type repetitions for each pair of lines, disregarding
numbers and ignoring case. A valid repetition is a word type that
occurs in both lines of the pair. Words that occur many times in one
line of the pair only must be disregarded;
-Print out the count for word repetitions for each sentence pair in
the formats shown below ;
For instance:
input:
cat: cat is sitting on 2 mats.
Dog: dog is sitting.
dog, CAT and 2 mATs.
output format 1:
#format:
#[line][line][repetitions][words repeated]
[1][2][2][is,sitting]
[1][3][2][cat,mats]
[2][3][1][dog]
output format 2:
#matrix format:
#[line 1]:[1 & 1][1 & 2][1 & 3]
#[line 2]:[2 & 1][2 & 2][2 & 3]
#[line 3]:[3 & 1][3 & 2][3 & 3]
[1]:[0][2][2]
[2]:[2][0][1]
[3]:[2][1][0]
thanks very much indeed
tony berber
Catholic University of Sao Paulo, Brazil
Applied Linguistics Postgraduate Program
tony4 at uol.com.br