count word repetitions for each pair of lines

T

tony

I wonder if anyone has a script that could do the following?

-Read a file containing a regular text (e.g. a news story);
-Count word type repetitions for each pair of lines, disregarding
numbers and ignoring case. A valid repetition is a word type that
occurs in both lines of the pair. Words that occur many times in one
line of the pair only must be disregarded;
-Print out the count for word repetitions for each sentence pair in
the formats shown below ;

For instance:

input:
cat: cat is sitting on 2 mats.
Dog: dog is sitting.
dog, CAT and 2 mATs.

output format 1:
#format:
#[line][line][repetitions][words repeated]
[1][2][2][is,sitting]
[1][3][2][cat,mats]
[2][3][1][dog]

output format 2:
#matrix format:
#[line 1]:[1 & 1][1 & 2][1 & 3]
#[line 2]:[2 & 1][2 & 2][2 & 3]
#[line 3]:[3 & 1][3 & 2][3 & 3]
[1]:[0][2][2]
[2]:[2][0][1]
[3]:[2][1][0]

thanks very much indeed

tony berber

Catholic University of Sao Paulo, Brazil
Applied Linguistics Postgraduate Program
tony4 at uol.com.br
 
J

Jürgen Exner

tony said:
I wonder if anyone has a script that could do the following?

-Read a file containing a regular text (e.g. a news story);
-Count word type repetitions for each pair of lines, disregarding
numbers and ignoring case. A valid repetition is a word type that
occurs in both lines of the pair. Words that occur many times in one
line of the pair only must be disregarded;
-Print out the count for word repetitions for each sentence pair in
the formats shown below ;

Pretty simple:
- split() both lines into arrays of words, filter out numbers and similar
unwanted stuff
- then apply the solution from the FAQ
"How do I compute the difference of two arrays? How do I compute
the intersection of two arrays?"
- and then just print the result

Where's the problem?

jue
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,764
Messages
2,569,566
Members
45,041
Latest member
RomeoFarnh

Latest Threads

Top