C
C3
I'm looking for, or willing to write, a program that will take a list of
files as command-line arguments, and then build up a frequency table of
n-grams (individual bytes, or strings of 2 or more bytes) for all these
files.
e.g. ngram 4 file1.txt file2.txt
would return the most frequently occurring sequences of 4 bytes over the two
files.
I am willing to go quick'n'dirty for this. I understand I need to build up a
table of all the n-grams that exist in each file. Can someone help me get
started on this?
cheers,
files as command-line arguments, and then build up a frequency table of
n-grams (individual bytes, or strings of 2 or more bytes) for all these
files.
e.g. ngram 4 file1.txt file2.txt
would return the most frequently occurring sequences of 4 bytes over the two
files.
I am willing to go quick'n'dirty for this. I understand I need to build up a
table of all the n-grams that exist in each file. Can someone help me get
started on this?
cheers,