n-order markov text script?

F

Fabian Pilkowski

I'm looking for a markov text script that will generate markov texts of
order n. The best I can find on the net
(http://www.beetleinabox.com/markov.html) will only do order 2.

Does anyone know where I can find source code for what I require?

Not exactly since I'm unfamiliar with the Markov Algorithm. But a search
on CPAN for that gives some results. Have a look at it.

http://search.cpan.org/search?query=markov

Perhaps the module Algorithm::MarkovChain could do what you want.

regards,
fabian
 
P

Peter A. Krupa

Lincoln D. Stein has a web page with a travesty generator and the
source: http://stein.cshl.org/~lstein/mangler.cgi

But I think it generates the travesty word-by-word. Here's an execrable
nth order Markov text generator I whipped out. Be aware that if you're
going to generate a high order travesty you need a lot of text to get
good results.

* * *

$text = "When in the Course of human events, it becomes necessary for
one people to dissolve the political bands which have connected them
with another, and to assume among the powers of the earth, the separate
and equal station to which the Laws of Nature and of Nature's God
entitle them, a decent respect to the opinions of mankind requires that
they should declare the causes which impel them to the separation.";

$order = 5;

# Break the text into chunks and store them in a hash table.
for ( $i = 0; $i < length ( $text ) - $order; $i++ )
{
$h{ substr ( $text, $i, $order ) }++;
}

# Create a list of the text chunks, weighted by occurrence.
foreach $key ( sort ( keys ( %h ) ) )
{
for ( $i = 0; $i < $h{ $key }; $i++ )
{
push ( @a, $key );
}
}

$str = substr ( $text, 0, $order - 1 );

print ( $str );

for ( $i = 0; $i < 1000; $i++ )
{
for ( $j = 0; $j <= $#a; $j++ )
{
if ( $a[ $j ] =~ /$str./ )
{
$first = $j;

last;
}
}

$last = $first;

for ( $j = $first + 1; $j <= $#a; $j++ )
{
if ( $a[ $j ] !~ /$str./ )
{
$last = $j - 1;

last;
}
}

$r = int ( rand ( $last - $first + 1 ) );

print ( substr ( $a[ $first + $r ], $order - 1 ) );

$str = substr ( $a[ $first + $r ], 1 );
}

* * *

H:\>perl travesty.pl
When in the opinions of human events, it becomes necessary for one
people the opinions of human events, it becomes necessary for one people
the Course of Nature's God entitle the earth, them to which the
political bands which impel the separate and to the political bands
which them to dissolve they should decent respect to dissolve the
opinions of the separation to dissolve them to which have connected the
Course of mankind requires that them to the Course of mankind requires
that them with another, and of mankind requires that the opinions of the
political bands which the opinions of Nature's God entitle to them, a
declare them, a decent respect to the separate and to assume among them
to which impel the opinions of Nature's God entitle to dissolve the
causes which they should decent respect to them, a declare the Course of
mankind requires that the earth, them to the separation to they should
declare them with another, and equal station to assume among them to
which have connected the ea
 
I

Ilmari Karonen

I'm looking for a markov text script that will generate markov texts of
order n. The best I can find on the net
(http://www.beetleinabox.com/markov.html) will only do order 2.

The code you refer to (http://cm.bell-labs.com/cm/cs/tpop/markov.pl)
can easily be modified to use any prefix length. For example, below
is a straightforward modification for a 3-word prefix. I've made no
other changes, so you can easily compare the two versions.

# Copyright (C) 1999 Lucent Technologies
# Excerpted from 'The Practice of Programming'
# by Brian W. Kernighan and Rob Pike
# Modified by Ilmari Karonen

# markov.pl: markov chain algorithm for 3-word prefixes

$MAXGEN = 10000;
$NONWORD = "\n";
$w1 = $w2 = $w3 = $NONWORD; # initial state
while (<>) { # read each line of input
foreach (split) {
push(@{$statetab{$w1}{$w2}{$w3}}, $_);
($w1, $w2, $w3) = ($w2, $w3, $_); # multiple assignment
}
}
push(@{$statetab{$w1}{$w2}{$w3}}, $NONWORD); # add tail

$w1 = $w2 = $w3 = $NONWORD;
for ($i = 0; $i < $MAXGEN; $i++) {
$suf = $statetab{$w1}{$w2}{$w3};# array reference
$r = int(rand @$suf); # @$suf is number of elems
exit if (($t = $suf->[$r]) eq $NONWORD);
print "$t\n";
($w1, $w2, $w3) = ($w2, $w3, $t); # advance chain
}

In fact, it's quite easy to make it use an arbitrary prefix length
given on the command line. Unlike above, for this version I've
modernized the code a bit.

#!/usr/bin/perl -w
use strict;

my $length = int shift; # length from command line
die "Invalid prefix length $length.\n" if $length < 1;

use constant MAXGEN => 10000;
use constant NONWORD => "";

my %statetab;

my @w = (NONWORD) x $length; # initial state
while (<>) { # read each line of input
foreach (split) {
push @{$statetab{"@w"}}, $_;
shift @w; push @w, $_;
}
}
push @{$statetab{"@w"}}, NONWORD; # add tail

@w = (NONWORD) x $length;
foreach (1 .. MAXGEN) {
my $suf = $statetab{"@w"}; # array reference
my $t = $suf->[rand @$suf]; # @$suf is number of elems
last if $t eq NONWORD;
print "$t\n";
shift @w; push @w, $t; # advance chain
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top