Here’s a simple Markov chain implementation in PHP, loosely adapted from this excellent write up about implementing Markov chains in javascript:
class Link {
private $nexts = array();
public function addNextWord($word) {
if (!is_string($word)) {
throw new Exception('addNextWord method in Link class is run with an string parameter');
}
if (!isset($this->nexts[$word])) {
$this->nexts[$word] = 0;
}
$this->nexts[$word]++;
}
public function getNextWord() {
$total = 0;
foreach($this->nexts as $word => $count) {
$total += $count;
}
$randomIndex = rand(1, $total);
$total = 0;
foreach($this->nexts as $word => $count) {
$total += $count;
if ($total >= $randomIndex) {
return $word;
}
}
}
}
class Chain {
private $words = array();
function __construct($words) {
if (!is_array($words)) {
throw new Exception('Chain class is instantiated with an array');
}
for($i = 0; $i < count($words); $i++) {
$word = (string) $words[$i];
if (!isset($this->words[$word])) {
$this->words[$word] = new Link();
}
if (isset($words[$i + 1])) {
$this->words[$word]->addNextWord($words[$i + 1]);
}
}
}
public function getChainOfLength($word, $i) {
if (!is_string($word)) {
throw new Exception('getChainOfLength method in Chain class is run with an string parameter');
}
if (!is_integer($i)) {
throw new Exception('getChainOfLength method should be called with an integer');
}
if (!isset($this->words[$word])) {
return '';
} else {
$chain = array($word);
for ($j = 0; $j < $i; $j++) {
$word = $this->words[$word]->getNextWord();
$chain[] = $word;
}
return implode(' ', $chain);
}
}
}
And here is an example of usage:
function get_all_words_in_file($file) {
return preg_split('/s+/ ', file_get_contents($file));
}
$file = 'testtext2.txt';
$words = get_all_words_in_file($file);
$chain = new Chain($words);
$newSentence = $chain->getChainOfLength('The', 200);
echo wordwrap($newSentence, 80, "n");
Conceptually, a Markov chain captures the idea of likelihood of traversing from state to state. You can populate this data for a block of text by passing through a block of text and counting the number of occurrences of words that follow a given word. You can then use this data to generate new blocks of text.