Here’s a simple Markov chain implementation in PHP, loosely adapted from this excellent write up about implementing Markov chains in javascript:
class Link { private $nexts = array(); public function addNextWord($word) { if (!is_string($word)) { throw new Exception('addNextWord method in Link class is run with an string parameter'); } if (!isset($this->nexts[$word])) { $this->nexts[$word] = 0; } $this->nexts[$word]++; } public function getNextWord() { $total = 0; foreach($this->nexts as $word => $count) { $total += $count; } $randomIndex = rand(1, $total); $total = 0; foreach($this->nexts as $word => $count) { $total += $count; if ($total >= $randomIndex) { return $word; } } } } class Chain { private $words = array(); function __construct($words) { if (!is_array($words)) { throw new Exception('Chain class is instantiated with an array'); } for($i = 0; $i < count($words); $i++) { $word = (string) $words[$i]; if (!isset($this->words[$word])) { $this->words[$word] = new Link(); } if (isset($words[$i + 1])) { $this->words[$word]->addNextWord($words[$i + 1]); } } } public function getChainOfLength($word, $i) { if (!is_string($word)) { throw new Exception('getChainOfLength method in Chain class is run with an string parameter'); } if (!is_integer($i)) { throw new Exception('getChainOfLength method should be called with an integer'); } if (!isset($this->words[$word])) { return ''; } else { $chain = array($word); for ($j = 0; $j < $i; $j++) { $word = $this->words[$word]->getNextWord(); $chain[] = $word; } return implode(' ', $chain); } } }
And here is an example of usage:
function get_all_words_in_file($file) { return preg_split('/s+/ ', file_get_contents($file)); } $file = 'testtext2.txt'; $words = get_all_words_in_file($file); $chain = new Chain($words); $newSentence = $chain->getChainOfLength('The', 200); echo wordwrap($newSentence, 80, "n");
Conceptually, a Markov chain captures the idea of likelihood of traversing from state to state. You can populate this data for a block of text by passing through a block of text and counting the number of occurrences of words that follow a given word. You can then use this data to generate new blocks of text.