Higher order Markov chains in Ruby - Ruby Quiz #74

April 11, 2006

Here’s my second solution for the quiz No 74. It generates text with a first or higher order Markov chains. See also the first order version.

A first order chain, for example the following one

@mc3 = MarkovChain.new()
@mc3.add(1,2) 
@mc3.add(3,4)

is stored as the following hash of hashes

1: ABS: {[2,1]} - REL: {[2,1.0]}
3: ABS: {[4,1]} - REL: {[4,1.0]}

For every key 1 and 3 the absolute and relative probabilities are stored.

A higher order chain, for example the following

@mc7 = MarkovChain.new(order = 2)
@mc7.add_elems([20,21,20,22,21,22,20,23])   

is also stored as a hash of hashes, but the keys are arrays

[20,21]: ABS: {[20,1]} - REL: {[20,1.0]}
[21,20]: ABS: {[22,1]} - REL: {[22,1.0]}
[20,22]: ABS: {[21,1]} - REL: {[21,1.0]}
[22,21]: ABS: {[22,1]} - REL: {[22,1.0]}
[21,22]: ABS: {[20,1]} - REL: {[20,1.0]}
[22,20]: ABS: {[23,1]} - REL: {[23,1.0]}

Because I am new to Ruby, I had to learn a few things about Arrays, Objects and dup(). I still do not understand this fully (perhaps it is to late here 00:46 CET), but in the method MarkovChain.add() i need to dup the arrays, otherwise some keys will get overwritten.

# Add an edge from node a to node b.

#

def add(a, b)
  # this dup is needed, otherwise elements will be overwritten

  a = a.dup() if a.instance_of?(Array)
  @absolute[a] ||= {} 
  @absolute[a][b] ||= 0
  @absolute[a][b] += 1
  @sum_edges[a] ||= 0
  @sum_edges[a] += 1
  @dirty[a] = true
  end

By the way, below is an example text I generated for order 3 after I fed 4 books of Chesterton into the chain. Beginning with order 3 I think the text looks like a patchwork or collage of some phrases and sentences of the original texts.

. Up to a certain point it was a wordless clamour startlingly close, and 
loud enough to be distinct if each word had not killed the other. No, 
replied Fisher. The Renaissance nobles of the Tudor time were like that. 
It is awful, I think it goes far enough! said Flambeau; but my 
information is fragmentary, and only twelve members of the Central 
Anarchist Council. The gentleman who has for some time past played, with 
propriety and general applause, the difficult part of Thursday, has died 
quite suddenly. Consequently, he was willing to go into Parliament as a 
yeoman or a gentleman or a Jacobite or an Ancient Briton, I should say 
that, Crane went on. This also tempted him, that in a more partial and 
also a more premature fashion, for his voice was colourless and sad. 
What I did next does matter: I gave him a rather wild stare, March had 
yet another unexpected emotion, for his guide hailed the man as Hoggs 
and introduced him as Sir Howard Horne, then introducing his so-called 
Socialist budget, and prepared to expound it in an interview with so
promising a penman. Harold March was to have the gold of Glengyle. So 
far the crime seemed clear enough; and while Boyle was looking into it 
he heard a thud behind him, and then a sudden stillness, as of a stone 
statue walking. He might have been firebrands. The mutinies simmered 
down; the men who must have known that particular house to be so 
accurately inaccurate. But what makes you think it a specimen. Put the 
same feather with a ribbon and an artificial flower and everyone will 
think it a wildcat, though it may have been a clever fellow, answered 
the priest. As they raced along to the gate out of which poured and 
rolled, not Roman, but very late, and that this, my successful
masquerade, might be of considerable value to the public, placed it here 
with his own pale blue ones, and said, Do you want me to tell you all 
you want to guess what he doing, keep behind him. About his life and 
fortune on the table, and went below the surface in a way that at once 
puzzled and reassured him. On the oranges was the equally clear and 
exact description, Finest Brazil nuts, 4d. a lb. M. Brun had a dark, 
handsome lady, of no little majesty, and rather like a mosaic palace, 
rent with earthquakes; or like a Dutch tulip garden blown to the stars. 
The other did not answer; he was free in free London, and drinking

MarkovChain

require 'pretty-print'

# 

# A Markov Chain contains a graph which edges are labeled with probabilities.

# The graph is stored as two hashes, one for the absolute probabilites, one for

# the relative probabilities. The nodes of the graph correspond to the keys of the hashes.

#

# The rand() method is worst case O(n), for small lists this is ok, but for

# large lists binary search will be better O(lg n)

#


class MarkovChain 

  attr_reader :order
  
  # Initializes the hashes.

  #

  def initialize(order = 1)
    @order = order
    @absolute = {}
    @relative = {}
    @sum_edges = {}
    @dirty = {}
  end
  
  # The nodes of the graph correspond to the keys of the hashes.

  #

  def nodes
    @absolute.keys()
  end
  
  # Add an edge from node a to node b.

  #

  def add(a, b)
    # this dup is needed, otherwise elements will be overwritten

    a = a.dup() if a.instance_of?(Array)
    @absolute[a] ||= {} 
    @absolute[a][b] ||= 0
    @absolute[a][b] += 1
    @sum_edges[a] ||= 0
    @sum_edges[a] += 1
    @dirty[a] = true
  end

  # Adds a list of elements pairwise.

  #

  def add_elems(elems)
    if ( 1 == @order )
      a = nil
      elems.each() do |b|
        add(a, b) if ( a )
        a = b
      end
    else
      a = []
      elems.each() do |b|
        if ( a.length() == @order )
          add(a, b)
          a.shift()
        end
        a.push(b)
      end
    end
  end

  # Calculates all the relative cumulative probabilities.

  #

  def recalc_all()
    @relative = @absolute.dup()
    @relative.each_pair do | a, hash |
      recalc(a) if @dirty[a]
    end
  end
  
  # Calculates the relative cumulative probabilities.

  #

  def recalc(a)
    cum = 0.0
    @relative[a] = @absolute[a].dup()
    sum = @sum_edges[a] * 1.0
    @relative[a].each_pair do | b, value|
      cum = cum + ( value / sum )
      @relative[a][b] = cum
    end
    @dirty.delete(a)
    @relative[a]
  end
  
  # Generates a random successor of node a according to the probabilities learned 

  # from the example texts.

  #

  def rand(a)
    recalc(a) if @dirty[a]
    if a.nil? || @relative[a].nil? || @relative[a].length() == 0
      return nil
    elsif @relative[a].length() == 1
      return @relative[a].keys[0]
    else
      # this is a linear search now with worst case O(n), TODO log(n) binary search

      value = Kernel.rand()
      candidates = @relative[a].select { |b, prob| prob > value }
      return candidates[0][0]
    end
  end

  def to_s()
    result = []
    result << "MarkovChain"
    @absolute.each_pair do | key, hash |
      result << "#{key.pretty_to_s()}: ABS: #{@absolute[key].pretty_to_s()} - REL: #{@relative[key].pretty_to_s()}"
    end
    result.join("\n")
  end

end

See the rest of the code in the code repository on github.