Visualizing Trees and ASTs with Graphviz

Visualization is a great opportunity to see some complex structures from another side. Let’s assume we want to inspect how trees actually look. We will use Breadth-first search (BFS) to traverse graph (or tree in our case) and show connections between nodes.

def bfs(node)
  queue = [node]
  discovered = []
  while queue.length > 0 do
    current = queue.shift
    discovered << current
    current.children.each do |child|
      queue << child if child
    end
  end
  discovered
end

The awesome fact about BFS is that it traverse nodes from left to right, so, we will traverse tree level-by-level. Let’s assume we already have tree class. We start from filling a tree:

tree = Tree.new
[8, 4, 12, 2, 6, 10, 14, 1, 3, 5, 7, 9, 11, 13, 15].each do |value|
  tree.insert(value)
end

And now we will traverse all tree nodes and render it with left and right child:

([["Node", "Left", "Right"]] + bfs(tree.root).map do |node|
  [node.value, node.left&.value || "NULL", node.right&.value || "NULL"]
end).each { |n| puts n.join(" ") }

This code produces the following table (additionally I piped it through column -t):

Node  Left  Right
8     4     12
4     2     6
12    10    14
2     1     3
6     5     7
10    9     11
14    13    15
1     NULL  NULL
3     NULL  NULL
5     NULL  NULL
7     NULL  NULL
9     NULL  NULL
11    NULL  NULL
13    NULL  NULL
15    NULL  NULL

It’s time to draw a tree! We will use graphviz for rendering.

According to Graphviz documentation we need to simply describe edges in proper format and graphviz will render the graph. Sounds easy. We will generate .dot file and then render it. To reduce complexity of manipulating with files we will write output to stdout and then pipe it into the graphviz tool. It will look like:

ruby tree.rb | dot | neato -n -Tpng -o tree.png

Additionally, to decrease feedback loop (in case if you are using iTerm), you could use build-in command imgcat, so it will look like:

ruby tree.rb | dot | neato -n -Tpng | imgcat

This code will generate our first node graph:

def render_tree(tree)
  result = []
  result << "digraph graphname {"  # graph type
  result << "node [shape=circle];" # we want pretty circles as nodes

  bfs(tree).each do |node|
    node.children.each do |child|
      next if child.nil?
      result << [node.value, child.value].join(" -> ")
    end
  end

  result << "}"
  result.join("\n")
end

First render

It will generate Graphviz file with the following content:

digraph graphname {
node [shape=circle];
8 -> 4
8 -> 12
4 -> 2
4 -> 6
12 -> 10
12 -> 14
2 -> 1
2 -> 3
6 -> 5
6 -> 7
10 -> 9
10 -> 11
14 -> 13
14 -> 15
}

But there are some problems. Let’s assume we have more than one node with the same value ([5, 3, 7, 3, 7, 5]):

Something goes wrong

Hmm…, its not what we expect. We need more unique identificator and Ruby provides object_id method, which we can use to get unique identifier for each node.

Using object_ids

Ok, it’s better, but still not so good, we solved one problem but there is another one. Graphviz allows you to define how to render nodes. In order to do this, you need to append a line with label’s contents.

New code:

def render_tree(tree)
  result = []
  result << "digraph graphname {"
  result << "node [shape=circle];"

  bfs(tree).each do |node|
    result << "#{node.object_id} [label=#{node.value}]"
    node.children.each do |child|
      next if child.nil?
      result << [node.object_id, child.object_id].join(" -> ")
    end
  end

  result << "}"
  result.join("\n")
end

This is file generated by our method:

digraph graphname {
node [shape=circle];
70120168878200 [label=5]
70120168878200 -> 70120168878140
70120168878200 -> 70120168878120
70120168878140 [label=3]
70120168878140 -> 70120168878100
70120168878120 [label=7]
70120168878120 -> 70120168878000
70120168878120 -> 70120168878020
70120168878100 [label=3]
70120168878000 [label=5]
70120168878020 [label=7]
}

Result is:

First acceptable result

Ok, nice, but it’s not actually a binary tree. There is a nice answer on StackOverlow which gives you idea how to do render this tree as binary tree: https://stackoverflow.com/a/10905657.

First actual binary tree

Yes! So nice, actually what we want. But…

Binary tree with problems

Hey! Node with value 6 is on the right of node with value 7 — is something wrong with the tree insertion code?

Node  Left  Right
5     3     7
3     NULL  4
7     6     NULL
4     NULL  NULL
6     NULL  NULL

Seems like not. We need to try something else and there is question on Graphviz site which should help us: http://www.graphviz.org/content/FaqBalanceTree. According to that we need to add invisible nodes to our tree. Additionally we need to add invisible nodes for nodes with just one leaf. To implement it we introduce new class InvisibleNode which we inherit from Node class.

class TreeRenderer
  InvisibleNode = Class.new(Node)

  def initialize(tree)
    @tree = tree
  end

  def call
    result = []
    result << "digraph graphname {"
    result << "node [shape=circle];"

    bfs(tree).each do |node|
      result << ["#{node.object_id}", %{[label=#{node.value} fixedsize=true]}].join(" ")
      fix_missing_children(node).each do |child|
        case child
          when InvisibleNode
            result << [child.object_id, %{[label="" width=.1 style=invis]}].join(" ")
            result << [[node.object_id, child.object_id].join(" -> "), '[style=invis]'].join(" ")
          when Node
            result << [[node.object_id, child.object_id].join(" -> ")].join
        end
      end
    end

    result << "}"
    result.join("\n")
  end

  private
  attr_reader :tree

  def fix_missing_children(node)
    node
      .children
      .map { |child| child || generate_fake_child(node) }
      .insert(1, generate_fake_child(node))
  end

  def generate_fake_child(node)
    InvisibleNode.new.tap { |child| child.parent = node }
  end
end

It’s not so cool now, because, actually, it’s complex. We could delegate nodes rendering to the Node class, but I think it’s not perfect idea to change actual “business-logic” code for rendering.

Right binary tree

Additionally it will be nice to have ability to show NULL values, for debug purposes.

class TreeRenderer
  InvisibleNode = Class.new(Node)
  NilNode = Class.new(Node)

  def initialize(tree, with_nil)
    @tree = tree
    @with_nil = with_nil
  end

  def call
    result = []
    result << "digraph graphname {"
    result << "node [shape=circle];"

    bfs(tree).each do |node|
      result << ["#{node.object_id}", %{[label=#{node.value} fixedsize=true]}].join(" ")
      fix_missing_children(node).each do |child|
        case child
          when InvisibleNode
            result << [child.object_id, %{[label="" width=.1 style=invis]}].join(" ")
            result << [[node.object_id, child.object_id].join(" -> "), '[style=invis]'].join(" ")
          when NilNode
            result << [child.object_id, %{[label="NULL" shape="box" style="#{with_nil ? '' : 'invis'}"]}].join(" ")
            result << [[node.object_id, child.object_id].join(" -> "), %{[style="#{with_nil ? '' : 'invis'}"]}].join(" ")
          when Node
            result << [[node.object_id, child.object_id].join(" -> ")].join
        end
      end
    end

    result << "}"
    result.join("\n")
  end

  private
  attr_reader :tree, :with_nil

  def fix_missing_children(node)
    node
      .children
      .map { |child| child || generate_nil_child(node) }
      .insert(1, generate_fake_child(node))
  end

  def generate_nil_child(node)
    NilNode.new.tap { |child| child.parent = node }
  end

  def generate_fake_child(node)
    InvisibleNode.new.tap { |child| child.parent = node }
  end
end

And the result is:

Binary tree with NULLs

How can you use it? With minimal changes you can render any binary tree. For example here is RB-tree rendered this way.

RB tree with NULLs

I used this code to debug my RB-tree implementation. There are dotted edges which represents child-parent connection. So, I think you can easily find what’s wrong here.

RB tree with debug

The same renderer works on any tree structure. Let’s apply it to something more practical.

Parsing Ruby ASTs

An AST is just a tree, so the renderer above works with minor adjustments. The Ruby parser gem exposes the AST for any source file. The canonical approach would be the Visitor pattern, but we can keep it simpler:

require 'parser/current'

class ASTRenderer
  def initialize(tree)
    @tree = tree
  end

  def call
    result = []
    result << "digraph graphname {"

    bfs(@tree).each do |node|
      result << ["#{node.object_id}", %Q{[label="#{present(node)}"]}].join(" ")
      node.children.each do |child|
        next unless child.is_a?(Parser::AST::Node)
        result << [[node.object_id, child.object_id].join(" -> ")].join
      end
    end

    result << "}"
    result.join("\n")
  end

  private

  def present(node)
    return node.to_s unless node.is_a?(Parser::AST::Node)
    case node.type
    when :begin  then  "(begin)"
    when :if     then  "(if)"
    when :array  then  "(array)"
    when :send   then  "(send :#{node.to_a[1]})"
    when :lvar   then  "(lvar :#{node.to_a[0]})"
    when :def    then  "(def :#{node.to_a[0]})"
    when :str    then  "(str :#{node.to_a[0]})"
    when :args   then  "(args)"
    when :return then  "(return)"
    else node.to_s
    end
  end

  def bfs(node)
    queue = [node]
    discovered = []
    while queue.length > 0 do
      current = queue.shift
      next unless current.is_a?(Parser::AST::Node)
      discovered << current
      current.children.each do |child|
        queue << child if child
      end
    end
    discovered
  end
end

source_code = <<RUBY
  def factorial(n)
    return 1 if [1, 2].includes?(n)
    factorial(n - 1) + factorial(n - 2)
  end
RUBY
ast = Parser::CurrentRuby.parse(source_code)
puts ASTRenderer.new(ast).call

Result:

AST tree

Result with comments:

AST tree with comments

The changes from the binary tree renderer are minimal: no binary tree layout code, non-Parser::AST::Node children are skipped, and node labels come from the AST node type. Even on this small example the full AST is hard to read — there’s too much detail. But we can filter it down.

Let’s look at what a class definition node looks like:

class Foo < Bar
  def body; end
end
s(:class,
  s(:const, nil, :Foo),
  s(:const, nil, :Bar),
  s(:def, :body,
    s(:args), nil))

s is s-expression notation for nested lists. The class node has three arguments: the class name constant, the parent class constant, and the body.

Building a Class Dependency Diagram

Let’s catch class definitions and store them:

class KlassDefinitionsProcessor < Parser::AST::Processor
  attr_reader :klass_definitions

  def initialize(*)
    super
    @klass_definitions = []
  end

  def on_class(node)
    klass_konst, _parrent, _body = *node
    _nested_konst, konst_name = *klass_konst
    @klass_definitions << konst_name
    super
  end
end
source_code = <<RUBY
class Foo; end
class Bar; end
RUBY

ast = Parser::CurrentRuby.parse(source_code)
processor = KlassDefinitionsProcessor.new
processor.process(ast)
p processor.klass_definitions
# => [:Foo, :Bar]

Good start. But namespaced constants like Foo::Bar::Baz get truncated. Fix: use klass_konst.loc.expression.source instead of destructuring:

  def on_class(node)
    klass_konst, _parrent, _body = *node
    @klass_definitions << klass_konst.loc.expression.source
    super
  end
class Foo::Bar::Baz; end
class Oof::Zab; end
# => ["Foo::Bar::Baz", "Oof::Zab"]

Now for the dependency edges. We need to track which constants are referenced inside each class. The problem: standard AST processing is top-down, but we need to know the enclosing class for any given constant. The solution is a nesting stack — push on entry, pop on exit:

  def process(node)
    @nesting.push(node)
    super
    @nesting.pop
  end

Now we can easily access the current node’s parent:

  def parent
    @nesting.last(2).first
  end

and find in which class we are processing current node:

  def context_klass
    @nesting.reverse.select { |x| x.type == :class }.first
  end

Which gives us useful helpers:

  def klass_name_for(node)
    klass_konst, _, _ = *node
    source_for(klass_konst)
  end

  def source_for(node)
    node.loc.expression.source
  end

Wiring it together — collect class definitions and constant references:

  def on_class(node)
    @klass_definitions << klass_name_for(node)
    super
  end

  def on_const(node)
    konst_name = source_for(node)
    current_klass = klass_name_for(context_klass)
    (@klass_dependencies[current_klass] ||= []) << konst_name
    super
  end
class Foo::Bar::Baz; end
class Oof::Zab; def call; Foo::Bar::Baz.new; end; end

# processor.klass_dependencies
# => {
# "Foo::Bar::Baz"=>["Foo::Bar::Baz", "Foo::Bar", "Foo"],
# "Oof::Zab"=>["Oof::Zab", "Oof", "Foo::Bar::Baz", "Foo::Bar", "Foo"]
# }

Too noisy — nested constants trigger on_const for each segment, and the class name itself is included. Skip those cases:

  def on_const(node)
    unless %i[class const].include?(parent.type)
      konst_name = source_for(node)
      if context_klass
        current_klass = klass_name_for(context_klass)
        (@klass_dependencies[current_klass] ||= []) << konst_name
      end
    end
    super
  end
class Foo::Bar::Baz; end
class Oof::Zab; def call; Foo::Bar::Baz.new; end; end
# {"Oof::Zab"=>["Foo::Bar::Baz"]}

Now render it with Graphviz:

class GraphRenderer
  def initialize(relations, klasses)
    @relations = relations
    @klasses = klasses
  end

  def call
    [
      begin_header,
      klasses_info,
      relations_info,
      end_header
    ].flatten.join("\n")
  end

  private

  def begin_header
    [
      %(digraph graphname {),
      %(rankdir="LR")
    ]
  end

  def klasses_info
    @klasses.map do |klass|
      [wrap(klass), %Q{[label="#{klass}"]}].join(" ")
    end
  end

  def relations_info
    @relations.map do |klass, dependents|
      dependents.map do |child|
        [wrap(klass), wrap(child)].join(' -> ')
      end
    end
  end

  def end_header
    [%(})]
  end

  def wrap(text)
    '"' + text + '"'
  end
end

Relations graph

One remaining limitation: Ruby’s namespace resolution means B::C inside module A and A::B::C from outside resolve to the same class, but our processor treats them as different constants. Fully resolving this requires implementing Ruby’s constant lookup rules — left as a future exercise.