Lucas Luitjes

Freelance dev/devops/security. Click here to learn more about my background and the services I offer.

Using XPath to rewrite Ruby code with ease

26 Nov 2017

Tools that rewrite Ruby code, such as rubocop, do so by using the excellent parser gem. The parser gem allows you to convert your Ruby code into an AST (abstract syntax tree). For a primer on this topic, see the introduction to the parser gem.

While building textractor we often found ourselves writing code to query and filter ASTs to find the exact node to modify. For example to programmatically turn <%= f.text_field :name, placeholder: "Your name" %> into <%= f.text_field :name, placeholder: t('.your_name') %> we need to find the node of the value for the placeholder key, in a hash that happens to be an argument for a text_field call.

It turns out there is already an excellent query language for searching tree structures: XPath! All we have to do is turn an AST into an XML tree, run the XPath query, and find the original AST node belonging to the matches.

TL;DR: This post shows you how to turn this:

if node.respond_to?(:type) && node.type == :send
  node.children.find do |c|
    c.respond_to?(:type) && c.type == :hash
  end&.children.find do |c|
    c.respond_to?(:type) && c.type == :pair && c.children.any? do |gc|
      gc.respond_to?(:type) && gc.type == :sym && gc.children.any? do |ggc|
        ggc == :placeholder
      end
    end
  end&.children.find do |c|
    c.respond_to?(:type) && c.type == :str
  end
end

Into this:

*/send/hash/pair[sym[symbol-val/@value="placeholder"]]/str

All right, let’s get started!

So what does the AST for our example input <%= f.text_field :name, placeholder: "Your name" %> look like?

(:send,
  s(:send, nil, :f), :text_field,
  s(:sym, :attr),
  s(:hash,
    s(:pair,
      s(:sym, :placeholder),
      s(:str, "placeholder text"))))

We need to recursively convert that data structure into XML. Here’s a short class that does exactly that:

class XMLAST
  include REXML
  attr_reader :doc

  def initialize sexp
    @doc = Document.new "<root></root>"
    @sexp = sexp
    root = @doc.root
    populate_tree(root, sexp)
  end

  def populate_tree xml, sexp
    if sexp.is_a?(String) ||
       sexp.is_a?(Symbol) ||
       sexp.is_a?(Numeric) ||
       sexp.is_a?(NilClass)
      el = Element.new(sexp.class.to_s.downcase + "-val")
      xml.add_element el
    else
      el = Element.new(sexp.type.to_s)
      sexp.children.each{ |n| populate_tree(el, n) }
      xml.add_element el
    end
  end
end

We use REXML because it comes with the Ruby standard library. So far performance has been good, but if XML/XPath processing becomes your bottleneck, it’s easy enough to replace with nokogiri.

Let’s see it in action:

irb> require 'parser/current'
irb> require 'rexml/document'
irb> snippet = 'f.text_field :attr, placeholder: "placeholder text"'
irb> exp = Parser::CurrentRuby.parse(snippet)
irb> xml = XMLAST.new(exp)
irb> xml.doc.write(STDOUT, 2)
<root>
  <send>
    <send>
      <nilclass-val />
      <symbol-val />
    </send>
    <symbol-val />
    <sym>
      <symbol-val />
    </sym>
    <hash>
      <pair>
        <sym>
          <symbol-val />
        </sym>
        <str>
          <string-val />
        </str>
      </pair>
    </hash>
  </send>
</root>

However, if we want to be able to query on the values of literals, we’ll also need to add a value attribute:

def populate_tree xml, sexp
  if sexp.is_a?(String) ||
      sexp.is_a?(Symbol) ||
      sexp.is_a?(Numeric) ||
      sexp.is_a?(NilClass)
    el = Element.new(sexp.class.to_s.downcase + "-val")

    # Add value attribute
    el.add_attribute 'value', sexp.to_s

    xml.add_element el
  else
    el = Element.new(sexp.type.to_s)
    sexp.children.each{ |n| populate_tree(el, n) }
    xml.add_element el
  end
end

Now our XML looks like this:

<root>
  <send>
    <send>
      <nilclass-val value=''/>
      <symbol-val value='f'/>
    </send>
    <symbol-val value='text_field'/>
    <sym>
      <symbol-val value='attr'/>
    </sym>
    <hash>
      <pair>
        <sym>
          <symbol-val value='placeholder'/>
        </sym>
        <str>
          <string-val value='placeholder text'/>
        </str>
      </pair>
    </hash>
  </send>
</root>

Time to try out some XPath. First, we add a convenience method to our XMLAST class:

def xpath path
  XPath.match(doc, path)
end

Let’s try it:

irb> nodes = xml.xpath('*/send/hash/pair[sym[symbol-val/@value="placeholder"]]/str')
=> "[<str> ... </>]"
irb> nodes.first.to_s
=> "<str><string-val value='placeholder text'/></str>"

Pretty neat! But we’re not quite there yet. If we want to do anything useful with the results, we’ll need the original Ruby objects representing AST nodes. We could cheat and convert the results XML into a new AST, but that would almost certainly break the rewriter library built into the parser gem. Not to mention being horribly inefficient.

So instead we will add a bit of metadata to our XML tree, specifically the Ruby object IDs of the original nodes. Fortunately this is as easy as node.object_id:

def populate_tree xml, sexp
  if sexp.is_a?(String) ||
      sexp.is_a?(Symbol) ||
      sexp.is_a?(Numeric) ||
      sexp.is_a?(NilClass)
    el = Element.new(sexp.class.to_s.downcase + "-val")
    el.add_attribute 'value', sexp.to_s
    xml.add_element el
  else
    el = Element.new(sexp.type.to_s)

    # Add the ruby object id
    el.add_attribute('id', sexp.object_id)

    sexp.children.each{ |n| populate_tree(el, n) }
    xml.add_element el
  end
end

Which results in the following XML:

<root>
  <send id='47036300347960'>
    <send id='47036300353160'>
      <nilclass-val value=''/>
      <symbol-val value='f'/>
    </send>
    <symbol-val value='text_field'/>
    <sym id='47036300352660'>
      <symbol-val value='attr'/>
    </sym>
    <hash id='47036300348380'>
      <pair id='47036300348760'>
        <sym id='47036300348840'>
          <symbol-val value='placeholder'/>
        </sym>
        <str id='47036300349540'>
          <string-val value='placeholder text'/>
        </str>
      </pair>
    </hash>
  </send>
</root>

Now that we have the original object IDs in our XML output, we can walk the tree to find the original nodes. The implementation below is not very efficient, but it is very short. Optimizing the performance of a recursive tree walk is left as an exercise to the reader.

First, we need a way to recursively add all nodes to an array:

def treewalk sexp=@sexp
  return sexp unless sexp&.respond_to?(:children)
  [sexp, sexp.children.map {|n| treewalk(n) }].flatten
end

Then, we can use this to find our matching object ID:

def xpath path
  results = XPath.match(doc, path)
  results.map do |n|
    if n.respond_to?(:attributes) && n.attributes['id']
      treewalk.find do |m| 
        m.object_id.to_s == n.attributes['id']
      end
    else
      n
    end
  end
end

And here we are, a very quick and expressive way to juggle your ASTs:

irb> xml.xpath('*/send/hash/pair[sym[symbol-val/@value="placeholder"]]/str')
=> [s(:str, "placeholder text")]

See the complete source at the bottom of this post. If you want to further shorten your XPaths you could add more metadata to your XML tree. For example in textractor, if we encounter a send node (a method call) we automatically add message=”method_name” to the XML element. This allows us to write XPath such as send[@message="form_for"]. We are currently developing multiple products using this library. Once the XML format stabilizes, we plan to extract the library from our product and release a gem. If you are interested in using these techniques in your project, we’d love to help! Send us an email at info@snootysoftware.com.


At Snooty Software, we develop tools that programmatically modify code. Our first product, Textractor, takes an existing Rails project and prepares your ERB views for translation by replacing string literals with t() calls.

Complete source:

require 'parser/current'
require 'rexml/document'

class XMLAST
  include REXML
  attr_reader :doc

  def initialize sexp
    @doc = Document.new "<root></root>"
    @sexp = sexp
    root = @doc.root
    populate_tree(root, sexp)
  end

  def populate_tree xml, sexp
    if sexp.is_a?(String) ||
        sexp.is_a?(Symbol) ||
        sexp.is_a?(Numeric) ||
        sexp.is_a?(NilClass)
      el = Element.new(sexp.class.to_s.downcase + "-val")
      el.add_attribute 'value', sexp.to_s
      xml.add_element el
    else
      el = Element.new(sexp.type.to_s)
      el.add_attribute('id', sexp.object_id)

      sexp.children.each{ |n| populate_tree(el, n) }
      xml.add_element el
    end
  end

  def treewalk sexp=@sexp
    return sexp unless sexp&.respond_to?(:children)
    [sexp, sexp.children.map {|n| treewalk(n) }].flatten
  end

  def xpath path
    results = XPath.match(doc, path)
    results.map do |n|
      if n.respond_to?(:attributes) && n.attributes['id']
        treewalk.find do |m| 
          m.object_id.to_s == n.attributes['id']
        end
      else
        n
      end
    end
  end
end

# snippet = 'f.text_field :attr, placeholder: "placeholder text"'
# exp = Parser::CurrentRuby.parse(snippet)
# xml = XMLAST.new(exp)
# xml.xpath('*/send/hash/pair[sym[symbol-val/@value="placeholder"]]/str')
# => [s(:str, "placeholder text")]