Tools that rewrite Ruby code, such as rubocop, do so by using the excellent parser gem. The parser gem allows you to convert your Ruby code into an AST (abstract syntax tree). For a primer on this topic, see the introduction to the parser gem.
While building textractor we often found ourselves writing code to query and filter ASTs to find the exact node to modify. For example to programmatically turn <%= f.text_field :name, placeholder: "Your name" %>
into <%= f.text_field :name, placeholder: t('.your_name') %>
we need to find the node of the value for the placeholder
key, in a hash that happens to be an argument for a text_field
call.
It turns out there is already an excellent query language for searching tree structures: XPath! All we have to do is turn an AST into an XML tree, run the XPath query, and find the original AST node belonging to the matches.
TL;DR: This post shows you how to turn this:
if node.respond_to?(:type) && node.type == :send
node.children.find do |c|
c.respond_to?(:type) && c.type == :hash
end&.children.find do |c|
c.respond_to?(:type) && c.type == :pair && c.children.any? do |gc|
gc.respond_to?(:type) && gc.type == :sym && gc.children.any? do |ggc|
ggc == :placeholder
end
end
end&.children.find do |c|
c.respond_to?(:type) && c.type == :str
end
end
Into this:
*/send/hash/pair[sym[symbol-val/@value="placeholder"]]/str
All right, let’s get started!
So what does the AST for our example input <%= f.text_field :name, placeholder: "Your name" %>
look like?
(:send,
s(:send, nil, :f), :text_field,
s(:sym, :attr),
s(:hash,
s(:pair,
s(:sym, :placeholder),
s(:str, "placeholder text"))))
We need to recursively convert that data structure into XML. Here’s a short class that does exactly that:
class XMLAST
include REXML
attr_reader :doc
def initialize sexp
@doc = Document.new "<root></root>"
@sexp = sexp
root = @doc.root
populate_tree(root, sexp)
end
def populate_tree xml, sexp
if sexp.is_a?(String) ||
sexp.is_a?(Symbol) ||
sexp.is_a?(Numeric) ||
sexp.is_a?(NilClass)
el = Element.new(sexp.class.to_s.downcase + "-val")
xml.add_element el
else
el = Element.new(sexp.type.to_s)
sexp.children.each{ |n| populate_tree(el, n) }
xml.add_element el
end
end
end
We use REXML because it comes with the Ruby standard library. So far performance has been good, but if XML/XPath processing becomes your bottleneck, it’s easy enough to replace with nokogiri.
Let’s see it in action:
irb> require 'parser/current'
irb> require 'rexml/document'
irb> snippet = 'f.text_field :attr, placeholder: "placeholder text"'
irb> exp = Parser::CurrentRuby.parse(snippet)
irb> xml = XMLAST.new(exp)
irb> xml.doc.write(STDOUT, 2)
<root>
<send>
<send>
<nilclass-val />
<symbol-val />
</send>
<symbol-val />
<sym>
<symbol-val />
</sym>
<hash>
<pair>
<sym>
<symbol-val />
</sym>
<str>
<string-val />
</str>
</pair>
</hash>
</send>
</root>
However, if we want to be able to query on the values of literals, we’ll also need to add a value attribute:
def populate_tree xml, sexp
if sexp.is_a?(String) ||
sexp.is_a?(Symbol) ||
sexp.is_a?(Numeric) ||
sexp.is_a?(NilClass)
el = Element.new(sexp.class.to_s.downcase + "-val")
# Add value attribute
el.add_attribute 'value', sexp.to_s
xml.add_element el
else
el = Element.new(sexp.type.to_s)
sexp.children.each{ |n| populate_tree(el, n) }
xml.add_element el
end
end
Now our XML looks like this:
<root>
<send>
<send>
<nilclass-val value=''/>
<symbol-val value='f'/>
</send>
<symbol-val value='text_field'/>
<sym>
<symbol-val value='attr'/>
</sym>
<hash>
<pair>
<sym>
<symbol-val value='placeholder'/>
</sym>
<str>
<string-val value='placeholder text'/>
</str>
</pair>
</hash>
</send>
</root>
Time to try out some XPath. First, we add a convenience method to our XMLAST class:
def xpath path
XPath.match(doc, path)
end
Let’s try it:
irb> nodes = xml.xpath('*/send/hash/pair[sym[symbol-val/@value="placeholder"]]/str')
=> "[<str> ... </>]"
irb> nodes.first.to_s
=> "<str><string-val value='placeholder text'/></str>"
Pretty neat! But we’re not quite there yet. If we want to do anything useful with the results, we’ll need the original Ruby objects representing AST nodes. We could cheat and convert the results XML into a new AST, but that would almost certainly break the rewriter library built into the parser gem. Not to mention being horribly inefficient.
So instead we will add a bit of metadata to our XML tree, specifically the Ruby object IDs of the original nodes. Fortunately this is as easy as node.object_id
:
def populate_tree xml, sexp
if sexp.is_a?(String) ||
sexp.is_a?(Symbol) ||
sexp.is_a?(Numeric) ||
sexp.is_a?(NilClass)
el = Element.new(sexp.class.to_s.downcase + "-val")
el.add_attribute 'value', sexp.to_s
xml.add_element el
else
el = Element.new(sexp.type.to_s)
# Add the ruby object id
el.add_attribute('id', sexp.object_id)
sexp.children.each{ |n| populate_tree(el, n) }
xml.add_element el
end
end
Which results in the following XML:
<root>
<send id='47036300347960'>
<send id='47036300353160'>
<nilclass-val value=''/>
<symbol-val value='f'/>
</send>
<symbol-val value='text_field'/>
<sym id='47036300352660'>
<symbol-val value='attr'/>
</sym>
<hash id='47036300348380'>
<pair id='47036300348760'>
<sym id='47036300348840'>
<symbol-val value='placeholder'/>
</sym>
<str id='47036300349540'>
<string-val value='placeholder text'/>
</str>
</pair>
</hash>
</send>
</root>
Now that we have the original object IDs in our XML output, we can walk the tree to find the original nodes. The implementation below is not very efficient, but it is very short. Optimizing the performance of a recursive tree walk is left as an exercise to the reader.
First, we need a way to recursively add all nodes to an array:
def treewalk sexp=@sexp
return sexp unless sexp&.respond_to?(:children)
[sexp, sexp.children.map {|n| treewalk(n) }].flatten
end
Then, we can use this to find our matching object ID:
def xpath path
results = XPath.match(doc, path)
results.map do |n|
if n.respond_to?(:attributes) && n.attributes['id']
treewalk.find do |m|
m.object_id.to_s == n.attributes['id']
end
else
n
end
end
end
And here we are, a very quick and expressive way to juggle your ASTs:
irb> xml.xpath('*/send/hash/pair[sym[symbol-val/@value="placeholder"]]/str')
=> [s(:str, "placeholder text")]
See the complete source at the bottom of this post.
If you want to further shorten your XPaths you could add more metadata to your XML tree. For example in textractor, if we encounter a send
node (a method call) we automatically add message=”method_name”
to the XML element. This allows us to write XPath such as send[@message="form_for"]
.
We are currently developing multiple products using this library. Once the XML format stabilizes, we plan to extract the library from our product and release a gem. If you are interested in using these techniques in your project, we’d love to help! Send us an email at info@snootysoftware.com.
At Snooty Software, we develop tools that programmatically modify code. Our first product, Textractor, takes an existing Rails project and prepares your ERB views for translation by replacing string literals with t()
calls.
Complete source:
require 'parser/current'
require 'rexml/document'
class XMLAST
include REXML
attr_reader :doc
def initialize sexp
@doc = Document.new "<root></root>"
@sexp = sexp
root = @doc.root
populate_tree(root, sexp)
end
def populate_tree xml, sexp
if sexp.is_a?(String) ||
sexp.is_a?(Symbol) ||
sexp.is_a?(Numeric) ||
sexp.is_a?(NilClass)
el = Element.new(sexp.class.to_s.downcase + "-val")
el.add_attribute 'value', sexp.to_s
xml.add_element el
else
el = Element.new(sexp.type.to_s)
el.add_attribute('id', sexp.object_id)
sexp.children.each{ |n| populate_tree(el, n) }
xml.add_element el
end
end
def treewalk sexp=@sexp
return sexp unless sexp&.respond_to?(:children)
[sexp, sexp.children.map {|n| treewalk(n) }].flatten
end
def xpath path
results = XPath.match(doc, path)
results.map do |n|
if n.respond_to?(:attributes) && n.attributes['id']
treewalk.find do |m|
m.object_id.to_s == n.attributes['id']
end
else
n
end
end
end
end
# snippet = 'f.text_field :attr, placeholder: "placeholder text"'
# exp = Parser::CurrentRuby.parse(snippet)
# xml = XMLAST.new(exp)
# xml.xpath('*/send/hash/pair[sym[symbol-val/@value="placeholder"]]/str')
# => [s(:str, "placeholder text")]