Lucas Luitjes

Freelance dev/devops/security. Click here to learn more about my background and the services I offer.

Monocle: bidirectional code generation

09 Apr 2022

Update: Hacker News thread

Monocle is a bidirectional code generation library. It lets you write all sorts of development tools that don’t exist yet. The basic idea:

Monocle example diagram

Monocle lets you write templates that can generate code, but also parse that code back into the initial variables. Depending on how flexible your templates are, it can even parse code that has been modified by hand after generation. Monocle is written for Ruby, but the same concept can easily be applied to other languages.

You can use it to build visual programming tools, programmatic code editing tools, automated version upgrades, and more. For example, here is a video of a prototype for a visual Rails editor. There’s a list of other potential use cases further below.

In this blog post I’m going to show you how to use it, what you could do with it, and give a quick tour under the hood.

How to use monocle

Let’s look at a simple example. Assuming that you have an ERB file like this:

class <%= data["model_name"] %> < ApplicationRecord
end

You invoke it like so:

template = File.read("model.erb")
data = { "model_name" => "Book" }
puts ERB.new(template).result_with_hash(data: data)
#=> 
# class Book < ApplicationRecord
# end

Now let’s add some custom code to the generated output:

class Book < ApplicationRecord
  belongs_to :publisher
end

If you now want to change the name of the model, you’ll have to do it by hand. Or, if you run the ERB template again, you will lose the method we added.

What if there was a way to reverse the code generation? What if you could do this:

template = File.read("model.erb")
input = <<-EOF
  class Book < ApplicationRecord
    belongs_to :publisher
  end
EOF

ReverseERB.new(template).result(input)
#=> { "model_name" => "Book", "custom_code" => ["belongs_to :publisher"] }

Monocle lets you do exactly that. It works by specifying a little metadata for the interpolations. Let’s see the previous example expressed with a monocle (used to be called mutator, you will see some references to that term in the code).

# first we have a bit of boilerplate code
class ModelMonocle
  include Monocle::BaseMutator

  # parsing will start at the root snippet, but you can define multiple snippets and refer to 
  # them from other snippets. Since they're secretly just classes, you can also put them in 
  # modules and re-use them.
  snippet(:root) do
    match_on "result"

    # here's the actual template, but instead of interpolations, we use the prefix placeholder_
    sample_code <<-RUBY
      class Placeholder_model_name < ApplicationRecord
        placeholder_custom_code
      end
    RUBY
    
    # here we define how to interpolate the model name
    placeholder :model_name, type: :const, data_path: "result.model_name"

    # the final two statements define how to interpolate the custom code, which is slightly
    # more complex because that's not a built-in placeholder type
    define_replacer :custom_code do
      query_dst('result.custom_code') || ""     
    end

    define_matcher :custom_code do |ast|
      update_dst('result.custom_code', ast.loc.expression.source)
    end
  end
end

Let’s invoke it with the same input data as we did our ERB template:

data = { "model_name" => "Book"}
monocle = ModelMonocle.new
puts monocle.dst2code(data)
#=> 
#      class Book < ApplicationRecord
#        
#      end

Now lets do the reverse!

input = <<-EOF
  class Book < ApplicationRecord
  end
EOF
puts monocle.code2dst(input)
#=> {"model_name"=>"Book"}

Okay, so what about changing the name of the model? Let’s run the monocle on the model with our custom method added:

input = <<-EOF
  class Book < ApplicationRecord
    belongs_to :customer
  end
EOF
puts monocle.code2dst(input)
#=> {"model_name"=>"Book", "custom_code" => "belongs_to :publisher"}

Now we can change the name of the model in the input data, and run the monocle again:

input = <<-EOF
  class Book < ApplicationRecord
    belongs_to :customer
  end
EOF
data = monocle.code2dst(input)
data["model_name"] = "Product"
puts monocle.dst2code(data)
#=> 
# class Product < ApplicationRecord
#   belongs_to :customer
# end

And there we are! We just programmatically changed the name of a model while preserving code modifications made by hand!

Common questions

A few questions tend to come up a lot. Let’s take those on before we dive deeper into the code.

This was a very simple example, but Ruby has a lot of syntactic sugar. Won’t you need to write custom code for every possible variation?

Surprisingly, no! First off, we use AST nodes for comparison. Functionally equivalent variations tend to lead to the same AST node. For example, a normal if statement, a ternary statement (a ? nil : b) and postfix conditionals (b unless a) all become the same s(:if, ...) AST node. So if your template contains a ? nil : b, it will match b unless a just fine.

Second, we handle most edge cases in the parsing system, so you don’t have to deal with them in your snippets.

For example, a reference to a variable that is a block argument becomes a s(:lvar, ...) AST node. But a normal variable reference becomes a s(:send,...) AST node. That’s a problem if you’re matching block contents in a child snippet, because the Ruby parser does not know if the child snippet code is inside of a block or not. This is solved in the on_send method in the extractor by explicitly considering those two node types the same.

Does that mean the extractor and generator will need to be constantly updated for every new Ruby feature?

Rarely! The default is to compare AST nodes by node type. When comparing the arguments of a node, we also extract placeholder values from symbols. These two strategies combined take care of the majority of code.

We only had to override this behavior in a handful of situations so far (constants, instance variables, block arguments, method calls, method names, and matching the contents of a class definition or block against multiple snippets). In fact, the generator and extractor taken together are less than 300 lines of code.

What happens if I run into an edge case that you haven’t handled yet?

When working with tools like this, silent failure can be incredibly time-consuming. Therefore monocle tries to fail early and loudly. If you run into a very uncommon edge case, you can almost always deal with it with a custom matcher in your snippet. Alternatively, you can monkey-patch the extractor and generator pretty easily.

It would be cool if you could configure and override extractor behavior per snippet. That’s currently not possible, but it would be pretty easy to add.

The formatting of generated code looks weird. Is this tool going to screw up my carefully formatted codebase?

Currently, somewhat. But monocle comes with a SourceUpdater that makes sure to only update the smallest part of the code that actually changed. So small changes are unlikely to result in formatting changes.

For our use case it was fine to just run rubocop -a after every change, so it wasn’t a priority.

But it would be better if the generator was aware of the current indentation level, and the snippet’s relative indentation. Then it could indent properly. It could also check if a placeholder value is actually empty, and if so it shouldn’t insert an unnecessary newline like it currently does. With those two tweaks, most formatting issues would likely be resolved.

What about comments? Won’t those get overwritten?

No! The SourceUpdater goes to great lengths to prevent that. If the SourceUpdater changes or removes code that has comments in or near it, the comments will be left in place. In fact, the SourceUpdater will add a comment of its own, explaining that the related code was changed programmatically but this comment was preserved. That way you can easily use git-blame to find out exactly what changes were made.

Tools like this can never cover 100 percent of all cases. So what’s the point?

That’s true, if this library could literally understand all code, it would be general AI :-) We feel that by covering 80 to 90 percent of common cases, tools based on this library can save a lot of time. For the other 10 to 20 percent, we try to fail early so that a human being can step in.

With that covered, let’s get back to the code.

Built-in placeholders

Because you can define your own matchers and replacers in Ruby, you have the flexibility to parse and generate code as complex as you want. However, for common use cases, monocle comes with a number of built-in placeholder types.

Single node placeholders

These match a single node in the AST. You can give them a data_path, or lambda with limited custom logic.

  • :const matches any constant like SomeClass
  • :symbol matches any symbol like :something
  • :ivar matches any instance variable like @something
  • :send matches any method call (like puts 'something' or SomeClass.something)
  • :string matches strings, so if your snippet contains puts placeholder_my_string, it could match puts "something"
  • :method_name matches the name of a method definition, so in def something ; end it will match the something part
  • :block_arg matches a block argument, so in a.each { |something| some_logic } it will match the something part

Multi-node placeholders

These can match multiple nodes. They’re a little more complicated, but can save you a lot of time.

  • :list_of_symbols matches multiple symbols, so before_action placeholder_actions will match before_action :authenticate, :set_book, :handle_upload
  • :list_of_methods matches multiple method definitions, so placeholder_private_methods will match any number of methods that match any of the supplied child snippets
  • :child_snippets same as above, but for any type of AST node, not just method definitions

ERB

Since we were building a visual editor for Rails applications, we needed a solution for views as well. This is where the erb2builder library comes in. By converting both the input and the snippet to builder, we can use the exact same processing pipeline. Here’s an example of a monocle that generates and parses ERB templates:

snippet(:title) do
  sample_code_erb <<-ERB
    <h1 class="title"><%= placeholder_title %></h1>
  ERB

  match_on "title"
  placeholder_string :title, type: :string, data_path: "title", strip: true
end

This can convert:

<h1 class="title">
  Hello, world!
</h1>

Back into:

{ "title" => "Hello, world!" }

More examples and use cases

Check out the Rails monocles here. Please note that a lot of those were written using numbered placeholders rather than named placeholders. They also tend to define custom matchers using methods rather than the define_replacer and define_matcher block syntax. This is purely for legacy reasons (we didn’t add support for named placeholders until later), these examples could (and should) all be rewritten using named placeholders.

Some ideas of things you could build with this:

  • A no-code platform that builds and edits Rails apps under the hood (full demo video of our attempt at this)
  • A much more powerful version of scaffolds, that you can use throughout the lifetime of an app instead of just in the beginning
  • Graphical and programmatic editing tools for any code that mostly sticks with a predefined structure, like ActiveAdmin, test suites, web scrapers, etc.
  • Automated Rails app (and other frameworks) version upgrades, by writing monocles for the old and new version
  • Linters for more complicated code smells, for example by integrating with rubocop
  • Vulnerability scanning

Under the hood

I recommend reading the lib/monocle source, it’s not actually all that much code. The magic happens in dst_extractor.rb, code_generator.rb and source_updater.rb. The rest are utilities, dsl code, and built-in placeholders.

But here’s a brief explanation of what goes on under the hood.

Earlier I mentioned that it works by specifying a little metadata for each interpolation. Actually, the placeholder, define_matcher, and define_replacer methods generate instance methods under the hood. For a placeholder named placeholder_name it generates the methods match_placeholder_name? and replace_placeholder_name.

For simple cases you use the placeholder method, for complex cases you define your own logic using the define_matcher, and define_replacer methods.

replace_placeholder_name is used during generation to retrieve the value that should be interpolated. As code generation is fairly straightforward, we won’t go into detail about it.

match_placeholder_name? is used during parsing. A little background on parsing: all the snippets defined in a monocle are parsed into ASTs. The input is also parsed into an AST. Then we pass the root node of the input, and the root node of the root snippet, to a recursive function.

This function compares the two nodes, and calls itself with the children of each node if successful. If the snippet node happens to have a value that starts with placeholder_, we call the match_placeholder? method for that placeholder. We pass it the input AST node currently being processed.

In our match_placeholder? method we check if the input node is a valid value for that particular interpolation. For example, for our model name, we can check if it’s a string with a capital first letter. If applicable we convert the value into the right form, and store it in our result object. Afterwards, we return true if the value was valid.

If all of the placeholder methods return true, we have successfully converted the code back into initial values!

Note that the match_placeholder? method can contain any arbitrary logic. We could even pass the node and its children to another set of monocles, trying each one until we find one that returns true. That’s the basis for the list_of_methods and child_snippets placeholders.

That about wraps up the introduction. We’re not actively working on this, so this is all the documentation there currently is. But if you have questions, feel free to contact us.

Our long-running side project has been on hold for more than a year due to time constraints. We are publishing some of the core technology now. The code is unpolished, but (to our knowledge) innovative. If you have a use for it, and want to spar with us, feel free to email us at info@snootysoftware.com.