A well-designed domain-specific language makes code expressive and easy to understand. But designing a nice DSL was always a challenging task. In this article I present a simple framework for constructing regular DSLs in Scala.

Most internal domain specific languages I’ve seen so far belong to a class of regular languages. For explanation of the further parts, let’s use the following definition from Wikipedia:

Regular language is the language accepted by a deterministic finite automaton.

A deterministic finite automaton (DFA) consists of:

  • a finite set of states;
  • a finite set of input symbols (in given case – a finite set of DSL keywords);
  • a set of transitions between the states;
  • an initial state;
  • a set of accept (terminal) states.

Consider a simple language which describes delegation of tasks by managers to workers. It contains only five keywords: lets, delegate, and, to and supervisedBy. The language rules restrict a set of valid sentences to only three possible formats listed below:

  1. lets delegate Task1 [ and Task2 … ] to Worker1 supervisedBy Supervisor1
  2. lets delegate Task3 [ and Task4 … ] supervisedBy Supervisor2 to Worker2
  3. lets delegate Task5 [ and Task6 … ] to Worker3

It’s easy to build a DFA which uniquely defines this language:

DFA

The DFA has:

  • 5 states;
  • a start state S1 which is reached via lets keyword;
  • 2 final states S3 and S5;
  • 6 transitions triggered by keywords delegate, and, to, supervisedBy.

The following is a step-by-step manual on how to transform this and other DFAs to Scala code.

The General Overview

Once the language is designed and the correspondent DFA is built, it’s time to reflect the DFA in code. It’s easy to model the automaton when each state is mapped to a unique type. All the transitions can then be associated with correspondent states via ad-hoc polymorphism. All accept states can be defined either as a set of implicit proof values or as a set of conversions to a single result type.

Step I: Mapping States to Unique Types

There is obviously more than one way of mapping the states to custom types. The approach I find the most powerful it to create a parametrized type and express all other types in terms of this pivotal type. For instance, all states can be expressed via a combination of the pivotal type DFA[T <: State] and 5 subtypes of type StateState1 , State2State5.

Provided you have some insight into the domain, it’s more practical to express these specifics not only in the DSL semantics but also in the implementation of the DSL. Future code supporters will be grateful for that choice.

As for the sample DSL in this article, the good pivot may outline three qualitative distinctions between different states of given automaton: whether at least one task is specified; whether a task executor is specified; whether a supervisor is specified.

The five states are then expressed the following way:

  • S1 -> DFA[No, No, No]
  • S2 -> DFA[Yes, No, No]
  • S3 -> DFA[Yes, Yes, No]
  • S4 -> DFA[Yes, No, Yes]
  • S5 -> DFA[Yes, Yes, Yes]

Step II: Adding Transitions

The transitions are defined separately for each state. As we already have the mapping from each state to a unique type, it’s convenient to define an implicit extension class with the appropriate transitions for each type. Bear in mind that naming can be a tough problem here though.

Please note that each transition takes one state as an input (constructor parameter), and returns the same or a different state (each method’s return type).

To enter the DFA, we also have to define at least one transition to the start state:

At this stage we can already compose compilable sentences e.g.:

The only issue is that there’s no compile time validation yet so it is possible to construct an illegal method which will compile successfully.

Step III: Defining Accept States

Depending on how many accept states are needed and on the concrete DSL syntax, there are at least three approaches to enable compile-time state verification.

Explicit specification of the expected type

This is the simplest approach which can be applied only if you don’t mind leaking the types used in DSL implementation to the client code and making the client responsible for verification. If we take the sample DSL sentence from the previous paragraph, the proof that we end up in an accept state will looks like this:

Conversions to a result type

This approach is not suitable for making libraries but other than that it’s very powerful. One of many possible implementations for a given DSL is shown below:

A set of implicit proofs

The most difficult and powerful approach which affects the DSL as well. This approach can be found in Finagle RequestBuilder used for compile-time request config validation.

In a nutshell, each sentence of the DSL have to end up with a special last word (e.g. build() , send() , compile() , please() etc). This method takes an implicit proof parameter of the same type as the current state type. A set of proof values is predefined by the library, so if there’s no proof for a given state’s type – the code will not compile.

Let’s add a submit() word to our DSL. To enable the whole feature we only need to modify class DFA and its companion object:

Now a valid sentence which compiles fine has the following form:

But if the sentence ends up not in an accept state of the DFA, error message is shown:

And that’s it. Although the article appears bigger then I expected, all steps are fairly simple and once mastered, creating DSLs will be a piece of cake.

Did you like the post? Please spread the word 🙂