Saturday, January 10, 2009

C# code generation using MGrammar

The Microsoft Oslo CTP has some very interesting tools, especially the MGrammar part of the M language, which allows defining textual Domain Specific Languages (DSL). This is a refreshing change from Microsoft’s past obsession with graphical tools. Most published examples on this tool focus on DSLs for defining data structures, which is probably great for declarative-style DSLs. However, I believe it could also be useful for defining application behavior, in imperative-style languages. I don’t know yet if imperative-style languages are really part of MGrammar’s intended use, but I did some experimentation to find what work would be required to achieve this.

I also had previous experience with SableCC and theoretical knowledge of internal DSLs defined with Boo, and wanted to compare these tools. Note that I may be biased in doing things in a SableCCish way, and there may be simpler/better ways to do what I’m trying to achieve with MGrammar, but for now I’ll try to find what I can or can’t do with this new tool/toy and evaluate its level of complexity.

A great example that helped me learn M was Torkel Ă–degaard’s WatiN DSL using MGrammar. I’ll try here to do one step further from his examples, and generate executable C# code from the generated AST (Abstract Syntax Tree).

Why C# instead of simply interpreting?

  • Objects used by the DSL can be created with plain old C# code, using the same techniques and design principles that would be used in a standard C# project
  • Increase developer productivity by providing a better debugging experience:
    • Set breakpoints in the DSL script’s source code and step into the DSL script (not the generated C# code, and not the interpreter code) using Visual Studio’s debugger

Sample DSL script running in Visual Studio's debugger

  • If the DSL code throws an exception, the stack trace references a line from the DSL script’s source code. In the case of an interpreter, the stack trace would instead reference the interpreter’s source code.

This C# code generation will be done in the following steps:

  1. Parse the DSL script to an MGraph representation
  2. Convert the MGraph representation to XAML
  3. Convert the XAML representation to strongly typed C# objects (giving an Abstract Syntax Tree).
  4. Visit each of the AST’s node
  5. Match each AST node against one of the Visitor’s methods. Each of these methods contains transformation code to generate C# code from a node’s properties.

CSharpGenerationSteps

Steps 2 and 3 are done using the MGraphXamlReader code sample, while steps 4 and 5 are additional work, in which I reproduced SableCC’s patterns (a variation of the GoF Visitor pattern).

First, I need a sample DSL. I took inspiration from the Authorization Rules DSL in Ayende Rahien’s Building Domain Specific Languages in Boo book, and adapted it with rules specific to a project I’ve been working on. I used a C#-like syntax, but MGrammar also allows defining more esoteric syntaxes.

Action Edit (user, incident, comment) {
if (User is "PlantSupervisor") {
Allow("Plant supervisors can edit any comment at any time")
}
else {
if (User IsAuthorOf comment) {
if (DateTime.Now < incident.EndTime + 12 hours)
Allow("Comments can be edited up to 12 hours after the end of an incident.")
else
Deny("The incident has ended more than 12 hours ago, its comments can't be edited anymore.")
}
else {
Deny("User can't edit another user's comment")
}
}
}

This DSL allows to customize what UI elements are enabled/disabled depending on various rules. In the above example, the script defines rules to determine if a user is allowed to Edit a Comment attached to an Incident. Some users may be allowed to Edit only some of the comments on one screen. Buttons for actions that the user can’t perform are disabled, and they have a tooltip explaining why the action can’t be performed.

Example user interface, with different rules applied to each action button

I then wrote a MGrammar for this DSL (using Intellipad, which helped greatly my learning process with its live MGrammar Preview Mode):

module Nootaikok
{
import Language;
import Microsoft.Languages;

export Authorization;

language Authorization
{
syntax Main
= action:Action*
=> action;

syntax Action
= TAction actionName:Identifier parameters:ActionParameters? rules:CodeBlock
=> Action { Name { actionName }, Parameters { parameters }, Rules { rules } } ;

syntax ActionParameters
= '(' parameterList:ParameterList? ')'
=> parameterList;

syntax ParameterList =
parameter:Identifier "," parameterList:ParameterList => [ parameter, valuesof(parameterList) ]
| parameter:Identifier => [ parameter ] // last parameter;

syntax ParameterValues
= '(' parameterValueList:ParameterValueList? ')'
=> parameterValueList;

syntax ParameterValueList =
v:ParameterValue "," l:ParameterValueList => [ v, valuesof(l) ]
| v:ParameterValue => [ v ] // last parameter;

syntax ParameterValue
= e:Expression => e;

syntax CodeBlock
= '{' statements:Statement* '}' => statements
| statement:Statement => statement;

syntax Statement
= s:IfThenStatement => s
| s:IfThenElseStatement => s
| s:MethodCallStatement => s;

syntax IfThenStatement
= 'if' '(' condition:Expression ')' then:CodeBlock
=> IfThenElseStatement { Condition{condition}, ThenBranch{then} } ;

syntax IfThenElseStatement
= 'if' '(' condition:Expression ')' then:CodeBlock 'else' @else:CodeBlock
=> IfThenElseStatement { Condition{condition}, ThenBranch{then}, ElseBranch{@else} } ;

syntax MethodCallStatement
= name:Identifier parameters:ParameterValues
=> MethodCallStatement { Name{name}, Parameters{parameters} } ;

syntax Expression
= stringLiteral:StringLiteral => StringLiteralExpression { Value{stringLiteral} }
| precedence 1: TUser TIs roleName:StringLiteral => UserIsInRoleExpression { Role{roleName} }
| precedence 1: TUser TIsAuthorOf TComment => UserIsAuthorOfExpression { AuthorOf{"Comment"} }
| precedence 1: TUser TWasWorkingIn range:Range => UserWasWorkingInExpression { DateTimeRange{range} }
| precedence 1: @left:Expression '<' @right:Expression => LessThanExpression { Left{@left}, Right{@right} }
| precedence 1: @left:Expression '>' @right:Expression => GreaterThanExpression { Left{@left}, Right{@right} }
| precedence 1: @left:Expression '<=' @right:Expression => LessThanOrEqualExpression { Left{@left}, Right{@right} }
| precedence 1: @left:Expression '>=' @right:Expression => GreaterThanOrEqualExpression { Left{@left}, Right{@right} }
| precedence 2: @left:Expression '+' @right:Expression => AddExpression { Left{@left}, Right{@right} }
| precedence 2: @left:Expression '-' @right:Expression => SubtractExpression { Left{@left}, Right{@right} }
| precedence 3: timespan:TimeSpan => TimeSpanExpression { valuesof(timespan) }
| precedence 4: name:Identifier => VariableReferenceExpression { Name{name} }
| precedence 4: propertyName:QualifiedIdentifier => PropertyReadExpression { ObjectAndPropertyName{propertyName} } ;

syntax Range
= '[' rangeStart:Expression '..' rangeEnd:Expression ']'
=> Range { Start{rangeStart}, End{rangeEnd} } ;

syntax TimeSpan
= days:Integer TDays => TimeSpan { Days{days} }
| hours:Integer THours => TimeSpan { Hours{hours} }
| minutes:Integer TMinutes => TimeSpan { Minutes{minutes} }
| seconds:Integer TSeconds => TimeSpan { Seconds{seconds} } ;

token IdentifierBegin = '_' | Letter;
token IdentifierCharacter = IdentifierBegin | '$' | DecimalDigit;
identifier token Identifier = IdentifierBegin IdentifierCharacter*;
token QualifiedIdentifier = Identifier ('.' Identifier)+;

@{Classification["Keyword"]} token TAction = 'Action';
@{Classification["Keyword"]} token TUser = 'User';
@{Classification["Keyword"]} token TIs = 'is';
@{Classification["Keyword"]} token TDays = 'days';
@{Classification["Keyword"]} token THours = 'hours';
@{Classification["Keyword"]} token TMinutes = 'minutes';
@{Classification["Keyword"]} token TSeconds = 'seconds';
@{Classification["Keyword"]} token TIsAuthorOf = 'IsAuthorOf';
@{Classification["Keyword"]} token TWasWorkingIn = 'WasWorkingIn';
@{Classification["Keyword"]} token TComment = 'comment';
@{Classification["Keyword"]} token TIncident = 'incident';

token Letter = 'a'..'z' | 'A'..'Z';
token DecimalDigit = '0'..'9';
token Integer = DecimalDigit+;

interleave Skippable
= Base.Whitespace+
| Language.Grammar.Comment;

syntax StringLiteral
= val:Language.Grammar.TextLiteral => val;

}
}

This grammar can be used to generate a parser, which converts the DSL script source to a set of MGraph nodes. These nodes are generic objects, which would be complex to manipulate in C# code. This is where MGraphXamlReader helps by generating a XAML representation of the MGraph, and by then converting that XAML representation to C# object instances.

To use this conversion from MGraph to XAML to C# objects, we first need to manually define a C# class for each node in the object graph. For example, an if/then/else is defined as:

using System.Collections.Generic;
using MAuthorizationDSL.CodeGenerator.Ast.AstVisitor;
using MAuthorizationDSL.CodeGenerator.Ast.Expressions;
namespace MAuthorizationDSL.CodeGenerator.Ast.Statements
{
public class IfThenElseStatement : AbstractAstNode, IStatement
{
public IfThenElseStatement()
{
ThenBranch = new List<IStatement>();
ElseBranch = new List<IStatement>();
}

public IExpression Condition { get; set; }
public IList<IStatement> ThenBranch { get; protected set; }
public IList<IStatement> ElseBranch { get; protected set; }
}
}

Once we have defined classes for all AST nodes, the AST can be generated from the DSL script source.

ExampleAST

The AST can also be represented as XAML, again using MGraphXamlReader:

<n1:Action Name="Edit">
<n1:Action.Parameters>
<n0:String>user</n0:String>
<n0:String>incident</n0:String>
<n0:String>comment</n0:String>
</n1:Action.Parameters>
<n1:Action.Rules>
<n2:IfThenElseStatement>
<n2:IfThenElseStatement.Condition>
<n3:UserIsInRoleExpression Role="&quot;PlantSupervisor&quot;" />
</n2:IfThenElseStatement.Condition>
<n2:IfThenElseStatement.ThenBranch>
<n2:MethodCallStatement Name="Allow">
<n2:MethodCallStatement.Parameters>
<n3:StringLiteralExpression Value="&quot;Plant supervisors can edit any comment at any time&quot;" />
</n2:MethodCallStatement.Parameters>
</n2:MethodCallStatement>
</n2:IfThenElseStatement.ThenBranch>
<n2:IfThenElseStatement.ElseBranch>
<n2:IfThenElseStatement>
<n2:IfThenElseStatement.Condition>...</n2:IfThenElseStatement.Condition>
<n2:IfThenElseStatement.ThenBranch>...</n2:IfThenElseStatement.ThenBranch>
<n2:IfThenElseStatement.ElseBranch>...</n2:IfThenElseStatement.ElseBranch>
</n2:IfThenElseStatement>
</n2:IfThenElseStatement.ElseBranch>
</n2:IfThenElseStatement>
</n1:Action.Rules>
</n1:Action>

This XAML is an intermediate representation before the C# objects are created. It’s very verbose, but it can be helpful when debugging failures when generating the objects. For example, we see that MGraphXamlReader expects to assign the value “Edit” to the “Name” property of the “Action” instance. If that property is not defined (or has a different name), the instantiation of AST classes will fail with a non-obvious error. Looking at the XAML can help investigating the mismatch between the MGraph and the strongly typed AST classes, and apply the necessary fixes to either the MGrammar or the AST classes.

Once we have the object representation (the AST), we need to traverse the tree by visiting each node. When a node is visited, we can then map from that node’s properties to C# code, and we then continue going deeper in the tree by visiting the node’s child nodes.

For example, the following XAML node:

<n3:UserIsInRoleExpression  Role="&quot;PlantSupervisor&quot;" />

will be mapped to:

this.UserIsInRole(user, "PlantSupervisor")

The SableCCish way to traverse the AST is a variation of the GoF Visitor pattern, and I’m going to use a similar pattern here. First, a base visitor class needs to be created. This base class has a method for each possible node in the AST, in which it calls the Visit method on each of its child nodes. This class is tightly coupled to all the AST nodes (it needs to know the structure of each one of them). Therefore, depending on the complexity of the DSL, this class can be painful to write and maintain. In the case of SableCC, developers are freed from this burden by having the Visitor class generated automatically, but it has to be written manually with M (although it could probably be generated with M as well).

public class AstVisitor : IAstVisitor
{
public virtual void CaseIfThenElseStatement(IfThenElseStatement node)
{
if (node.Condition != null)
node.Condition.Visit(this);
if (node.ThenBranch != null)
{
foreach (var expression in node.ThenBranch)
expression.Visit(this);
}

if (node.ElseBranch != null)
{
foreach (var expression in node.ElseBranch)
expression.Visit(this);
}
}
...
}

Each AST node also needs to implement the IAstVisitable interface, so for the previously shown IfThenElseStatement example, we need to add:

public override void Visit(IAstVisitor visitor)
{
visitor.CaseIfThenElseStatement(this);
}

The AstVisitor class is a base class which defines how to traverse the tree. The transformations to C# code are applied in a class inheriting from AstVisitor, which overrides the visitor methods for nodes where a transformation is needed. (A helpful analogy to better understand this is a set of XSLT templates which match and transform XML DOM nodes). For example, the CaseIfThenElseStatement is overrided in CodeGeneratingAstVisitor as:

public override void CaseIfThenElseStatement(IfThenElseStatement node)
{
generator.SetCurrentSourceLine(node.FileName, node.Span.Start.Line);
generator.WriteIndent();
generator.Write("if (");
node.Condition.Visit(this); // go further down in the AST by visiting the Condition node
generator.Write(")");
generator.WriteLine();
generator.WriteIndentedLine("{");
generator.IndentLevel++;
if (node.ThenBranch != null)
{
// go further down in the AST by visiting the ThenBranch nodes
foreach (var expression in node.ThenBranch)
expression.Visit(this);
}
generator.IndentLevel--;
generator.WriteIndentedLine("}");
if (node.ElseBranch != null && node.ElseBranch.Count > 0)
{
generator.WriteIndentedLine("else");
generator.WriteIndentedLine("{");
generator.IndentLevel++;
// go further down in the AST by visiting the ElseBranch nodes
foreach (var expression in node.ElseBranch)
expression.Visit(this);
generator.IndentLevel--;
generator.WriteIndentedLine("}");
}
}

Finally, this produces the following C# code:

using System;
using MAuthorizationDSL.Core;
public class IncidentReport_Comments_Edit_AuthRules : AbstractAuthorizationRule
{
public void Evaluate(string user,Incident incident,Comment comment)
{
if (UserIsInRole(user, "PlantSupervisor"))
{
Allow("Plant supervisors can edit any comment at any time");
}
else
{
if (UserIsAuthorOf(user, comment))
{
if (DateTime.Now < incident.EndTime + TimeSpan.FromHours(12) )
{
Allow("Comments can be edited up to 12 hours after the end of an incident.");
}
else
{
Deny("The incident has ended more than 12 hours ago, its comments can't be edited anymore.");
}
}
else
{
Deny("User can't edit another user's comment");
}
}
}
}

The generated IncidentReport_Comments_Edit_AuthRules class inherits from the AbstractAuthorizationRule class. This AbstractAuthorizationRule is another class we’ll need to write, in which we define the methods invoked by the DSL scripts:

  • UserIsInRole
  • UserIsAuthorOf
  • Allow
  • Deny

This follows the “Anonymous Base Class” DSL pattern. I’m not showing the class’s code here, because its a simple proof-of-concept implementation that returns hard coded results, but it could be modified to really do the appropriate checks in a database or Active Directory.

All that’s left is to compile that C# code, load the generated assembly (ideally in a separate AppDomain), create an instance of the IncidentReport_Comments_Edit_AuthRules class using reflection (for each rule, a separate C# class is generated) and execute its Evaluate method.

In my current code, I simply call the C# compiler and load the generated assembly in the current AppDomain. I also neglect several “infrastructure” considerations, since this is still proof-of-concept code:

  • caching
  • instance management
  • batch compilation
  • recompile and reload modified scripts at runtime

These considerations are better described in Chapter 7 of Ayende Rahien’s Building Domain Specific Languages in Boo book. This chapter explains these requirements and how the Rhino DSL library fulfills them. This library can be used for Boo DSLs, and a similar library would need to be written for M-to-C# DSLs before using such a DSL in a production application.

The rules can then be consumed in the C# UI code. For example, the following test shows how the "Edit" action on the "IncidentReports_Comments" functionality was denied to "Operator1" for a Comment that was created by "Operator2". The action was denied by a rule defined in the DSL. The AuthorizationRules class is responsible of loading and executing the DSL scripts for the requested functionality.

[Test]
public void OperatorCannotEditOtherOperatorComments()
{
var comment = new Comment() { Author = "Operator2", CommentText = "test" };
var incident = new Incident()
{
StartTime = DateTime.Now.AddDays(-5), EndTime = DateTime.Now.AddDays(-4),
Description = "test",
Comments = new List<comment>() { comment }
};

var rules = AuthorizationRules.GetInstance();
Assert.That(
rules.WhyAllowedOrDenied("IncidentReport_Comments", "Edit", "Operator1", incident, comment),
Is.EqualTo("User can't edit another user's comment")
);
Assert.That(!rules.IsAllowed("IncidentReport_Comments", "Edit", "Operator1", incident, comment));
}

Conclusion

That was fairly complex work, involving lots of steps. As I’ve said before, I was biased by knowledge of SableCC, and there may be simpler ways to achieve the same results with M (or simpler ways may be introduced in future versions of M).

One boring (repetitive) step was manually creating strongly-typed AST and Visitor classes for the DSL. A solution for that may be to automatically generate these classes from an MGrammar definition. Alternatively, it may be possible to work directly with the MGraph nodes in a dynamic language on the .Net DLR, or with C# 4.0’s dynamic typing. This would completely avoid the creation of the AST and Visitor classes (however I don’t know if it’s currently possible or even if this is an intended feature for a future version).

I’ve also overlooked several considerations that would need to be solved in a production DSL. A good inspiration to solve these considerations can be found in the Rhino DSL library.

Even though M is a very interesting tool, it may not be the best choice for all cases. My example defines an “external DSL”, in which I have great flexibility over the syntax. Another approach would be an “internal DSL”, which is hosted inside another language. Internal DSLs usually give less flexibility on the syntax: the DSL scripts will usually have some similarity to the host language’s syntax. For example, a internal DSL defined in Boo will have a Python-like syntax, and it would be hard to give it a C#-like syntax instead. However, I believe this is a non-issue in most cases, as these languages still give lots of flexibility (for example by providing metaprogramming facilities or by allowing manipulation of the parsed AST). An external DSL’s extreme syntax flexibility can be useful when the syntax is already defined (if we want to integrate with a “legacy DSL”), but otherwise an internal DSL is probably a simpler solution. An internal DSL uses the host language’s compiler to generate MSIL code, so we don’t have to worry about creating an AST, visiting its nodes and generating C# code. An internal DSL also gives us the benefit of being integrated with that language’s tools (refactoring tools, IntelliSense, debugger…) “by default”.

M may also not be the best choice for imperative-style external DSLs. Other tools, such as SableCC, automatically generate code that we have to do manually in M to achieve the same results. M DSLs can still be written very productively using Intellipad’s almost realtime feedback. Therefore, I would say M is appropriate for simple DSLs, where writing a few AST classes and Visitor methods manually is not an issue, but I believe SableCC would be more appropriate for more complex languages because it generates these classes automatically. (For example, the C# or Java languages could be defined using SableCC grammars).

Also, we can see from M’s published examples that there is an intense focus on data. This may indicate that imperative-style DSL are not really an intended use case of that tool, but it can still be very useful for declarative-style DSLs.


The full source code for this example can be downloaded here: MAuthorizationDSL.zip.

1 comment:

  1. I've been working on something very similar. Check this out: http://www.justnbusiness.com/post/2009/03/11/MetaSharp-code-generation-success!.aspx

    I may have to incorporate some of your ideas here.

    ReplyDelete