The Surprising Complexity of Alloy parsing

peter.kriens · March 18, 2025, 10:37am

Once you understand that a value in Alloy is always a relation (a set of tuples of atoms) the Alloy syntax is surprisingly easy to write; the syntax flows very well. This is, however, a bit of a mirage. The underlying model of Alloy is a lot more subtle and complex than that I originally gave it credit for.

When I initially designed the grammar it felt pretty simple. Alloy looked like a simple expression language with the footnote that there are value expressions and formulas. A value represents a relation like (A+B) and a formula is a boolean expression like some a and one b. It felt simple, until I hit the references to values: sigs, fields, functions, and predicates. Once you get to a name, most bets are off. I guess I knew this but never really realized that Alloy has such extensive overloading and basically only one namespace. It does, fortunately, allows you to use concise names.

Signatures, functions, predicates, macros, and fields share the same namespace.
Functions and field names can be overloaded any number of times
Function arguments and box joins use the same syntax: []
‘Object oriented’ calls (like foo.bar[]) and joins use the same dot operator: '.'

The most complicated expression can therefore lead to many different concrete expressions based on how n is defined.

v.n[p]

fun n[] – (join p (join v (call n) )
fun n[p] – (join v (call n p)) or (join p (call n v))
fun n[t,p] – (call n v p)

It should be obvious that if names are overloaded, the number of possible AST trees grows exponentially. And common names are, eh, common. For example, every time the util/ordering module is used for a signature, it adds the same field names for its functions, predicates and fields to the shared namespace.

The reason that it can works is the very elegant Alloy type system. Every possible expression is verified against the type system. If the types do not match, the expression is rejected. If at the end there are multiple expressions it means there is an error. For example:

sig A,B {}
fun foo : B -> A { B -> A }
fun foo : A -> B { A -> B }
run { some  foo.foo.foo.foo.foo.B }

The first reference to foo has two valid choices. Each choice is then further explored until the B at the end makes only starting with foo:B->A valid.

Alloy’s elegant type system is not your grandfather’s type system, it is based on set membership. In Alloy, every value is a relation and a relation is a set of tuples of attributes. Each attribute is typed by the signatures of which it can be a member of. Most of the time this is a signature but it can be the union of multiple signatures and implies the rules for inheritance. This technique is surprisingly effective. It is actually quite hard to come up with examples where there are a lot of parallel paths through the expressions. If you know a complex use cases then they are highly appreciated.

However, the type system is only useful after we’ve parsed all the signatures, functions, macros, and predicates. The parsing has therefore two-phases. In the first phase builder objects are used to create a mutable representation. Then when we no all typed objects, we build an immutable AST of the code. The builders make sure to explore all possible permutations when there are multiple choices.

Back to the grammar. I started initially with a basic expression grammar like (note that the order defines the priority in ANTLR4):

value           : ('~'|'^'|'*') value                                               # unaryOpValue
                | value '\''                                                        # primeValue
                | value '.' value                                                   # joinValue
                | value '[' value (',' value)* ']'                                  # boxJoinValue
                | value ('<:'|':>') value                                           # restrictionValue
                | value multiplicity? '->' multiplicity? value                      # arrowValue
                | value '&' value                                                   # intersectionValue
                | value '++' value                                                  # relationOverrideValue
                | '#' value                                                         # cardinalityValue
                | value ('+'|'-') value                                             # unionDifferenceValue
                | '{' decl ( ',' decl )* ( block | bar ) '}'                        # comprehensionValue
                | 'sum' decl ( ',' decl )* '|' value                                # sumValue        
                | value qname value                                                 # primitiveValue
                | '(' value ')'                                                     # parenthesisValue                
                | '{' value '}'                                                     # parenthesisValue                
                | '@' name                                                          # atnameValue
                | qname '$'                                                         # metaValue 
                | number                                                            # numberValue
                ;

This grammar, however, collapses the v.n[p] case where the '.' and '[]' are overloaded. Initially I tried to solve this in the AST builder phase but this was extremely hairy (and started to resemble the existing code). After a lot of trials I came up with the grammar like:

value           : value '.' qname ('[' value (',' value)* ']')?                     # boxValue
                | qname '[' value (',' value)* ']'                                  # boxValue
                | qname                                                             # qnameValue
                | ('~'|'^'|'*') value                                               # unaryOpValue
                ... same as previous
                ;

By giving the special v.n[p] case priority, the actual expression priority can be postponed until the static type information is available.

In summary, while Alloy expressions appear straightforward, the underlying set‐based type system and heavy overloading mean that parsing must defer certain decisions until type analysis. The grammar approach shown—giving special* v.n[p] *forms priority and pushing final resolution to the type system—allows Alloy to remain flexible without requiring an explosion of ad hoc syntax. If you have more complex examples that stress‐test the grammar (especially ones with repeated or heavily overloaded names), please share them—your feedback will help refine the parser further.

Parsing on this level is not my area of expertise. If there are esteemed readers here that want to contribute ideas, let me know.

Topic		Replies	Views
New Parser using ANTLR Experimental	5	87	March 22, 2025
Alloy API ... again Experimental	7	605	November 22, 2022
Relational Operators and Quantified Expressions	1	277	August 26, 2020
References and Containment Experimental	4	447	March 12, 2021
Alloy 6 tutorial updates Alloy 6	10	285	February 13, 2024

The Surprising Complexity of Alloy parsing

Related topics