New Parser using ANTLR

The current code base of Alloy started in the '90’s and has been worked on by a lot of very smart graduate students. Some of the code is absolutely brilliant but the age of the code base is clearly showing. Since Alloy started 30 years ago, a lot has changed. Java went from a bit more than a toy language to a half way decent one (really!). Today it has many concepts out of the box that had to be developed from scratch in Alloy. The code also shows that many generations of students worked at it for a short period; it is often surprisingly hard to find where something is happening. Worst of all, everything seems to be connected to anything.

Although I originally wanted to retire, I don’t use a lot of Alloy modeling anymore, Matthew Di Ferrante was willing to sponsor my work to do further improvements. One the first things we decided to do is to work out the actual grammar and create a new ANTLR parser based on modern technology. (The grammer can also be used in Python, JS, etc.)

For this sponsoring, i will take advantage of the work I did a long time ago: develop an (interface based) API on Alloy. A well defined API will make it easier to understand the Alloy code for educational purposes and simplifies building extensions. Most of the extensions over the last years have been retrofitted in with enormous effort. The Electrum port into Alloy 6 was an underappreciated herculean effort.

API based programming is something I worked on for most my life. It was the reason I fell in love with OSGi and it was the reason I fell in love with Alloy. For the OSGi I wrote thousands of pages of specifications and in retrospect I wish I had had something like Alloy. I am a sucker for concise clear crisp specifications and Alloy could make them verifiable. It was a bit of a surprise to me how the formal design community seems to highly focused on the proof aspect. For me a good Alloy model is worth a thousand pictures.

The reason that I am diving a bit deeper in the API topic is that I feel that Alloy itself could become an example of software engineering with formal methods. My good intention for this year is to continue the work with an Alloy model of Alloy. Anybody that is interested in this angle of Alloy should speak out so we can collaborate on it. Maybe universities that do software engineering and formal methods?

So how will this work go? The strategy of the current work is to not change the existing code base but insert an API between the parts. A Parser API, an AST API, a Solver API, a Visualizer API, a runtime API etc. For each API, a classic provider will created that maps the API to the classic code base. The classic code base might implement some interfaces but remains unchanged and compatible with today’s. However, once an API is in place it will become easier to plugin new implementations. However, all along the way we should remain 100% backward compatible although newer features might not work for older parts.

However, I think our direction should become language server oriented with VS Code as primary front end. This implies that likely little work will be done on the Java UI.

I plan to use this forum as a blog of my progress. I am already itching to tell you about all the nooks and crannies I found in Alloy’s sometimes magical syntax!

Peter Kriens

5 Likes

Thanks for all your efforts. I note that there’s a VS Code initiative for TLA+ too (Eclipse for the new generation…)

1 Like

Hey, @peter.kriens, I don’t remeber where I found this, and you’ve probably seen this, but just in case:

If I understand correctly, it’s a reimplementation of a subset of the Alloy language in Racket.

I think there is still a lot of value in the Java eco system. This is partly a bias from my side due to experience but it also has the advantage we can make one deliverable that runs anywhere.

Last but not least, I do not believe in big bangs, they tend to never end. It is better to do incremental changes that make the current Alloy better and better than betting the farm on some new environment. New environments often make a specific part easier but they require a lot of work on the other parts.

Although I can get upset by many of its implementation details, it is a solid piece of work that is remarkably rich in features.

1 Like

Peter - This is a terrific idea! One that would be very beneficial to efforts to integrate new solvers and extensions with the Alloy code base.

In case it is helpful, here is an ANTLR grammar that a former MMath student of mine wrote to analysis Alloy 5 models: static-profiling-of-alloy-models/src/main/antlr/com/alloyprofiling/ALLOY.g4 at main · WatForm/static-profiling-of-alloy-models · GitHub

Three challenges that we have had when integrating with Alloy are

  1. the contextual meaning of some syntax operators such as “some”, e.g. “some” is internally recognized as “SOME” or “SOMEOF” depending on its context;

  2. there are parts of the internal Alloy AST that are not visible syntactically - we recently ran into an operator called “EXACTLY…” (sorry I can’t remember the “exact” name of it right now :slight_smile: ).

  3. Not enough of Alloy’s AST classes are public and modifiable.

Thanks!

I do have a grammar by now that seems to handle the language. ANTLR 4 is surprisingly powerful.

The hard part is the overloading of the dot operator and the box operator: v.qname[p] has many different interpretations based on what qname refers to after parsing all the modules.If qname refers to a function or predicate, the v can be the this/first parameter of the function/predicate (or not) and the box operator ('[]') can hold the parameters or not. So the expansion v.qname[p] can be:

  • relation(join p (join v qname))
  • fun/0(join p (join v (call qname)))
  • fun/1(join p (call qname v)) or (join v (call qname p))
  • fun/2(call qname v p)

This means that some important operator priority decisions can only be taken when the module is completely parsed. This was a bit challenging. In the current Alloy this delegated to the ChoiceExpr and it hugely complicates the type system and the AST. This design choice forced the Type to be able to have different cardinalities. I think I found a more elegant solution but still need to prove that it is identical.

The new AST will not be modifiable. I am going through great lengths to make it immutable. If you want to build a new AST you need to construct a completely new one with the visitor or listener pattern but the advantage is that you can freely share branches because they’re immutable.

So far I’ve separated the build process of the AST and the final immutable AST. The final AST nodes are completely public Java records and leverage the new Java pattern matching techniques with sealed types. Sealed types make it possible to switch on Java types. Every AST node is a record so there are no records for specific operation, each operation is a record type. The AST node/records have no operations.

There is a ModuleBuilder class that can be used to build Alloy modules from code. It contains methods that are called from the parser and it allows you to build a complete module.

Most important, there is a strict separation between the API (mostly Java interfaces) and the implementations.

My unfinished & messy working branch is here: org.alloytools.alloy/org.alloytools.alloy.parser/src/main/antlr/Alloy.g4 at pkriens/api · pkriens/org.alloytools.alloy · GitHub

Hi Peter -

Could you update us on the status of this project? You’ve mentioned above that there is at least one issue that cannot be resolved within grammar rules. Are there any other issues like this? To what extent does your new rendering of the Alloy grammar eliminate some of the functionality in the current “CompModule.resolveAll” function? It would be very helpful for someone to post an explanation of what happens within the Alloy resolve phase (besides typechecking) - i.e., how are the Alloy data structures changes in that phase?

Is the API far enough along for someone to use it?

thanks, Nancy

Hi Nancy,

It is work in progress. I was plodding along and got sidetracked with the certification issue for the Mac App and still in progress the same for the Windows app. I got quite far along the new parser and AST but then just found out I was trying to change too much at once. This work absolutely needed the API work to be in there before I can think of changing that part. So I resurrected my API work and am currently staring at a large pile of lose parts I need to integrate again. Since Matthew stopped the funding it also became harder to find the time because there are also several other projects pulling as well as the summer in the south of France … Actually, having this kind of interest is also good since it is more motivating than plodding alone and wondering who actually wants to use it.

I am currently deep in a C++ project for 3D printing (OpenSCAD) but once I got my objects in there I will start spending more time again. Despite the lack of funding, I want to at least get the API and a new parsers/AST in there.

Thanks for the update Peter!

Thank you for making the ANTLR grammar available. We are going to try building on it for some of our Dash+ tools.

Huge problem with the grammar is that you can only build an AST when you have the context available. Without the context (that assigns types to names), the grammar is extremely ambiguous and often you can only decide what the semantics are at the end of a long expression, creating many parallel paths.

Its tricky. I hope to find time soon to show the work I did here because it would help if we had the syntax translated into a nice AST

For me, there would be a question of whether you want to include this disambiguation within the parsing process or as a separate pass over a data structure that simply represents the parsed form. There is a separate “resolveAlloy” function and I’ve never understood what is done in parsing vs resolving.

In there is an ast node that holds all possible undecided paths through the graph. Thus is the resolved when the types are known in a second pass.

In my parser I create builders during the first pass. Then in a second phase I create the AST by letting the builders give me a set of possible paths (might be empty) and tracking all paths. At the end I have either 1 path or an error. But then I can still look back what all the different possible paths were or in the case of no paths indicate that it doesn’t type check.

I’m fanatically trying to make the AST immutable.

Peter