Alloy 6 vs. TLA+

Only a month late! Here’s my usual spiel:

Alloy comes from the Z line of “systems are basically giant relational algebra problems”, while TLA+ comes from the LTL line of “systems are basically giant temporal logic problems”, which leads to a lot of fundamental differences between them. They also had different design philosophies: Jackson wanted a language that was easy to build tooling for, Lamport wanted a language that was easy to express complex logic in.

One consequence of this is that Alloy is a much smaller language: you can fit the entire syntax on a single sheet. It’s also much easier in TLA+ to write a spec that you can’t model-check, while with Alloy you have to be actively trying. It’s also impossible to write an unbound model in Alloy, so you’re guaranteed that every spec is over a finite state space. Which gets rid of the common TLA+ runtime problem of “is this taking a long time because it’s a big state space or because it’s an infinite state space”.

Re tooling: Alloy converts to SAT, which makes model checking much, much faster. Specs that would take minutes to check in TLA+ take less than a second in Alloy. Being able to model-find as well as model-check is really nice, as is the ability to generate multiple counter/examples. The visualizer is really useful, too, especially for showing counterexamples to nontech folk. But the thing I honestly miss most when working in TLA+ is the REPL. It’s not the best REPL, but it’s a hell of a lot better than no REPL.

Okay, so drawbacks: Alloy6 added a notion of “time”. This expands the space that Alloy can do, but the syntax has footguns in it. If you forget to update a variable in TLA+, you get an error, if you forget to update in Alloy, it’ll just assign whatever value it wants to that variable. I asked one of the developers and apparently it’s not something they can easily address in the current semantics. Also, temporal specs take a lot longer to check than static specs. The more “timelike” the problem gets, the worse Alloy becomes at handling it.

Alloy also doesn’t have composable collection types. You can model “a queue of messages”, but you can’t easily model “a queue of messages for each server”. It’s possible but you have to be clever, whereas in TLA+ it’s straightforward.

Less fundamental but still a big problem I keep having: no good iteration or recursion. Transitive closure is pretty nice but it’s not powerful enough. For example:

     sig node {edge: set Node}

Is N1 connected to N2? Easy, just do N1 in N2^.edge. How many nodes is the shortest path? That’s much harder.

Which one is more appropriate to use really depends on your problem. As a very, very, VERY rough rule of thumb: TLA+ is best you’re most interested in the dynamics of a system: the algorithms, the concurrency, etc. Alloy is best when you’re most interested in the statics of a system: the requirements, the data relationships, the ontology, etc. My standard demos for showcasing TLA+ are concurrent transfers and trading platforms, while my standard demos for Alloy are access policies and itemized billing. You can use each in the other’s niche, too (see Zave’s work on Chord), so it’s not a strict division.


I’ll make getting the actual comparison post done a priority now


The (symbolic) nuXmv backend of Alloy 6 is surprisingly good at checking invariants (with unbounded model checking). For example, recently I was specifying the original Echo distributed spanning tree protocol from Chang, and Alloy 6 was way faster at checking the key correctness property for ALL connected networks with up to 5 nodes than TLA+/TLC was at checking ONE single configuration with a complete network with 5 nodes (even with symmetry on). I agree that in general TLA+/TLC is faster for distributed and concurrent algorithms, but that is not always the case (I think that when we have a lot of non-determinism the symbolic model checking of Alloy 6 tends to be better, but that is just a wild guess).

With the seq “multiplicity” that particular example is also fairly easy to model in Alloy 6 (syntax for the operations is not pretty though, neither efficiency), but I agree that the larger collection of types in TLA+ makes it easier for most users to specify the data structures in their systems, since they resemble more the kinds of types that are available in programming languages. The “everything is a relation” motto of Alloy forces you into a very abstract modelling mindset, that might be very puzzling at first.



Oh that’s interesting! Would you be willing to share both specs?

Sure, I’ll send you the specs by email.

Oww awesome, could I have them as well? I’m the maintainer of a Clojure library that wraps TLC (Recife), but I would like to have Alloy 6 available as well (as Alloy is built with Java, like TLC). Maybe having a TLA+ operator calling Alloy, who knows lol

Sure, I’ll email you.

1 Like

I’m new to formal methods, but I’m on fire for the notion, and benefits of “thinking above the code”. So far I have played with TLA+, only really used Alloy, and my impressions so far (still very new and ignorant) is that TLA+ is like the “expert mode” and that I could probably write a spec for just about anything with it, but it has a higher learning curve. Also, there seem to be many more learning resources available for TLA+ (probably just due to maturity I’m guessing). PlusCal looks like an amazingly helpful tool to help a noob like me ease into this.

Alloy’s approachable syntax and the model explorer seem absolutely awesome! Some parts of the Alloy docs I’ve been rereading because I’m so new to this higher level of abstraction. The temporal logic seems to make sense to me pretty quickly, it’s the relational logic that confused me a little bit at first. I really like Alloy a lot. (Please forgive me if I’m misusing some of this new vocabulary)

I’m curious, does anyone here have any experience with P? It seems interesting the whole idea of representing your system as communicating state machines.

In any case, I just took a new job working for a utility provider. I work with a bunch of Electrical Engineers and Power Systems Engineers so luckily, they are pretty supportive when it comes to improving quality, verifiability, and modeling. I’m extremely confident that Alloy 6 and/or TLA+ would be extremely helpful to our team. I’m very new to power systems as well, and we’re trying to make the power grid “smarter” by having the various devices involved in a microgrid send data about the system/circuit state through message broker and our Rust code subscribes to the topics, and then uses a genetic algorithm to “solve” the circuit for VVO/CVR (Variable Voltage Optimization/Conservation Voltage Reduction), and issues commands to the hardware to adjust the voltage to ensure we stay within the allotted range and don’t fry people’s stuff.

Given that this is a set of complex systems with a lot of concurrent behavior, extremely expensive equipment, and operators could get hurt or people could lose power, I’ve been shocked at the lack of documentation and tests of the code base my little team is inheriting.
This is an excellent example of verifiable correctness being kinda important, so I was pretty surprised when I asked around and the other software folk were not only unaware of formal methods, but were confused and almost hostile at the idea (perhaps I did not communicate effectively).

It has taken so long for my team to gain background context on the problem domain and to become familiar with the massive and complex code base we’ve inherited.

It never ceases to amaze me how ubiquitous it is in the industry to just cut corners (little/no or low quality tests, low/no diversity of testing strategies, poor/non-existent documentation, etc.). Many business just want to get to market first, and say they value quality, but any suggested practices/processes to improve quality/verifiability are met with impatience or annoyance.

Side Note: the previous team of contractors (and original authors of the code base) are being let go because they blew up a battery/capacitor bank that probably cost a couple years of my salary.

I’ve been hungry to learn some new tactics and tools to help the software teams I work on level up the quality of our deliverables and I think Alloy 6 and/or TLA+ is exactly what I’ve been looking for. I’ve got some minimal buy-in and I’m going to create a specification for our test grid, the devices on it, and ideally the voltage optimization algorithm and get the SMEs to help me ensure it’s correct and demonstrate to the software teams the power and benefits of this practice!

I think I fully agree with your assessment of our industry and it’s practices. I’m on a life long quest to find ways to write better software better. And puzzled by how such an important industry can sometimes be so shoddy.

That said, it is hard. I fell in love with Alloy about 5 years ago. I really think it contains the possibility to really improve the design process. However, I do not think it is ready for the kind of industrial applications you talk about. It might get there, and I’ve been looking for customers to fund the development of the missing pieces. For I do think there is an impedance mismatch between the formal tools and the industry tools.

I think the industry mostly needs a strong specification language that tightly integrates with the continuous integration pipeline. Academia seems mostly interested in how clever the tool is in verifying assertions.

The most brilliant part of Alloy is that it can create instances of models but because of that numbers are severely crippled and model size sometimes a problem.

I do believe that the impedance mismatch can be bridged but it does require a lot of work and I see nobody really working on this bridge.

1 Like

@jbrubru and I supervised a master internship last year, assessing P (essentially an actor language) and comparing it to some work we had done in Alloy several years ago. As far as we were able to see, verification in P was difficult to set up and P being asynchronous, you need to implement synchronization yourself when in need (it’s easier the other way round).

1 Like

Thank you so much for y’all’s feedback. It’s helping me get my bearings, and helping me correct my understanding of what tools are suited for which scenarios a bit too.

My understanding so far is that (and please correct me if I’m wrong) TLA+ is probably the best (and maybe only) of these tools that is viable for industrial (not just small systems or academic demonstrations) use. I’ve heard about some of the white papers or descriptions about TLA+ being used at AWS and Microsoft for large and important systems or algorithms.

Of course, it’s difficult to provide y’all that know more than me with enough context to ask this, and I don’t want to impose on your time too much, but I’m getting the notion that of the existing tools I’m aware of (and a bit familiar with), TLA+ would be the most appropriate tool to try to use at my job. Would anyone here say this sounds correct? Or would anyone here disagree strongly?

I’ve been given a chance to do a spike and demonstrate what might be possible. I have one teammate who has fully bought-in to the idea and wants to learn this skillet too. I’m considering either of the following 2:

  1. I could use Alloy for the ontology of the devices on the microgrid (and it would help everyone to visualize this, because not all of us can read circuit/electrical diagrams) but the genetic algorithm and any progression of state transitions of the circuit would best be handled with TLA+
  2. Use TLA+ all the way

Please forgive me for advice-seeking here. Does anyone have any feedback or suggestions regarding my alternatives? Or is there a more appropriate place to request such feedback?

Thank you for your reply. I actually believe strongly in contributing to projects I believe in (even if it’s just $10/month or something). I know that the Linux Foundation just created the TLA+ foundation to help fuel improvement and adoption. I’ll check Open Collective as well, but I’m curious if there are any established ways to donate a little bit each month towards the development of Alloy or TLA+?

We do not have an official way to donate and I am not sure how we could spend that kind of money.

What imho is really needed are research projects on actual industrial problems. We miss the bridges and they can only be made under the pressure of real world pressure.

Understood sir, and that makes sense. One of our newest power systems engineer got her PhD (I’m not sure how long ago). But I know she still has active connections in academia. I can inquire with the company I’m contracting for if they have any sort of internship/study program.

This newest engineer seems the most passionate about sharing understanding, increasing quality, and following suggested industry practices (obviously especially on the power systems side, as state and federal regulations must be adhered to), and I’ve talked to her a couple times about trying to extend some of that rigor to the software side.

I’d be absolutely willing to request a case study (our company has published some before, just not usually as software focused I think). Would something like that be more helpful?

That was one thing that kind of bummed me out was on page, with the exception of Google Scholar searches, I didn’t find anything with Alloy after the year 2015 or so. I would love to see some industrial uses that included Alloy 6.

Will this be posted to

Coming back to this thread after doing what turned out to be a massive project in Alloy 6, I think I need to share some thoughts.

The project I worked on was specifically a series of work flows that enable a factory, and customer service technicians to get credentials for a new or serviced machine, by using a unique hardware identifier.

Alloy 6 had some very charming features, but also some very nasty drawbacks that make it jarring to work with.

Let’s open up with the nice features.

The big one is being able to effectively roll the dice and see many different structural combinations that are allowed in a given model. This highlighted about 60% of the problems we were going to run into.

Second, being able to use set theory to describe the various states of the machines, the sub component which stores the unique hardware identifier, and the types of mappings of connections to our backend found the next 30% of problems.

It can kinda model time with actions. This found the last 10% of problems that needed to be addressed.

There are some other nice features that are a little weird, like being able to describe “all the other machines” as machine - m where m is the machine the current function is applied to.

The pain points:

Specificity. Unlike TLA+, Alloy can’t tell you when you’re a variable or a mapping is being implicitly left to changed. In a massive system that spans across multiple machines, sub component, production facilities, and end users locations (gyms), you need to pathologically specify each step. With large numbers of variables and mappings that means lots unchanged or stuttering sets, in frankly a poorly scalable method. I also have the expectation that stuttering a machine would stutter it’s fields as well, but that’s not the case at all.

Lists. Without a language level list, this only exacerbates the previous issue, in so far, 40% of my specifications are just stutter and unchanged lines, of various flavors to lock down what’s modified.

Sub-sets. Things get real funky when describing what happens with sub-sets. I would expect that a subset of machine (e.g. powered_on machines) would also be stuttered if I stuttered it’s superset.

Now, would I be able to implement the same specification in TLA+? I don’t know, this project was more interested in the implicit structures that could be formed across machines and services, and that would’ve been very difficult in TLA+. However, modeling actions was frankly pulling teeth, and I would really hate to extend my model further.

In summary: I still think Alloy is good for structural studies, but I think it’s inappropriate for adoption of industry as a whole, especially with temporal studies or with verification.


I don’t understand some of your points. Without seeing a representative example of your model, we can’t see whether there would have been a solution to the issues you encountered. We can’t assess whether Alloy and TLA+ are the best approaches in this context. We can’t compare Allow and TLA+ solutions to the same problem. We don’t which temporal properties you want to check.

A few remarks:

  • Alloy does have lists in the shape of sequences.
  • Regarding the specification of steps, TLA+ asks that you specify unchanged variables (the advantage is that UNCHANGED can take a tuple as parameter).
  • Regarding “machines”, which do not exist in Alloy per se, you could perhaps specify that some variables don’t change when one (representing a machine) doesn’t.

Here is a minimal case necessary to start showing pain points.

let unchanged[c] { ( c = (c)' ) }

var sig microProc{}

var sig machine {
    var uP : lone microProc,
    var cached_uP: lone microProc

var sig powered in machine{}
var sig online in powered{}

one sig backend{
	var entries : machine -> lone microProc 

pred stutter_machines[m :set machine]{
	// Preserve that devices don't switch membership spontaneously
	all i : m | unchanged[i] and unchanged [i.uP] and unchanged[i.cached_uP]
	// Maintain membership in these groups

pred backendDelete[m:machine, c:uP]{
	some backend.entries[m]
	backend.entries' = backend.entries - (m -> c)

This becomes onerous at scale. What I would expect is something like:

pred backendDelete[m:machine, c:uP]{
	some backend.entries[m]
	backend.entries' = backend.entries - (m -> c)

Effectively locking the superset of machine ST items within machine do not change their membership, and nor do their fields change membership. This would already alleviate the problem that I was facing with lists.

Even at the lowest level, the following would be a big improvement in QoL:

pred stutter_machines[m :set machine]{
	// Preserve that devices don't switch membership spontaneously
	all i : m | unchanged[i, i.uP, i.cached_uP]
	// Maintain membership in these groups
	unchanged[m, m.uP, m.cached_uP]

Thanks, i can’t really comment right now but will come back to this late August.


I don’t agree that should be the semantics in general. I can think of examples where I might want to keep a superset unchanged but change a subset. Like @grayswandyr mentioned, if you want that particular semantics in your example you can just add a fact stating precisely that, for example something like:

fact machine_stuttering {
	always (unchanged[machine] implies unchanged[uP] and unchanged[cached_uP] and unchanged[powered] and unchanged[online])

Now if you want to keep all machines and machine subsets unchanged in a particular event you only need to state unchanged[machine], for example:

pred backendDelete[m:machine, c:microProc]{
	some backend.entries[m]
	backend.entries' = backend.entries - (m -> c)

Btw, in your predicate stutter_machines I think you have some redundant constraints and that it could be simplified to:

pred stutter_machines[m :set machine]{
	unchanged[m <: uP] 
	unchanged[m <: cached_uP] 


1 Like