The use of Alloy and LLM's

I’ve been heavily engaged with GPT-4 recently. It is fascinating how much it knows but trying to use it in a real setting (controlling the Eclipse IDE) it is a nightmare, it does some things surprisingly well but then makes the most basic mistakes. Thank god there is git to restore the workspace previous state.

Interestingly, it is quite good at correcting a mistake after a proper diagnosis or a counter example. It often feels like an open loop system where the human is supposed to be the feedback provider. Sadly, this requires extensive knowledge by the ‘clever Hans’ (as this mechanism is called).

I wonder if we could use Alloy as the ‘clever Hans’? If the LLM generates an Alloy program we could generate counter examples and feed these back.

GPT-4 can generate Alloy models but they are usually not very clever or good but I think with a thorough description of the Alloy language and maybe extensively training an LLM?

Clearly the syntax does not have to be Alloy, the idea is to be able to generate counter examples so the LLM is confronted with the consequences of its proposals.

Is anybody else thinking along these lines?

Hi! This is exactly what I’m going to be working on starting in a week – I’ve enrolled at The Recurse Center for a 6-week (March 25 - May 3) professional sabbatical. I have lots of ideas, but generally I’m playing in this space of Formal Models integrated into day-to-day dev settings, bridged by LLM. I generally work in Rails applications, and have good vibes with Alloy, so I’m starting with Alloy+LLM+Rails.

Pre-gaming in research a bit, this is very aligned with the work that Sullivan is doing at SCOPE Lab, and I’ve run into other Rails+Alloy projects too, such as Near’s earlier work on Rubicon, a (defunct) ruby RSpec DSL that integrates Alloy into working test suites.

I’m imagining taking a working codebase (say a Rails app), extracting one or more Alloy models using LLM stuff, maybe also taking some direction of what we want to explore / prove / disprove, and then generate back unit tests (say Rspec tests) that pass/fail as expected. In Rails this might be focused on the models, API interactions, existing test functionality, etc – so I’m hoping that there is enough consistent structure there for the cleverness of Hans to turn into something practical. Maybe something that can get into the CI/CD pipeline. Dunno.

One of my co-workers suggested focusing on cancan access control as an even more focused subset, which makes sense, and there are likely other areas. But I want to swing for the stars.

I welcome ideas!!! Since I’ll be working on this more or less full time, I’m also up for direct collaboration or brainstorming on zoom and whatnot. I’ll be in Brooklyn for the duration and would be happy to buy someone an IRL coffee if they wanna talk Alloy+LLM+Rails or some overlap therein :slight_smile:

This sounds incredibly interesting!

You’re moving in a direction, I think that I also tried to pursue in Java. Some of my thoughts are in this video: https://www.youtube.com/watch?v=_j3SdIGqYG8

This post was, however, more in the direction of letting the LLM make a proposal/plan in Alloy and then creating an instance and feed it back.

Anyway, keep us posted, or keep me posted (peter.kriens@aqute.biz), very interested in these efforts.

What a delightful video @peter.kriens! Yes, this is definitely the same area that I’m exploring here. I’m doing exploration on what interactions can be very practical, and on what layer (cli, embedded in test suite, ci/cd).

Once I’ve got a few ideas I’d love to chat live in the next few weeks – I’ll email to coordinate but will also post here a public zoom that anyone interested will be welcome to join as an ad-hoc demo and discussion group.

Anyone else working in this space?

1 Like