Performance issue with a relatively small application domain

Hi all!

In my company we are experimenting with Alloy 6 for the design of one of our products. We are experiencing some issues with the performance and the module system of the Alloy that I wanted to share with you… All the models can be found here [1].

Performance

Our model is still relatively small compared to what we need to implement, but we already have some performance issues while generating CNF or when we run simulations.

In the first instance we got a “Capacity exceeded” warning when generating CNF. Even if we solved it with a refactoring, it is still a red flag as all is not yet implemented. Do you think that it is surprising or expected given the size of our models?

A second issue is the time required to solve a simulation that involves multiple modules/concepts. When working module by module everything is fine, the problem arises when we merge/reuse them in the application file. A workaround is to compose 2 or 3 modules/concepts at a time, but is not really maintainable especially with the themes that cannot be modularized. Same question here, do you think it is expected? Or the models are poorly implemented maybe?

Module System

As you can see I use generic modules a lot, but I don’t really know if it is better to use this feature or to implement our module with an “open” type and latter to restrict the type to the wanted set, for example:

module:

module a

sig A {
   attr: one univ
}

app:

module app
open a

sig B, C {}

fact {
   A.attr in B + C
}

I feel that the generic modules are easier to use, but I cannot find a way to import a generic module using the union of type, e.g.; open a[B + C]. Why do we have such a limitation?

Note on application domain:

In the Github repository I sent you, the padme.md file is the application, the other files are concepts of the application. There is a “fwk” folder which contains some “framework” functions and the “themes” folder which contain themes to visualize each concept as well as the application. You can see the Padme application as the Microsoft Teams application with the Team, Channel, and Application that you can attach to Channel. In Padme, you have the Brief that contains Lead and you can attach some Item to the Lead. Items could be any thing such as task, evaluation document, email, attachement,… The Brief is a kind of objectif and a Lead a kind of review process of an asset regarding the Brief. The all application is a partnering platform for the biopharma sector.

Thanks for your help,
Best Regards,

Clement Delgrange

[1] https://github.com/cdelg/padme

1 Like

Hi Clement,

I haven’t had a chance to look at your model in detail, but a few quick reactions:

  1. Using univ extensively is going to kill performance, because it prevents Alloy’s type checker from limiting which atoms can be related in a relation. Even with just 3 signatures, let’s suppose you give them each a scope of 5, and you have a scope of 10 for Time. Then you have a total universe of 25 atoms, which means that a binary relation has 2^(25 x 25 x 10) possible values, when it might have had only 2^(5 x 5 x 10). That’s a very big increase in the state space! Instead, make the generic modules parametric in some sig, declaring module foo[T] and then instantiating T when you use it. You can’t instantiate with a union A+B, but you can instantiate with a super signature (ie, introduce C that A and B extend).

  2. Check that you’re using the one quantifier carefully, and that you really mean one and not some. Remember that one implies at most one, and that introduces an additional complex formula.

  3. What does ordered_value do? Does it really need to be parameterized by 3 separate types?

  4. The structure of preds like addEvalToLeadEquivAddRelatedTaskToLead is very unusual in the way that it uses a macro that is defined to be a comprehension. I don’t understand this idiom that you’re using at first sight, but maybe you can explain it and we can see if there’s a simpler way to express it?

Daniel

1 Like

Hi,
that’s a big model indeed!

AFAIK, this is due to the Kodkod/Pardinus representation of fields (= relations) as specific matrices. The CapacityExceeded exception occurs when there’s a relation with an arity a (notice that a var relation has a hidden “time” column counting +1) and a domain (= the set of all possible atoms, incl. integers taken from the Int sig) of cardinality d such that d^a > 2,147,483,647 (= MAX_INT). You model has a domain of size 96 (!). So with a relation of arity 5, or a var relation of arity 4, you’re already above MAX_INT… Perhaps this is what you had in your model before refactoring?

I don’t know if this threshold can be changed: is it a specific limitation that could be addressed by simply switching from ints to longs in Kodkod source code?

Adding to Daniel’s answer: I fear this is to be expected due to the combination of a number of parameters: the number of atoms and relations, the number of var fields and sigs, the under-specification of goals (e.g. an eventually goal is less constrained than a more fixed scenario)… The main way to address this issue will, I guess, to decrease the size of the state space, by making your models as small and deterministic as possible: to do so, I suppose you should split the model in several smaller ones, depending on abstractions of other complex parts and try to reduce non-determinism.

Regarding univ, notice also that, in Alloy 6, it is the set of all atoms “existing in the current state”, not all atoms at all states. So I’m not sure all expressions you wrote using univ always mean what you think.

Like Daniel, I’m a bit surprised by the idiom you’re using (with primes occurring sometimes at non-usual palces, as well as events referring to the past). This isn’t necessarily wrong but a bit hard to follow. We have some guidelines here and there that you may find useful.

Hope this helps.

1 Like

Hi Daniel, thank you for your time and help!

Using univ extensively is going to kill performance, because it prevents Alloy’s type checker from limiting which atoms can be related in a relation. Even with just 3 signatures, let’s suppose you give them each a scope of 5, and you have a scope of 10 for Time. Then you have a total universe of 25 atoms, which means that a binary relation has 2^(25 x 25 x 10) possible values, when it might have had only 2^(5 x 5 x 10). That’s a very big increase in the state space!

Indeed, removing univ improved the performance by 2. It’s not a big difference in my case, but it’s still noticeable.

Instead, make the generic modules parametric in some sig, declaring module foo[T] and then instantiating T when you use it. You can’t instantiate with a union A+B , but you can instantiate with a super signature (ie, introduce C that A and B extend).

I used this technique but the drawback is a loss of decoupling between my modules. If the modules that define A and B are reusable concept libraries, they will need to have a dependency on C which is defined in the more specific context where A and B’s modules are used together. I didn’t find a way to tell that C is in A or in B using the type system without A or B “knowing” it. Maybe it is unusual, but currently my modules do not need any external knowledge other than the identity of the atoms in the parametric type. Do you think this requirement/expectation is too restrictive? Do you have another suggestion without using extends?

Check that you’re using the one quantifier carefully, and that you really mean one and not some.

Indeed, maybe it is my OOP mindset, but I have a lot of exclusive relations between my objects.

What does ordered_value do? Does it really need to be parameterized by 3 separate types?

Hum, to be honest I’m not proud of this one. It is a tentative to define a generic ordered “value object” instantiated only with enum constant, e.g.; Date.

The structure of preds like addEvalToLeadEquivAddRelatedTaskToLead is very unusual in the way that it uses a macro that is defined to be a comprehension. I don’t understand this idiom that you’re using at first sight, but maybe you can explain it and we can see if there’s a simpler way to express it?

These predicates are used to constrain/automate the action defined in the modules. The macros are just alias for the comprehensions (I could replace this with functions, I don’t know if it makes a difference). The comprehensions are used to “collect” the atoms affected by an action in order to apply another action on them.

The specific addEvalToLeadEquivAddRelatedTaskToLead predicate means:

  1. When I add an Evaluation Document to a Lead, then I create a Task and add it to the Lead too.
  2. When I add a Task to a Lead, then If the task is linked to an Evaluation, this one must be added to the Lead too.

Thanks again for your help, I hope I was clear enough in my answers.

Best Regards,
Clément.

Hi,

AFAIK, this is due to the Kodkod/Pardinus representation of fields (= relations) as specific matrices. The CapacityExceeded exception occurs when there’s a relation with an arity a (notice that a var relation has a hidden “time” column counting +1) and a domain (= the set of all possible atoms, incl. integers taken from the Int sig) of cardinality d such that d^a > 2,147,483,647 (= MAX_INT) . You model has a domain of size 96 (!). So with a relation of arity 5, or a var relation of arity 4, you’re already above MAX_INT… Perhaps this is what you had in your model before refactoring?

This was exactly the case, thanks for the explanation.

Adding to Daniel’s answer: I fear this is to be expected due to the combination of a number of parameters: the number of atoms and relations, the number of var fields and sigs, the under-specification of goals (e.g. an eventually goal is less constrained than a more fixed scenario)… The main way to address this issue will, I guess, to decrease the size of the state space, by making your models as small and deterministic as possible: to do so, I suppose you should split the model in several smaller ones, depending on abstractions of other complex parts and try to reduce non-determinism.

I was afraid of this. I’m not working on a small tricky domain but on a simple and large one. Many aspects need to be added like access control and or other small modules, I was hoping to address them all at once.

Regarding univ , notice also that, in Alloy 6, it is the set of all atoms “existing in the current state”, not all atoms at all states. So I’m not sure all expressions you wrote using univ always mean what you think.

Thanks for mentioning this, I will double check.

Like Daniel, I’m a bit surprised by the idiom you’re using (with primes occurring sometimes at non-usual palces, as well as events referring to the past). This isn’t necessarily wrong but a bit hard to follow. We have some guidelines here and there that you may find useful.

Thanks for the links, I will definitely check them out.

Clément.

1 Like

Hi!

With all your suggestions and comments I managed to improve the performance considerably (repository updated), thanks.

In order to avoid using univ with my generic modules I did something unusual I think. Let say we have a Trash module that have a parametric type TrashItem. Now, I want to reuse this module in a context where I can trash Files and Directories and another context where I can trash Video for example. Instead of using a base type I declare the module like that:

module trash[TrashItem1, TrashItem2]

private let TrashItem = TrashItem1 + TrashItem2

// Then I only use TrashItem macro

and use the module as follow:

module app1

open trash[File, Directory]

sig File, Directory {}
module app2

open trash[Video, Video]

sig Video{}

It is not really elegant, but it works well. Is there a reason to forbid union of types for parametric types.

1 Like

Hi,

I believe you could still have a single parameter and compute the union using a subset signature. For example, in module app1 you could have:

open trash[Object]
sig Object in univ {}
fact {Object = File+Directory}

Probably, this would be less efficient than your trick, but of course, in this particular example, you could also have File and Directory as extensions of Object and pass Object as parameter.

Best,
Alcino

1 Like

Hi,

I believe you could still have a single parameter and compute the union using a subset signature.

I tried this but there is a big impact on the performance, more than 10 times slower than using multiple parametric types.

Probably, this would be less efficient than your trick, but of course, in this particular example, you could also have File and Directory as extensions of Object and pass Object as parameter.

The problem with the second method is when File or Directory are defined in another module, they cannot extends the Object type.

Best Regards,
Clément

I suppose instantiating a generic module with a parameter is performing a pure syntactic operation, namely doing as if you had a non-generic module where the formal parameter has been replaced with the actual parameter. In that case, you can only allow actual parameters that work in all contexts. Which is not the case if instantiating with a union; as the following exemple shows, you may write:

module M[X]
sig S extends X {}

but this is forbidden:

sig S extends A+B {}

That being said, Alloy also allows to write something like this:

module trash[X]
...

and then:

module app1
open trash[Obj]
sig File, Directory {}
sig Obj = File + Directory {}

I don’t know if the generated Kodkod is of the same style as in Alcino’s solution, or if it’s more efficient; you may check.

1 Like

Hi,

I didn’t know this construct! Your method does not impact the performance, thanks!

Yh, I think it’s not documented, or very barely.

I just checked and Obj doesn’t exist at all in the Kodkod encoding (which is why it doesn’t impact performance in your example). It is litteraly replaced by File + Directory, hence sig S extends X {} will be also be forbiden in trash, as it comes to the same situation as above.

2 Likes

This is a fascinating topic! Thanks for taking the time to dive into @cdelg’s real life model. That syntax with sig Obj = File + Directory {} is pretty cool.