Simplified DB migration - process, not structure

I am by no means experienced in formal methods. You can thank Hillel Wayne’s published work for anything I managed to learn, and me for anything I didn’t.

Sorry, this is a long one. I have more difficulties than questions, but I hope people here can offer some pointers for the first besides answers to the second.

I’m trying to model what from an enterprise, industry perspective looks like a simple example that still has some relevance to real life. I hope I’m neither too wordy not missing too many crucial details below:

  • There is some kind of aggregate in the DDD sense - a business concept that has a unique identifier, might contain other things with unique identifiers and is a “transaction boundary”.

    • That’s not a good explanation of what a DDD aggregate is, but I think it’s what’ll be the most important for writing the model.
    • Any update we make that only changes something within the aggregate is atomic.
  • I want to change the data representation in the DB.

  • The update will be gradual, not all-at-once. We don’t want to stop the normal use of the application.

  • An application instance (there’s more than one) can write data in the Old format, the New format or an Intermediate format. This determines which versions of the aggregate it can read.

    Writes Can read
    Old Old, Intermediate
    New New, Intermediate
    Intermediate Old, New, Intermediate
    • In this model I’m not interested in modeling properties like the equivalence of the before/after representations. Assume that’s done separately.
  • Application instances change what versions they write one-at-atime.

  • Normal client requests still update the database through the application instances through the whole lifetime of the process.

    • These write out whatever representation the application instance is putting out.
  • The overall idea is to:

    1. Update all application instances from writing Old to Intermediate
    2. Make sure all the aggregate instances get updated by the running application instances.
    3. Update all application instances from writing Intermediate to New.
    4. Make sure all the aggregate instances get update by the running application instances (the assumption is that overwriting Intermediate with New removes the redundancies present in the old one.
  • These are baby steps. I don’t even try to think about cases like “through pathological scheduling the DB commits a very old write way after the entire migration process seems to have completed successfully”.

    • It might be an interesting future extension, or not. I’ve not given this sort of thing thought yet, as I’m struggling to get the current model doing sensible things.
    • Lower hanging fruit I’ve also left intentionally unaddressed: additions of new aggregate instances and deletions of existing ones.
    • For an RDBMS (unless everything’s stuck in a JSON cloumn in PostgreSQL) I’d have to have schema changes in the process as well. At this-or-close-to-this level of detail it does not seem like there’d be anything interesting the model could say about them.
module migration

check { spec implies eventually always Stutter = PossibleStep }

// "clean" model signatures

sig AppInstance {
	, var writes : one Schema
}

sig Aggregate {
	, var schema : one Schema
}

one sig Old extends Schema {}
one sig Intermediate extends Schema {}
one sig New extends Schema {}

abstract sig Schema {}

pred canRead[i: one AppInstance, a: one Aggregate] {
	a.schema in writerCanReadSchema[i.writes]
}

fun writerCanReadSchema[ws: one Schema]: set Schema {
	{ s: Schema |
		{
			ws = Old implies not s = New
			ws = New implies not s = Old
		}
	}
}

// safety

pred qos {
	all i: AppInstance | all a: Aggregate | canRead[i, a]
}

// liveness

pred completes {
	storedForeverAs[New] and writtenForeverAs[New]
}

pred neverWrittenAgain[s: Schema] { eventually always no (AppInstance.writes & s) }

pred writtenForeverAs[s: Schema] { eventually always AppInstance.writes = s }

pred neverStoredAsAgain[s: Schema] { eventually always no (Aggregate.schema & s) }

pred storedForeverAs[s: Schema] { eventually always Aggregate.schema = s }

pred progressesToNo[s: Schema] {
	eventually always no (Aggregate.schema & s)
	eventually always no (AppInstance.writes & s)
}

pred progressesTo[s: Schema] {
	eventually always Aggregate.schema = s
	eventually always Aggregate.schema = s
}

// fairness support signatures

var sig PossibleStep in Step {}

one sig Workload extends Step {}
one sig Migration extends Step {}
one sig Stutter extends Step {}

abstract sig Step {}

// behaviour

pred spec {
	init and step
}

pred init {
	#Aggregate > 1 and #Aggregate < 4
	#AppInstance > 2 and #AppInstance < 4
	
	all a: Aggregate | a.schema = Old
	all i: AppInstance | i.writes = Old

	PossibleStep = Migration
}

pred step {
	always {
		Migration = PossibleStep or Stutter = PossibleStep

		some AppInstance.writes - New implies Migration in PossibleStep

		Migration = PossibleStep implies migrationStep
		Stutter = PossibleStep implies stutter
	}
}

pred workloadStep {
	// TODO: implement
	stutter
}

pred migrationStep {
	AppInstance.writes = New implies Migration not in PossibleStep'
	some i: AppInstance | i.writes = Old implies updateInstance[i, Intermediate]
	some i: AppInstance | {
		i.writes = Intermediate
		no (AppInstance.writes & Old)
	} implies updateInstance[i, New]
}

pred updateInstance[i: AppInstance, nw: Intermediate + New] {
	all a: Aggregate | a.schema' = a.schema
	
	one i: AppInstance {
		all o: AppInstance - i | o.writes' = o.writes

		i.writes' = nw
	}
}

pred stutter {
	all i: AppInstance | i.writes' = i.writes
	all a: Aggregate | a.schema' = a.schema
}
  • I’ve read some people on here reporting pain points when there’s a lot of variables being updated at the same time.
    • I’m having similar difficulties even though there’s only 2 time-varying relations in the model.
    • I am sure some of my pain is self-inflicted. I’ve probably tried multiple complex things that someone more experienced could express more simply.
    • Some of it might be that I’m trying bits and pieces from specs I can see on here without being able to find understandable (to me) documentation for them, ex let unchanged[s, r] = s.(r)' = s.r.
    • By now I think that the most likely way for “clear nonsense” to happen is when I’ve not specified how all variables change from each and every reachable/possible state.
      • Some indication in the visualizer that this is happening (dashed arrow on the behaviour trace? IDK) might go a long way for someone like me.
      • It’s not obvious why the spec is triggering Alloy’s arbitrary-change-when-unspecified behavior. I feel like knowing how to structure a action predicate (might be the wrong term?) to avoid that from happening would go a long way towards moving from “mostly fixing up the basic encoding of the model” to “mostly working on the problem”.
      • Sometimes I’m surprised when some variables do undergo arbitrary change and yet the safety property I’ve defined (qos) holds. This seems too good to be true given all the uncertainty, confusion and pain I’m experiencing otherwise.
    • I’ve tried adding some fairness (? don’t grasp the terminology here well) to ensure things proceed up to the point where nothing can change anymore. This is clearly not a trivial thing to figure out how to implement from scratch.
    • Would this kind of model be better served by TLA+?
  • I tried splitting this off into multiple modules so I had the spec in one and run/check commands in others.
    • I wanted to do this to work with different commands for the same spec more easily.
    • Unfortunately, the table view on an instance seems to hide all the signatures (?) from the imported spec module.
    • I hope I’m just missing something in the UI:
      • either to show signatures from other modules in the table view with these
      • or to pick which command to run from multiple present in a module.
  • How do I highlight the syntax as Alloy in Disqus?
    • alloy nor alloy6 after the first triple-backtick do not seem to work (if I use java at least some keywords get highlighted in the previes).
  • I’m sure I’m forgetting something else I wanted to ask/mention.

Very long message indeed, too long for me :slight_smile:
But, regarding your questions about difficulties, your step predicate and events seem non-standard. You may find this post helpful (stop before the Event Reification section, it’s confusing for a newcomer). This tutorial/lecture notes may also be useful.

Not particularly, the standard way to model such a spec in Alloy (and detailed in the links pointed to above) is similar to TLA+.

1 Like

I did fail to mention the model is incomplete/ - due to the difficulties I’m having.
Thanks, I’ll be sure to study that in detail when I have a bit more time.

Having a template for “how to structure something like this in general” is bound to be helpful. Pedagogically, as a resource, that reminds me of the How to Design Programs book.

You recommend to skip event reification and I will do my best to do that, but “seeing what event just occurred” seems like such a QoL improvement it’ll be hard not to sneak a peek.

To ask a question that’s probably more “on my level”. In the template in that thread you use the some quantifier.

fact behaviour {
	// initial conditions
	no inbox
	...
	// execution
	always (skip or some n : Node | send[n] or compute[n])
}

When I read some the intuition is that more than one node could act at a time. If that is prevented by the frame conditions it means… that the kind of decomposition/locality of reasoning I’m used to from programming no longer applies? Am I on the right track with that?

I think I can live with that (more like: have to if I want to get anywhere with Alloy), but it is surprising / a stumbling block.

From what little I understand of TLA+ it seems it’d be less likely to cause this sort of confusion in a beginner. Would be glad to be corrected if I misunderstand that.

OK, just rephrasing things to follow that pattern closely is surprisingly effective (modified spec below). I don’t know what was wrong with the original and/or how deeply confused I was about things while trying to write it. That would be nice – but it’s a goal for later.

The comments below do contain some doubts, but they feel minor compared to getting out of the initial tar pit. Being a beginner I obviously can’t tell if they are minor.

The other thing that I’ve managed not to stumble upon initially that’s a huge QoL difference (and addresses one of the difficulties from the first post) is Execute > Execute all and the entries for running individual commands under the Execute menu.

Just being able to see that the model can complete is a huge difference.

The way the state transitions are encoded feels very abstract vs a typical step-by-step program, but right now that’s plenty good enough.

module migration

check qos { qos }

run canStopWritingOld { neverWrittenAgain[Old] }

// This passes if we don't handle any requrests.
check stopsWritingOld { (eventually always stutter) or neverWrittenAgain[Old] }

run canComplete { completes }

// "clean" model signatures

sig AppInstance {
	, var writes : one Schema
}

sig Aggregate {
	, var schema : one Schema
}

one sig Old, Intermediate, New extends Schema {}

abstract sig Schema {}

// safety

pred qos {
	all i: AppInstance | all a: Aggregate | canRead[i, a]
}

pred canRead[i: one AppInstance, a: one Aggregate] {
	a.schema in writerCanReadSchema[i.writes]
}

fun writerCanReadSchema[ws: one Schema]: set Schema {
	{ s: Schema |
		{
			ws = Old implies not s = New
			ws = New implies not s = Old
		}
	}
}

// liveness

pred completes {
	storedForeverAs[New] and writtenForeverAs[New]
}

pred neverWrittenAgain[s: Schema] { eventually always no (AppInstance.writes & s) }

pred writtenForeverAs[s: Schema] { eventually always AppInstance.writes = s }

pred neverStoredAsAgain[s: Schema] { eventually always no (Aggregate.schema & s) }

pred storedForeverAs[s: Schema] { eventually always Aggregate.schema = s }

pred progressesToNo[s: Schema] {
	eventually always no (Aggregate.schema & s)
	eventually always no (AppInstance.writes & s)
}

pred progressesTo[s: Schema] {
	eventually always Aggregate.schema = s
	eventually always Aggregate.schema = s
}

// behaviour

fact spec { init and step }

pred init {
	#Aggregate > 1 and #Aggregate < 4
	#AppInstance > 2 and #AppInstance < 4
	
	all a: Aggregate | a.schema = Old
	all i: AppInstance | i.writes = Old
}

pred step {
	always {
		stutter or (
		some i: AppInstance, s: Schema | restart[i, s] ) or (
		some i: AppInstance, a: Aggregate | handleRequest[i, a] )
	}
}

pred restart[i: one AppInstance, s: one Schema] {
	// guard
	not s = Old
	not i.writes = New
	not { i.writes = Intermediate and s = Old }
	all a: Aggregate | s in writerCanReadSchema[a.schema]

	// A restart with the same value is (or should be) equivalent to a stutter anyway.
	// Q: Does guarding against it matter? Not sure.
	not s = i.writes

	// effect
	i.writes' = s

	// frame conditions
	all o: AppInstance - i | o.writes' = o.writes
	all a: Aggregate | a.schema' = a.schema
}

pred handleRequest[i: one AppInstance, a: one Aggregate] {
	// guard

	// A write that does not change the schema looks like a stutter anyway.
	// Q: Does guarding against it matter? Not sure.
	// Maybe it'd be better to make the aggregates have some state?
	// That way it'd be possible to distinguish "normal updates" from "just rewrite using the new schema".
	not a.schema = i.writes

	// On one hand - a request to update a will fail if i cannot read it.
	// On the other - aren't we preventing the model from detecting that something is wrong?
	// The qos check already handles the same concern in a different way...
	// But I get an inkling there's some modelling tradeoff in putting canRead in the guard.
	canRead[i, a]

	// effect
	a.schema' = i.writes'

	// frame conditions
	all i: AppInstance | i.writes' = i.writes
	all o: Aggregate - a | o.schema' = o.schema
}

pred stutter {
	all i: AppInstance | i.writes' = i.writes
	all a: Aggregate | a.schema' = a.schema
}

My most pressing issue (“how to get out of a pit”) is solved. Would it be OK to continue posting updated versions of the model in here over time as I try to learn more through doing (and reading up!)?

EDIT: Aaand there was an error in qos - the predicate was checking the initial state only. Ensuring safety required an additional guard condition in restart.

...

pred qos {
	// 'always' was missing before.
	always all i: AppInstance | all a: Aggregate | canRead[i, a]
}

...


pred restart[i: one AppInstance, s: one Schema] {
	// guard
	not s = Old
	not i.writes = New
	s = New implies no { o: AppInstance - i | o.writes = Old } // This bit's new.
	not { i.writes = Intermediate and s = Old }
	all a: Aggregate | s in writerCanReadSchema[a.schema]

	// A restart with the same value is (or should be) equivalent to a stutter anyway.
	// Q: Does guarding against it matter? Not sure.
	not s = i.writes

	// effect
	i.writes' = s

	// frame conditions
	all o: AppInstance - i | o.writes' = o.writes
	all a: Aggregate | a.schema' = a.schema
}

...

If you feel at ease with reification, then do not hesitate to use it. What’s important IMO is that reification should just observe the model in the sense that the set of traces with reified events should be in bijection with the set of traces without them.

I’m not completely sure to understand what you mean so here is how I view the situation. First, indeed, some allows several events at the same time but, indeed too, due to frame conditions, it’s likely that only one event can happen in any given state. More precisely, many events may happen at the same time but their effect will usually be the same, which means that they are indistinguishable events w.r.t their effect.

Second, when programming, you’re designing a machine (under a different name: program, function, state machine, automaton…). The semantics is usually that the only part of the system that changes is what the machine modifies and the rest is unchanged. What you call locality. Like TLA+, but contrary to Event-B for instance, Alloy does not come with a notion of machine. Rather, we only have logic. We’re just saying what is true of a system compliant with our spec. If you don’t say anything, then anything may happen. If you want something not to happen, you must enforce it.

I don’t see a difference w.r.t TLA+ on this front. In fact, the “standard” style I advocated for above directly comes from TLA+. In the latter, the standard approach is also to model frame conditions (UNCHANGED in TLA+), steps as a some under an always ([Next] in tLA+) and stuttering (_vars in TLA+).

1 Like

A lot (most?) of my confusion probably stems from murky recollections and shoddy analogies in my head. Though I’m certain I have ways to go – your comments do help get me on the right track. Thanks!

1 Like