How would you make this git objects model more idiomatic?

After trial and error, I’ve ended up with the following model of git object:

sig Object {}

sig Tree {
  children: some (Object + Tree)
}

sig Message {}

sig Commit {
  tree: one Tree,
  parent: lone Commit,
  message: lone Message
}

fact {
  -- Trees
  -- … can't contain themselves
  no t: Tree | t in t.^children
  -- … can't have the same children as another tree. If they do in real life,
  -- it's just the same tree (the storage is content-addressable)
  no disj t, u: Tree | t.children = u.children

  -------------------------------------

  -- Commits
  -- … can't form a cycle
  no c: Commit | c in c.^parent
  -- … can't be exactly the same as another commit. Same reason trees can't
  -- be identical.
  no disj c, d: Commit | c.tree = d.tree && c.parent = d.parent && c.message = d.message

  -------------------------------------

  -- Messages
  -- … are used by at least one commit. (Messages are really just a way to
  -- model that you can have a commit with or without a message.)
  all m: Message | some c: Commit | c.message = m
}

pred Default {}

run Default for 3

It produces helpful instances! But now that I’ve got this basically working, I want to stop and ask: what would an expert improve? I’m sure I’m going about things in less-than-great ways (e.g. maybe Object and Tree and Commit should all inherit from something so as to model that they’re all stored together in .git/objects on a filesystem?) and I’d love to hear y’all’s feedback!

(Next step for me: try using time in Electrum to model some simple operations and check that my model and the real git binary give the same result!)

1 Like

Hi Brian, This seems like a really nice start. I think this is more than idiomatic enough to move on and start modeling more. I would ask yourself: what design aspects of Git do you have questions or uncertainties about? Or which design aspects are essential to Git working? For example, maybe you want to model how blobs are hashed and what properties the hashes should have.

One small comment about the model. This constraint

all m: Message | some c: Commit | c.message = m

seems to me to belong not the model proper but to a run command. There’s no reason to say that there can’t be messages that haven’t been assigned to commits. To make such messages disappear in the visualizer, you’d just select “hide unconnected nodes.”

1 Like

ah, thank you! I didn’t know about that option. I’ll definitely have to use that in the future :+1: