Baserock Moving Parts

The main parts of Baserock are:

  • Morph - a command line tool for workflow
  • Lorry - a tool for collecting upstream source, converting into Git if required
  • Trove - a Baserock appliance to host git repos collected by Lorry, and/or created by Baserock users
  • git.baserock.org - our 'master' instance of a Trove, containing all the upstreams we are integrating
  • the Baserock Definitions format - YAML language for describing build+integration instructions
  • Reference system definitions - a git repo defining various Baserock reference systems
  • Trebuchet - a tool for updating Baserock systems, based on BTRFS snapshots
  • tbdiff - a tool used by Trebuchet to create binary diffs between directory trees

We also deal with the following:

  • chunk: an upstream project when built as part of Baserock
  • stratum: a collection of chunks
  • system: a collection of strata for a specific device
  • morphology: a description of how a chunk, stratum or system should be built
  • morph file: a structured text file describing a chunk, stratum or system
  • artifact: the result of building a chunk, stratum or system
  • cluster: stuff that gets deployed together

WIP clarification of terms used in Baserock

Morphologies

Morphology is close to the raw-data loaded in from the definitions repository, though defaults are added, and there's some helper methods on the objects, so there's not necessarily a 1 to 1 mapping between a Morphology object and a file in the definitions repository.

Sources

Sources are a more processed form of this, including the repository urls and commit IDs of the source code needed to build the morphology, plus the splitting rules derived from the Morphology.

Each Source keeps a reference to the morphology, since we need to use it to find things like the name, kind and build commands.

We could move this information into the Source object, but there's at least one place where it's useful to know that two Sources came from the same Morphology.

Artifacts

Artifacts are created from the splitting rules. When you build a Source, the Artifact objects are used to determine the relationship between which Source produces what.

Before my patch, the Artifact included pretty much everything, including the cache key and dependencies, after my patch it's pretty much just the name, the reference to the source, references to the Sources that depend on this Artifact, and the methods for determining where to put things in the ArtifactCache.

Build Graphing

After all the Sources and Artifacts are constructed, the ArtifactResolver is used to fill in all the build dependencies, and a couple of other fields.

After this, the CacheKeyComptuer class is used to fill in the cache keys, since all the required information is available at this point.

Previously the dependencies would be filled in per-artifact, which would add a lot of redundant data, and means we have to build per-artifact, which results in the huge number of build steps listed.

These are all distinct operations that are currently a bit jumbled together as there's no clear objects going in and objects going out of each step, since there's a lot of mutated state, and the Sources/Artifacts are created at the same time as we're still parsing the morphologies.