Remote Build Execution - An Overview

One of the big benefits of using build systems like Bazel is that it supports remote-execution and caching of the build out of the box. Each action in the build graph can be executed on a remote machine, cached and then downloaded onto the local machine. When there are multiple developers on the team (or you are re-building your project from scratch on a different machine), and if the build has already been run by a developer in the team, then the other developers can simply reuse the results of the previous build thereby reducing the build time and load on the their local machines.

How is a remotely executed build faster?

Faster end-to-end build time is achieved in two ways:

1. High Parallelism

In a locally executed build, the build parallelism is usually capped to the number of CPU cores / RAM available on the machine (for example -j 16 of -j 96 if you are on one of those more expensive machines). With a remotely executed build you can far exceed the local parallelism and can go upto -j 500 or even -j 1000 depending on how good and light-weight the client implementation is. If portions of the build graph are wider than the -j value of the local machine, then those portions will heavily benefit by high parallelism.

1. Cached Action Results

While remote-execution of an individual action is generally expected to be slower than a locally executed action (remote-execution has overhead like shipping local input files to remote server, running the action on remote-server and downloading remotely executed results), fetching the cached result of a previously remotely executed action is usually faster than locally executing a build action. In a large organization (like Google :D) or a large repository, > 90% of the overall actions generally result in cache-hits. The larger the number of repetitive builds you do, the higher your cache-hit rate will be.

Remote Execution API

Action Specification

Build systems like Bazel use the open source Remote Execution API with a corresponding gRPC server that implements a remote-execution API. The remote-execution API describes a mechanism for executing an arbitrary local command remotely. In order to execute a command on a remote machine you would need the following basic things:

1. The command line invocation
2. Input files / directories
3. Environment variables
4. Output files / directories
5. Platform configuration (like Mac / Linux / Windows)

For example, lets say you want to run a compilation action like the following on a remote Linux machine

PWD=/proc/self/cwd clang++ -c test.cpp -o test.o


Here’s what the various requirements mentioned above would map to:

1. The command line invocation = clang++ -c test.cpp -o test.o.
2. Input files / directories This would include the following files:
1. test.cpp
2. test.h
3. ... other header files included by test.h / test.cpp transitively ...

1. Environment variables = PWD=/proc/self/cwd
2. Output files / directories = test.o
3. Platform configuration = OS=Linux, toolchain=clang-10 Note that the platform configuration you specify depends on what platform configurations the remote-execution server respects. In the example above, the remote-execution server would have to know to make clang-10 binaries available during command execution and must execute the command on a Linux machine. As you can imagine, more fancier platform specifications are possible.

An encapsulation of the various things described above is what we call an Action. An Action captures the various information needed to run the command on a remote server.

We looked at what goes into an action but haven’t described how the action is presented to the remote-execution server by the client. In the remote-execution API specification, content-addressable storage (CAS) is used to present the action and also the dependencies specified by the action. The CAS is a common key-value store that both the remote-execution client and remote-execution server utilize for exchanging data. The general series of steps in presenting an a piece of data (an action spec / an input file etc) to a remote-execution server using content-addressable storage is:

1. Compute the digest of the what you want to store.

a. In the case of a regular file, it will be a digest of the file.

b. In the case of a directory, it will be a digest of the Merkle tree root of the directory.

c. In the case of an action specification, it will be a digest of the wire format of the action spec.

2. Store the digest along with the data in CAS. The digest will be the key and the data itself will be the value.

3. Present the digest to the remote-execution server which can in-turn utilize the CAS service to fetch the data.

Steps involved in running an action

At this point we have all the basic building blocks we need to see how we can run an action remotely. Lets take a sample C++ compile command we saw above, add more details to it and see how it can be remotely executed with the remote-execution API.

test.cpp:

#include "test.h"
int main() {
std::cout << "Hello world!\n";
return 0;
}


test.h

#ifndef TEST_H
#define TEST_H

#include <iostream>

#endif

1. Determine the inputs of above action. In Bazel, the dependent header files are described in BUILD file. In scenarios where that is not the case, the test.cpp file will be preprocessed to find out the dependent header files.
2. Upload each of the header files along to CAS keyed with their digest.
3. Determine the output to be produced by the action. In this case, the output is going to be test.o file.
4. Determine the toolchain to be used for this compilation. Lets assume that we are going to use clang-10 and assume that the remote-execution server will make it available when remotely running the action.
5. Construct the action spec. It would look something like (in textproto):
input_root_digest: <input-dir-digest>
output_files: "test.o"
platform: {
key: "toolchain"
value: "clang-10"
}
platform: {
key: "OS"
value: "Linux"
}

1. Serialize the action spec to binary format and compute the digest of the serialized action spec.
2. Upload the action spec to CAS keyed by its digest.
3. Call Execute() RPC on the remote-execution server.
4. Download the output file from the CAS using the digest present in the result of the Execute() call.