How Code Generation Works
This page gives a brief decription of how this code generator works. It is not intended to be the final treatise on how to write any code generator. It is meant to be a reference for those who wish to contribute to this effort, or to use it as a reference implementation.
There are two steps: a parse step which essentially involves reorganizing data to make it more friendly to templates, and a translation step which sends information about the API to templates, which ultimately write the library.
The protoc contract
This code generator is written as a protoc plugin, which operates on
a defined contract. The contract is straightforward: a plugin must
accept a CodeGeneratorRequest
(essentially a sequence of
FileDescriptor
objects) and output a
CodeGeneratorResponse
.
If you are unfamiliar with protoc plugins, welcome! That last
paragraph likely sounded not as straightforward as claimed. It may be useful
to read plugin.proto and descriptor.proto before continuing on. The
former describes the contract with plugins (such as this one) and is relatively
easy to digest, the latter describes protocol buffer files themselves and is
rather dense. The key point to grasp is that each .proto
file compiles
into one of these proto messages (called descriptors), and this plugin’s
job is to parse those descriptors.
That said, you should not need to know the ins and outs of the protoc
contract model to be able to follow what this library is doing.
Entry Point
The entry point to this tool is gapic/cli/generate.py
. The function
in this module is responsible for accepting CLI input, building the internal
API schema, and then rendering templates and using them to build a response
object.
Parse
As mentioned, this plugin is divided into two steps. The first step is
parsing. The guts of this is handled by the API
object,
which is this plugin’s internal representation of the full API client.
In particular, this class has a build()
method which
accepts a sequence of FileDescriptor
objects (remember, this is protoc
’s
internal representation of each proto file). That method iterates over each
file and creates a Proto
object for each one.
Note
An API
object will not only be given the descriptors
for the files you specify, but also all of their dependencies.
protoc
is smart enough to de-duplicate and send everything in the
correct order.
The API
object’s primary purpose is to make sure all
the information from the proto files is in one place, and reasonably
accessible by Jinja templates (which by design are not allowed to call
arbitrary Python code). Mostly, it tries to avoid creating an entirely
duplicate structure, and simply wraps the descriptor representations.
However, some data needs to be moved around to get it into a structure
useful for templates (in particular, descriptors have an unfriendly approach
to sorting protobuf comments, and this parsing step places these back
alongside their referent objects).
The internal data model does use wrapper classes around most of the
descriptors, such as Service
and
MessageType
. These consistently contain their
original descriptor (which is always spelled with a _pb
suffix, e.g.
the Service
wrapper class has a service_pb
instance variable).
These exist to handle bringing along additional relevant data (such as the
protobuf comments as mentioned above) and handling resolution of references
(for example, allowing a Method
to reference its
input and output types, rather than just the strings).
These wrapper classes follow a consistent structure:
They define a
__getattr__
method that defaults to the wrapped desctiptor unless the wrapper itself provides something, making the wrappers themselves transparent to templates.They provide a
meta
attribute with metadata (package information and documentation). That means templates can consistently access the name for the module where an object can be found, or an object’s documentation, in predictable and consistent places (thing.meta.doc
, for example, prints the comments forthing
).
Translation
The translation step follows a straightfoward process to write the contents of client library files.
This works by reading in and rendering Jinja templates into a string. The file path of the Jinja template is used to determine the filename in the resulting client library.
More details on authoring templates is discussed on the Templates page.
Exit Point
Once the individual strings corresponding to each file to be generated
is collected into memory, these are pieced together into a
CodeGeneratorResponse
object, which is serialized
and written to stdout.