As our systems move to more concurrent and distributed execution patterns, the tools and abstractions we have to monitor, schedule, and enforce system behaviors become progressively less effective or adequate. Inevitably, we need the ability to correlate context at one point of the system with events that are meaningful at other parts of the system.
This observation has been directly or indirectly addressed in a vast body of existing work by maintaining a notion of context that follows the execution patterns of applications through events, queues, thread pools, and messages between distributed components. However, despite the proliferation of several popular end-to-end tracing frameworks — Zipkin, Dapper, X-Trace, and others — today’s distributed systems do not support the notion of an end-to-end request context in a well-defined or coherent manner. This has led to a fragmented landscape of poorly supported, siloed tracing frameworks that conflate necessarily separate design concerns such as data formats, semantics, and propagation logic.
In this work we propose a principled layered design for end-to-end request context in distributed systems. Our design enables tracing systems to share common underlying layers: a system-level instrumentation layer for capturing request flows at the application level; and a data layer for propagating arbitrary data alongside a request’s execution. Using our Baggage abstraction, it is possible to develop new tracing applications and deploy them on existing infrastructure without needing additional instrumentation or redeployment.
The Tracing Plane is a layered design for context propagation in distributed systems. The tracing plane enables interoperability between systems and tracing applications. It is designed to provide a simple “narrow waist” for tracing, much like how TCP/IP provides a narrow waist for the internet.
We have recently (January 2017) released our open-source prototype of the Tracing Plane on GitHub. Documentation, Tutorials, and an academic paper are coming soon in 2017!
Baggage is our name for general purpose request context in distributed systems, and Baggage is implemented by the Tracing Plane. Though many systems already have request contexts — e.g., Go’s context package; Span contexts in Zipkin, OpenTracing and Dapper; request tags in Census; etc. — none of them are general purpose. What this means is that if I instrument my distributed system to pass around Zipkin span contents, then later wish to use Census, I must reinstrument everything in order to pass around Census tags. That sucks.
The Tracing Plane has four layers, illustrated in green in the figure below. Depending on who you are, your entry point to the Tracing Plane differs.
System developers use the Transit Layer APIs to instrument their system to pass baggage around.
Tracing application developers use the Baggage Buffers IDL to generate contexts and APIs for their tracing application.
In the middle, the Atom and Baggage layers provide generic interfaces that together enable a multitude of different kinds of tracing applications to coexist.
For more information, see the documentation on our GitHub repo. This is an active research project and we will be publishing an academic paper in 2017.