Jürgen Cito — April 15 2015

Learning-based Program Analysis for Software Configurations (Jurgen Cito, MIT CSAIL)

Configuration errors have become an increasing root cause of
software failures as modern software systems grow increasingly
large and more complex. Just recently, a misguided software
configuration change lead to major outages at Facebook
and Instagram [1]. In this talk, I want to give an overview on
recent work that employs learning techniques to enable program
analysis for configurations: (1) Learning a static analyzer for
continuous integration configurations, and (2) Synthesizing container
configurations from interactions and state changes.

(1) Incorrect configuration settings lead to build failures in
continuous integration (CI) environments, which can take hours to run,
significantly delaying feedback loops and wasting valuable developer
time. We make use of cascading decision trees to learn constraints about
CI configurations that identify failing builds and their root causes.
To more accurately identify root causes, we train a neural network that
filters out constraints that are less likely to be connected to the root
cause of a build failure.

(2) Writing container configurations (Dockerfiles) is a tedious and
error-prone process. We automate this task by recording developer
interactions and observe state changes in containers. The challenge is
to distinguish between experimental interactions and essential
interactions that eventually lead to the desired final state of the
infrastructure. We show different techniques to achieve
this goal in addition to a repair technique based on learned
probabilistic models from all container configurations on GitHub that
transforms instructions in the from the interaction language to conform
to (statistical) best practices in Dockerfiles.