• Speaker: Raja Sambasivan (CMU)
  • Date: April 6th, 2012 (Friday)
  • Room: CIT 368
  • Title: "Diagnosing performance changes by comparing request flows"
  • Abstract:
Diagnosing performance problems in distributed systems is very time consuming and difficult. The root cause could be contained in any one of the numerous components or subcomponents of the system, or worse, could be a result of interactions among them. Clearly, new problem diagnosis techniques are needed. In this talk, I describe request-flow comparison, a new technique for automatically localizing the sources of performance changes in distributed systems and a logical first step toward the larger goal of completely automated diagnosis and healing. It uses the key insight that such changes often manifest as mutations in the path requests take through the distributed system---e.g., the components they visit and the functions they access---or in their timing. Exposing these mutations and showing how they differ from previous behaviour localizes the source of the problem and significantly guides developer effort. Case studies of using request-flow comparison to diagnose real, previously undiagnosed problems in a prototype distributed storage service and in certain Google services are presented.