• Speaker: Tyson Condie (UCLA)
  • Date: November 8, 2013 (Friday)
  • Room: TBA
  • Title: "Big Learning Systems"
  • Abstract:
A new wave of systems is emerging in the space of Big Data Analytics that open the door to programming models beyond Hadoop MapReduce (HMR). It is well understood that HMR is not ideal for applications in the domain of machine learning and graph processing. This realization is fueling a number of new (Big Data) system efforts: Berkeley Spark, Google Pregel, GraphLab (CMU), and Hyracks (UC Irvine), to name a few. Each of these add unique capabilities, but form islands around key functionalities: fault-tolerance, resource allocation, and data caching. In this talk, I will provide an overview of Big Data Systems starting with Google's MapReduce, which defined the foundational architecture for processing large data sets. I will then identify a key limitation in this architecture; namely, its inability to efficiently support iterative workflows. I will then describe real-world examples of systems that aim to fill this computational void. I will conclude with a description of my own work on a layering that unifies the key runtime functionalities (fault-tolerance, resource allocation, data caching, and more) for workflows (both iterative and acyclic) that process large data sets.

Tyson Condie is a principal scientist with the Cloud and Information Services Lab at Microsoft and an Assistant Professor at UCLA. He received his Ph.D. from Berkeley. His research focuses on data analytics, distributed systems, Internet-scale query processing and optimization, and declarative language design and implementation. His current work involves building a system software stack for large-scale data processing tasks on resource managers like Apache YARN, Berkeley Mesos, Google Omega, and Facebook Corona.