Seeing Things in the Clouds: Browsing Semi-Structured Data with Tag Clouds over Concept Lattices

Search is one of the most common operations on the internet, and one of the cornerstones of information retrieval. However, users must know exactly what they want and to formulate explicit queries to search for it. In many applications, this is difficult because the underlying data archives are semi-structured, i.e., consist of a mixture of formatted (e.g., author or date) and free-text (e.g., synopsis or description) fields, and the information remains implicit.

Software development archives such as revision control and issue tracking systems are one instance of this general problem. Users are interested in high-level questions (e.g., How have the active developers changed over time?'',Which topics has this developer been working on?’’, or ``Which methods are often changed together?’’) but the archives do not directly provide the answers. Dedicated mining tools can automate the search for specific aspects, but remain inflexible. We instead propose an interactive browsing approach to analyze such semi-structured data archives. The main novelty of this approach is the combination of an intuitive tag cloud interface with an underlying concept lattice that provides a formal structure for navigation. This lets users serendipitously explore the data without predefined goals and along different navigation paths, and thereby allows them to discover unexpected information.

We implemented our approach in the ConceptCloud browser. It uses tag clouds to provide a unified view of the selected items, but the browser interface also supports concurrent navigation in multiple interlinked tag clouds that can each be customized individually, which allows multifaceted archive explorations. We sketch the mathematical foundations and show how we used ConceptCloud in different domains, including software archives, citation data, and wine reviews.

Bio: Bernd Fischer is professor in the Division of Computer Science at Stellenbosch University (South Africa) and the current Head of Division. His research area is automated software engineering, in particular logic-based (in the broadest sense) techniques. He has worked on a wide range of topics, such as specification-based component reuse, program synthesis from high-level models, or software model checking.

Fischer studied Computer Science at the TU Braunschweig (Germany) but obtained his PhD from the University of Passau (Germany) in 2001. From 1998 to 2006, he was a Research Scientist with USRA/RIACS at the NASA Ames Research Center (USA). From 2006 to 2013, he held a position as Senior Lecturer at the University of Southampton (UK), and in 2013, he moved to Stellenbosch University (South Africa), in an attempt to improve the access to both sunshine and wine after seven years in England.