On-demand curation (ODC) tools like Paygo, KATARA, and Mimir allow users to defer expensive curation effort until it is necessary. In contrast to classical databases that do not even permit potentially erroneous data to be queried, ODC systems instead answer with guesses or approximations. The quality and scope of these guesses may vary and it is critical that an ODC system be able to communicate this information to an end-user. The central contribution of this paper is a preliminary user study evaluating the cognitive burden and expressiveness of four representations of attribute-level uncertainty. The study shows (1) insignificant differences in time taken for users to interpret the four types of uncertainty tested, and (2) that different presentations of uncertainty change the way people interpret and react to data. Ultimately, we show that a set of UI design guidelines and best practices for conveying uncertainty will be necessary for ODC tools to be eective. This paper represents the first step towards establishing such guidelines.
This section consists of papers on uncertainty representation from SIG CHI conference.
Sample-Oriented Task-Driven Visualizations: Allowing Users to Make Better, More Confident Decisions
We often use datasets that reflect samples, but many visualization tools treat data as full populations. Uncertain visualizations are good at representing data distributions emerging from samples, but are more limited in allowing users to carry out decision tasks. This is because tasks that are simple on a traditional chart (e.g. “compare two bars”) become a complex probabilistic task on a chart with uncertainty. We present guidelines for creating visual annotations for solving tasks with uncertainty, and an implementation that addresses five core tasks on a bar chart. A preliminary user study shows promising results: that users have a justified confidence in their answers with our system
Visualization of Uncertainty and Reasoning
This article gathers and consolidates the issues involved in uncertainty relating to reasoning and analyzes how uncertainty visuaizations can support cognitive and meta-cognitive processes. Uncertainty in data is paralleled by uncertainty in reasoning processes, and while uncertainty in data is starting to get some of the visualization research attention it deserves, the uncertainty in the reasoning process is thus far often overlooked. While concurring with the importance of incorporating data uncertainty visualizations, we suggest also developing closely integrated visualizations that provide support for uncertainty in reasoning.
Trust Me, I’m Partially Right: Incremental Visualization Lets Analysts Explore Large Datasets Faster
Queries over large scale (petabyte) data bases often mean waiting overnight for a result to come back. Scale costs time. Such time also means that potential avenues of exploration are ignored because the costs are perceived to be too high to run or even propose them. With sampleAction we have explored whether interaction techniques to present query results running over only incremental samples can be presented as sufficiently trustworthy for analysts both to make closer to real time decisions about their queries and to be more exploratory in their questions of the data. Our work with three teams of analysts suggests that we can indeed accelerate and open up the query process with such incremental visualizations.
Enterprise Data Analysis and Visualization: An Interview Study
Organizations rely on data analysts to model customer engagement, streamline operations, improve production, inform business decisions, and combat fraud. Though numerous analysis and visualization tools have been built to improve the scale and efficiency at which analysts can work, there has been little research on how analysis takes place within the social and organizational context of companies. To better understand the enterprise analysts’ ecosystem, we conducted semi-structured interviews with 35 data analysts from 25 organizations across a variety of sectors, including healthcare, retail, marketing and finance. Based on our interview data, we characterize the process of industrial data analysis and document how organizational features of an enterprise impact it. We describe recurring pain points, outstanding challenges, and barriers to adoption for visual analytic tools. Finally, we discuss design implications and opportunities for visual analysis research.
Evaluating Sketchiness as a Visual Variable for the Depiction of Qualitative Uncertainty
We report on results of a series of user studies on the perception of four visual variables that are commonly used in the literature to depict uncertainty. To the best of our knowledge, we provide the first formal evaluation of the use of these variables to facilitate an easier reading of uncertainty in visualizations that rely on line graphical primitives. In addition to blur, dashing and grayscale, we investigate the use of ‘sketchiness’ as a visual variable because it conveys visual impreciseness that may be associated with data quality. Inspired by work in non-photorealistic rendering and by the features of hand-drawn lines, we generate line trajectories that resemble hand-drawn strokes of various levels of proficiency—ranging from child to adult strokes where the amount of perturbations in the line corresponds to the level of uncertainty in the data. Our results show that sketchiness is a viable alternative for the visualization of uncertainty in lines and is as intuitive as blur; although people subjectively prefer dashing style over blur, grayscale and sketchiness. We discuss advantages and limitations of each technique and conclude with design considerations on how to deploy these visual variables to effectively depict various levels of uncertainty for line marks.
A User Study to Compare Four Uncertainty Visualization Methods for 1D and 2D Datasets
Many techniques have been proposed to show uncertainty in data visualizations. However, very little is known about their effectiveness in conveying meaningful information. In this paper, we present a user study that evaluates the perception of uncertainty amongst four of the most commonly used techniques for visualizing uncertainty in one-dimensional and twodimensional data. The techniques evaluated are traditional errorbars, scaled size of glyphs, color-mapping on glyphs, and colormapping of uncertainty on the data surface. The study uses generated data that was designed to represent the systematic and random uncertainty components. Twenty-seven users performed two types of search tasks and two types of counting tasks on 1D and 2D datasets. The search tasks involved finding data points that were least or most uncertain. The counting tasks involved counting data features or uncertainty features. A 44 full-factorial ANOVA indicated a significant interaction between the techniques used and the type of tasks assigned for both datasets indicating that differences in performance between the four techniques depended on the type of task performed. Several one-way ANOVAs were computed to explore the simple main effects. Bonferronnis correction was used to control for the family-wise error rate for alpha-inflation. Although we did not find a consistent order among the four techniques for all the tasks, there are several findings from the study that we think are useful for uncertainty visualization design. We found a significant difference in user performance between searching for locations of high and searching for locations of low uncertainty. Errorbars consistently underperformed throughout the experiment. Scaling the size of glyphs and color-mapping of the surface performed reasonably well. The efficiency of most of these techniques were highly dependent on the tasks performed. We believe that these findings can be used in future uncertainty visualization design. In addition, the framework developed in this user study presents a structured approach to evaluate uncertainty visualization techniques, as well as provides a basis for future research in uncertainty visualization.
Evaluating the Effects of Displaying Uncertainty in Context-Aware Applications
Many context aware systems assume that the context information they use is highly accurate. In reality, however, perfect and reliable context information is hard if not impossible to obtain. Several researchers have therefore argued that proper feedback such as monitor and control mechanisms have to be employed in order to make context aware systems applicable and usable in scenarios of realistic complexity. As of today, those feedback mechanisms are difficult to compare since they are too rarely evaluated. In this paper we propose and evaluate a simple but effective feedback mechanism for context aware systems. The idea is to explicitly display the uncertainty inherent in the context information and to leverage from the human ability to deal well with uncertain information. In order to evaluate the effectiveness of this feedback mechanism the paper describes two user studies which mimic a ubiquitous memory aid. By changing the quality, respectively the uncertainty of context recognition, the experiments show that human performance in a memory task is increased by explicitly displaying uncertainty information. Finally, we discuss implications of these experiments for today’s context-aware systems.
How Good is 85%? A Survey Tool to Connect Classifier Evaluation to Acceptability of Accuracy
Many HCI and ubiquitous computing systems are characterized by two important properties: their output is uncertain it has an associated accuracy that researchers attempt to optimize—and this uncertainty is user-facing—it directly affects the quality of the user experience. Novel classifiers are typically evaluated using measures like the F1 score but given an F-score of (e.g.) 0.85, how do we know whether this performance is good enough? Is this level of uncertainty actually tolerable to users of the intended application and do people weight precision and recall equally? We set out to develop a survey instrument that can systematically answer such questions. We introduce a new measure, acceptability of accuracy, and show how to predict it based on measures of classifier accuracy. Out tool allows us to systematically select an objective function to optimize during classifier evaluation, but can also offer new insights into how to design feedback for user-facing classification systems (e.g., by combining a seemingly-low-performing classifier with appropriate feedback to make a highly usable system). It also reveals potential issues with the ubiquitous F1-measure as applied to user-facing systems.
Large databases with uncertain information are becoming more common in many applications including data integration, location tracking, and Web search. In these applications, ranking records with uncertain attributes needs to handle new problems that are fundamentally different from conventional ranking. Specifically, uncertainty in records’ scores induces a partial order over records, as opposed to the total order that is assumed in the conventional ranking settings. In this paper, we present a new probabilistic model, based on partial orders, to encapsulate the space of possible rankings originating from score uncertainty. Under this model, we formulate several ranking query types with different semantics.We describe and analyze a set of efficient query evaluation algorithms. We show that our techniques can be used to solve the problem of rank aggregation in partial orders. In addition, we design novel sampling techniques to compute approximate query answers. Our experimental evaluation uses both real and synthetic data. The experimental study demonstrates the efficiency and effectiveness of our techniques in different settings.
Expressive Query Construction through Direct Manipulation of Nested Relational Results
Despite extensive research on visual query systems, the standard way to interact with relational databases remains to be through SQL queries and tailored form interfaces. We consider three requirements to be essential to a successful alternative: (1) query specification through direct manipulation of results, (2) the ability to view and modify any part of the current query without departing from the direct manipulation interface, and (3) SQL-like expressiveness. This paper presents the first visual query system to meet all three requirements in a single design. By directly manipulating nested relational results, and using spreadsheet idioms such as formulas and filters, the user can express a relationally complete set of query operators plus calculation, aggregation, outer joins, sorting, and nesting, while always remaining able to track and modify the state of the complete query. Our prototype gives the user an experience of responsive, incremental query building while pushing all actual query processing to the database layer. We evaluate our system with formative and controlled user studies on 28 spreadsheet users; the controlled study shows our system significantly outperforming Microsoft Access on the System Usability Scale.