Statistics for HCI. Alan Dix
Читать онлайн книгу.is ‘really’ a 50:50 phenomenon.
formalist This is a pragmatic position: it doesn’t matter what probability ‘really’ is, so long as it satisfies the right mathematical rules. In particular, Bayesian statistics encodes beliefs as 0–1 values, which satisfy the rules of probability (sometimes called plausibility or reasonable expectation [14, 40, 41]).
Often frequentist is used to refer to more traditional forms of statistics such as hypothesis testing (see Chapter 6), in contrast to Bayesian statistics (see Chapter 7), because the latter usually adopts the formalist approach, treating probability as belief. However, this is a misnomer as one can have frequentist interpretations of Bayesian methods and one can certainly apply formalism to traditional statistics. Personally, I tend to use frequentist language to explain phenomena, and formalism to do actual calculations … but deep down I am an idealist!
We will explore and experiment further with randomness in the next chapter, but let us focus for the moment on the goal of working back from measurements to the real world. When the measurements include random effects, it is evident that answering questions about the real world requires a combination of probability and common sense—and that is precisely the job of statistics.
1.4 WHY ARE YOU DOING IT?
Are you doing empirical work because you are an academic addressing a research question, or a practitioner trying to design a better system? Is your work intended to test an existing hypothesis (validation) or to find out what you should be looking for (exploration)? Is it a one-off study, or part of a process (e.g., ‘5 users’ for iterative development)?
These seem like obvious questions, but, in the midst of performing and analysing your study, it is surprisingly easy to lose track of your initial reasons for doing it. Indeed, it is common to read a research paper where the authors have performed evaluations that are more appropriate for user interface development, reporting issues such as wording on menus rather than addressing the principles that prompted their study.
This is partly because there are similarities between academic research and UX practice, both parallels in the empirical methods used and also parallels between the stages of each. Furthermore, your goals may shift—you might be in the midst of work to verify a prior research hypothesis, and then notice an anomaly in the data, which suggests a new phenomenon to study or a potential idea for a product.
We’ll start out by looking at the processes of research and software development separately, and then explore the parallels. Being aware of the key stages of each helps you keep track of why you are doing a study and also how you should approach your work. In each we find stages where different techniques are more or less appropriate: some need no statistics at all, instead qualitative methods such as ethnography are best; for some a ‘gut’ feeling for the numbers is sufficient, but no more; and some require formal statistical analysis.
Figure 1.4: Research—different goals for empirical studies.
1.4.1 EMPIRICAL RESEARCH
There are three main uses of empirical work during research, which often relate to the stages or goals of a research project (Fig. 1.4).
exploration This is principally about identifying the questions you want to ask. Techniques for exploration are often open-ended. They may be qualitative: ethnography, in-depth interviews, or detailed observation of behaviour whether in the lab or in the wild. However, this is also a stage that might involve (relatively) big data, for example, if you have deployed software with logging, or have conducted a large-scale, but open-ended, survey. Data analysis may then be used to uncover patterns, which may suggest research questions. Note, you may not need this as a stage of research if you began with an existing hypothesis, perhaps from previous phases of your own research, questions arising from other published work, or based on your own experiences.
validation This is predominantly about answering questions or verifying hypotheses. This is often the stage that involves most quantitative work: including experiments or large-scale surveys. This is also the stage that one most often publishes, especially in terms of statistical results, but that does not mean it is the most important. In order to validate, you must establish what you want to study (explorative) and what it means (explanation).
explanation While the validation phase confirms that an observation is true, or a behaviour is prevalent, this stage is about working out why it is true, and how it happens in detail. Work at this stage often returns to more qualitative or observational methods, but with a tighter focus. However, it may also be more theory based, using existing models, or developing new ones in order to explain a phenomenon. Crucially it is about establishing mechanism, uncovering detailed step-by-step behaviours … a topic we shall return to later.
Figure 1.5: Iterative development process.
Of course these stages may often overlap, and data gathered for one purpose may turn out to be useful for another. For example, work intended for validation or explanation may reveal anomalous behaviours that lead to fresh questions and new hypotheses. However, it is important to know which goal you were intending to address, and, if you change, how and why you are looking at the data differently …and whether this matters.
1.4.2 SOFTWARE DEVELOPMENT
Figure 1.5 shows a typical iterative software development or user experience design cycle. Initial design activity leads to the making of some sort of demonstrable artefact. In the early stages this might be storyboards, or sketches, later wireframes or hi-res prototypes, or in the case of agile development an actual running system. This is then subjected to some form of testing or evaluation.
During this process we are used to two different kinds of evaluation point.
formative evaluation This is about making the system better. It is performed on the design artefacts (sketch, prototype, or experimental system) during the cycles of design–build–test. The form of this varies from expert evaluation to a large-scale user test. The primary purpose of formative evaluation is to uncover usability or experience problems for the next cycle.
summative evaluation This is about checking that the system works and is good enough. It is performed at the end of the software development process on a pre-release product. It may be related to contractual obligations: “95% of users will be able to use the product for purpose X after 20 minutes’ training;” or may be comparative: “the new software outperforms competitor Y on both performance and user satisfaction.” In less formal situations, it may simply be an assessment that enough work has been done based on the cumulative evidence from the formative stages.
Figure 1.6: Parallels between academic research and iterative development.
Figure 1.7: Parallel: exploration—formative evaluation.
In web applications, the boundaries can become a little less clear as changes and testing may happen on the live system as part of perpetual-beta releases or A–B testing.
1.4.3 PARALLELS
Although research and software development have different overall goals, we can see some obvious parallels between the two (Fig. 1.6). There are clear links between explorative research and formative evaluations, and between validation and summative evaluations. However, it is perhaps less immediately clear how explanatory research connects with