Business Experiments with R. B. D. McCullough

Читать онлайн книгу.

Business Experiments with R - B. D. McCullough


Скачать книгу
but instead were balanced, were developed by William Gosset, pseudonymously writing as “Student” (who also invented Student's images‐distribution) in the early twentieth century. Fisher later popularized randomized experiments in the 1920s. Much of the early work in the design of experiments in the early twentieth century focused on agriculture, and it was extremely fruitful. Moses and Mosteller (1997, p. 217) noted,

      The development of greatly increased agricultural productivity in the twentieth century has rested largely on field experiments in which new varieties of crops (and new agricultural practices) are compared to standard ones. So important is this empirical testing to agricultural progress that a large part of modern statistical design of experiments actually grew up in the context of agricultural experimentation.

Grid chart depicting the increase in US agricultural production (output and input) in the twentieth century, from the year 1910 to 2000.

      The next field to be revolutionized by the design of experiments was manufacturing. Chemical production greatly increased as a result of the design of experiments and this spread to other process industries. Variability in successive batches of output was decreased, allowing for a more uniform, higher quality product and a concomitant decrease in waste. In the 1950s, the statistical pioneer W. Edwards Deming taught statistical methods to Japanese manufacturers at a time when “made in Japan” was synonymous with “low quality.” Deming taught them to greatly increase quality and output, especially in the automotive and electronics industries.

Grid chart depicting the remarkable growth of American imports of Japanese cars from the 1960s through the 1980s.

      1 1.7.1 Define the following terms:independent variable / dependent variablepredictor / responsesample / populationtreatment / responserandom sampleobservational data / experimental data

      2 1.7.2 What are the four key steps in any experiment?

      3 1.7.3 Ice cream sales are positively correlated with shark attacks in the Eastern United States. What is the lurking variable?

      4 1.7.4 What is necessary to show causation?

      In this chapter we have twice quoted Stefan Thomke's book, Experimentation Works: The Surprising Power of Business Experiments, and we highly recommend it as an introduction to how experimental methods are used in business. It describes many, many business experiments and how to adopt an experimentation culture in a business. Thomke, a professor at Harvard Business School, has been conducting research in this area for decades. His last book, in 2003, was entitled Experimentation Matters.

      Section 1.1 “Life Expectancy and Newspapers”

      • The life expectancy example is based on Zaman (2010). The associated data are from the World Bank Indicator Tables for the year 2010, including only all observations that have no missing values.

      • The smoother used by R in Figure 1.1 is called “lowess,” which stands for LOcally WEighted Scatterplot Smoother. It's used to detect nonlinearities in a scatterplot.

      • Looking only at correlations (or lack thereof) can make it impossible to uncover causal relations. Suppose a car travels at a constant speed over hilly roads. The driver will have to accelerate on the inclines and brake on the declines to maintain a constant speed. A person who knows nothing of automobiles might observe these data and conclude that depressing the accelerator or the brake has nothing to do with what speed the car travels.

      Section 1.2 “Case: Credit Card Defaults”

      • The credit card data set is the “default of credit card clients Data Set” from https://archive.ics.uci.edu/ml/index.html. For the education variable, numbers 4, 5, and 6 were converted to “other,” similarly for the marriage variable values 0 and 4.

      • It is not a good idea to run a linear regression with the variable default on the left‐hand side because default is binary (takes on only the values zero and one) and linear regression is for continuous dependent variables. There is a special method for binary dependent variables called “logistic regression,” but that's something for an advanced statistics course.

      • The idea of the garden of forking paths is discussed clearly and nontechnically in Gelman and Loken (2014), which article was included in Best Math Writing of 2015; although nontechnical, it's an excellent read for the statistically inclined person, too.

      • In general, lurking variables affect observational data and confounding variables affect designed experiments. A lurking variable connects two otherwise unconnected variables, creating the appearance of a causal relation between


Скачать книгу