Business Experiments with R. B. D. McCullough

Читать онлайн книгу.

Business Experiments with R - B. D. McCullough


Скачать книгу
analyze your own business experiments. I hope you enjoy learning from Bruce as much as I did.

      Elea McDonnell Feit

      Philadelphia, PA

      This book is accompanied by a companion website:

       www.wiley.com/go/mccullough/businessexperimentswithr

c02f001

      The website includes datasets.

      After reading this chapter, students should:

       Distinguish between observational and experimental data.

       Understand that observational data analysis identifies correlation, but cannot prove causality.

       Know why it is difficult to establish causality with observational data.

       Understand that an experiment is a systematic effort to collect exactly the data you need to inform a decision.

       Explain the four key steps in any experiment.

       State the “Big Three” criteria for causality.

       Identify the conditions that make experiments feasible and cost effective.

       Give examples of how experiments can be used to inform specific business decisions.

       Understand the difference between a tactical experiment designed to inform a business decision and an experiment designed to test a scientific theory.

Scatter plots depicting the average life expectancy for several countries versus the number of newspapers per 1000 persons in each country.

      Software Details

      Reproduce the above graphs using the data file WorldBankData.csv

      Below is code for the first graph. To create the next graph, you will have to create a new variable, the natural logarithm of newspapersper1000.

      df <- read.csv("WorldBankData.csv") # "df" is the data frame. plot(df$newspapersper1000,df$lifeexp,xlab="Newspapers per1000", ylab="Life Expectancy",pch=19,cex.axis=1.5,cex.lab=1.15) abline(lm(lifeexp∼newspapersper1000,data=df),lty=2) lines(lowess(df$newspapersper1000,df$lifeexp)) plot(log(df$newspapersper1000),df$lifeexp,xlab="log(Newspapers per1000)",ylab="Life Expectancy",pch=19,cex.axis=1.5, cex.lab=1.15) abline(lm(lifeexp∼log(newspapersper1000),data=df),lty=2) lines(lowess(log(df$newspapersper1000),df$lifeexp))

      (1.1)equation

      where standard errors are in parentheses, so both the coefficients have very high images‐statistics and are significant. This means that there is a relationship between life expectancy and the number of newspapers per 1000 people. But does this show that a country having more newspapers leads to longer lives for its citizens? Common sense says probably not. The natural logarithm of the number of newspapers is probably a proxy for other variables that drive life expectancy; countries that can afford newspapers can probably also afford better food, housing, and medical services. What we are observing is most likely a mere correlation, and, unfortunately, this sort of observational analysis should not be interpreted as causal.

      Try it!

      Run the above simple regression. You should get the same coefficients and standard errors.

      A better analysis would add more variables to the regression to “control” for other factors. So, let's try adding other variables that we expect to drive life expectancy: LHB (natural logarithm of the number of hospital beds per 1000 in the country), LP (natural logarithm of the number of physicians per 1000 in the country), IS (an index of improvements in sanitation), and IW (an index of improvements in water supply). Since we don't believe that newspapers cause longer life expectancy, we would expect that once we include these variables in the regression, the coefficient on LN will be reduced. The results are

      (1.2)equation

      The coefficient on LN has not gone to zero; in fact, it hasn't changed much. The coefficients on all but one of the other variables that we know affect life expectancy are insignificant. What are we to make of this?

      Try it!

      Run the above multiple regression. You should get the same coefficients and standard errors. Be sure you understand why the variables LN and LHB are “significant” while the others are not.


Скачать книгу