Business Experiments with R. B. D. McCullough

Читать онлайн книгу.

Business Experiments with R - B. D. McCullough


Скачать книгу
soon as you get the results, and so these types of tests have a major impact on how websites are managed.

Test A treatment B treatment Response measures Result
Email sign‐up No incentive $10 incentive # of sign‐ups 300% lift
Skirt images Head‐to‐toe Cropped Skirt sales ($/session) 7% lift
Location search Zip search GPS search Sign‐ups 40% lift
Rentals 23% lift
Video icon No icon Icon % to product detail No significant
Sales ($/session) Difference

      Note: Winning treatment shown in boldface.

      Table 1.6 reports the test results in terms of a percentage lift. For example, in the Dell Landing Page test, the lift was 36%, which means that the hero image produced 36% more submissions. Table 1.6 only reports the lift numbers for test results that were found to be significant. Significance tests are used to determine whether there is enough data to say that there really is a difference between the two treatments. Imagine, for example, that we had test data on only five users: two who saw version A and looked at product details and three who saw version B and did not look at product details. Is this enough data to say that A is better than B? Your intuition probably tells you that it isn't, which is true, but when samples are a bit bigger, we can't rely on intuition to determine whether there is enough data to draw a conclusion. Testing for significance is one of the tools we use in analyzing A/B tests, and Chapter 2 will show you how to do it. As we will explain in the next few sections, we need more than just the lift numbers to perform the significance test.

      Most website testing managers will tell you that more than half of the website tests that they run are not significant, meaning that they cannot conclude that one version is better than the other. For example, in the video icon test in Figure 1.7, there were no significant differences in the % of users who viewed the product detail pages or the average sales per session. If we looked at the raw data, there were probably some small differences, but that difference was not great enough to rise to the level of significance. The analyst has wisely chosen not to report the lift numbers, and instead simply said, “there was no significant difference.” While the manager who came up with this video icon idea might not be too happy to find that it doesn't work, it is important to know that it doesn't work so that attention can be shifted to more promising improvements to the website. Smart testing managers realize that it is important to run many tests to find the features of the website that really do change user behavior.

      Exercises

      1 1.5.1 Visit a retail website and identify five opportunities for A/B tests on the website. For each test, clearly define the A and B treatments that you would test and identify a response variable to measure performance.

      2 1.5.2 Find an article that reports the results of a medical experiment, a business experiment, or a psychological experiment. How are the results reported? Do they use a graph to display the data? Does the article indicate whether the difference between treatments was significant?

      3 1.5.3s. Visit a retail store and identify five opportunities for A/B tests. For each test, clearly define the A and B treatments that you would test and identify a response variable to measure performance.

      Experiments are as old as the bible. From The Book of Daniel (1, 11‐16),

      Daniel then said to the guard whom the chief official had appointed over Daniel, Hananiah, Mishael, and Azariah, ”Please test your servants for ten days: Give us nothing but vegetables to eat and water to drink. Then compare our appearance with that of the young men who eat the royal food, and treat your servants in accordance with what you see.” So he agreed to this and tested them for ten days. At the end of the ten days they looked healthier and better nourished than any of the young men who ate the royal food. So the guard took away their choice food and the wine they were to drink and gave them vegetables instead.

      The first clinical trial was conducted in 1747 by the Scottish physician James Lind, who was trying to find a cure for scurvy. Scurvy was a serious problem, since it killed more British sailors than the French and the Spanish combined. After two months at sea, when the men were afflicted with scurvy, Lind divided 12 sick sailors into six groups of 2. Each day the groups were administered cider, 25 drops of sulfuric acid, vinegar, a cup of seawater, and barley water, and the final group received two oranges and one lemon. After six days the fruit ran out, but one sailor was completely recovered and the other was almost recovered.

      Randomization was introduced into experimental design in the nineteenth century by Peirce and Jastrow (1885) (many people incorrectly attribute this to R. A. Fisher in the twentieth


Скачать книгу