A Primer on A/B Testing (Yummy Candy!)
I think I know how it feels to be a nagging dentist. I spend lots of time helping startup founders figure out how to increase the number of people using their product. Sometimes, founders think that because a few silly folks labelled me with the (soon-to-be-cliched) title of growth hacker, I am “magical” like an Apple product. With one quick suggestion from me, they can get to a million users! Unfortunately, it doesn’t usually work that way. Instead, I tell them, they need to (among other things) rigorously A/B test a dozen interface changes on their three or four most important pages.
私は、歯科医がどのように悩んでいるのか分かっているつもりでいる。新規事業の創立者たちが、彼らの製品を使用する人を増やす方法を、見つけ出す手助けをするのに多くの時間を費やした。しばしは、愚かな人達が私を (すぐに陳腐になる) 成長する掟破りの人物というレッテルを貼ったため、創立者たちは、私がアップルの商品のような魔術師であると思っている。私から、ちょっと一つ提案があります。彼らは百万人のユーザーに届く! 残念ながら、通常そのようにはいかない。その代わり、私は彼らに言います。(ほかの事柄の中で) 彼らの3つ、4つの最も大切なページの上での、12のインターフェースの変更という、厳格なA/Bテストをする必要がある。
And then I get that “what-do-you-mean-I-need-to-floss-every-single-night?” kind of look. I’ve run hundreds of A/B tests over the years, and in the process I’ve learned a lot about what messages people respond to. After seeing the results of those tests, I present a shocking hypothesis: “you should try this yummy candy!” will be more effective than “you really need to start flossing every day.” So… I need to tell you why A/B testing is like yummy candy. Fortunately, I can make that argument without being misleading: running A/B tests can be really fun and addictive (like Skittles!). You’ve probably experienced an eager expectation that something new would immediately improve your world in a significant way.
Maybe as part of a website — a new, beautiful signup flow will mean a super engaged user base — or, in your personal life: a new hairstyle will encourage people to respond to you in a better way. A/B testing can provide that spark of hope on a very frequent basis: at Circle of Moms we’d have dozens of tests running at any given time, each serving as a quantitatively sound way to understand our usage and improve our product. Pushing out new tests multiple times a week, getting rapid feedback on each, is like regularly handing out chocolate to your team. Each test is a yummy morsel of hope: it has the potential to bring users in, excite and engage existing users, and make money.
週に何度も新しいテストを推し進め、各自の敏速な反響を得ることは、チョコレートを恒常的にあなたのチームに渡すようなものである。各テストは一杯の美味な希望: それはユーザーに可能性をもたらものであり、ユーザーを熱狂させることを約束し、お金を生み出す。
Frequent testing is like frequent chocolate consumption. Yum! Frequent chocolate consumption has risks, and so does frequent A/B testing. With A/B testing, it’s important to be holistic and patient about collecting data. But a product development strategy involving A/B testing is generally both more fun and more effective than the alternative “change and pray” approach. Now that we’ve established that A/B testing is fun, we get to the real questions. Why does it actually matter to your business? What should you be testing? When does it make sense to do? (brief answer: not always) And how, technically, should you do it? Let’s tackle each of those.
WHY The reason to A/B test is simple: because newer doesn’t always mean better, and everyone I’ve met is mediocre at predicting how effective a new experience will be. There’s often an implicit assumption that ______ in my product isn’t very good, and by spending time on it, we can only make better. In extreme cases — the current version is a 404 page not found error — that’s very likely to be true. But in more common cases — the signup flow is a little bit ugly and awkward — product changes don’t always mean progress. We saw this time and again at Circle of Moms. We had a new homepage that looked cleaner and more usable… and users who saw it stopped contributing to conversations.
We had a signup flow that seemed much simpler and more professional… but fewer people got through it and those who got through it didn’t invite their friends to join our site. Surely asking people to share their answers on Facebook would be good, right? Turns out no: very few moms actually shared their activity, while many others were scared off by the thought of us making content too public (this only applied for some content types). Okay, you say, that’s all fair and well, but how about just making a change and seeing how it affects overall metrics for the product? There is a case where this is a good approach, and I’ll walk through it in the “When” section.
But most of the time, it’s the wrong way to go. To work, serial “testing” requires three things: the rest of the world staying steady, large changes, and a close eye on metrics. Let’s say you’re looking at how a new homepage design affects activity, and all of a sudden your sending email IP is blacklisted by Yahoo. Your numbers will almost certainly go down, regardless of the effectiveness of your new homepage. New signup flow, and all of a sudden you get a surge of search traffic that broadens your audience but decreases the quality? Same type of issue. Major site downtime or technical issues can have the same impact.
If you have a huge increase or decrease, and you know that the outside world is more or less the same over the test period, and you measure different cohorts properly, and of course you only measure one things at a time… serial testing can work. If you really think those can happen consistently, you’re a lot more optimistic than I am. WHAT There are two reasons to A/B test something: 1) You have a product enhancement that might improve your metrics at a level material to your business, and want to try it. 2) You have a radically revamped piece of your product, and want to verify that it’s at least as effective as the current version.
Generally, #1 is about iteration and optimization, while #2 is about design and vision. The thought processes for the two are very different. Optimization is only useful on products close enough to “good” to be optimized. Overused but apropos cliche: A/B testing something that’s badly broken is akin to rearranging the deck chairs on the Titanic. Here are a couple of cases where you may or may not want to use optimization: Viral signup flows. If your current signup flow features 1000 signups inviting 3000 people, 900 of whom register for your product, you’re very close to being viral (K=0.9).
その二つの思考過程は大きく異なっている。最適化は、ただ最適に活用されることのみを追求することにのみ、有益なだけである。適切さを欠き、乱用されればどうなるか、ご存知の通りである: A/Bテストが何かひどく壊れたものであるならば、それはタイタニックのデッキチェアーを手直しするのに類似している。最適化を使いたい、使いたくないに関わらず、ここに2,3のケースがある: ウィルス性サインアップ・フロー。もしあなたの現在のサインアップ・フローが、1000のサインアップを呼び物とし、3000人の人々を勧誘し、その内900人の人があなたの商品のため登録するなら、あなたはかなりviral (K=0.9)に近づく。
A/B testing would be a good use of time. If your current flow features 1000 signups inviting 600 people, 80 of whom join (K=0.08), then you aren’t in the ballpark: optimizing button text is likely a waste of time. Go bigger. Email content. Subject lines and link text can have a huge impact on email clickthrough rates. One typical example: an email with the subject “5 Embarrassing Kid Moments” gets 2.5 times as many clicks as one with the subject “The Craziest Thing My Child Has Done.” But again, being close to “good” is key: if that 2.5x is the difference between 50 clicks a week and 125 clicks a week, does it matter? If it doesn’t matter (and good estimation is key), no point spending time A/B testing it.