Purchase funnel. Much of Team Rankings‘ revenue comes from subscriptions, and some purchase funnels can be much more effective than others. Last year, for instance, our March Madness product, BracketBrains, generated 30% more sales when we prompted people to “Get 2011 Picks” than to “Get BracketBrains”. Same caveats apply here, though: test if and only if the differences are likely to matter for your business. The thought process for a new design is very different. At Circle of Moms, I’d explain to my team that we tested a new home page not because we wanted it to improve our metrics but because we didn’t want to tank them.
Team Rankingsの利益の多くは購読料から来ており、購買行動プロセスは何にもまして効果的だ。例えば昨年、弊社のMarch Madnessの商品BracketBrainsは「BracketBrainsを手に入れよう」ではなく「2011年Picksを手に入れよう」と消費者を促すことにより収入の3割増につながった。だが、ここで同じ警告が当てはまる。それは「違いが自分のビジネスにとって重大であれば、否、重大な時のみテストする」ということだ。1つの新しいデザインへの思考プロセスは非常に異なる。母親のサークルで、私は、新しいホームページを試すということと、それは我々のメトリックスを向上させたいからではなく、メトリックスを台なしにしたくなりからだということを自分のチームに対して説明した。
I wrote an entire post on why A/B testing vs. holistic design is a false dichotomy; the TL;DR is “test entire designs, see how each does on a variety of metrics, then make an informed judgment call on how to go forward.” WHEN If you thought flossing was exciting, wait until he starts talking about statistical significance! The most important type of significance in assessing when to A/B test isn’t statistical significance, it’s business significance. Can a new version of this page make a real difference to our business? Size of user base, development team, and revenues inform what’s useful for different companies.
Facebook can move the needle with hundreds of 0.1% improvements. A small startup with no revenue and 100 users doesn’t care about 0.1%, nor will they be able to detect it. In a small product with no usage, serial testing is fine: there’s no chance that the business will be built around the existing product, so rapid change is more important than scientific understanding. Statistical significance is the second most important variable in assessing when to A/B test. Making a decision between two options without enough data can undermine the entire point of A/B testing.
Doing statistical significance properly can be difficult, but the 80-20 solution is pretty simple. Just use an online split test calculator, using estimates of statistics to see whether you are likely to attain statistical significance for a single output variable. A big caveat: a little common sense regarding statistical significance can go a long way. If you’ve been running a test for a while and don’t have a clear winner but have some other ideas that might move the needle a lot more, you might be well-served by resolving the test now and trying something new.
You’re running a company, not trying to publish in an academic journal. At Team Rankings, we regularly do this with BracketBrains. Most of our sales happen over a four day period, so we have a limited window in which to test things. The “cost” of resolving slightly sub-optimally — say choosing the 5% option rather than the 5.1% option — is likely to be lower than the opportunity cost of not running an additional test. And since resolving a test when 99% of sales have occurred does us no good, we’re more aggressive than traditional statistical tests would dictate.
If, on the other hand, you’re early in a product’s lifetime, conservative decision-making might be more appropriate. HOW There’s one way to A/B test properly: build your own system. Lots of people probably don’t want to hear that, but products like Optimizely are too simplistic and optimization-focused to be broadly useful. Outsourcing your A/B testing is like outsourcing your relationship with your users: you need to understand how people are using your product, and the A/B testing services currently available don’t cut it. I wish I could recommend an open source A/B testing framework to avoid re-inventing the wheel; ping me if you know of a good one or are creating one (if so, I’d be happy to help).
A/Bingo is the closest. The good news is that it’s pretty simple to get your own very basic A/B testing system up and running, and it’s easy to build up functionality over time. Here’s the bare minimum:
Reporting (SQL) I’m assuming you have a USER_ACTIVITY table that records different types of activity, with user, time, and activity type. A table like that makes A/B test reporting a whole lot easier.
Scaling If your site is or becomes massive, scaling the framework will entail some additional work. USER_AB_TEST_OPTIONS may require a large number of writes, and that little query joining USER_ACTIVITY and USER_AB_TEST_OPTIONS might take a while. Writing to the table in batch, using a separate tracking database, and/or using non-SQL options may all help to scale everything. At Circle of Moms, we built out a system to automatically report lots of stats for every test. This was awesome, but it takes some work to scale, and I would never recommend it as a first step. So… As I said, A/B testing is like candy: fun and sometimes addictive.
Done correctly, it can be part of the best form of mature and thoughtful product development. It builds a culture of testing and measuring. It lets you understand what works and what doesn’t. It forces you to get smarter about what actually moves metrics. Most important, it fosters an environment where data trumps opinions… anyone want to volunteer to try to take that to D.C.? Mike Greenfield founded Circle of Moms and Team Rankings, led LinkedIn's analytics team from 2004-2007, and built much of PayPal's early fraud detection technology. Ping him at [first_name] at mikegreenfield.com.