A/B Testing on iOS and Android (and others)

A/B testing (aka. split testing) has been around on the Web for a long time. The reason to do it is simple: much of how you design your user interface and your user experience is based on assumptions and no matter how good you think your assumptions are, you don’t know what works better in reality. And let’s be honest, most assumptions aren’t thought through carefully in the first place. After all, nothing can tell you better than real data from real users.

You’re already measuring what users are doing? That’s not quite enough yet. The reason why it’s “A/B testing” and not just “A testing plus analytics” is simple: numbers alone don’t always say much (e.g. vanity metrics). Comparing 2 numbers on the other hand is easy and reliable and even if neither of the 2 is really good, at least you know which one is the lesser of 2 evil and the one you should continue with (read: continue with and iterate again, fast).

One presentation that caught my interest early on was Marissa Mayer’s keynote at Google I/O in 2008. She gives insights on how far Google actually goes to fine-tune and optimize their user interfaces using A/B testing. That way, Google found proof for details that would usually be decided upon by a design or UX specialist, even down to the “right” whitespace between different parts of the page on google.com:

(…) you can test interfaces and be able to tell in a mathematical way which one works better for your users.

Fast forward to today. A/B testing is old news for websites, but still only just beginning for native mobile apps. It’s easy to get started on reasons, like:

  • Native apps get bundled as binaries, no code is running on your own servers where you can easily control them, implementing different behaviors of the same feature is arguably more difficult.
  • Release cycles are different. New versions still get “shipped” (just like software in the old days) and you don’t to send your users through this every other day or so.
  • And even if you would want to, on iOS it’s Apple who prevents you from doing so with a 1 to 2 weeks approval time.
  • It’s just relatively new and there isn’t much written, there aren’t many tools and frameworks and there isn’t much of a best-practice and universally established mindset existant around it.
  • And more…

The bright side: it’s changing now.

Tools and frameworks are appearing and making this whole story a whole lot easier. Some of the ones that look great are clutch.ioarise.io or Pathmapp. And even Amazon joins with their own A/B testing service.

I’m excited to give these tools a shot and see what they can do. We’ve recently been accepted into Pathmapp’s beta program and I’m thrilled to see how good it works. I’m going to follow up with experiences in later posts. In the meantime I’d be more than happy to hear your experiences and recommendations.

  • http://arise.io Guillaume Charhon

    Thanks you for talking about us !

    Happy new year & Happy A/B testing
    The Arise.IO team

    • hendrikbeck

      You’re welcome, Guillaume! Hope we’re getting our hands wet on your tool soon and I can write more about it. Happy new year to you, too!

  • J

    Hi mate,

    Im making changes and improving an existing e-commerce app and I’m trying to figure out the best way to do the AB testing on the UI changes and stumbled across your blog, seems like you’ve been playing around with different AB testing platforms, was just wondering if you could share some insights that you’ve gained so far? does it appear to have any clear winner ? there seems to be a couple of different companies are working on it at the moment

    On a side note, how’s life in Vietnam been treating you so far ;), I’m from Vietnam too, not living there atm but spent the first 20 years of my early life there

    Joey

    • http://hendrikbeck.wordpress.com hendrikbeck

      Hey Joey,

      thanks for your reply and your questions. I ended up only getting hands-on experience in Optimizely.

      The me, they definitely seemed good to A/B test variations on your UI. The typical example use cases like “Does a red or a green button work better” are definitely doable. Often though we ran into situations where things we’re a bit too complex that they could easily be done with Optimizely. Especially if you wanna test slightly different backend behavior, it gets a lot more difficult all of a sudden.

      Also, I’ve been trying to use it in early stage startup scenarios. But there, you often deal with small numbers of users, so focusing too much on statistical hints at what works better doesn’t get you very far. In my opinion at least, I know Lean Startup et. al. motivate you to A/B test early on. But not sure if something like Optimizely is the right tool for that.

      Lastly, one more comment about Mobile Apps vs. Web: since you can’t easily roll-out changes there, it seemed so logical to me to integrate A/B testing, basically to reduce your average cycle time and learn faster. But as well, didn’t end up using it long and intensive enough to really have a sound opinion.

      I’d be thrilled to hear your opinion and learn about your experiences with it.

      Cheers and hope to hear from you again. And good luck with your project!
      Hendrik