Eating your own dog food

Early feedback is important. The earlier in the life cycle of development feedback comes in, the faster you can iterate, figure out what is working and what is not working, improve, and iterate again. You should release early and release often.

Releasing early and often usually aims at release cycles of something like 2 weeks. Depending on your kind of system, this can be shorter, but especially for native apps, much shorter release cycles aren’t really feasible. An even quicker way to get feedback is to give your software into the hands of your own colleagues and selected testers – constantly. Within your own organization, nobody prevents you from releasing continously, as often as multiple times in a day, without the overhead of an official release. You can then take the feedback of your own peers to iterate even faster. In modern tech slang this has become known as “eating your own dog food”.

Here at Klamr we try to get ongoing development into the hands of all our colleagues as fast as possible. The key to do this is Continuous Integration, that’s where everything ties in. Here’s how we do it:

  1. Jenkins: We’re using Jenkins as our continuous integration server and use it to automate most of our tasks. For each project there is a Jenkins job that pulls the latest code regularly, builds it, tests it, and then distributes it. Jenkins is amazingly easy to set up and configure, yet incredibly flexible and powerful. Ever since we started using it, it has grown with us into dozens of very different jobs for pretty much every project we’re working on.
  2. GIT branching strategy: while working on new features we need to decide when exactly changes should be made available internally. The general requirements are never to break builds altogether and not to break core functionality. We don’t pull every single change that is made anywhere in the project. We hook our Jenkins job into our GIT branching strategy to give the responsibility to decide which change is ready to our engineers. They have control over it by pulling changes into certain branches when they are ready.
  3. Schedule your distribution: depending on the project we either distribute immediately on every new change, or nightly. This is configured in Jenkins. My personal rule of thumb is: the more transparent new versions are for your (internal) users, the quicker and easier distributions/deployments are, and the less frequent commits to your distribution branch are, the better is to distribute changes immediately. When starting a new project, I generally start with this. Once problems appear that can be solved by slowing down, go to nightly distributions. Everything running server-side like a web app, for example, is completely transparent for users (just as they are in your production environment), new versions aren’t disrupting anybody. That’s a good candiate for very frequent distributions. An iOS application, on the other hand, needs to installed manually, hence pushing out 20 new versions every day tends to be disrupting for everybody. The last thing we want to do is make our co-workers feel disrupted and annoyed, that just leads to less and worse feedback.
  4. Distribute: the actual deliveries are all automated, but differ quite a bit depending on the type of software. Some examples of what we do:
    • Backend application: this get built and deployed to internal servers. This is the most complex deployment process we’ve got, especially things like database migrations don’t make it exactly trivial.
    • Web application: our klamr.to web application is deployed on every new change to an internal, protected web server. It is then connected to our live database, so everybody in the company can use this web application instead of our live production web application. Changes on here have sometimes only been finished for minutes until they get available.
    • Android: our Android app is distributed in two ways: new APK files are sent out directly via email (Android makes installing new APK’s directly from email attachments so much easier than iOS) and via the service Appaloosa Store. The latter has some nice advantages like providing a custom store app and push notifications for new versions.
    • iOS: our iOS app is distributed via Testflight. There’s a few catches for iOS, for example that you need to build on a machine running Mac OS. That’s why we have a separate Jenkins instance only for building the iOS app. Most other Jenkins jobs are running on one Linux-based instance hosted on Amazon EC2. Also, devices must be explicitly registered in your ad-hoc provisioning and Apple restricts the number of internal devices to 100. No rocket science once it’s all set up, but a few extra hoops to jump through.
  5. Real data: It’s important to allow internal users to use these early builds against their real Production data. Our web application, for example, runs on an internal URL, but is configured against our Production servers and database. This allows us to test drive new features early on with our real accounts. This leads to much better feedback than asking people to test features on isolated servers with fake data and helped a lot with internal acceptance.
  6. Automate: the key to all of this is automation. If it’s not automated, regular distribution either doesn’t happen, or it wastes valuable engineering time. And as mentioned already above, this all ties into continuous integration. Much of the process and infrastructure described above should be in place anyways to continuously build and test your software in an automated way.
  7. Release notes: for us it proved incredibly helpful to automate release notes for each internal distribution. Remember that one of the main reasons to do all this in the first place is to get early feedback. Without release notes, it’s not possible for anybody to know what has changed and to know which part of your apps to pay attention to. We’re not doing this in all places but if we do it, we’re using GIT commit comments. They aren’t suitable for end users, but they are more than good enough for internal users.
  8. Respect: although these builds are only internal, we highly respect them. This means we never try to break them (see above), we try to make using and updating them as easy as possible for our co-workers, and our engineers are quickly reacting to any kind of feedback that comes in.

Regular internal distribution helps us to keep the feedback cycle as short as possible, sometimes even down to minutes. Automation of all the tasks involved helps us to keep moving fast, even as the number of systems and their complexity grows. I would highly recommend trying to automate as much as possible right from the start.

Are you eating your own dog food? What is your experience with this? Are you using different techniques and tools? Leave a comment, I’m very interesting to hear what you’re doing.

A/B Testing on iOS and Android (and others)

A/B testing (aka. split testing) has been around on the Web for a long time. The reason to do it is simple: much of how you design your user interface and your user experience is based on assumptions and no matter how good you think your assumptions are, you don’t know what works better in reality. And let’s be honest, most assumptions aren’t thought through carefully in the first place. After all, nothing can tell you better than real data from real users.

You’re already measuring what users are doing? That’s not quite enough yet. The reason why it’s “A/B testing” and not just “A testing plus analytics” is simple: numbers alone don’t always say much (e.g. vanity metrics). Comparing 2 numbers on the other hand is easy and reliable and even if neither of the 2 is really good, at least you know which one is the lesser of 2 evil and the one you should continue with (read: continue with and iterate again, fast).

One presentation that caught my interest early on was Marissa Mayer’s keynote at Google I/O in 2008. She gives insights on how far Google actually goes to fine-tune and optimize their user interfaces using A/B testing. That way, Google found proof for details that would usually be decided upon by a design or UX specialist, even down to the “right” whitespace between different parts of the page on google.com:

(…) you can test interfaces and be able to tell in a mathematical way which one works better for your users.

Fast forward to today. A/B testing is old news for websites, but still only just beginning for native mobile apps. It’s easy to get started on reasons, like:

  • Native apps get bundled as binaries, no code is running on your own servers where you can easily control them, implementing different behaviors of the same feature is arguably more difficult.
  • Release cycles are different. New versions still get “shipped” (just like software in the old days) and you don’t to send your users through this every other day or so.
  • And even if you would want to, on iOS it’s Apple who prevents you from doing so with a 1 to 2 weeks approval time.
  • It’s just relatively new and there isn’t much written, there aren’t many tools and frameworks and there isn’t much of a best-practice and universally established mindset existant around it.
  • And more…

The bright side: it’s changing now.

Tools and frameworks are appearing and making this whole story a whole lot easier. Some of the ones that look great are clutch.ioarise.io or Pathmapp. And even Amazon joins with their own A/B testing service.

I’m excited to give these tools a shot and see what they can do. We’ve recently been accepted into Pathmapp’s beta program and I’m thrilled to see how good it works. I’m going to follow up with experiences in later posts. In the meantime I’d be more than happy to hear your experiences and recommendations.

3 steps I should have taken to get started with Web and App analytics

In the last weeks I wrapped my head around client-side analytics here at Klamr. Looking back, probably the biggest Gotcha! was the realization that ignoring all the cool out-of-the-box stats and colorful graphs would have been better. It’s all to easy to get paralyzed by the sheer amount of information each tool provides you. The answer you’re  looking for isn’t between visits and pageviews and referrers and user flows and mobile data and custom segments, so don’t spend your time looking for it there (just yet). Eric Ries calls them “Vanity Metrics” and tells you why they are dangerous. I should have listened earlier.

It actually starts with making sure you know what your app is supposed to do and what user actions are valuable for you. That’s all that really matters in the beginning. Whether you’ve got 50 or 5000 sessions and whether your median session length is 34 or 58 seconds, that doesn’t tell you much yet. So, better start with what matters and ask yourself: what are the few important actions I want my users to do with my app?

Hence, if I would start over again, I would do the following:

  1. Make a list with the 3, 4, 5 most valuable user actions in your app, on a very high level. For a shopping app, a completed purchase should obviously be on this list. For a photo sharing app, a shared photo should be on. A blogging platform would include creating a new blog and publishing a post. Note that this list should contain the successful  final action at the end of activities, not the actual activity leading up to it. Analytics tools often call them “events”. A purchase, for example, usually consists of a relatively long flow, but for now only the successful completion matters.
  2. Determine where and how these events are happening in your app, i.e. the lines of code that actually perform them. Find out whether you wanna instrument your client application (web or native) or whether you better do it on your server backend. Instrumenting your client often tends to be easier tool-wise, but sometimes it’s just better done on the server. But keep in mind that you try to find out how to improve your client. Getting the number of purchases, for example, isn’t the actual goal here. What you actually want to do in later iterations is find out where you can improve user flows, UI and UX to optimize. That’s why server-side instrumentation doesn’t work out well later.
  3. Instrument your system with the tool of your choice, e.g. murm.ioGoogle Analytics or Flurry. Then release it and wait for data. Instead of tons of numbers you should now see your few major numbers showing up in the tool.

See what this gives you and then iterate. You can add more (refined) events, categorize them like a Pirate, add funnels and goals, and go deeper. And in one of your next iterations you can look at what you can do to improve.