DIY dark-launching feature toggle in 16 lines of Ruby

Dark launching and soft launching functionality is an important ingredient for continuous shipping. To me by now, even 2 days of code changes piling up feel like more than necessary. Often enough, this is due to the fact that pending code changes aren’t complete enough yet to get them into production, let alone show it to anyone.

As a solution to this there are approaches like dark-launches and soft roll-outs, but often enough they require code and tool changes that are being delayed until very late. That’s quite a bummer since dark-launching is so helpful in moving fast, from both a technical and a product management point of view.

  • Developers are happy when they can get code into Production (“nice, it works, check!”)
  • Product Managers are happy when they can get early early feedback, at least from a few select customers (and without having to sacrifice on anything by publicly releasing something unfinished)
  • (The right group of) Users are happy if they get early access to features

Sounds like reason enough to stop NOT doing it right away huh? ;)

I was in this very situation on a relatively new project, with a million things to do, and I’ve decided to hack a DIY solution together and see where that gets me. Turns out it only a few lines of actual code and is already much much better than nothing. Here’s the premise:

  • I just wanted to be able to hide access to a feature, i.e. hide the link in the nav bar that leads users to it (yeah, I know, but it’s ok in this case, and I bet in a lot of cases it is)
  • We’re on Heroku so I thought Heroku config vars would be a great way to control it (no code deploy necessary, but also no overhead with databases and backend access etc., Heroku already provides everything)
  • Toggling only on a per-user basis (we were THAT small, yes), no other fanciness like user groups, geographic distribution, load balancing or whatever (yet)

And here’s the code that made it work:

Quick run-down:

  • Called the class DarkLaunch, more because I liked the sound of it than of its correctness ;)
  • It has this one feature toggle method that can be used to surround links etc. a la “if DarkLaunch.feature_visible(…)”. It returns true whenever a particular user should see the feature in this moment, and false otherwise
  • It always returns false if there’s not current_user (since we’re toggling on a per-user basis)
  • It always returns true for Development and Test (which leads to other problems but for the moment I liked it to have everything visible on Dev)
  • Each feature becomes an identifier (like UPLOAD_PHOTOS) that is used when calling the feature_visible()
  • It expects a Heroku config var named FEATURE_UPLOAD_PHOTOS
  • This config var is expected to contain a comma-separated list of user IDs to should have access to the feature
  • feature_visible returns true if the ID of the given user is in that list
  • Or once we’re ready to make a feature public to everybody, we can just set the variable to “PUBLIC”. In that case it always returns true, without checking user IDs anymore
  • And otherwise it returns false, blocking the feature for everybody else

Usage is then dead simple, as long as “launching” is as simple as showing something on the UI or hiding it:

It’s a bit of a quick hack, of course, and far from a complete or well-done (or flexible, or …) solution in so many ways. But it was great to see that a few lines of code added so much value to rolling out a feature. Feel free to let me know what you think or if you’re interested in using more of this. And who knows, maybe it becomes a little Gem… :-)

 

Go Lean: SysOps, DevOps, AppOps, NoOps

I’ve always been a big fan of tools and services that take complexity away from engineering teams and allow them to be lean and focus on what they do best: building their product and focusing on their own rocket science.

Back when I started to work with the Web, I remember us ordering an actual physical server, setting it up and loading it into our car to carry it to the data center. Fast forward a few years and you can do all of this within a few minutes without moving away from your desk. Especially working with infrastructure has changed a lot since AWS, Heroku, AppEngine, Parse and all the others appeared.

In one of my previous startups we thought “DevOps” would be the way to go, bridging the (cultural, among others) gap between developers and system administrators. Back then built it around AWS which still required work and expertise around the infrastructure.

Again fast forwarding a bit, this has even become simpler. Especially for early stage startups, where agility and the ability to move fast with limited resources and a laser-thin focus is more important than building proper scalable software, you can build stacks on tools that do virtually everything for you. Deploy your stuff to Heroku (or EngineYard, or …), make it “Continuous” with Codeship, CDN-ify it with Cloudflare, get your basic reporting done with Chartio, and so on. By building such a stack of tools you can nearly entirely avoid having to deal with the adminstration overhead you used to have when building software. In this post, Adron Hall of New Relic (who arguably knows a lot more about it than I do) talks about it calls it “AppOps” and “NoOps”. The former basically let’s you do your Ops work with applications that sit on top of your infrastructure and hide the nastiness, where the latter eliminates it almost completely.

There will be a time where you have to migrate away from these out-of-the-box tool stacks, be it for scalability reasons, or functionality, or cost, or other factors. But if you just need to get started and move fast early on, having “No Ops” sounds wonderful to me and the possibilities and tools are just getting better every day.

The Web / Mobile Feedback Loop

Backlogs for Web and mobile products don’t exclusively contain new features. One eye should always be on what has been done and how that is working out. A proper feedback loop gives valuable input that helps to determine what should be done next.

On one hand, of course, there’s high level goals and vision that define new features and the larger chunks of upcoming work (which just reminds me of this great article about how Spotify has done Prioritization in their early days). But then there’s more. For example. there’s bugs, there’s A/B testing results, there’s the Google Analytics account that somebody should actually have a look at, and there’s more. Most people know most of these, but mostly, they aren’t managed really well all together. So I thought a good start would be to all those sources of input on the feedback loop that may (or may not) affect our priorities:

  1. The product vision (this is what your management and product managers want to do, the longer term goals, this isn’t actually on the feedback loop, I just wanted to have it on the list)
  2. Business figures (e.g. your sales numbers; I dare say this input is usually indistinguishable from #1 (because it comes from the same people?), but I’d argue that it’s “feedback”, unlike #1)
  3. Analytics (the likes of Google Analytics)
  4. Feedback that is built into your product (without being explicit feedback, it’s basically extracted from normal usage of the app)
  5. A/B Testing (e.g. Optimizely or a variety of other ways to do them)
  6. Explicit customer feedback (lots of sources here incl. all the feedback your customer support and sales teams gather, but there’s also tools you can use that allow your customers to give feedback online, e.g. murm.io (for specific feedback on your existing features) or tools a la Uservoice and ZenDesk)
  7. Crash reporting tools (Crashlytics, Crittercism, …)
  8. Dogfooding (your own company using your product, often this is a much smaller feedback loop since it allows you to get feedback on unfinished work that wasn’t even released yet)
  9. External ratings (e.g. what your users say about your app on Google Play and iTunes)
  10. Customer opinions out on the web (blogs, social media, very similar to the point before but wide-spread on the Internet)
  11. Beta testers and special user groups (there’s a bunch of tools that help you, e.g. Testflight)

This was just a first shot and I’m merely thinking out loud.

It’d also be interesting to see how all of these can be managed more effectively than having different people “keep an eye on it” or having 13 different tools at our disposal to log in and check regularly. I’d greatly welcome less overhead to collect them, a better way to manage and follow up and make them a part of the development process, and create a lot more transparency for teams and stakeholders around them.

I’d be interested to hear what others think or whether there’s anything missing on the list above.

Rails Continuous Integration in the Cloud with CircleCI

Every once in a while I come across tools and services that convince me right from the start. The last one was Nitrous.io, for example. This time it’s CircleCI.

Here at SQUAR we’re doing all our server-side development with Rails. Rails itself does a great job making test-driven development actually work, but without continuous integration (CI), test automation is half the fun (and even less the value, for what’s worth).

I’ve been a big fan of Jenkins since forever and it never occurred to me that I might some day abandon it. Here’s how CircleCI made me do it anyways:

  • Go to circleci.com, sign up with GitHub and a few clicks later your project is all set up and gets CI’ed already
  • CircleCI practically figures out how to build and test our project on its own, so from this point onwards every push to GitHub (master or branch) is being tested. And I receive an email if my tests didn’t run through.
  • It integrates nicely not only with, but also on GitHub. For example it tells me on the pull requests page if my tests for this branch passed or failed.
  • CircleCI infers permissions and collaborators from GitHub, so there’s no accounts and permissions to set up on CircleCI. None at all.
  • It has a hook built in already that can pick up artifacts from a special folder and publish in on the Web on every build page. That makes it a breeze to e.g. integrate test coverage reports by just dropping the HTML-rendered results into this folder and letting CircleCI do the rest.
  • Another good one that I haven’t gotten around to set up yet is the Heroku integration. Setting up a continuous deployment chain that pushes changes from particular branches to Heroku after tests passed is reduced to very few convenient steps.

With Jenkins this would have been quite a bit more of a hazzle to set up. Not to mention that CircleCI is running on the Web and is completely on demand (where are you with that, Jenkins?), with plans that will get you relatively far starting as low as 19 USD/month.

Ok, the above was the short and over-excited run-down of my first experience. I actually need to make a few more remarks to put it into the right perspective:

  • It wasn’t working as out-of-the-box as I described above. I had to make one customization to our build to make it actually work. But this change was well documented and all I needed to do is drop a circle.yml file into my repository with a few lines of code.
  • I also had to jump through a few hoops to make CircleCI pick up our coverage reports. But the support was super helpful and I was actually happy to experience the great customer service instead of being upset it didn’t work right away.
  • We’re only using it for Rails at the moment and it’s great at that. Given that we’re doing quite a bit more (e.g. Android and iOS apps), we’ll have to figure out how much of that we can set up with CircleCI as well. I have a feeling we’ll still end up using Jenkins for whatever jobs CircleCI can’t get done. Because no matter what I said about Jenkins above, it’s still by far the most versatile of ‘em all.
  • There’s also Travis. It’s very established for CI’ing open-source projects and has recently started to become available for private repositories as well. From all I know it’s actually better and more mature and even offers CI for iOS apps (which is rare and awesome). But it does set you back a whopping 129 USD/month for the cheapest plan. I don’t doubt that’s money well invested once you’re at a certain scale, but it felt a bit too much for us as startup that’s hardly 3 months old. Other than that, I believe Travis would have impressed me as an old Jenkins veteran just as much…

If you’re building your tool stack in the cloud then you should give CircleCI a try. And I believe you’ll be stunned, and thankful, and you never want to go back.

Coding Rails apps in the cloud with Nitrous.io in less than 5 minutes

After my friend Minh from TechInAsia told me about Nitrous.io and their 1 million dollar seed funding around a month ago, I’ve been early waiting to get access and try it out. One of the promises was to get working development stacks running in a heartbeat. Today I finally got to try it and I have to say I was impressed.

Timer started, here’s what I did:

  1. I opened www.nitrous.io and signed up for an account. That was basically instant.
  2. Clicked on “New Box”, chose the platform (right now they offer Rails, Node.js, Django and Go), gave it a name, selected a region and clicked on “Create Box”. Provisioning and starting up the box took around a minute. It opened with a window in the browser that contained a file explorer on the left, a text editor in the middle, and a console in the bottom.
  3. A “git clone” in the console pulled in the sources from GitHub in a few seconds.
  4. A “bundle install” pulled in all the dependencies, this took around a minute.
  5. A “rake db:migrate” set up the local database.
  6. “Rails s” started the local server.
  7. And finally, a click on the “Preview Port 3000″ menu item opened a new tab in my browser, pointing to the right URL show my existing Rails app.

All this took less than 5 minutes and I was ready to go. A few further edits, running rails commands, migrating databases again, previewing changes, all that worked like a charm. And signing out on one computer and quickly signing in on the other preserved all the local changes and I could continue immediately.

Again, I’m impressed.

The Last Responsible Moment

We’re often tempted to plan as much as possible in advance and make decisions way before they are actually due. It gives us a sense of security and risk mitigation and preparedness. I urge you to stop doing that.

To understand why, follow this chain of reasoning:

  1. When making decisions, more uncertainty leads to higher risk.
  2. The more knowledge we have, the less uncertainty remains.
  3. (No matter how much you know today, it’s safe to say that) Tomorrow you will know more than today.
  4. Hence every decision you will make tomorrow (next week, next month), will be more informed than the ones you make today.
  5. That’s why making decisions as late as possible lowers risk.

The knowledge you’re accumulating can be pretty much everything, e.g. about:

  • Yourself
  • Your team
  • Your project
  • Your product
  • Your business
  • Your competition
  • The market you’re in
  • The legislative environment you’re dealing with
  • And so on…

Agile software development and Scrum have this way of making late decisions built into the process. What matters most is at the top of the backlog or is worked on during a sprint. Everything else isn’t important for now and most decisions related to everything else don’t need to be made yet.

Keep this in mind. Keep questioning yourself whether decisions need to be made already. And if not, don’t make them now. And maybe don’t even bother to think about them yet.

Where are the Java User Groups in Vietnam and in South-East Asia?

I’ve been wondering about this nearly three years ago already, looking for Java Communities in Ho Chi Minh City and Vietnam. Now I picked up an interest in this question again. While there has been tons of cool developments in other communities (Agile Vietnam, for example), there seems to be nothing new about good ol’ Java.

Especially looking at the Word Map of (registered) Java User Groups at http://www.java.net/jug-profile-map makes me wonder where all the JUG’s are in South-East Asia.

Even in more “mature” locations like Singapore, there doesn’t seem to be much around as I recently learned here.

Java, anyone? Watch out, I’m looking for you. And oh, if anybody is interesting in getting together and bringing Java forward in Vietnam, just drop a comment below or send me a message.

Fighting Scope Creep with the Techcrunch Test

No doubt scope creep is one of the biggest dangers to any software project. The possibilities to build everything are just too tempting and too often we think perfectionism is a virtue. Before we realize it, we’ve lost focus, got off track, and blown up the project far enough to be in trouble.

The Techcrunch Test (as I call it) originated for me during my work for a social local mobile app. Part of the job was to prioritize features properly and – at least equally important – find the right sizing of each feature. Thanks to iterative and incremental approaches it’s not necessary to be complete and perfect on a feature the first time around. However, cutting down features enough on this first time is often easier said than done. More often than not there were heated discussions about what needs to be done and what can safely be cut out and de-prioritized.

Techcrunch, one of the world’s most popular technology blogs, was one of the publications we were waiting to appear on. We knew we would have achieved something if Techcrunch would have started to write about us. (If you’re working in a different space, you should replace Techcrunch with a publication or authority that matters for you.)

The Techcrunch Test helps getting brought back down to earth whenever you’re tempted to build too much or set wrong priorities. For a feature X – or a part of a feature – ask youself and your team the following question:

If 3 months from now we will have failed and Techcrunch was to write about our failure, would Techcrunch say the following:

“If only they had introduced feature X they would have become successful!”

?

I promise you, in most cases the answer to this question will be No. And especially if you’re doubting the importance of a feature anyways already, the answer will almost always be No.

It’s that simple. Once you’re at the point of using the Test, pretty much nothing you’re putting into this question will appear important enough to be built afterwards. And there you go: don’t build it. Instead, focus on what really matters, focus on what sets you apart, focus on what is at the core of what you’re trying to achieve, focus on what brings you forward.

Focus on what Techcrunch would praise you for, some day.

Building the Right Product with Hypothesis-Driven Development

In my previous post about Making Continuous Delivery work with Scrum and Sprints I wrote about how to shorten release cycles significantly by changing your process and adding in the obvious amount of test and release automation.

A comment challenged that by basically saying “Well, this might help you build your product right (and in shorter cycles), but building the right product is a whole different question. And maybe the more important one.” Hard to disagree.

I wanted to dig deeper. These days you can’t be wrong by starting in the vicinity of Lean Startup if you’re looking for how to build the right product efficiently. As an engineer I’m familiar with a lof X-driven development techniques but then I’ve came across one I haven’t about before: Hypothesis-Driven Development.

The basic idea is simple:

  • Instead of requirements, you formulate assumptions, or hypotheses
  • At the same time you define a measurable signal, that will tell you whether you were right or wrong in a reasonably short amount of time

This sounds like a great start to get to a structured approach to factor the question of the right product into your development.

But of course building the right product and building the product right aren’t mutually exclusive. Nor would I say one is more important over the other. They both are. Where hypothesis-driven development guides you to make sure you’re being intentional about your assumptions and the need to test them, good old fashioned engineering techniques like test-driven development and test automation make sure you’re implementing your hypotheses right. Without being able to successfully (bug-free and all) deliver an increment of your software that aims at testing an assumption, you’re not going to get the right answers either.

The article I stumbled upon was http://agile.dzone.com/articles/hypothesis-driven-development which also links to a great presentation about Replacing Requirements with Hypotheses.

Making Continuous Delivery work with Scrum and Sprints

Scrum is promoting fixed-length sprints of 1, 2 or 4 weeks. We do 2. That means we plan for 2 weeks, the team works for 2 weeks, and then we’re ending the sprint with regression tests, release preparations, sprint review, a final sign-off, and the release. All engineering activities are set up around this. Now we want to release more often – continuously.

There’s a lot of good reasons to deliver continuously. Robert Johnson of  Facebook gave one of my favorite pep talks about this a while ago with “Facebook: Moving Fast At Scale”.

My requirement here is to stick to sprints of 2 weeks. I worked through a few alternatives, e.g. reducing the sprint length to one week instead. This would give us twice as many releases but that’s still nowhere near Facebook and others, and also wouldn’t solve the problem I wanted to solve since nothing would need to change. Another option that has been coming back regularly was Kanban, mostly because work flows continuously, without being boxed into sprints.

In the following I’m walking through what needs to change in order to make continuous delivery work with Scrum and fixed-length sprints.

Sprints vs. releases or: what are sprints for?

The main reasons for doing short sprints over long-term planning is the ability to respond to change and simply accepting the fact that long-term plans don’t work out anyways.

The reason why we ended up releasing after every sprint is that a) releases always create a certain overhead and it seems to make sense to batch up work and go through the overhead only once. And b) because it fits traditional project management thinking: once the planned work is done, it’s being signed-off and released.

But in the same way as we broke down work from a 12 months project into one sprint at a time during our transition to agile, it sounds reasonable to break things down further from batching up a release every 2 weeks into very small continuous releases.

Sprint review and final sign off

If releases are done every 2 weeks after a sprint is finished, it’s easy to combine sprint review and final sign-off. Again, a way of batching up things. But it doesn’t actually save a lot of time to batch up a sign-off, so we could as well do a quick sign-off after every story is completed. This has advantages anyways because a story is only then truly done after it’s signed-off and released. So the change that needs to happen is to de-couple sign-off of each story from the sprint review at the end of the sprint.

What remains is that the sprint review is an opportunity for the team to brag about the work they’ve done, to get stakeholders involved and updated and to make sure that the actual progress becomes visible and agreed upon. It has also advantages to show the current progress in a live environment because by the time of the sprint review, each story that is done is already running in production.

From a process point of view, the team would now be able to release continuously throughout the sprint. Now let’s get back to that “overhead” I talked about. The technical challenges need to be adressed, otherwise we’ll spend more time on the actual release than on development.

No junk in the trunk and code freeze

When releases are happening every 2 weeks, there’s always a bit of a touch down period at the end. Unfinished code is being finished, the last tickets are verified and time is spent to make sure that the Master is clean and ready to go. This often allows for a certain degree of sloppiness with the Master throughout the sprint. It’s not very critical if unfinished work ends up on the Master because there’s always enough time to fix it and clean it up. This needs to change if releases should happen “whenever we feel like it”.

First of all: No junk in the trunk! Master must always contain work that is finished and built to production quality. There is different branching strategies out there and documented. In this context they mostly come down adding two additional areas in the repository:

  1. Where work in progress happens. I recommend a separate feature branch for each story. Work remains solely on this branch until very high confidence is reached that a feature works.
  2. Where work gets integrated (but not yet pushed to Master). This is where finished stories are integrated with the latest Master – but outside of Master. Here, the remaining issues are caught and regression tests on related existing features are done. This should increase the confidence that a new story works and doesn’t break anything else to 100%. Then it goes into Master.

In an ideal world, this would allow us to get rid of good ol’ Code Freeze altogether. The different branches, the quality gates on each branch, and the integration down towards Master does exactly what a sprint-end code freeze does: make sure Master is clean and ready for a release.

Automate your testing

If each story should end up on a Master that is ready to release, then regression tests must be done for each ticket. Otherwise it’s hard to ensure that Master is really ready and recently added work doesn’t break anything. That’s where lack of test automation really does start to hurt.

There’s zillions of articles and books out there about this, so I’ll keep it short. The essence is: do it from the very beginning if you can. If it’s too late for that, invest some effort and get your regression tests automated, as close to 100% as you can.

This will decrease the overhead related to regression tests to as close to 0 as it gets.

Above I mentioned “quality gates”. These are all the different checks and tests a revision of your software must go through before it’s Done. Depending on your system they may consist of building your software, running unit tests, running static code analysis checks, running regression, UI and load tests and maybe some – hopefully not too many – manual steps. With many CI servers like Jenkins they can be automated and arranged in build pipelines. Such a pipeline runs them consecutively on a certain revision and only if it runs through until the end without failing in between, you’ve got a green light on this revision. I recommend using build pipelines.

Automate your deployments

Regression tests and deployments have a lot in common: if you only do them rarely, manual steps usually don’t hurt enough to automate it to near 100%. Now that we’re about to release very very often, this starts to hurt (= creates overhead). Release must be as lightweight, fast, and robust as it gets. If only 1 out of 10 releases makes even remotely trouble or fails, it’s hard to get the team confident enough to release frequently.

I recommend using a CI server, reducing deployments to a few clicks and add a suite of tests against your live servers into the script that tests the current deployment right after it’s done and fails or succeeds immediately.

I also recommend putting deployments as much as possible into the hands of the engineers who are writing the code or at least bring a member of the operations team into the Scrum team. This removes additional hand-overs and the team being blocked by others. Depending on the environment this is often difficult to achieve, but automating deployments down to a few clicks that never fail certain helps a lot.

Changing your definition of done

If all of the above works and releases indeed start to happen regularly and even after every story is done, it’s time to change the Definition of Done. Both the final sign-off and the release should be in there.

Up your skills!

A transformation to continuous delivery is a big step forward for a team. It requires a variety of skills to cope with all the technical and non-technical challenges. Being intentional about improving the team’s skills and being willing to spend time and money for this will definitely help. You can also learn along the way and learn from mistakes, but focus on better skills pro-actively will help making less mistakes, moving faster and gaining confidence within the team and outside.

Dealing with release problems

A live user environment much more likely breaks during a release, obviously. Hence a very common concern is that more frequent releases will as well introduce more frequent problems and actually increase the total amount of work necessary to deal with and fix all the problems.

But there’s also advantages: continuous releases consist of much smaller change sets. Hence it’s a whole lot easier to regression test them and to release them. And if something breaks, it’s also a whole lot faster to spot, understand and fix the problem and release a hotfix. Not to mention that continuous releases often happen within a day or so after the work is finished, so it’s likely engineers haven’t forgotten all about the details of a change.

There’s more ways of helping out on this, e.g. the ability to release only to a small sub-set of users, observe and then roll-out to all users after. Improvements like this should be evaluated.

Conclusion

Here’s a summary of what I’ve been working through on making continuous releases work with Scrum and fixed-length sprints:

  1. Decide to de-couple sprints from releases. Sprints are for planning, releases are just one more piece of getting work “done”.
  2. Move your final sign-off out of the sprint review (if that has been the case) and move it to the end of each story. Add this to your Definition of Done.
  3. Choose and implement the right branching strategy. Introduce feature branches and an integration branch and make sure work is being properly tested when it leaves a branch. Move only down to Master when things are really working and keep junk out of the trunk.
  4. Automate your testing as much as possible. Besides unit test coverage, implement as much automated tests as you need in order to gain the team’s and the product owner’s confidence, e.g. UI tests, load tests or even automated static code analysis. Use a CI server like Jenkins to tie all your automation together and make use of build pipelines.
  5. Automate your deployments down to a few clicks. Get the ability to execute the release into the Scrum team to avoid hand-overs and the team being blocked on the release.
  6. Release whenever you feel like it, be happy, and adding value to your live users quicker than ever.

What is your opinion about this? Are you doing the same? Did you face the same issues? Did you solve them in a similar way? How are you releasing your software when using Scrum and fixed-length sprints? I would be happy if you take a minute and leave a comment below.