Bookshelf Apps

What we’re doing about quality

August 10th, 2010 1

I know many users have been frustrated with the stability of the early 3.0 releases of SongBook DS. As is usual with these things there were not many bugs, but they were highly visible for some users. That and the delays inherent in the App Store release process made for a situation that was not very good. It’s not a situation that we want to repeat and has served as a strong sign that we need to change the way we manage releases.

In the past we had an easier time with releases as we had a smaller user base and a smaller set of supported devices. During the time of the 3.0.x releases we also had a few updates to Linn’s Cara firmware to deal with. In most cases we don’t get early access to Linn firmware updates so this can mean that sometimes we have some compatibility issues. We also had one major bug get through both our testing and Apples testing.

That’s the bad news. The good news is that with the 3.0.6 release (out a couple of weeks ago) we have eliminated most of the crashing bugs. We’ve still got further to go to reach our stability goals, but things are moving in the right direction. On our worst day we had 1.7 megabytes of crash reports on our server. The current daily report is 100k of crash data (52 crashes), so not perfect yet, but going in the right direction.

 

Complexity

SongBook is evolving in to quite a complex product. From a single source code base we now generate 12 different products. Some of these are not released yet, but every product means changes to the code base that need testing. We now have thousands of users, all with a slightly different network setup. Our release test plan (more on this below) now tests against 10 media renderer configurations, 4 OS/Hardware platforms, 5 media servers, and 27 functional tests. If we run a complete test this involves running 226 individual tests. Including setup time this takes many hours of work for each release.

 

What we are doing

We’ve been working hard putting new systems in place to ensure upcoming releases are much more stable right from the start. This is a huge task and we are only part of the way along implementing these processes, but this is what we have done so far:

1. Automated build and release process

Some of the issues we have had in the past have been because of mistakes in the build or release process, or due to testing a different build to what was actually released. We’ve implemented a completely automated build and release process to eliminate this problem. Builds are triggered by check-ins to our configuration management system and run completely unattended on a dedicated build server. We’ve also made lots of small changes in the code designed to increase reliability.

2. Issue tracking

We actually have had this for quite a while now, but our internal issue tracking system has played a core role in improving our internal processes. Issues in this system are tracked right through from configuration management to builds and releases. It’s one of the parts of the system that has worked out the best I think. We’re using Jira for issue tracking and release management. We found it much better to separate our issue tracking system from our support system, they really are for different purposes.

3. Regression Testing

We have a manual test plan in place for all releases. We’ve always had this but we have increased the number of tests substantially. We’re working on adding to these tests and to automating what we can. Also, because we are more formally tracking source code changes we have a better idea of what subsystems to concentrate the testing on.

4. Crash Reporting

We now automatically collect crash logs from our iOS applications. This gives us visibility of what is really happening, not just what is reported. It’s shown us that there are a lot of bugs that do not get reported. We analyse this data to find the most common crashes and work on eliminating them. This is not always easy as some of the bugs are caused by memory management issues that are not at all clear from the crash reports. The first few versions of the crash reports would also leave out critical information in some cases, so we have refined this process over time.

 

We’ve got a lot of work to do to finish this process, but it is already paying off and should only improve over time. This is really a natural progression from a one-person/one application development process to a larger business. As we continue to increase our product range and supported devices it will be essential to have these processes in place.

One of our fears with this change is that it would slow the pace of development down. There is a convincing argument though that it will allow us to increase our development speed as we are spending less time on resolving issues and fixing problems. Only time will tell if this is the case, but at least each release should be more reliable now.

One Response

Bbrip
August 11th, 2010

Sounds great. Good luck. I’ll mail some bugs I found in 3.0.6

Leave a Comment

You should not post support requests here as comments. The quickest way to get your support issue resolved is to email support@bookshelfapps.com, you will get a response and your issue will be tracked if you do this. A comment here may or may not get noticed. For more information on support see the support section

Your Comment
Use OpenID
OpenID URL