Product developmentMarch 22, 20263 min read

The bug that only happened when the phone was locked

How a podcast that died 54 seconds after locking the screen turned out to be a development build artifact, and what it taught us about trusting test conditions.

We were a few weeks from putting Podshot in front of real people when a tester sent a message that made my stomach drop: "Audio stops about a minute after I lock my phone."

A podcast app that can't play with the screen off is not a podcast app. It is a webpage. So I dropped everything and went looking.

The first wrong guess

My first instinct was the audio session. On iOS, if you don't tell the operating system that you intend to keep playing in the background, it will pause you the moment the app loses focus. We had set that up, or so I thought, so I read the configuration three times, changed nothing, and still saw the problem.

Then I timed it. Roughly 54 seconds after locking, every single time. A number that consistent is a clue. Real background-audio failures are messy and depend on what else the phone is doing. A clean 54 seconds smells like a timer somewhere, not a system policy.

The thing I should have checked first

Here is the part I am a little embarrassed about. I was testing on a development client. That is the build you run during day-to-day work, the one with the live-reload connection and the debugger attached. It keeps a socket open to your laptop. When the phone locks and the network winds down, that socket eventually gives up, and the dev client does not love that.

I built a preview version, the kind that behaves like the real shipped app, locked the screen, and walked away for ten minutes. The episode kept playing the whole time.

So the bug was never in the product. It was in the conditions I was testing under. The fix was to stop trusting the dev client for anything related to the background.

The real work that came out of it

Chasing a fake bug still left us better off, because while I was in that part of the code I found a genuine problem with the lock-screen controls.

The skip-forward and skip-back buttons on the lock screen were hard-coded to 30 and 15 seconds. But a user can change those intervals in settings, and we were ignoring their choice once the screen was off. So someone who set a 45-second skip would get 30 on the lock screen and 45 inside the app. Small, but the kind of small thing that makes an app feel unfinished.

We wired the lock-screen controls to read the same interval the user had picked, and to update live when they changed it. While we were there we had to make a real trade-off on iOS. The system gives you a limited set of remote commands, and we wanted both custom skip intervals and the next/previous behavior that headphone gestures expect. You cannot have everything. We chose skip intervals on the lock screen for iOS, and kept next/previous on Android where there was room for both. A double-tap on the headphones no longer jumps to the next episode on iOS. We decided that was the better loss.

What I keep from this

Two things stuck with me.

First, when a number is suspiciously round or steady, stop debugging the feature and start debugging the test setup. The conditions you measure under are part of the experiment, and a development build is not the same animal as the thing your users install.

Second, a wasted afternoon is rarely fully wasted if you spend it reading the right code. I went in to fix a problem that did not exist and came out having fixed one that did.

The episode plays with the screen off now. It always did. I just had to look at it the right way.