If you listen to (or make) podcasts, you are almost certainly aware of the recent industry-wide shift toward video. Or perhaps more accurately, the shift away from audio. The whole podcasting world is doing a pivot to video, and it’s apparent every place you look, and many places that you listen, that those two places are becoming the same place.
The mid-2010s “pivot to video” was one of the greatest media follies of the internet age. Extremely abridged version: In the midst of a relatively stable and profitable time for digital media, the biggest outlets on the web shifted their publishing focus away from written articles and toward videos. They were lured by social media companies’ promises of massive audience growth, but the promises were lies, and the pivot led straight off a cliff. Now, almost exactly a decade later, a similar shift is happening in the world of audio. More and more podcast companies, lured again by promises of massive audience growth, are pivoting to video.
Last week, The Guardian published an article by Fiona Sturges about the downsides of video-first podcasts. Sturges detailed how shows that lean too heavily on visuals can leave their audio listeners behind, with poor audio quality and visual references going un- or under-explained. Podcaster and audio critic Miranda Sawyer gave the quote of the article:
“The absence of images from radio and podcasting isn’t some failure of technology. These audio mediums have grown from a deep love of sound and its imaginative possibilities. When I hear people say the future of audio is essentially television, it makes me feel they never knew what was exciting about sound in the first place.”
That sentiment echoes Defector’s Alex Sujong Laughlin, who back in August published a terrific essay titled “The Future of Podcasting Is Here, And It Sucks.” Sujong Laughlin, who among other things is one of the minds behind Defector’s podcast sensation Normal Gossip, focused less on video podcasts’ degraded audio experience and more on the industry and its audience’s move away from high-touch, scripted storytelling (e.g. Radiolab, This American Life) and toward lower-difficulty, celebrity-infused chat shows (e.g. Armchair Expert, Good Hang).
After enumerating the economic trends pushing things in that direction, and the many jobs lost and midsize publishers acquired and destroyed along the way, Sujong Laughlin added her own take:
The chat shows are entertaining. They fill up my bathroom with agreeable noise when I’m showering, but often I wonder if I’ve turned myself into a grown-up iPad toddler, unable to cope with the silence of being alone, or to train my attention span on something that requires several hours’ commitment. A celebrity talk show has kept me company on walks, but it has never shocked me out of hazy malaise and made me feel more curious about the world, less alone, or in awe. I have never listened to one and been surprised in the ways that I was upon encountering This American Life, or Serial, or The Daily.
As harsh as that self-analysis may feel, I have to admit to having had similar thoughts. They’re not categorical or absolute for me, as I sense they aren’t for Sujong Laughlin; I don’t feel incapable of coping with silence, and well-made chat podcasts are often much more to me than agreeable noise. Furthermore, I can easily list a few chat shows that have recently made me feel more curious and less alone—for a good time, listen to McMansion Hell’s Kate Wagner expounding on the complex intersections between architecture and politics on a recent episode of Know Your Enemy. But for me, even the best chat show can’t hold a candle to the kind of stuff painstakingly made by veterans working at places like Serial, Higher Ground, Search Engine, Twenty Thousand Hertz, or 99% Invisible.
Of course, the media world doesn’t really care what I think, or what others in my little niche of audio aficionados may want. Spotify and YouTube are relentlessly pushing video onto podcast creators, Netflix is securing Ringer video podcasts for their own walled garden, and some of my favorite newer podcasts are incorporating more and more visuals into each episode they make. Modern production studios—including the one I just helped design and build!—are all “video ready,” with their rounded wood desks, tasteful backdrops, and low-profile microphone boom arms. It’s a video world, whether we wanted it or not.

A single episode of Strong Songs contains dozens of tracks and hundreds of edits, with nary a video file in sight.
The move toward video isn’t all-encompassing. Strong Songs, a high-effort, audio-only product if ever there was one, has found a big enough audience for me to make a living, and I don’t have any shareholders pushing me to embrace video in order to make the numbers go up. But I can’t entirely ignore the siren call of video, partly just because it offers new expressive possibilities for my creative work. I can’t feel too superior to the rest of the podcasting world, given that I’m planning to make more videos than ever in 2026, both for the Strong Songs Patreon and for YouTube. I think video is a vital part of online music education, and it’s also a fun new production challenge. But with each camera I install, each light I set up, each YouTube Short or Instagram Reel I record, I can feel it gently pulling at me. Come on, buddy, make more videos. Think of how many more people you could reach. It’s not an unpleasant feeling, which of course is the issue.
The pull is even more noticeable when it comes to my other podcast, Triple Click. That makes sense; Triple Click is a chat show, and so already a more natural fit for video than a scripted show like Strong Songs. And seemingly everywhere I look, I see snappy and engaging clips of my peers’ podcasts on social media, three or four people laughing into webcams as the edits ping-pong between them. Has Triple Click been leaving an untapped audience on the table by stubbornly remaining audio-only? Almost certainly.
After talking it through with my cohosts, I spent much of December investigating our options. Could I produce a video version of Triple Click without compromising the quality of the audio version? I tried out a few of the most popular podcast-specific web recording apps—Descript and Riverside chief among them—and spent far too long exploring harder-core video editing options in DaVinci Resolve. (Ask me about syncing Fusion animations to MIDI pitch data extracted from voiceover audio. Just kidding, don’t ask me about that!)
After many hours of experimentation, here’s what I found:
No, it is not possible for me to make a video version of Triple Click that meets the standards I’ve set for the audio version of the show.
Descript and Riverside are interesting apps. Riverside in particular seems like a much more effective option for remote podcast recording than more general-purpose chat apps like Zoom, which I’ve been using for the past couple of years.
Editing in both is (imo) notably slow and frustrating, particularly compared with pro audio applications like Logic or Pro Tools. It’s not even close.
Riverside and Descript are both larded to the gills with AI bloat—”producer” chatbots that routinely make mistakes, video and thumbnail image generators that puke out horrors, and so on.
The two apps offer decent transcription tools, and each episode is given a passable transcript as soon as it’s done processing. I would never publish the transcript, as it spells basically every name wrong and makes a lot of other mistakes. But it’s helpful for navigation while editing.
More usefully, Riverside generates a collection of pre-cut social clips like the ones you see in your social feeds. These are actually usable with very little additional editing, and they’re pretty slick.
After dispensing with the notion of producing full-length videos of Triple Click, I realized that we could have our cake and eat it too: make the audio version of the show as usual, and have Riverside clip out video excerpts to share on social media. And so we found a way to add some of the benefits (and fun) of video to Triple Click without compromising the audio product we’ve spent the past decade making.
I would imagine most of my podcasting contemporaries feel similarly to me these days; like we have one foot in each world. I sit in the studio, making my audio-only music podcast while setting up cameras to record the process, conjuring my little soundscapes while adjusting the key light above my desk. I’m trying to be as thoughtful about the balance as possible, and not to let video’s undeniable charms tempt me into losing sight of what makes audio so powerful on its own.

Jad Abumrad, a man who gets it
In November, Radiolab founder Jad Abumrad guested on Strong Songs to talk about his new podcast series Fela Kuti: Fear No Man. In the course of our conversation, I asked what he, as one of podcasting’s great audio storytellers, thought of the industry’s recent swing toward video. His answer, lightly edited here for clarity, was a story unto itself:
As much as I am, like, a capital-A audio nerd, I don't have a political feeling about audio versus video. It just happened to be the case that two and a half, three years ago when we started [Fela Kuti: Fear No Man], it was just, I mean, podcasting was audio. And there were only a few podcasts, the Joe Rogans and such, that were doing video.
And then in just three years, to see how quickly that's changed, and how people are redefining the category! I can't tell you how many people have asked me, "Oh, congratulations on the new Fela series. Where can I watch that?" I get that question every five minutes. I'm like, “You can't watch it. It's just, listen to it.”
(Me: “You listen; it's just for your ears.”)
Yeah, it's just for your ears.
You know, there's a part of me that gets very sad because… maybe I do have political feelings about this. I mean, I do feel like the way that you listen, when you are just listening, it's very special, you know? The way a voice gets in your imagination. Somehow, because of the lack of images, you fill it in with your own paintbrush, and it lives so much more warmly and more vividly in your mind. I really love that. I mean, even with chat shows. I love the way a voice can fill your consciousness, you know?
Video is multifaceted and exciting, and a good video is a wonderful thing. But music, voices, stories; those things exist on their own terms. They’ll bloom if we let them, alive in the spaces we make in our minds.
