It’s been over a year since ChatGPT and Dall-E 2 debuted, amazing the world. That’s all very good for the text and still image people, but can AI help video editors? Yes it can, in very many concrete ways, which we will list in this article. For now, we will concentrate on video editing; generative AI functions in animation/special effects and video distribution will be covered in future pieces.
Of course, the term AI has a certain plasticity to it. Many of the following functions have been around for years and did not necessarily require machine learning on existing data to be created, as the most famous AI tools did. However, given their almost magical capacity to save time and headache, we are including them in the same bucket. The tool names (organized under the categories of desktop, web, and mobile apps) have links to more information on how to achieve these effects in the different softwares.
In the old days, if you wanted to use a transcript to edit an interview, you would need to send the footage to a transcription service. Many days later it would come back and you would highlight what you wanted to include in the paper version. Finally, you would have an editing assistant come in and put together a rough cut. The whole process might take a week or two.
No more. With transcript-based editing, you can generate a transcript automatically, or upload one if you prefer. Then you can just highlight the quotes you want to include, and the software will give you an instant first cut. Veteran L.A.-based documentary documentary editor, Robert McFalls, finds this tool useful: “I find that having the words in front of me helps focus on what’s being said, so I can craft a clearer statement,” but he cautions: “These edits don’t always work. The tone of the reading before the edit might not match what comes after.”
This feature has been around for years and is now fairly mature. It is available both within editing softwares, and as part of standalone apps.
Remove Filler Words
Do your, ummm, video subjects like… insert verbal fillers into their… uhhh… conversation? And does it annoy you? Filler-word removers are the perfect way to delete non-verbal garbage words like ‘er’, ‘umm’, etc. All of a sudden, your interview subjects will seem a lot more articulate and concise.
In the same way that you can automatically remove filler words, you can automatically remove silences too. Run your podcast or audio VO through these applications and you will hear an immediate improvement of the pace. If, however, you have video footage, you will be left with some jump cuts where the silences were. Luckily, the next feature can make those jumps cuts seem invisible.
If you are conducting an interview and delete silences, filler words, or other unwanted material, what you will be left with are jump cuts: you will see the interview subject move abruptly from one position to another at the point of the cut. You can actually use this as a stylistic element. (Remember Ask a Ninja?) But if you would prefer not to, you can use a morph cut. What this feature does is shift the pixels in the image gradually, typically over 6 or 8 frames, hiding the jump.
In some circles, use of this transition is seen as disreputable. McFalls points out that “by making the cut seamless you’re saying, rather unequivocally, that the second phrase immediately follows the first. Which is not true.” But on balance, he doesn’t see an ethical problem: “Is it more truthful to add a cutaway to someone listening so you can make the compression? Personally, I don’t think so.”
Speaking of ethical quandaries, there was a big to-do recently when a documentary about the deceased Anthony Bourdain took some of his writings and voiced them using an AI-generated version of his voice. The director, Morgan Neville, went to an outside firm to create the voice he used, but now an even greater capability is available to you at low cost. With Descript, you can literally change the words that your on-camera subject is saying with a feature they call Overdub. Both the cloned voice and the lipsync for it will be created. Please don’t do anything deeply immoral with this feature.
Web only: Descript.
Many times when you have shot 4K footage and want to push in for a closer frame, you need to reframe the crop because the subject is moving. With auto-reframe, that process is automated. Just set the initial frame and this feature will adjust to maintain good composition on what it judges is the most important part of the shot. Of course, you have the capability to tweak the framing once it is done.
This feature is particularly helpful when you are trying to create different formats for pushing out to social media. With auto-reframe, you can take 16:9 footage and pull 9:16 and 4:5 clips from it far more quickly and easily.
Given the litigiousness of our media environment, if you are creating a product video or producing documentary footage that might have sensitive connotations, you will need to blur the faces of any people you have not procured a signed release from. Sometimes you will need to blur other things, like a brand-name or a visible ID.
There are two ways to do this. You can motion-track a face or object, and then apply a blur. This is the more traditional way. But some apps are also able to recognize the faces and blur them automatically.
Happily, this feature does not require either automobiles or amphibian aviary. Ducking simply means that when the voice track comes on, the music track is dipped – typically over a couple of frames – so that it doesn’t fight with the voice. Then when the narration goes silent again, the music comes back to the fore. To do this manually over a long piece can become annoying, but auto-ducking will do it automatically. However, if you’re fussy, it might not give you exactly what you want. Says McFalls: “I always end up changing the keyframes to my liking.”
The task of color grading has always been a little bit of a science and a little bit of an art. You will still need the artist’s eye for fine adjustment, but AI-assisted color grading can help a lot. Runway ML, a web-based suite of AI tools, has introduced a new feature that lets you describe a color grade in words and have it applied to your footage. Color.io goes a step further: you can upload an image sample of a color grade, and the software will pull a grade from it. Both these apps let you download a LUT file, which you can then upload to your editor for application and fine adjustments.
Descript has also developed an impressive sound clean-up function which they call Studio Sound. You can hear it in action here:
Noise reduction, either by using frequency cut-offs or using a noise gate, has been around for years. But what Studio Sound can do that is truly remarkable is remove the reverb, echo and low-fi transmission effects from badly recorded sound and make it sound… well, not perfect, but substantially better.
It used to be that if you shot standard definition, you lusted after 1080p. And if you had 1080p, you wanted 4K. It was always about playing with the toys that gave you higher resolution, because once the video was shot you were stuck with it for good.
But what if you could take old, low-resolution footage and make it look like it was shot in higher resolution? Or blow up a portion of 4K footage beyond what you technically should? Today we have a number of remarkable web-based tools that let you up-res images, letting you recover detail that was never there to begin with.
Not only can you upscale spatially, you can also upscale temporally. Even if you have shot something at a standard frame of 24fps, modern software can interpolate between the frames you’ve shot, adding in-betweens that provide smooth slow motion without having shot at a high frame rate.
AI tools can do amazing things, and they will continue to get better. At IdeaRocket, we believe that humans will always be useful in translating a business need into a video solution. We use these and more traditional tools to make explainer videos for technology, healthcare, and other verticals. We also make some pretty good TV commercials for streaming, broadcast or cable distribution. We work in 2d, 3d, whiteboard, motion graphics, and mixed media techniques, as well as live action. Contact us to learn more.