Separating Audio from Video with WebAssembly
Introduction
A few months ago we were building an audio transcription feature for one of my clients. We found that quite a lot of users wanted to upload video files as well as audio directly from our upload mask. Our transcription tool supported video, so we just allowed the upload of more filetypes. During testing we then received mixed feedback from our users.
My 3.5Gb video file took ages to upload.
I tried uploading a large video, but my unstable internet connection forced me to do it three times.
This got us thinking.
Why are we even uploading the whole video file, when we only need the audio for the transcription?
By just uploading the audio, we could:
- increase speed and stability of the upload
- reduce storage costs
- speed up the transcription process
So how do we extract the audio in a nice performant way?
Enter WebAssembly
I have in the past often thought about the best use-cases for WebAssembly (WASM) and where its performance gains justify the higher implementation complexity. Computationally expensive algorithms as found in 3D, machine learning or games (e.g. chess) usually came to mind; and while I dabbled around with Rust and tried a few fun things, nothing was ever really worth it.
For those who haven’t kept up with WASM, it is a portable binary format that allows an application to run at near-native speed across different platforms. WebAssembly enables high-performance execution of code on web browsers by providing a compact, fast, and efficient binary instruction format that can be executed in the browser’s JavaScript engine.
WebAssembly is designed to be a compilation target for languages like C, C++, and Rust, allowing developers to write code in these languages and compile it into WASM. This binary format can then be loaded and executed by modern web browsers, making it possible to run computationally intensive applications directly in the browser without the need for plugins or external software.
By integrating with JavaScript, WASM enables seamless interaction between the binary-compiled code and the rest of the web application, allowing for powerful, high-performance web applications that were previously not feasible with traditional JavaScript alone.
FFmpeg and ffmpeg.wasm
If you ever need to convert audio or video FFmpeg will be your friend. It is a free, open-source command line application and extremely versatile. For extracting audio from a video file you just run the following:
ffmpeg -i input.mp4 -map a -q:a 0 output.mp3
The -map parameter extracts just the audio a stream. With -q:a we can define the quality of the audio, which we set to the maximum with 0.
This works just as expected locally. But how can we make use of it in the browser?
Luckily a pure WebAssembly/JavaScript port of FFmpeg exists with ffmpeg.wasm.
Front-End Implementation
Let’s use the WASM port to extract the audio directly in the browser. For this example we will be using angular. It should work similarly with any other frontend framework as well.
First we need to install the npm package to our project
npm i @ffmpeg/ffmpeg
In our angular component upload.component.ts we import the modules.
import { FFmpeg } from "@ffmpeg/ffmpeg";
import { fetchFile, toBlobURL } from "@ffmpeg/util";
Then we need to load and enable the three ffmpeg.wasm modules:
ffmpeg-core.jsfor IO with the wasm moduleffmpeg-core.wasmthe wasm module responsile for transcodingffmpeg-core.worker.jsthe javascript worker to enable running in the background
We set the base url and load the blobs for all three modules, initialize the loaded and ffmpeg variables and define an async function load, which fetches the three modules and tells us that it has loaded. We will call this function when we know that the user wants to upload something in order to prevent unnecessary traffic.
const baseURL = "https://unpkg.com/@ffmpeg/core-mt@0.12.6/dist/esm";
class uploadComponent implements OnInit {
loaded = false;
ffmpeg = new FFmpeg();
file: File?;
// for preview purposes only
audioUrl: String?;
async load() {
await this.ffmpeg.load({
coreURL: await toBlobURL(`${baseURL}/ffmpeg-core.js`, "text/javascript"),
wasmURL: await toBlobURL(
`${baseURL}/ffmpeg-core.wasm`,
"application/wasm"
),
workerURL: await toBlobURL(
`${baseURL}/ffmpeg-core.worker.js`,
"text/javascript"
),
});
this.loaded = true;
}
async transcode() {
// stub for later
}
}
Within load() we can also initialize an event listener, which is triggered for every line of output ffmpeg prints. You can use that for debugging or, depending on the log level you set, to e.g. update a progress bar, show the remaining time in your UI etc.
this.ffmpeg.on("log", ({ message }) => {
console.info(message);
});
Now we can define what we actually want to do with ffmpeg. We also wrap this in an async transcode() function. We assume the file that is supposed to be trancoded is already assigned to the file property. We load the file into the FFmpeg WASM environment using fetchFile() and writeFile(). The file will just be named video with the actual file extension e.g. video.mp4.
Once the file is available, we can run the same ffmpeg command as we would using the console application. During the execution the log event will be triggered to inform us about the status and once finished, we can load the output.mp3 file and pass it to our fictitious uploadFile() method.
async transcode() {
const fileName = video.${this.file.type.split("/")[1]}
// get and write file to wasm environment
await this.ffmpeg.writeFile(
fileName,
await fetchFile(this.file)
);
// execute ffmpeg command with args
const status = await ffmpeg.exec([
"-i",
fileName,
"-q:a",
"0",
"-map",
"a",
"output.mp3"
]);
console.log(status);
// get and upload file
const fileData = await ffmpeg.readFile("output.mp3")
uploadFile(fileData)
// create data url for previewing resulting audio
const data = new Uint8Array(fileData as ArrayBuffer);
this.audioUrl = URL.createObjectURL(
new Blob([data.buffer], { type: 'audio/mp3' })
);
}
Integrating it into the User Flow
- We need to call
load()to set up the wasm environment, it would make sense to do this before the user selects a file to upload, or once you know what type of file you are dealing with, if you only want to load the ca. 30mb, when you are sure they will be needed. - We call
transcode()and can follow along what happens in the log. - Once the file is successfully transcoded, we can upload it and preview it using the
audioUrlin an<audio>tag.
<audio controls>
<source [src]="audioUrl" type="audio/mp3" />
</audio>
Conclusion
If you only need audio, you can easily and efficiently extract it from a video file using ffmpeg.wasm. This saves a significant amount of time and resources depending on the size of your video content.