Documentation: https://github.com/openai/whisper?utm_source=chatgpt.com
Overview
Whisper, simple to use, a little bit more demanding to set up. Let's take a look at the dependencies below:
Whisper
│
├── PyTorch
│ └── Visual C++ Runtime
│
└── FFmpeg
Installation
Visual C++ Runtime
Get-WmiObject -Class Win32_Product | Where-Object {$_.Name -like "*Visual C++*"} | Select Name
Confirm that you have the following installed:
-
Microsoft Visual C++ 2022 X64 Minimum Runtime
-
Microsoft Visual C++ 2022 X64 Additional Runtime
If not:
-
In https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170
Download and Install: Latest supported Redistributable version
PyTorch
(To avoid conflicts with other Python packages, it is recommended to create a dedicated virtual environment for this project)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Confirm the installation:
python -c "import torch; print(torch.__version__)"
The expected output will be something like:
2.x.x+cpu
FFmpeg
Whisper uses FFmpeg to decode and read audio files. Check whether FFmpeg is already installed:
ffmpeg -version
If the command is not recognized, you will need to install it:
From https://www.gyan.dev/ffmpeg/builds/, download: ffmpeg-git-essentials.7z
Create a folder named ffmpeg in C:/ and drag the content of the download to ffmpeg:
C:/
└── ffmpeg/
├── bin/
│ ├── ffmpeg.exe
│ ├── ffplay.exe
│ └── ffprobe.exe
├── doc/
└── LICENSE
Adding ffmpeg to PATH:
Environment Variables > Edit Environment Variables > User Variables > PATH > New > C:/ffmpeg/bin
Whisper
Installation
pip install openai-whisper
Usage
whisper audio.mp3 --model base --language English
Available models
Whisper provides multiple models, allowing you to balance speed and accuracy:
tinybasesmallmediumlargeturbo
Larger models generally offer better accuracy at the cost of higher processing requirements.
I've tried it on multiple audios with the base model and did a good job with the English language.
Language support
The same flexibility applies to languages. Whisper supports a wide range of languages, which can be explicitly specified or automatically detected, making it suitable for multilingual transcription workflows.
When used correctly, this tool can be a real game changer.
Long hours of meetings, interviews, or conversations can be automatically transcribed by a computer and later summarised, all with the help of AI.
Django 5.2 openai-whisper==20250625