Audio to Text with Python

Documentation: https://github.com/openai/whisper?utm_source=chatgpt.com

 

Overview

With the rise of AI, it has become increasingly difficult to understand what truly qualifies as artificial intelligence. Sometimes, it feels like a simple program with a few if statements and a for loop is being marketed as AI.

In the case of Whisper, however, it is safe to say that it genuinely uses artificial intelligence. Whisper relies on neural networks to recognise speech patterns and convert spoken audio into text.

 


Installation & Usage
 

pip install openai-whisper



When used correctly, this tool can be a real game changer.
Long hours of meetings, interviews, or conversations can be automatically transcribed by a computer and later summarised, all with the help of AI.

 

whisper audio.mp3 --model base --language English

 

 

Available models

Whisper provides multiple models, allowing you to balance speed and accuracy:

 

  • tiny
  • base
  • small
  • medium
  • ...
    ​​​

Larger models generally offer better accuracy at the cost of higher processing requirements. 
I've tried it on multiple audios with the base model and did a good job with the English language.

 

Language support

The same flexibility applies to languages. Whisper supports a wide range of languages, which can be explicitly specified or automatically detected, making it suitable for multilingual transcription workflows.


Django 5.2
openai-whisper==20250625


17 Dec. 2025 | Last Updated: 18 Dec. 2025 | jaimedcsilva

Related
  • Using the OpenAI API with Python
  • Ollama: Install and Run Local LLMs
  • Audio to Text with Python

  • Buy Me a Coffee