I have used Open-whisper and Fast-whisper to do subtitles.
Open whisper is easy to set up and install locally. I tried various models.
Recently I tried to do the French series En Therapie (In Therapy) which has 35 short episodes.
https://www.arte.tv/fr/videos/RC-020578/en-therapie/
Each episode is only 20 minutes long, so I thought that open whisper would be great to translate from French to English.
However. It failed dismally. Constant, regurgitation of repeated sentences. Throughout entire episodes open whisper used “him” instead of “her” and many other instances of misspelling. It would fail if there was music playing in the background.
I extracted the audio from the videos into small .wav format and .mp3 format but both failed.
I spent over a week trying to create suitable subtitles to no avail.
Check out Voxtral Mini or Small if you have the GPU for it. It works really well on English but it comes from French company so I would be surprised if French doesn’t work well as well.
Thank you Domi
Whisper struggles with non-english languages and background noise - you might get better results using the larger models (medium/large) with a lower temperature setting to reduce the hallucinations and repetitions your experiencing.
hey MysteriousSophon21
I did use the larger and medium models with Open whisper and Fast whisper.
I did not consider the lower termperature settings
Thank you
Whisper is what I think is one of the best uses for machine learning.
Recognition in general is the main thing it’s powerful for. Speech to text, OCR, etc.