We have hosted the application whisper timestamped in order to run this application in our online workstations with Wine or directly.
Quick description about whisper timestamped:
Multilingual Automatic Speech Recognition with word-level timestamps and confidence. Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This repository proposes an implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models. Besides, a confidence score is assigned to each word and each segment.Features:
- The start/end estimation is more accurate
- Documentation available
- Confidence scores are assigned to each word
- If possible (without beam search.), no additional inference steps are required to predict word timestamps (word alignment is done on the fly after each speech segment is decoded)
- Special care has been taken regarding memory usage
- Light installation for CPU
- Plot of word alignment
Programming Language: Python.
Categories:
©2024. Winfy. All Rights Reserved.
By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.