We have hosted the application whisper timestamped in order to run this application in our online workstations with Wine or directly.


Quick description about whisper timestamped:

Multilingual Automatic Speech Recognition with word-level timestamps and confidence. Whisper is a set of multi-lingual, robust speech recognition models trained by OpenAI that achieve state-of-the-art results in many languages. Whisper models were trained to predict approximate timestamps on speech segments (most of the time with 1-second accuracy), but they cannot originally predict word timestamps. This repository proposes an implementation to predict word timestamps and provide a more accurate estimation of speech segments when transcribing with Whisper models. Besides, a confidence score is assigned to each word and each segment.

Features:
  • The start/end estimation is more accurate
  • Documentation available
  • Confidence scores are assigned to each word
  • If possible (without beam search.), no additional inference steps are required to predict word timestamps (word alignment is done on the fly after each speech segment is decoded)
  • Special care has been taken regarding memory usage
  • Light installation for CPU
  • Plot of word alignment


Programming Language: Python.
Categories:
Machine Learning, LLM Inference

Page navigation:

©2024. Winfy. All Rights Reserved.

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.