We have hosted the application hifi gan in order to run this application in our online workstations with Wine or directly.
Quick description about hifi gan:
HiFi-GAN is a GAN-based neural vocoder designed to generate high-fidelity speech waveforms from mel spectrograms with exceptional efficiency. It introduces a generator architecture tailored to model the periodic structure of speech and a set of discriminators that focus on different scales and periods of the waveform to better capture naturalness. The model targets a sweet spot between sample quality and generation speed, outperforming many previous GAN vocoders while being far faster than typical autoregressive models. In experiments on LJSpeech, HiFi-GAN was shown to achieve mean opinion scores close to human recordings while synthesizing 22.05 kHz audio up to ~168� faster than real time on an NVIDIA V100 GPU. A smaller configuration trades a bit of quality for even higher speed and can run more than 13� faster than real time on CPU, making it suitable for deployment scenarios without powerful GPUs.Features:
- High-fidelity neural vocoder that converts mel spectrograms to waveforms using a GAN architecture
- Multi-period and multi-scale discriminators to better capture periodicity and overall speech realism
- Very fast inference, achieving far faster-than-real-time generation on modern GPUs and even optimized CPU setups
- Multiple generator configurations (v1, v2, v3) to balance quality, speed, and model size
- Compatible with many TTS front ends such as Tacotron2 and Glow-TTS for end-to-end systems
- Open-source implementation with pretrained models and scripts for training, evaluation, and inference
Programming Language: Python.
Categories:
©2024. Winfy. All Rights Reserved.
By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.