transformer engine

We have hosted the application transformer engine in order to run this application in our online workstations with Wine or directly.

Run transformer engine online

Quick description about transformer engine:

Transformer Engine (TE) is a library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference. TE provides a collection of highly optimized building blocks for popular Transformer architectures and an automatic mixed precision-like API that can be used seamlessly with your framework-specific code. TE also includes a framework-agnostic C++ API that can be integrated with other deep-learning libraries to enable FP8 support for Transformers. As the number of parameters in Transformer models continues to grow, training and inference for architectures such as BERT, GPT, and T5 become very memory and compute-intensive. Most deep learning frameworks train with FP32 by default. This is not essential, however, to achieve full accuracy for many deep learning models.

Features:

Easy-to-use modules for building Transformer layers with FP8 support
Optimizations (e.g. fused kernels) for Transformer models
Support for FP8 on NVIDIA Hopper and NVIDIA Ada GPUs
Support for optimizations across all precisions (FP16, BF16) on NVIDIA Ampere GPU architecture generations and later
Documentation available
Examples included

Programming Language: Python.
Categories:

Machine Learning, LLM Inference

Page navigation:

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.