kubeai

We have hosted the application kubeai in order to run this application in our online workstations with Wine or directly.

Run kubeai online

Quick description about kubeai:

Get inferencing running on Kubernetes: LLMs, Embeddings, Speech-to-Text. KubeAI serves an OpenAI compatible HTTP API. Admins can configure ML models by using the Model Kubernetes Custom Resources. KubeAI can be thought of as a Model Operator (See Operator Pattern) that manages vLLM and Ollama servers.

Features:

Drop-in replacement for OpenAI with API compatibility
Serve top OSS models (LLMs, Whisper, etc.)
Multi-platform: CPU-only, GPU, coming soon: TPU
Scale from zero, autoscale based on load
Zero dependencies (does not depend on Istio, Knative, etc.)
Chat UI included (OpenWebUI)
Operates OSS model servers (vLLM, Ollama, FasterWhisper, Infinity)
Stream/batch inference via messaging integrations (Kafka, PubSub, etc.)

Programming Language: Go.
Categories:

Large Language Models (LLM), LLM Inference

Page navigation:

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.