We have hosted the application kubeai in order to run this application in our online workstations with Wine or directly.


Quick description about kubeai:

Get inferencing running on Kubernetes: LLMs, Embeddings, Speech-to-Text. KubeAI serves an OpenAI compatible HTTP API. Admins can configure ML models by using the Model Kubernetes Custom Resources. KubeAI can be thought of as a Model Operator (See Operator Pattern) that manages vLLM and Ollama servers.

Features:
  • Drop-in replacement for OpenAI with API compatibility
  • Serve top OSS models (LLMs, Whisper, etc.)
  • Multi-platform: CPU-only, GPU, coming soon: TPU
  • Scale from zero, autoscale based on load
  • Zero dependencies (does not depend on Istio, Knative, etc.)
  • Chat UI included (OpenWebUI)
  • Operates OSS model servers (vLLM, Ollama, FasterWhisper, Infinity)
  • Stream/batch inference via messaging integrations (Kafka, PubSub, etc.)


Programming Language: Go.
Categories:
Large Language Models (LLM), LLM Inference

Page navigation:

©2024. Winfy. All Rights Reserved.

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.