We have hosted the application cogvlm in order to run this application in our online workstations with Wine or directly.
Quick description about cogvlm:
CogVLM is an open-source visual�language model suite�and its GUI-oriented sibling CogAgent�aimed at image understanding, grounding, and multi-turn dialogue, with optional agent actions on real UI screenshots. The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490�490 inputs; CogAgent-18B extends this to 1120�1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks. The repo provides multiple ways to run models (CLI, web demo, and OpenAI-Vision�style APIs), along with quantization options that reduce VRAM needs (e.g., 4-bit). It includes checkpoints for chat, base, and grounding variants, plus recipes for model-parallel inference and LoRA fine-tuning. The documentation covers task prompts for general dialogue, visual grounding (box?caption, caption?box, caption+boxes), and GUI agent workflows that produce structured actions with bounding boxes.Features:
- Pretrained VLMs (CogVLM-17B) and GUI-capable CogAgent-18B
- Multi-turn image dialogue, visual grounding, and GUI action planning
- Ready-to-use demos: CLI (SAT/HF), Gradio web UI, and OpenAI-Vision�style API
- Checkpoints for chat/base/grounding variants with prompt templates
- Quantization support (4-bit/8-bit) and model-parallel inference on multi-GPU
- Fine-tuning examples (e.g., CAPTCHA) and evaluation scripts
Programming Language: Python, Unix Shell.
Categories:
©2024. Winfy. All Rights Reserved.
By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.