We have hosted the application cogvlm in order to run this application in our online workstations with Wine or directly.


Quick description about cogvlm:

CogVLM is an open-source visual�language model suite�and its GUI-oriented sibling CogAgent�aimed at image understanding, grounding, and multi-turn dialogue, with optional agent actions on real UI screenshots. The flagship CogVLM-17B combines ~10B visual parameters with ~7B language parameters and supports 490�490 inputs; CogAgent-18B extends this to 1120�1120 and adds plan/next-action outputs plus grounded operation coordinates for GUI tasks. The repo provides multiple ways to run models (CLI, web demo, and OpenAI-Vision�style APIs), along with quantization options that reduce VRAM needs (e.g., 4-bit). It includes checkpoints for chat, base, and grounding variants, plus recipes for model-parallel inference and LoRA fine-tuning. The documentation covers task prompts for general dialogue, visual grounding (box?caption, caption?box, caption+boxes), and GUI agent workflows that produce structured actions with bounding boxes.

Features:
  • Pretrained VLMs (CogVLM-17B) and GUI-capable CogAgent-18B
  • Multi-turn image dialogue, visual grounding, and GUI action planning
  • Ready-to-use demos: CLI (SAT/HF), Gradio web UI, and OpenAI-Vision�style API
  • Checkpoints for chat/base/grounding variants with prompt templates
  • Quantization support (4-bit/8-bit) and model-parallel inference on multi-GPU
  • Fine-tuning examples (e.g., CAPTCHA) and evaluation scripts


Programming Language: Python, Unix Shell.
Categories:
Large Language Models (LLM), AI Models

Page navigation:

©2024. Winfy. All Rights Reserved.

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.