4m

We have hosted the application 4m in order to run this application in our online workstations with Wine or directly.

Quick description about 4m:

4M is a training framework for �any-to-any� vision foundation models that uses tokenization and masking to scale across many modalities and tasks. The same model family can classify, segment, detect, caption, and even generate images, with a single interface for both discriminative and generative use. The repository releases code and models for multiple variants (e.g., 4M-7 and 4M-21), emphasizing transfer to unseen tasks and modalities. Training/inference configs and issues discuss things like depth tokenizers, input masks for generation, and CUDA build questions, signaling active research iteration. The design leans into flexibility and steerability, so prompts and masks can shape behavior without bespoke heads per task. In short, 4M provides a unified recipe to pretrain large multimodal models that generalize broadly while remaining practical to fine-tune.

Features:

Any-to-any modeling across diverse vision tasks
Masked modeling with unified tokenization for multiple modalities
Released model families (e.g., 4M-7, 4M-21) with training/eval code
Promptable and steerable behavior without task-specific heads
Transfer to unseen tasks and modalities from a single backbone
Research-grade configs and examples for reproduction

Programming Language: Python.
Categories:

AI Models

Page navigation:

By OD Group OU – Registry code: 1609791 -VAT number: EE102345621.