Model Gallery

Discover and install AI models from our curated collection

4 models available

1 repositories

Documentation

Find Your Perfect Model

Filter by Model Type

Browse by Tags

qwen3-vl-30b-a3b-instruct

Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on-demand deployment. #### Key Enhancements: * **Visual Agent**: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks. * **Visual Coding Boost**: Generates Draw.io/HTML/CSS/JS from images/videos. * **Advanced Spatial Perception**: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI. * **Long Context & Video Understanding**: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing. * **Enhanced Multimodal Reasoning**: Excels in STEM/Math—causal analysis and logical, evidence-based answers. * **Upgraded Visual Recognition**: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc. * **Expanded OCR**: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing. * **Text Understanding on par with pure LLMs**: Seamless text–vision fusion for lossless, unified comprehension. #### Model Architecture Updates: 1. **Interleaved-MRoPE**: Full‑frequency allocation over time, width, and height via robust positional embeddings, enhancing long‑horizon video reasoning. 2. **DeepStack**: Fuses multi‑level ViT features to capture fine-grained details and sharpen image–text alignment. 3. **Text–Timestamp Alignment:** Moves beyond T‑RoPE to precise, timestamp‑grounded event localization for stronger video temporal modeling. This is the weight repository for Qwen3-VL-30B-A3B-Instruct.

Repository: localaiLicense: apache-2.0

gemma-3-27b-it

Google/gemma-3-27b-it is an open-source, state-of-the-art vision-language model built from the same research and technology used to create the Gemini models. It is multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 models have a large, 128K context window, multilingual support in over 140 languages, and are available in more sizes than previous versions. They are well-suited for a variety of text generation and image understanding tasks, including question answering, summarization, and reasoning. Their relatively small size makes it possible to deploy them in environments with limited resources such as laptops, desktops or your own cloud infrastructure, democratizing access to state of the art AI models and helping foster innovation for everyone.

Repository: localaiLicense: gemma

qwen3-vlto-32b-instruct-i1

**Model Name:** Qwen3-VL-32B-Instruct (Text-Only Variant: Qwen3-VLTO-32B-Instruct) **Base Model:** Qwen/Qwen3-VL-32B-Instruct **Repository:** [mradermacher/Qwen3-VLTO-32B-Instruct-i1-GGUF](https://huggingface.co/mradermacher/Qwen3-VLTO-32B-Instruct-i1-GGUF) **Type:** Large Language Model (LLM) – Text-Only (Vision-Language model stripped of vision components) **Architecture:** Qwen3-VL, adapted for pure text generation **Size:** 32 billion parameters **License:** Apache 2.0 **Framework:** Hugging Face Transformers --- ### 🔍 **Description** This is a **text-only variant** of the powerful **Qwen3-VL-32B-Instruct** multimodal model, stripped of its vision components to function as a high-performance pure language model. The model retains the full text understanding and generation capabilities of its parent — including strong reasoning, long-context handling (up to 32K+ tokens), and advanced multimodal training-derived coherence — while being optimized for text-only tasks. It was created by loading the weights from the full Qwen3-VL-32B-Instruct model into a text-only Qwen3 architecture, preserving all linguistic and reasoning strengths without the need for image input. Perfect for applications requiring deep reasoning, long-form content generation, code synthesis, and dialogue — with all the benefits of the Qwen3 series, now in a lightweight, text-focused form. --- ### 📌 Key Features - ✅ **High-Performance Text Generation** – Built on top of the state-of-the-art Qwen3-VL architecture - ✅ **Extended Context Length** – Supports up to 32,768 tokens (ideal for long documents and complex tasks) - ✅ **Strong Reasoning & Planning** – Excels at logic, math, coding, and multi-step reasoning - ✅ **Optimized for GGUF Format** – Available in multiple quantized versions (IQ3_M, Q2_K, etc.) for efficient inference on consumer hardware - ✅ **Free to Use & Modify** – Apache 2.0 license --- ### 📦 Use Case Suggestions - Long-form writing, summarization, and editing - Code generation and debugging - AI agents and task automation - High-quality chat and dialogue systems - Research and experimentation with large-scale LLMs on local devices --- ### 📚 References - Original Model: [Qwen/Qwen3-VL-32B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct) - Technical Report: [Qwen3 Technical Report (arXiv)](https://arxiv.org/abs/2505.09388) - Quantization by: [mradermacher](https://huggingface.co/mradermacher) > ✅ **Note**: The model shown here is **not the original vision-language model** — it's a **text-only conversion** of the Qwen3-VL-32B-Instruct model, ideal for pure language tasks.

Repository: localaiLicense: apache-2.0

qwen3-vlto-32b-thinking

**Model Name:** Qwen3-VLTO-32B-Thinking **Model Type:** Large Language Model (Text-Only) **Base Model:** Qwen/Qwen3-VL-32B-Thinking (vanilla Qwen3-VL-32B with vision components removed) **Architecture:** Transformer-based, 32-billion parameter model optimized for reasoning and complex text generation. ### Description: Qwen3-VLTO-32B-Thinking is a pure text-only variant of the Qwen3-VL-32B-Thinking model, stripped of its vision capabilities while preserving the full reasoning and language understanding power. It is derived by transferring the weights from the vision-language model into a text-only transformer architecture, maintaining the same high-quality behavior for tasks such as logical reasoning, code generation, and dialogue. This model is ideal for applications requiring deep linguistic reasoning and long-context understanding without image input. It supports advanced multimodal reasoning capabilities *in text form*—perfect for research, chatbots, and content generation. ### Key Features: - ✅ 32B parameters, high reasoning capability - ✅ No vision components — fully text-only - ✅ Trained for complex thinking and step-by-step reasoning - ✅ Compatible with Hugging Face Transformers and GGUF inference tools - ✅ Available in multiple quantization levels (Q2_K to Q8_0) for efficient deployment ### Use Case: Ideal for advanced text generation, logical inference, coding, and conversational AI where vision is not needed. > 🔗 **Base Model**: [Qwen/Qwen3-VL-32B-Thinking](https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking) > 📦 **Quantized Versions**: Available via [mradermacher/Qwen3-VLTO-32B-Thinking-GGUF](https://huggingface.co/mradermacher/Qwen3-VLTO-32B-Thinking-GGUF) --- *Note: The original model was created by Alibaba’s Qwen team. This variant was adapted by qingy2024 and quantized by mradermacher.*

Repository: localaiLicense: apache-2.0