Model Gallery

Discover and install AI models from our curated collection

1109 models available
1 repositories
Documentation

Find Your Perfect Model

Filter by Model Type

Browse by Tags

qwen3-vl-30b-a3b-instruct
Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on-demand deployment. #### Key Enhancements: * **Visual Agent**: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks. * **Visual Coding Boost**: Generates Draw.io/HTML/CSS/JS from images/videos. * **Advanced Spatial Perception**: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI. * **Long Context & Video Understanding**: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing. * **Enhanced Multimodal Reasoning**: Excels in STEM/Math—causal analysis and logical, evidence-based answers. * **Upgraded Visual Recognition**: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc. * **Expanded OCR**: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing. * **Text Understanding on par with pure LLMs**: Seamless text–vision fusion for lossless, unified comprehension. #### Model Architecture Updates: 1. **Interleaved-MRoPE**: Full‑frequency allocation over time, width, and height via robust positional embeddings, enhancing long‑horizon video reasoning. 2. **DeepStack**: Fuses multi‑level ViT features to capture fine-grained details and sharpen image–text alignment. 3. **Text–Timestamp Alignment:** Moves beyond T‑RoPE to precise, timestamp‑grounded event localization for stronger video temporal modeling. This is the weight repository for Qwen3-VL-30B-A3B-Instruct.

Repository: localaiLicense: apache-2.0

qwen3-vl-30b-a3b-thinking
Qwen3-VL-30B-A3B-Thinking is a 30B parameter model that is thinking.

Repository: localaiLicense: apache-2.0

qwen3-vl-4b-instruct
Qwen3-VL-4B-Instruct is the 4B parameter model of the Qwen3-VL series.

Repository: localaiLicense: apache-2.0

qwen3-vl-32b-instruct
Qwen3-VL-32B-Instruct is the 32B parameter model of the Qwen3-VL series.

Repository: localaiLicense: apache-2.0

qwen3-vl-4b-thinking
Qwen3-VL-4B-Thinking is the 4B parameter model of the Qwen3-VL series that is thinking.

Repository: localaiLicense: apache-2.0

qwen3-vl-2b-thinking
Qwen3-VL-2B-Thinking is the 2B parameter model of the Qwen3-VL series that is thinking.

Repository: localaiLicense: apache-2.0

qwen3-vl-2b-instruct
Qwen3-VL-2B-Instruct is the 2B parameter model of the Qwen3-VL series.

Repository: localaiLicense: apache-2.0

huihui-qwen3-vl-30b-a3b-instruct-abliterated
These are quantizations of the model Huihui-Qwen3-VL-30B-A3B-Instruct-abliterated-GGUF

Repository: localaiLicense: apache-2.0

lfm2-vl-450m
LFM2‑VL is Liquid AI's first series of multimodal models, designed to process text and images with variable resolutions. Built on the LFM2 backbone, it is optimized for low-latency and edge AI applications. We're releasing the weights of two post-trained checkpoints with 450M (for highly constrained devices) and 1.6B (more capable yet still lightweight) parameters. 2× faster inference speed on GPUs compared to existing VLMs while maintaining competitive accuracy Flexible architecture with user-tunable speed-quality tradeoffs at inference time Native resolution processing up to 512×512 with intelligent patch-based handling for larger images, avoiding upscaling and distortion

Repository: localaiLicense: lfm1.0

lfm2-vl-1.6b
LFM2‑VL is Liquid AI's first series of multimodal models, designed to process text and images with variable resolutions. Built on the LFM2 backbone, it is optimized for low-latency and edge AI applications. We're releasing the weights of two post-trained checkpoints with 450M (for highly constrained devices) and 1.6B (more capable yet still lightweight) parameters. 2× faster inference speed on GPUs compared to existing VLMs while maintaining competitive accuracy Flexible architecture with user-tunable speed-quality tradeoffs at inference time Native resolution processing up to 512×512 with intelligent patch-based handling for larger images, avoiding upscaling and distortion

Repository: localaiLicense: lfm1.0

lfm2-1.2b
LFM2‑VL is Liquid AI's first series of multimodal models, designed to process text and images with variable resolutions. Built on the LFM2 backbone, it is optimized for low-latency and edge AI applications. We're releasing the weights of two post-trained checkpoints with 450M (for highly constrained devices) and 1.6B (more capable yet still lightweight) parameters. 2× faster inference speed on GPUs compared to existing VLMs while maintaining competitive accuracy Flexible architecture with user-tunable speed-quality tradeoffs at inference time Native resolution processing up to 512×512 with intelligent patch-based handling for larger images, avoiding upscaling and distortion

Repository: localaiLicense: lfm1.0

liquidai_lfm2-350m-extract
Based on LFM2-350M, LFM2-350M-Extract is designed to extract important information from a wide variety of unstructured documents (such as articles, transcripts, or reports) into structured outputs like JSON, XML, or YAML. Use cases: Extracting invoice details from emails into structured JSON. Converting regulatory filings into XML for compliance systems. Transforming customer support tickets into YAML for analytics pipelines. Populating knowledge graphs with entities and attributes from unstructured reports. You can find more information about other task-specific models in this blog post.

Repository: localaiLicense: lfm1.0

liquidai_lfm2-1.2b-extract
Based on LFM2-1.2B, LFM2-1.2B-Extract is designed to extract important information from a wide variety of unstructured documents (such as articles, transcripts, or reports) into structured outputs like JSON, XML, or YAML. Use cases: Extracting invoice details from emails into structured JSON. Converting regulatory filings into XML for compliance systems. Transforming customer support tickets into YAML for analytics pipelines. Populating knowledge graphs with entities and attributes from unstructured reports.

Repository: localaiLicense: lfm1.0

liquidai_lfm2-1.2b-rag
Based on LFM2-1.2B, LFM2-1.2B-RAG is specialized in answering questions based on provided contextual documents, for use in RAG (Retrieval-Augmented Generation) systems. Use cases: Chatbot to ask questions about the documentation of a particular product. Custom support with an internal knowledge base to provide grounded answers. Academic research assistant with multi-turn conversations about research papers and course materials.

Repository: localaiLicense: lfm1.0

liquidai_lfm2-1.2b-tool
Based on LFM2-1.2B, LFM2-1.2B-Tool is designed for concise and precise tool calling. The key challenge was designing a non-thinking model that outperforms similarly sized thinking models for tool use. Use cases: Mobile and edge devices requiring instant API calls, database queries, or system integrations without cloud dependency. Real-time assistants in cars, IoT devices, or customer support, where response latency is critical. Resource-constrained environments like embedded systems or battery-powered devices needing efficient tool execution.

Repository: localaiLicense: lfm1.0

liquidai_lfm2-350m-math
Based on LFM2-350M, LFM2-350M-Math is a tiny reasoning model designed for tackling tricky math problems.

Repository: localaiLicense: lfm1.0

liquidai_lfm2-8b-a1b
LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. We're releasing the weights of our first MoE based on LFM2, with 8.3B total parameters and 1.5B active parameters. LFM2-8B-A1B is the best on-device MoE in terms of both quality (comparable to 3-4B dense models) and speed (faster than Qwen3-1.7B). Code and knowledge capabilities are significantly improved compared to LFM2-2.6B. Quantized variants fit comfortably on high-end phones, tablets, and laptops.

Repository: localaiLicense: lfm1.0

kokoro
Kokoro is an open-weight TTS model with 82 million parametrs. Despite its lightweight architecture, it delivers comparable quality to larger models while being significantly faster and more cost-efficient. With Apache-licensed weights, Kokoro can be deployed anywhere from production environments to personal projects.

Repository: localaiLicense: apache-2.0

kitten-tts
Kitten TTS is an open-source realistic text-to-speech model with just 15 million parameters, designed for lightweight deployment and high-quality voice synthesis.

Repository: localaiLicense: apache-2.0

qwen-image
We are thrilled to release Qwen-Image, an image generation foundation model in the Qwen series that achieves significant advances in complex text rendering and precise image editing. Experiments show strong general capabilities in both image generation and editing, with exceptional performance in text rendering, especially for Chinese.

Repository: localaiLicense: apache-2.0

qwen-image-edit
Qwen-Image-Edit is a model for image editing, which is based on Qwen-Image.

Repository: localaiLicense: apache-2.0

Page 1 of many