LocalAI - Models

gpt-oss-20b

Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Links

Tags

gpt-oss-120b

Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of the open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fits into a single H100 GPU (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model. Highlights Permissive Apache 2.0 license: Build freely without copyleft restrictions or patent risk—ideal for experimentation, customization, and commercial deployment. Configurable reasoning effort: Easily adjust the reasoning effort (low, medium, high) based on your specific use case and latency needs. Full chain-of-thought: Gain complete access to the model’s reasoning process, facilitating easier debugging and increased trust in outputs. It’s not intended to be shown to end users. Fine-tunable: Fully customize models to your specific use case through parameter fine-tuning. Agentic capabilities: Use the models’ native capabilities for function calling, web browsing, Python code execution, and Structured Outputs. Native MXFP4 quantization: The models are trained with native MXFP4 precision for the MoE layer, making gpt-oss-120b run on a single H100 GPU and the gpt-oss-20b model run within 16GB of memory.

Links

Tags

openai_gpt-oss-20b-neo

These are NEO Imatrix GGUFs, NEO dataset by DavidAU. NEO dataset improves overall performance, and is for all use cases. Example output below (creative), using settings below. Model also passed "hard" coding test too (6 experts); no issues (IQ4_NL). (Forcing the model to create code with no dependencies and limits of coding short cuts, with multiple loops, and in real time with no blocking in a language that does not support it normally.) Due to quanting issues with this model (which result in oddball quant sizes / mixtures), only TESTED quants will be uploaded (at the moment).

Links

https://huggingface.co/DavidAU/Openai_gpt-oss-20b-NEO-GGUF

Tags

huihui-ai_huihui-gpt-oss-20b-bf16-abliterated

This is an uncensored version of unsloth/gpt-oss-20b-BF16 created with abliteration (see remove-refusals-with-transformers to know more about it).

Links

Tags

openai-gpt-oss-20b-abliterated-uncensored-neo-imatrix

These are NEO Imatrix GGUFs, NEO dataset by DavidAU. NEO dataset improves overall performance, and is for all use cases. This model uses Huihui-gpt-oss-20b-BF16-abliterated as a base which DE-CENSORS the model and removes refusals. Example output below (creative; IQ4_NL), using settings below. This model can be a little rough around the edges (due to abliteration) ; make sure you see the settings below for best operation. It can also be creative, off the shelf crazy and rational too. Enjoy!

Links

https://huggingface.co/DavidAU/OpenAi-GPT-oss-20b-abliterated-uncensored-NEO-Imatrix-gguf

Tags

meta-llama-3.1-8b-instruct:grammar-functioncall

This is the standard Llama 3.1 8B Instruct model with grammar and function call enabled. When grammars are enabled in LocalAI, the LLM is forced to output valid tools constrained by BNF grammars. This can be useful for ensuring that the model outputs are valid and can be used in a production environment. For more information on how to use grammars in LocalAI, see https://localai.io/features/openai-functions/#advanced and https://localai.io/features/constrained_grammars/.

Links

Tags

meta-llama-3.1-8b-instruct:Q8_grammar-functioncall

This is the standard Llama 3.1 8B Instruct model with grammar and function call enabled. When grammars are enabled in LocalAI, the LLM is forced to output valid tools constrained by BNF grammars. This can be useful for ensuring that the model outputs are valid and can be used in a production environment. For more information on how to use grammars in LocalAI, see https://localai.io/features/openai-functions/#advanced and https://localai.io/features/constrained_grammars/.

Links

Tags

gpt-oss-20b-esper3.1-i1

**Model Name:** gpt-oss-20b-Esper3.1 **Repository:** [ValiantLabs/gpt-oss-20b-Esper3.1](https://huggingface.co/ValiantLabs/gpt-oss-20b-Esper3.1) **Base Model:** openai/gpt-oss-20b **Type:** Instruction-tuned, reasoning-focused language model **Size:** 20 billion parameters **License:** Apache 2.0 --- ### 🔍 **Overview** gpt-oss-20b-Esper3.1 is a specialized, instruction-tuned variant of the 20B open-source GPT model, developed by **Valiant Labs**. It excels in **advanced coding, software architecture, and DevOps reasoning**, making it ideal for technical problem-solving and AI-driven engineering tasks. ### ✨ **Key Features** - **Expert in DevOps & Cloud Systems:** Trained on high-difficulty datasets (e.g., Titanium3, Tachibana3, Mitakihara), it delivers precise, actionable guidance for AWS, Kubernetes, Terraform, Ansible, Docker, Jenkins, and more. - **Strong Code Reasoning:** Optimized for complex programming tasks, including full-stack development, scripting, and debugging. - **High-Quality Inference:** Uses `bf16` precision for full-precision performance; quantized versions (e.g., GGUF) available for efficient local inference. - **Open-Source & Free to Use:** Fully open-access, built on the public gpt-oss-20b foundation and trained with community datasets. ### 📌 **Use Cases** - Designing scalable cloud architectures - Writing and optimizing infrastructure-as-code - Debugging complex DevOps pipelines - AI-assisted software development and documentation - Real-time technical troubleshooting ### 💡 **Getting Started** Use the standard `text-generation` pipeline with the `transformers` library. Supports role-based prompting (e.g., `user`, `assistant`) and performs best with high-reasoning prompts. ```python from transformers import pipeline pipe = pipeline("text-generation", model="ValiantLabs/gpt-oss-20b-Esper3.1", torch_dtype="auto", device_map="auto") messages = [{"role": "user", "content": "Design a Kubernetes cluster for a high-traffic web app with CI/CD via GitHub Actions."}] outputs = pipe(messages, max_new_tokens=2000) print(outputs[0]["generated_text"][-1]) ``` --- > 🔗 **Model Gallery Entry**: > *gpt-oss-20b-Esper3.1 – A powerful, open-source 20B model tuned for expert-level DevOps, coding, and system architecture. Built by Valiant Labs using high-quality technical datasets. Perfect for engineers, architects, and AI developers.*

Links

https://huggingface.co/mradermacher/gpt-oss-20b-Esper3.1-i1-GGUF

Tags

gpt-oss-20b-claude-4-distill-i1

**Model Name:** GPT-OSS 20B **Base Model:** openai/gpt-oss-20b **License:** Apache 2.0 (fully open for commercial and research use) **Architecture:** 21B-parameter Mixture-of-Experts (MoE) language model **Key Features:** - Designed for powerful reasoning, agentic tasks, and developer applications. - Supports configurable reasoning levels (Low, Medium, High) for balancing speed and depth. - Native support for tool use: web browsing, code execution, function calling, and structured outputs. - Trained on OpenAI’s **harmony response format** — requires this format for proper inference. - Optimized for efficient inference with native **MXFP4 quantization** (supports 16GB VRAM deployment). - Fully fine-tunable and compatible with major frameworks: Transformers, vLLM, Ollama, LM Studio, and more. **Use Cases:** Ideal for research, local deployment, agent development, code generation, complex reasoning, and interactive applications. **Original Model:** [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) *Note: This repository contains quantized versions (GGUF) by mradermacher, based on the original fine-tuned model from armand0e, which was derived from unsloth/gpt-oss-20b-unsloth-bnb-4bit.*

Links

https://huggingface.co/mradermacher/gpt-oss-20b-claude-4-distill-i1-GGUF

Tags

financial-gpt-oss-20b-q8-i1

### **Financial GPT-OSS 20B (Base Model)** **Model Type:** Causal Language Model (Fine-tuned for Financial Analysis) **Architecture:** Mixture of Experts (MoE) – 20B parameters, 32 experts (4 active per token) **Base Model:** `unsloth/gpt-oss-20b-unsloth-bnb-4bit` **Fine-tuned With:** LoRA (Low-Rank Adaptation) on financial conversation data **Training Data:** 22,250 financial dialogue pairs covering stocks (AAPL, NVDA, TSLA, etc.), technical analysis, risk assessment, and trading signals **Context Length:** 131,072 tokens **Quantization:** Q8_0 GGUF (for efficient inference) **License:** Apache 2.0 **Key Features:** - Specialized in financial market analysis: technical indicators (RSI, MACD), risk assessments, trading signals, and price forecasts - Handles complex financial queries with structured, actionable insights - Designed for real-time use with low-latency inference (GGUF format) - Supports S&P 500 stocks and major asset classes across tech, healthcare, energy, and finance sectors **Use Case:** Ideal for traders, analysts, and developers building financial AI tools. Use with caution—**not financial advice**. **Citation:** ```bibtex @misc{financial-gpt-oss-20b-q8, title={Financial GPT-OSS 20B Q8: Fine-tuned Financial Analysis Model}, author={beenyb}, year={2025}, publisher={Hugging Face Hub}, url={https://huggingface.co/beenyb/financial-gpt-oss-20b-q8} } ```

Links

https://huggingface.co/mradermacher/financial-gpt-oss-20b-q8-i1-GGUF

Tags

metatune-gpt20b-r1.1-i1

**Model Name:** MetaTune-GPT20B-R1.1 **Base Model:** unsloth/gpt-oss-20b-unsloth-bnb-4bit **Repository:** [EpistemeAI/metatune-gpt20b-R1.1](https://huggingface.co/EpistemeAI/metatune-gpt20b-R1.1) **License:** Apache 2.0 **Description:** MetaTune-GPT20B-R1.1 is a large language model fine-tuned for recursive self-improvement, making it one of the first publicly released models capable of autonomously generating training data, evaluating its own performance, and adjusting its hyperparameters to improve over time. Built upon the open-weight GPT-OSS 20B architecture and trained with Unsloth's optimized 4-bit quantization, this model excels in complex reasoning, agentic tasks, and function calling. It supports tools like web browsing and structured output generation, and is particularly effective in high-reasoning use cases such as scientific problem-solving and math reasoning. **Performance Highlights (Zero-shot):** - **GPQA Diamond:** 93.3% exact match - **GSM8K (Chain-of-Thought):** 100% exact match **Recommended Use:** - Advanced reasoning & planning - Autonomous agent workflows - Research, education, and technical problem-solving **Safety Note:** Use with caution. For safety-critical applications, pair with a safety guardrail model such as [openai/gpt-oss-safeguard-20b](https://huggingface.co/openai/gpt-oss-safeguard-20b). **Fine-Tuned From:** unsloth/gpt-oss-20b-unsloth-bnb-4bit **Training Method:** Recursive Self-Improvement on the [Recursive Self-Improvement Dataset](https://huggingface.co/datasets/EpistemeAI/recursive_self_improvement_dataset) **Framework:** Hugging Face TRL + Unsloth for fast, efficient training **Inference Tip:** Set reasoning level to "high" for best results and to reduce prompt injection risks. 👉 [View on Hugging Face](https://huggingface.co/EpistemeAI/metatune-gpt20b-R1.1) | [GitHub: Recursive Self-Improvement](https://github.com/openai/harmony)

Links

https://huggingface.co/mradermacher/metatune-gpt20b-R1.1-i1-GGUF

Tags

Model Gallery

Find Your Perfect Model

Filter by Model Type

Browse by Tags

gpt-oss-20b

gpt-oss-120b

openai_gpt-oss-20b-neo

huihui-ai_huihui-gpt-oss-20b-bf16-abliterated

openai-gpt-oss-20b-abliterated-uncensored-neo-imatrix

meta-llama-3.1-8b-instruct:grammar-functioncall

meta-llama-3.1-8b-instruct:Q8_grammar-functioncall

gpt-oss-20b-esper3.1-i1

gpt-oss-20b-claude-4-distill-i1

financial-gpt-oss-20b-q8-i1

metatune-gpt20b-r1.1-i1