// model catalog

Validated Models.

No fixed catalog. Operators upload or register the model; sharing is per-project. Every model is served by vLLM behind an OpenAI-compatible /v1.

models
27
families
14
api
/v1, OpenAI-compatible

Xerotier has no fixed model catalog. The models below were uploaded or registered by their project owners and shared publicly. All run on vLLM behind an OpenAI-compatible API.

Gemma Models

Google's lightweight open language models

Models
gemma-3-4b-it
EA5D2652-3D78-4D47-A576-FCC3EA9879F6
XIM Only
Gemma 3 model card Model Page: Gemma Resources and Technical Documentation: [Gemma 3 Technical Report][g3-tech-report] [Responsible Generative AI Toolkit][rai-toolkit] [Gemma on Kaggle][kaggle-gemma] [Gemma on Vertex Model Garden][vertex-mg-gemma3] Terms of Use: [Terms][terms] Authors: Google DeepMind Model Information Summary description and brief definition of inputs and outputs. Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140...
Parameters 4.3B
Context 15K
License Unknown
Architecture gemma3
gemma-3-12b-it
C92FB30F-F677-4F29-92D3-6026BD715F2D
XIM Only Not Deployed
Gemma 3 model card Model Page: Gemma Resources and Technical Documentation: [Gemma 3 Technical Report][g3-tech-report] [Responsible Generative AI Toolkit][rai-toolkit] [Gemma on Kaggle][kaggle-gemma] [Gemma on Vertex Model Garden][vertex-mg-gemma3] Terms of Use: [Terms][terms] Authors: Google DeepMind Model Information Summary description and brief definition of inputs and outputs. Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140...
Parameters 12.2B
Context 32K
License Unknown
Architecture gemma3
gemma-3-27b-it
0422A634-819E-4F77-9899-3598D694BFE9
XIM Only Not Deployed
Gemma 3 model card Model Page: Gemma Resources and Technical Documentation: [Gemma 3 Technical Report][g3-tech-report] [Responsible Generative AI Toolkit][rai-toolkit] [Gemma on Kaggle][kaggle-gemma] [Gemma on Vertex Model Garden][vertex-mg-gemma3] Terms of Use: [Terms][terms] Authors: Google DeepMind Model Information Summary description and brief definition of inputs and outputs. Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140...
Parameters 27.4B
Context Unknown
License Unknown
Architecture gemma3

GLM Models

Zhipu AI's general language models

Models
GLM-4.7-Flash
B0545E60-0048-421A-8EE8-093B126F7340
XIM Only Not Deployed
GLM-4.7-Flash πŸ‘‹ Join our Discord community. πŸ“– Check out the GLM-4.7 technical blog , technical report(GLM-4.5) . πŸ“ Use GLM-4.7-Flash API services on Z.ai API Platform. πŸ‘‰ One click to GLM-4.7 . Introduction GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency. Performances on Benchmarks | Benchmark | GLM-4.7-Flash | Qwen3-30B-A3B-Thinking-2507 | GPT-OSS-20B | |--------------------|---------------|-----------------------------|-------------| | AIME 25 | 91.6 | 85.0 | 91.7 | | GPQA | 75.2 | 73.4 | 71.5 | | LCB v6 | 64.0 | 66.0 | 61.0 | | HLE | 14.4 | 9.8 | 10.9 | | SWE-bench Verified | 59.2 | 22.0 | 34.0 | | τ²-Bench | 79.5 | 49.0 | 47.7 | | BrowseComp | 42.8 |...
Parameters 3.8B
Context 202K
License MIT
Architecture glm4_moe_lite

Granite Models

IBM's enterprise-focused language models

Models
granite-4.1-8b
E304C47E-5880-45ED-830A-5F17F9F8619F
Shared
mof-class3-qualified Granite-4.1-8B Model Summary: Granite-4.1-8B is a 8B parameter long-context instruct model finetuned from Granite-4.1-8B-Base* using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. Granite 4.1 models have gone through an improved post-training pipeline, including supervised finetuning and reinforcement learning alignment, resulting in enhanced tool calling, instruction following, and chat capabilities. Developers: Granite Team, IBM HF Collection: Granite 4.1 Language Models HF Collection Technical Blog: Granite-4.1 Blog GitHub Repository: ibm-granite/granite-4.1-language-models Website: Granite Docs Release Date: April 29th, 2026 License: Apache 2.0 Supported Languages: English, German, Spanish,...
Parameters 11.6B
Context 49K
License Unknown
Architecture granite

Granite Models

IBM's enterprise-focused language models

Models
granite-4.0-h-small
F879E17F-5521-4CCD-A6B6-C47E0036834E
XIM Only Not Deployed
mof-class3-qualified Granite-4.0-H-Small πŸ“£ Update [10-07-2025]: Added a default system prompt* to the chat template to guide the model towards more professional, accurate, and safe* responses. Model Summary: Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base* using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF)* and tool-calling* capabilities, making them more effective in enterprise applications....
Parameters 11.6B
Context 131K
License Unknown
Architecture granitemoehybrid
granite-4.0-h-tiny
C29307CD-71DD-43D4-B3AC-A271BC80BB9B
XIM Only
mof-class3-qualified Granite-4.0-H-Tiny πŸ“£ Update [10-07-2025]: Added a default system prompt* to the chat template to guide the model towards more professional, accurate, and safe* responses. Model Summary: Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from Granite-4.0-H-Tiny-Base* using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF)* and tool-calling* capabilities, making them more effective in enterprise applications. Developers:...
Parameters 1.8B
Context 131K
License Unknown
Architecture granitemoehybrid

Mixture of Experts

Mixture-of-experts architecture models

Models
LFM2-8B-A1B
7E32DB9D-FA97-40B4-A277-6F2D41E2C760
XIM Only
Try LFM β€’ Docs β€’ LEAP β€’ Discord LFM2-8B-A1B LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. We're releasing the weights of our first MoE based on LFM2, with 8.3B total parameters and 1.5B active parameters. LFM2-8B-A1B is the best on-device MoE in terms of both quality (comparable to 3-4B dense models) and speed (faster than Qwen3-1.7B). Code and knowledge capabilities are significantly improved compared to LFM2-2.6B. Quantized variants fit comfortably on high-end phones, tablets, and laptops. Find more information about LFM2-8B-A1B in our blog post. πŸ“„ Model details Due to their small size, we recommend fine-tuning LFM2 models on narrow...
Parameters 1.9B
Context 68K
License LFM Open License v1.0
Architecture lfm2_moe

Llama Models

Meta's open-weight large language models

Models
Llama-4-Scout-17B-16E-Instruct-quantized.w4a16
4103842E-F281-41AF-AB47-7409DCE49B01
XIM Only Not Deployed
Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 Model Overview Model Architecture: Llama4ForConditionalGeneration Input: Text / Image Output: Text Model Optimizations: Activation quantization: None Weight quantization: INT4 Release Date: 04/25/2025 Version: 1.0 Validated on: RHOAI 2.20, RHAIIS 3.0, RHELAI 1.5 Model Developers: Red Hat (Neural Magic) Model Optimizations This model was obtained by quantizing weights of Llama-4-Scout-17B-16E-Instruct to INT4 data type. This optimization reduces the number of bits used to represent weights from 16 to 4, reducing GPU memory requirements by approximately 75%. Weight quantization also reduces disk size requirements by approximately 75%. The llm-compressor library is used for quantization. Deployment This model can be deployed efficiently on...
Parameters 22.2B
Context 10485K
License Unknown
Architecture llama4

Mistral Models

Mistral AI's efficient language models

Models
Ministral-3-14B-Reasoning-2512
85AC319C-D50B-45B2-B9C5-B3A0093A9885
XIM Only Popular
Ministral 3 14B Reasoning 2512 The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 14B can even be deployed locally, capable of fitting in 32GB of VRAM in BF16, and less than 24GB of RAM/VRAM when quantized. Learn more in our blog post and paper. Key Features Ministral 3 14B consists of two main architectural components: 13.5B Language Model 0.4B Vision...
Parameters 18.1B
Context 262K
License Unknown
Architecture mistral3
Ministral-3-3B-Reasoning-2512
973063DC-E3A7-42F8-A0F6-4F3043473E7B
XIM Only Popular
Ministral 3 3B Reasoning 2512 The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized. Learn more in our blog post and paper. Key Features Ministral 3 3B consists of two main architectural components: 3.4B Language Model 0.4B Vision Encoder The Ministral 3 3B Reasoning model offers the following capabilities: Vision: Enables the model to analyze images...
Parameters 4.7B
Context 99K
License Unknown
Architecture mistral3
Devstral-Small-2-24B-Instruct-2512
F7E5ED05-1378-4DD8-9D41-C62CD37404CB
XIM Only
Devstral Small 2 24B Instruct 2512 Devstral is an agentic LLM for software engineering tasks. Devstral Small 2 excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench. This model is an Instruct model in FP8, fine-tuned to follow instructions, making it ideal for chat, agentic and instruction based tasks for SWE use cases. For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we invite companies to reach out to us. Key Features The Devstral Small 2 Instruct model offers the following capabilities: Agentic Coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents....
Parameters 18.1B
Context 82K
License Unknown
Architecture mistral3

Community Models

User-shared models from the Xerotier community

Models
granite-embedding-311m-multilingual-r2
A223E2E0-A790-4776-B103-4FCE643525B5
Shared
Granite-Embedding-311M-Multilingual-R2 Model Summary: Granite-Embedding-311M-Multilingual-R2 is a 311M parameter dense embedding model from the Granite Embeddings collection for high-quality multilingual text embeddings. It produces 768-dimensional vectors with a context length of up to 32,768 tokens. The model supports 200+ languages (based on the multilingual pretraining corpus of the underlying encoder), with enhanced support for 52 languages and programming code that receive explicit retrieval-pair and cross-lingual training. All training data uses permissive, enterprise-friendly licenses, plus IBM-collected and IBM-generated datasets. Granite Embedding 311M Multilingual R2 shows strong performance across multilingual information retrieval benchmarks, code retrieval, long-document...
Parameters 610M
Context 8K
License Unknown
Architecture modernbert
granite-embedding-reranker-english-r2
3B871E57-ED1D-4845-868F-3D538F06B2D5
Shared
granite-embedding-reranker-english-r2 Model Summary: granite-embedding-reranker-english-r2_ is a 149M parameter dense cross-encoder model from the Granite Embeddings collection that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768 based on context length of upto 8192 tokens. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets. The granite-embedding-reranker-english-r2_ model uses a cross-encoder architecture to compute high-quality relevance scores between queries and documents by jointly encoding their text, enabling precise reranking based on contextual alignment. The model is trained...
Parameters 285M
Context 8K
License Unknown
Architecture modernbert
granite-embedding-english-r2
C9776EBA-C4BF-4171-A376-7B43F0874EDE
XIM Only
Granite-Embedding-English-R2 Model Summary: Granite-embedding-english-r2 is a 149M parameter dense biencoder embedding model from the Granite Embeddings collection that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768 based on context length of upto 8192 tokens. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets. The r2 models show strong performance across standard and IBM-built information retrieval benchmarks (BEIR, ClapNQ), code retrieval (COIR), long-document search benchmarks (MLDR, LongEmbed), conversational multi-turn (MTRAG), table retrieval (NQTables, OTT-QA, AIT-QA,...
Parameters 285M
Context 8K
License Unknown
Architecture modernbert

Community Models

User-shared models from the Xerotier community

Models
nomic-embed-text-v1.5
CB042730-1149-40C8-BDB1-7574E33DDC30
XIM Only
nomic-embed-text-v1.5: Resizable Production Embeddings with Matryoshka Representation Learning Blog | Technical Report | AWS SageMaker | Nomic Platform Exciting Update!: nomic-embed-text-v1.5 is now multimodal! nomic-embed-vision-v1.5 is aligned to the embedding space of nomic-embed-text-v1.5, meaning any text embedding is multimodal! Usage Important: the text prompt must* include a task instruction prefix, instructing the model which task is being performed. For example, if you are implementing a RAG application, you embed your documents as searchdocument: and embed your user queries as searchquery: . Notice: From transformers v5.5.0 and sentence transformers v5.3.0, trustremotecode=True will no longer be necessary. This will only be possible with the text-only series as of now. Task...
Parameters 160M
Context 2K
License Unknown
Architecture nomic_bert

Community Models

User-shared models from the Xerotier community

Models
gpt-oss-20b
A2CDFE3C-3F89-44C7-AD8D-D8AB6986E90D
XIM Only Not Deployed
Try gpt-oss Β· Guides Β· Model card Β· OpenAI blog Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of these open models: gpt-oss-120b β€” for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters) gpt-oss-20b β€” for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. [!NOTE] This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model....
Parameters 4.3B
Context Unknown
License Apache-2.0
Architecture Unknown

Phi Models

Microsoft's compact high-performance models

Models
phi-4
96EFB1B3-84A1-4660-AF02-B4E25EDB2A5D
XIM Only Not Deployed
Phi-4 Model Card Phi-4 Technical Report Model Summary | | | |-------------------------|-------------------------------------------------------------------------------| | Developers | Microsoft Research | | Description | phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning. phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures | | Architecture | 14B parameters, dense decoder-only Transformer...
Parameters 17.8B
Context 16K
License MIT
Architecture phi3
Phi-4-mini-instruct
A8F06DE3-8E0F-4684-804E-20BE19BBD37A
XIM Only Not Deployed
πŸŽ‰Phi-4: [mini-reasoning | reasoning] | [multimodal-instruct | onnx]; [mini-instruct | onnx] Model Summary Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning and direct preference optimization to support precise instruction adherence and robust safety measures. πŸ“° Phi-4-mini Microsoft Blog πŸ“– Phi-4-mini Technical Report πŸ‘©β€πŸ³ Phi Cookbook 🏑 Phi Portal πŸ–₯️ Try It Azure, Huggingface πŸš€ Model paper Intended Uses Primary Use Cases The model is intended for broad multilingual commercial and research use. The model...
Parameters 6.1B
Context 131K
License MIT
Architecture phi3

Qwen Models

Alibaba Cloud's multilingual language models

Models
Qwen3-0.6B
BCEF18DA-1F3B-4543-ACD4-B00598CCBD0F
XIM Only
Qwen3-0.6B Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code...
Parameters 781M
Context 40K
License Apache-2.0
Architecture qwen3
Qwen3-14B-AWQ
AC410C0A-F022-44CA-AE56-6CFED22E8E35
XIM Only Not Deployed
Qwen3-14B-AWQ Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code...
Parameters 18.3B
Context 40K
License Apache-2.0
Architecture qwen3
Qwen3-8B
B59EBB93-175E-4B1B-9802-4A9AC213B795
XIM Only Not Deployed
Qwen3-8B Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code...
Parameters 10.9B
Context 40K
License Apache-2.0
Architecture qwen3

Qwen Models

Alibaba Cloud's multilingual language models

Models
Qwen3.5-0.8B
714E40DE-1EC2-4F89-9AA0-A1E15E535C9A
XIM Only
Qwen3.5-0.8B Qwen Chat [!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. In light of its parameter scale, the intended use cases are prototyping, task-specific fine-tuning, and other research or development purposes. Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with unprecedented capability and efficiency....
Parameters 911M
Context 23K
License Apache-2.0
Architecture qwen3_5
Qwen3.6-27B
95B3468A-B3FD-4FB5-939A-A9BB693FB8F4
XIM Only
Qwen3.6-27B Qwen Chat [!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. Qwen3.6 Highlights This release delivers substantial upgrades, particularly in Agentic Coding: the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. Thinking Preservation:...
Parameters 29.4B
Context 87K
License Apache-2.0
Architecture qwen3_5
Qwen3.5-27B-FP8
77BDCA79-81D6-4484-9CF4-93EE22B1B6DC
XIM Only Not Deployed
Qwen3.5-27B-FP8 Qwen Chat [!Note] This repository contains FP8-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original model. Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with...
Parameters 29.4B
Context 262K
License Apache-2.0
Architecture qwen3_5

Qwen Models

Alibaba Cloud's multilingual language models

Models
Qwen3.5-35B-A3B
33F46EFB-365A-4437-B7AF-306F20EE8D16
XIM Only Popular
Qwen3.5-35B-A3B Qwen Chat [!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. [!Tip] For users seeking managed, scalable inference without infrastructure maintenance, the official Qwen API service is provided by Alibaba Cloud Model Studio. In particular, Qwen3.5-Flash is the hosted version corresponding to Qwen3.5-35B-A3B with more production features, e.g., 1M context length by default and official built-in tools. For more information, please refer to the User Guide. Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5...
Parameters 3.7B
Context 138K
License Apache-2.0
Architecture qwen3_5_moe
Qwen3.6-35B-A3B
388E2AB6-AC7F-43DC-9458-E085516F668D
Shared Popular
Qwen3.6-35B-A3B Qwen Chat [!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. Qwen3.6 Highlights This release delivers substantial upgrades, particularly in Agentic Coding: the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. Thinking...
Parameters 3.7B
Context 262K
License Unknown
Architecture qwen3_5_moe