Model Catalog - Xerotier

Gemma Models

Google's lightweight open language models

Models

gemma-3-4b-it

EA5D2652-3D78-4D47-A576-FCC3EA9879F6

XIM Only

Gemma 3 model card Model Page: Gemma Resources and Technical Documentation: [Gemma 3 Technical Report][g3-tech-report] [Responsible Generative AI Toolkit][rai-toolkit] [Gemma on Kaggle][kaggle-gemma] [Gemma on Vertex Model Garden][vertex-mg-gemma3] Terms of Use: [Terms][terms] Authors: Google DeepMind Model Information Summary description and brief definition of inputs and outputs. Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140...

Parameters 4.3B

Context 15K

License Unknown

Architecture gemma3

Sign in to Deploy

Details

gemma-3-12b-it

C92FB30F-F677-4F29-92D3-6026BD715F2D

XIM Only Not Deployed

Gemma 3 model card Model Page: Gemma Resources and Technical Documentation: [Gemma 3 Technical Report][g3-tech-report] [Responsible Generative AI Toolkit][rai-toolkit] [Gemma on Kaggle][kaggle-gemma] [Gemma on Vertex Model Garden][vertex-mg-gemma3] Terms of Use: [Terms][terms] Authors: Google DeepMind Model Information Summary description and brief definition of inputs and outputs. Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140...

Parameters 12.2B

Context 32K

License Unknown

Architecture gemma3

Sign in to Deploy

Details

gemma-3-27b-it

0422A634-819E-4F77-9899-3598D694BFE9

XIM Only Not Deployed

Gemma 3 model card Model Page: Gemma Resources and Technical Documentation: [Gemma 3 Technical Report][g3-tech-report] [Responsible Generative AI Toolkit][rai-toolkit] [Gemma on Kaggle][kaggle-gemma] [Gemma on Vertex Model Garden][vertex-mg-gemma3] Terms of Use: [Terms][terms] Authors: Google DeepMind Model Information Summary description and brief definition of inputs and outputs. Description Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models. Gemma 3 models are multimodal, handling text and image input and generating text output, with open weights for both pre-trained variants and instruction-tuned variants. Gemma 3 has a large, 128K context window, multilingual support in over 140...

Parameters 27.4B

Context Unknown

License Unknown

Architecture gemma3

Sign in to Deploy

Details

GLM Models

Zhipu AI's general language models

Models

GLM-4.7-Flash

B0545E60-0048-421A-8EE8-093B126F7340

XIM Only Not Deployed

GLM-4.7-Flash 👋 Join our Discord community. 📖 Check out the GLM-4.7 technical blog , technical report(GLM-4.5) . 📍 Use GLM-4.7-Flash API services on Z.ai API Platform. 👉 One click to GLM-4.7 . Introduction GLM-4.7-Flash is a 30B-A3B MoE model. As the strongest model in the 30B class, GLM-4.7-Flash offers a new option for lightweight deployment that balances performance and efficiency. Performances on Benchmarks | Benchmark | GLM-4.7-Flash | Qwen3-30B-A3B-Thinking-2507 | GPT-OSS-20B | |--------------------|---------------|-----------------------------|-------------| | AIME 25 | 91.6 | 85.0 | 91.7 | | GPQA | 75.2 | 73.4 | 71.5 | | LCB v6 | 64.0 | 66.0 | 61.0 | | HLE | 14.4 | 9.8 | 10.9 | | SWE-bench Verified | 59.2 | 22.0 | 34.0 | | τ²-Bench | 79.5 | 49.0 | 47.7 | | BrowseComp | 42.8 |...

Parameters 3.8B

Context 202K

License MIT

Architecture glm4_moe_lite

Sign in to Deploy

Details

Granite Models

IBM's enterprise-focused language models

Models

granite-4.1-8b

E304C47E-5880-45ED-830A-5F17F9F8619F

Shared

mof-class3-qualified Granite-4.1-8B Model Summary: Granite-4.1-8B is a 8B parameter long-context instruct model finetuned from Granite-4.1-8B-Base* using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. Granite 4.1 models have gone through an improved post-training pipeline, including supervised finetuning and reinforcement learning alignment, resulting in enhanced tool calling, instruction following, and chat capabilities. Developers: Granite Team, IBM HF Collection: Granite 4.1 Language Models HF Collection Technical Blog: Granite-4.1 Blog GitHub Repository: ibm-granite/granite-4.1-language-models Website: Granite Docs Release Date: April 29th, 2026 License: Apache 2.0 Supported Languages: English, German, Spanish,...

Parameters 11.6B

Context 49K

License Unknown

Architecture granite

Sign in to Deploy

Details

Granite Models

IBM's enterprise-focused language models

Models

granite-4.0-h-small

F879E17F-5521-4CCD-A6B6-C47E0036834E

XIM Only Not Deployed

mof-class3-qualified Granite-4.0-H-Small 📣 Update [10-07-2025]: Added a default system prompt* to the chat template to guide the model towards more professional, accurate, and safe* responses. Model Summary: Granite-4.0-H-Small is a 32B parameter long-context instruct model finetuned from Granite-4.0-H-Small-Base* using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF)* and tool-calling* capabilities, making them more effective in enterprise applications....

Parameters 11.6B

Context 131K

License Unknown

Architecture granitemoehybrid

Sign in to Deploy

Details

granite-4.0-h-tiny

C29307CD-71DD-43D4-B3AC-A271BC80BB9B

XIM Only

mof-class3-qualified Granite-4.0-H-Tiny 📣 Update [10-07-2025]: Added a default system prompt* to the chat template to guide the model towards more professional, accurate, and safe* responses. Model Summary: Granite-4.0-H-Tiny is a 7B parameter long-context instruct model finetuned from Granite-4.0-H-Tiny-Base* using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets. This model is developed using a diverse set of techniques with a structured chat format, including supervised finetuning, model alignment using reinforcement learning, and model merging. Granite 4.0 instruct models feature improved instruction following (IF)* and tool-calling* capabilities, making them more effective in enterprise applications. Developers:...

Parameters 1.8B

Context 131K

License Unknown

Architecture granitemoehybrid

Sign in to Deploy

Details

Mixture of Experts

Mixture-of-experts architecture models

Models

LFM2-8B-A1B

7E32DB9D-FA97-40B4-A277-6F2D41E2C760

XIM Only

Try LFM • Docs • LEAP • Discord LFM2-8B-A1B LFM2 is a new generation of hybrid models developed by Liquid AI, specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency. We're releasing the weights of our first MoE based on LFM2, with 8.3B total parameters and 1.5B active parameters. LFM2-8B-A1B is the best on-device MoE in terms of both quality (comparable to 3-4B dense models) and speed (faster than Qwen3-1.7B). Code and knowledge capabilities are significantly improved compared to LFM2-2.6B. Quantized variants fit comfortably on high-end phones, tablets, and laptops. Find more information about LFM2-8B-A1B in our blog post. 📄 Model details Due to their small size, we recommend fine-tuning LFM2 models on narrow...

Parameters 1.9B

Context 68K

License LFM Open License v1.0

Architecture lfm2_moe

Sign in to Deploy

Details

Llama Models

Meta's open-weight large language models

Models

Llama-4-Scout-17B-16E-Instruct-quantized.w4a16

4103842E-F281-41AF-AB47-7409DCE49B01

XIM Only Not Deployed

Llama-4-Scout-17B-16E-Instruct-quantized.w4a16 Model Overview Model Architecture: Llama4ForConditionalGeneration Input: Text / Image Output: Text Model Optimizations: Activation quantization: None Weight quantization: INT4 Release Date: 04/25/2025 Version: 1.0 Validated on: RHOAI 2.20, RHAIIS 3.0, RHELAI 1.5 Model Developers: Red Hat (Neural Magic) Model Optimizations This model was obtained by quantizing weights of Llama-4-Scout-17B-16E-Instruct to INT4 data type. This optimization reduces the number of bits used to represent weights from 16 to 4, reducing GPU memory requirements by approximately 75%. Weight quantization also reduces disk size requirements by approximately 75%. The llm-compressor library is used for quantization. Deployment This model can be deployed efficiently on...

Parameters 22.2B

Context 10485K

License Unknown

Architecture llama4

Sign in to Deploy

Details

Mistral Models

Mistral AI's efficient language models

Models

Ministral-3-14B-Reasoning-2512

85AC319C-D50B-45B2-B9C5-B3A0093A9885

XIM Only Popular

Ministral 3 14B Reasoning 2512 The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 14B can even be deployed locally, capable of fitting in 32GB of VRAM in BF16, and less than 24GB of RAM/VRAM when quantized. Learn more in our blog post and paper. Key Features Ministral 3 14B consists of two main architectural components: 13.5B Language Model 0.4B Vision...

Parameters 18.1B

Context 262K

License Unknown

Architecture mistral3

Sign in to Deploy

Details

Ministral-3-3B-Reasoning-2512

973063DC-E3A7-42F8-A0F6-4F3043473E7B

XIM Only Popular

Ministral 3 3B Reasoning 2512 The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities. This model is the reasoning post-trained version, trained for reasoning tasks, making it ideal for math, coding and stem related use cases. The Ministral 3 family is designed for edge deployment, capable of running on a wide range of hardware. Ministral 3 3B can even be deployed locally, fitting in 16GB of VRAM in BF16, and less than 8GB of RAM/VRAM when quantized. Learn more in our blog post and paper. Key Features Ministral 3 3B consists of two main architectural components: 3.4B Language Model 0.4B Vision Encoder The Ministral 3 3B Reasoning model offers the following capabilities: Vision: Enables the model to analyze images...

Parameters 4.7B

Context 99K

License Unknown

Architecture mistral3

Sign in to Deploy

Details

Devstral-Small-2-24B-Instruct-2512

F7E5ED05-1378-4DD8-9D41-C62CD37404CB

XIM Only

Devstral Small 2 24B Instruct 2512 Devstral is an agentic LLM for software engineering tasks. Devstral Small 2 excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench. This model is an Instruct model in FP8, fine-tuned to follow instructions, making it ideal for chat, agentic and instruction based tasks for SWE use cases. For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we invite companies to reach out to us. Key Features The Devstral Small 2 Instruct model offers the following capabilities: Agentic Coding: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents....

Parameters 18.1B

Context 82K

License Unknown

Architecture mistral3

Sign in to Deploy

Details

Community Models

User-shared models from the Xerotier community

Models

granite-embedding-311m-multilingual-r2

A223E2E0-A790-4776-B103-4FCE643525B5

Shared

Granite-Embedding-311M-Multilingual-R2 Model Summary: Granite-Embedding-311M-Multilingual-R2 is a 311M parameter dense embedding model from the Granite Embeddings collection for high-quality multilingual text embeddings. It produces 768-dimensional vectors with a context length of up to 32,768 tokens. The model supports 200+ languages (based on the multilingual pretraining corpus of the underlying encoder), with enhanced support for 52 languages and programming code that receive explicit retrieval-pair and cross-lingual training. All training data uses permissive, enterprise-friendly licenses, plus IBM-collected and IBM-generated datasets. Granite Embedding 311M Multilingual R2 shows strong performance across multilingual information retrieval benchmarks, code retrieval, long-document...

Parameters 610M

Context 8K

License Unknown

Architecture modernbert

Sign in to Deploy

Details

granite-embedding-reranker-english-r2

3B871E57-ED1D-4845-868F-3D538F06B2D5

Shared

granite-embedding-reranker-english-r2 Model Summary: granite-embedding-reranker-english-r2_ is a 149M parameter dense cross-encoder model from the Granite Embeddings collection that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768 based on context length of upto 8192 tokens. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets. The granite-embedding-reranker-english-r2_ model uses a cross-encoder architecture to compute high-quality relevance scores between queries and documents by jointly encoding their text, enabling precise reranking based on contextual alignment. The model is trained...

Parameters 285M

Context 8K

License Unknown

Architecture modernbert

Sign in to Deploy

Details

granite-embedding-english-r2

C9776EBA-C4BF-4171-A376-7B43F0874EDE

XIM Only

Granite-Embedding-English-R2 Model Summary: Granite-embedding-english-r2 is a 149M parameter dense biencoder embedding model from the Granite Embeddings collection that can be used to generate high quality text embeddings. This model produces embedding vectors of size 768 based on context length of upto 8192 tokens. Compared to most other open-source models, this model was only trained using open-source relevance-pair datasets with permissive, enterprise-friendly license, plus IBM collected and generated datasets. The r2 models show strong performance across standard and IBM-built information retrieval benchmarks (BEIR, ClapNQ), code retrieval (COIR), long-document search benchmarks (MLDR, LongEmbed), conversational multi-turn (MTRAG), table retrieval (NQTables, OTT-QA, AIT-QA,...

Parameters 285M

Context 8K

License Unknown

Architecture modernbert

Sign in to Deploy

Details

Community Models

User-shared models from the Xerotier community

Models

nomic-embed-text-v1.5

CB042730-1149-40C8-BDB1-7574E33DDC30

XIM Only

nomic-embed-text-v1.5: Resizable Production Embeddings with Matryoshka Representation Learning Blog | Technical Report | AWS SageMaker | Nomic Platform Exciting Update!: nomic-embed-text-v1.5 is now multimodal! nomic-embed-vision-v1.5 is aligned to the embedding space of nomic-embed-text-v1.5, meaning any text embedding is multimodal! Usage Important: the text prompt must* include a task instruction prefix, instructing the model which task is being performed. For example, if you are implementing a RAG application, you embed your documents as searchdocument: and embed your user queries as searchquery: . Notice: From transformers v5.5.0 and sentence transformers v5.3.0, trustremotecode=True will no longer be necessary. This will only be possible with the text-only series as of now. Task...

Parameters 160M

Context 2K

License Unknown

Architecture nomic_bert

Sign in to Deploy

Details

Community Models

User-shared models from the Xerotier community

Models

gpt-oss-20b

A2CDFE3C-3F89-44C7-AD8D-D8AB6986E90D

XIM Only Not Deployed

Try gpt-oss · Guides · Model card · OpenAI blog Welcome to the gpt-oss series, OpenAI’s open-weight models designed for powerful reasoning, agentic tasks, and versatile developer use cases. We’re releasing two flavors of these open models: gpt-oss-120b — for production, general purpose, high reasoning use cases that fit into a single 80GB GPU (like NVIDIA H100 or AMD MI300X) (117B parameters with 5.1B active parameters) gpt-oss-20b — for lower latency, and local or specialized use cases (21B parameters with 3.6B active parameters) Both models were trained on our harmony response format and should only be used with the harmony format as it will not work correctly otherwise. [!NOTE] This model card is dedicated to the smaller gpt-oss-20b model. Check out gpt-oss-120b for the larger model....

Parameters 4.3B

Context Unknown

License Apache-2.0

Architecture Unknown

Sign in to Deploy

Details

Phi Models

Microsoft's compact high-performance models

Models

phi-4

96EFB1B3-84A1-4660-AF02-B4E25EDB2A5D

XIM Only Not Deployed

Phi-4 Model Card Phi-4 Technical Report Model Summary | | | |-------------------------|-------------------------------------------------------------------------------| | Developers | Microsoft Research | | Description | phi-4 is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning. phi-4 underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures | | Architecture | 14B parameters, dense decoder-only Transformer...

Parameters 17.8B

Context 16K

License MIT

Architecture phi3

Sign in to Deploy

Details

Phi-4-mini-instruct

A8F06DE3-8E0F-4684-804E-20BE19BBD37A

XIM Only Not Deployed

🎉Phi-4: [mini-reasoning | reasoning] | [multimodal-instruct | onnx]; [mini-instruct | onnx] Model Summary Phi-4-mini-instruct is a lightweight open model built upon synthetic data and filtered publicly available websites - with a focus on high-quality, reasoning dense data. The model belongs to the Phi-4 model family and supports 128K token context length. The model underwent an enhancement process, incorporating both supervised fine-tuning and direct preference optimization to support precise instruction adherence and robust safety measures. 📰 Phi-4-mini Microsoft Blog 📖 Phi-4-mini Technical Report 👩‍🍳 Phi Cookbook 🏡 Phi Portal 🖥️ Try It Azure, Huggingface 🚀 Model paper Intended Uses Primary Use Cases The model is intended for broad multilingual commercial and research use. The model...

Parameters 6.1B

Context 131K

License MIT

Architecture phi3

Sign in to Deploy

Details

Qwen Models

Alibaba Cloud's multilingual language models

Models

Qwen3-0.6B

BCEF18DA-1F3B-4543-ACD4-B00598CCBD0F

XIM Only

Qwen3-0.6B Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code...

Parameters 781M

Context 40K

License Apache-2.0

Architecture qwen3

Sign in to Deploy

Details

Qwen3-14B-AWQ

AC410C0A-F022-44CA-AE56-6CFED22E8E35

XIM Only Not Deployed

Qwen3-14B-AWQ Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code...

Parameters 18.3B

Context 40K

License Apache-2.0

Architecture qwen3

Sign in to Deploy

Details

Qwen3-8B

B59EBB93-175E-4B1B-9802-4A9AC213B795

XIM Only Not Deployed

Qwen3-8B Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features: Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios. Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code...

Parameters 10.9B

Context 40K

License Apache-2.0

Architecture qwen3

Sign in to Deploy

Details

Qwen Models

Alibaba Cloud's multilingual language models

Models

Qwen3.5-0.8B

714E40DE-1EC2-4F89-9AA0-A1E15E535C9A

XIM Only

Qwen3.5-0.8B Qwen Chat [!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. In light of its parameter scale, the intended use cases are prototyping, task-specific fine-tuning, and other research or development purposes. Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with unprecedented capability and efficiency....

Parameters 911M

Context 23K

License Apache-2.0

Architecture qwen3_5

Sign in to Deploy

Details

Qwen3.6-27B

95B3468A-B3FD-4FB5-939A-A9BB693FB8F4

XIM Only

Qwen3.6-27B Qwen Chat [!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. Qwen3.6 Highlights This release delivers substantial upgrades, particularly in Agentic Coding: the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. Thinking Preservation:...

Parameters 29.4B

Context 87K

License Apache-2.0

Architecture qwen3_5

Sign in to Deploy

Details

Qwen3.5-27B-FP8

77BDCA79-81D6-4484-9CF4-93EE22B1B6DC

XIM Only Not Deployed

Qwen3.5-27B-FP8 Qwen Chat [!Note] This repository contains FP8-quantized model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. The quantization method is fine-grained fp8 quantization with block size of 128, and its performance metrics are nearly identical to those of the original model. Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5 represents a significant leap forward, integrating breakthroughs in multimodal learning, architectural efficiency, reinforcement learning scale, and global accessibility to empower developers and enterprises with...

Parameters 29.4B

Context 262K

License Apache-2.0

Architecture qwen3_5

Sign in to Deploy

Details

Qwen Models

Alibaba Cloud's multilingual language models

Models

Qwen3.5-35B-A3B

33F46EFB-365A-4437-B7AF-306F20EE8D16

XIM Only Popular

Qwen3.5-35B-A3B Qwen Chat [!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. [!Tip] For users seeking managed, scalable inference without infrastructure maintenance, the official Qwen API service is provided by Alibaba Cloud Model Studio. In particular, Qwen3.5-Flash is the hosted version corresponding to Qwen3.5-35B-A3B with more production features, e.g., 1M context length by default and official built-in tools. For more information, please refer to the User Guide. Over recent months, we have intensified our focus on developing foundation models that deliver exceptional utility and performance. Qwen3.5...

Parameters 3.7B

Context 138K

License Apache-2.0

Architecture qwen3_5_moe

Sign in to Deploy

Details

Qwen3.6-35B-A3B

388E2AB6-AC7F-43DC-9458-E085516F668D

Shared Popular

Qwen3.6-35B-A3B Qwen Chat [!Note] This repository contains model weights and configuration files for the post-trained model in the Hugging Face Transformers format. These artifacts are compatible with Hugging Face Transformers, vLLM, SGLang, KTransformers, etc. Following the February release of the Qwen3.5 series, we're pleased to share the first open-weight variant of Qwen3.6. Built on direct feedback from the community, Qwen3.6 prioritizes stability and real-world utility, offering developers a more intuitive, responsive, and genuinely productive coding experience. Qwen3.6 Highlights This release delivers substantial upgrades, particularly in Agentic Coding: the model now handles frontend workflows and repository-level reasoning with greater fluency and precision. Thinking...

Parameters 3.7B

Context 262K

License Unknown

Architecture qwen3_5_moe

Sign in to Deploy

Details

Validated Models.

Gemma Models

GLM Models

Granite Models

Granite Models

Mixture of Experts

Llama Models

Mistral Models

Community Models

Community Models

Community Models

Phi Models

Qwen Models

Qwen Models

Qwen Models