I’m limited to 24GB of VRAM, and I need pretty large context for my use-case (20k+). I tried “Qwen3-14B-GGUF:Q6_K_XL,” but it doesn’t seem to like calling tools more than a couple times, no matter how I prompt it.
Tried using “SuperThoughts-CoT-14B-16k-o1-QwQ-i1-GGUF:Q6_K” and “DeepSeek-R1-Distill-Qwen-14B-GGUF:Q6_K_L,” but Ollama or LangGraph gives me an error saying these don’t support tool calling.
Hmm, Devstral doesn’t call any tools for me in the current stable Ollama version or the current release candidate. Wonder if it’s a bug in ollama or langchain. I’ve since tried “QwQ-32B-GGUF:Q3_K_XL”, and it’s a little better than Qwen3-14B:Q6, but still not quite satisfactory, and is much slower and “thinks” too much.