LocalLLaMA and tool use with LLMs

Since the first release of llama a really vibrant and active community has formed that is all about local LLM models r/LocalLLaMA. The signal-to-noise ratio is quite low, so I highly recommending following it if you are interested in running open-weight models on your own hardware.

While playing around with these models and developing queryMT one of my main focus was supporting local LLMs through ollama.

Tools

More and more models support tools and using external tools during the interaction with the LLM. MCP brought a lot of attention and buzz in this area.

I’ve found Hyper-mcp project that has a really nice approach to quickly deploy various MCP servers. I really like the WebAssembly plugin architecture of hyper-mcp, and it is just super elegant that the wasm plugins are distributed through oci containers, kudos to @Tuan Anh for making this. I must confess I have taken this approach as well in queryMT.

Using hyper-mcp, or any other MCP server, I’ve tested tool use of various LLM models. I must confess I was way more sceptical of the outcomes, than what I’ve experienced.

Summarizing issues

Any project out there involves managing issues with the project itself. This is not only true for software projects. Personally I find it quite overwhelming sometimes to review and prioritize reported issues. A natural use-case for using LLM models is to help you in summarizing issues, and maybe provide some ideas about how to fix them:

Generating code by using a specific library

queryMT has been inspired a lot by the llm project by written by Simon Willison, kudos for all his works in this area. I highly suggest following his blog, as there’s a lot of nice updates and tricks with LLM models out there. He introduced the concept of fragments. There’s the github fragments that would allow you to do similar actions like I’ve mentioned above. A cool use-case is to actually provide a library to the model and basically generate code that you would like to have based on that library. Although usually the models cut-off date is a year or so behind, most of open-source project would already be part of the model’s knowledge, so actually the code generation based on the project is more useful with private project, that might be on github, but might be on gitlab, or any other repository out there. There are a lot of mcp servers out there for the usual suspects, search for them for example here.

Naturally if the code-base is private, maybe you don’t really want to share your codebase with a cloud provider, and would like to use a local model. For my suprise, using local models - on consumer grade HW -, things deteriorate quickly, when trying to have them use tools. Here’s a list I’ve been experiemnting with so far:

ollama:hf.co/bartowski/Qwen_Qwen3-14B-GGUF:Q8_0
ollama:huggingface.co/bartowski/Qwen_Qwen3-32B-GGUF:Q4_K_L
ollama:huggingface.co/bartowski/Qwen_Qwen3-30B-A3B-GGUF:Q5_K_L
ollama:mistral-small3.1:latest
ollama:qwq:32b

For example you might would like to have the model review the codebase and create new example(s) using the library:

It does start off well, trying to fetch the repository’s structure, but it does not go further fetching content of the files in the repository.

For the reference, I’ve tried using the same model via alibaba’s official API, but even though the qwen3 models are listed, when I would send requests using qwen3 models the API just returns with 400 - as if there’s no such model available.

Using gpt-4o model, the tool use gets little more deeper:

While o4 really takes it all the way:

Project managment

Of course one of the neatest use-case is to have a draft plan for a sprint, and get your model to actually define issues and create them for the project.

What’s Next for queryMT

I’m still polishing queryMT’s core features, but stay tuned for an official launch. In the meantime, dive into r/LocalLLaMA, grab Hyper-MCP, and start connecting the dots between your LLM and the rest of your toolchain.