> This is a known limitation with small LLMs (0.6B-1.2B) doing tool calling.
To me this is this nut to crack, wrt tool calling and locally running inference. This seems like a really cool project and I'm going to dive around a little later but if it's hallucinating for something as basic as this makes me think it's more of POC stage right now (to echo other sentiment here).
That's a fair read. Tool calling reliability with sub-4B models is
genuinely the hardest unsolved problem in on-device AI right now.
The inference engine (MetalRT) is production-grade, the pipeline architecture
is solid, but the models at this size are still the weak link for
complex tool routing. Larger model support (where tool calling is
much more reliable) is next on the roadmap. Please stay tuned!
It needs a canonical source of truth, something isolated agents can't provide easily. There are tools out there like specularis that help you do that and keep specs in sync.
One example: I let the agent culminate the essence of all previous discussions into a spec.md file, check it for completeness, and remove all previous context before continuing.
Does fastlane still hang for a little before every command? I used to optimize build pipelines for a large company's iOS teams and it always seemed to stall for a little before doing the work. We eventually moved to Xcode Cloud (mainly to avoid code signing) and ran xcodebuild directly.
A locally running (macOS) iOS build size analysis tool. Been tinkering with Apple Intelligence, trying to implement an on-device size optimization engineer for my users. The small context window for Apple Intelligence has been tricky to get right.
I maintain an app size inspection tool that runs locally on your macOS and added the file inspector (Sunburst chart) for Gmail if anyone is interested in exploring its contents.
As others have pointed out, the main executable is huge (~300MB) and there are a lot of localization files, but not too much traditional asset duplication.
To me this is this nut to crack, wrt tool calling and locally running inference. This seems like a really cool project and I'm going to dive around a little later but if it's hallucinating for something as basic as this makes me think it's more of POC stage right now (to echo other sentiment here).
reply