I think the more you can shift to compile time the better when it comes to agent...

g947o · 2026-03-02T19:21:08 1772479268

I think Rust is great for agents, for a reason that is rarely mentioned: unit tests are in the same file. This means that agents just "know" they should update the tests along with the source.

With other languages, whether it's TypeScript/Go/Python, even if you explicitly ask agents to write/run tests, after a while agents just forget to do that, unless they cause build failures. You have to constantly remind them to do that as the session goes. Never happens with Rust in my experience.

0x3f · 2026-03-02T19:22:48 1772479368

You can add a callback to e.g. Claude to guarantee it does a cargo check and test.

unshavedyak · 2026-03-02T19:32:09 1772479929

Fwiw i used to do this (and with lints) - it was the only way to make Claude consistent in the early days when i first started using it (~August 2025).

For many months now though, Claude is nearly consistent with both calling test and check/clippy. Perhaps this is due to my global memory file, not sure to be honest.

What i do know, is that i never use those hooks, i have them disabled atm. Why? Because the benefit is almost nonexistent as i mentioned, and the cost is at times, quite high. It means i cannot work on a project piecemeal, aka "only focus on this file, it will not compile and that's okay", and instead forces claude to make complete edits which may be harder to review. Worst of all, i have seen it get into a loop and be unable to exit. Eg a test fails and claude says "that failure is not due to my changes" or w/e, and it just does that.. forever, on loop. Burns 100% of the daily tokens pretty quick if unmonitored.

Fwiw i've not looked to see if there's an alternate way to write hooks. It might be worth having the hook only suggest, rather than forcing claude. Alternatively, maybe i could spawn a subagent to review if stopping claude makes sense.. hmm.

0x3f · 2026-03-02T23:28:50 1772494130

I find this doesn't work automatically for me because the projects I'm on have a lot of conditional compilation feature flags that it doesn't quite understand how to cargo check properly, unless I tell it.

Maybe for your case you could create a /maybe-check command, and run that in the hook? Then specify the conditions under which a check/test is needed in there.

overfeed · 2026-03-03T06:35:53 1772519753

> if you explicitly ask agents to write/run tests, after a while agents just forget to do that

Add a single task using your project's preferred task-runner that performs all the checks you want the agent to adhere to: linting, test coverage, style checks, test, etc, and add a rule in AGENTS.md that agents should always run this tasks after edits, and fix any warnings or errors produced.

Add the same task to your version management's pre-merge checks, in case the agent (or colleague) forgets to check before pushing. This was good practice since before LLMs, but I never was a fan of having such checks to pre-commit hooks.

jimbokun · 2026-03-02T23:10:31 1772493031

Even LLMs know they should write tests but hate doing it.

wakawaka28 · 2026-03-02T21:08:43 1772485723

Unit tests in the same file wastes context and makes the whole thing hard to navigate for humans and machines alike.

dnautics · 2026-03-02T21:15:19 1772486119

nah, the agents jump around files anyways.

J_Shelby_J · 2026-03-02T21:21:32 1772486492

I’ve been doing the least amount of unit tests possible and doing debug asserts instead.

0x3f · 2026-03-02T23:30:05 1772494205

Normally I would put as many invariants in the types as possible, then tests cover the rest. I'm curious how you do this/what you use it for though. Would be cool if you had any examples.

jimbokun · 2026-03-02T23:11:09 1772493069

It’s about the best possible documentation.

wakawaka28 · 2026-03-02T23:36:37 1772494597

It isn't documentation. It is example code, in the best case. That shit belongs in other files, not in the main file. There is also a reason why literate programming never took off in general. Good luck getting anything done when 80% (conservatively) of the stuff you have to scroll through contributes nothing to the actual execution of the program and might actually be giving you false impressions of how things need to be done.

g947o · 2026-03-03T11:47:51 1772538471

I have yet to see a single Rust file where the test comes before source and takes 80% of the file content.

wakawaka28 · 2026-03-03T14:25:34 1772547934

Probably because all the tests are trivial, and people have the bias to not add all the testing that is needed inline with the code.

jaggederest · 2026-03-02T19:18:20 1772479100

Haskell is great, for what it's worth, but as with any language you have to reign in the AI's use of excessive verbosity. It will stack abstractions to the moon even for simple projects, and haskell's strengths for humans in this regard are weaknesses for AI - different weaknesses than other languages, but still, TANSTAAFL

I am trying out building a toy language hosted on Haskell and it's been a nice combo - the toy language uses dependent typing for even more strictness, but simple regular syntax which is nicer for LLMs to use, and under the hood if you get into the interpreter you can use the full richness of Haskell with less safety guardrails of dependent typing. A bit like safe/unsafe Rust.

solomonb · 2026-03-02T19:26:08 1772479568

> Haskell is great, for what it's worth, but as with any language you have to reign in the AI's use of excessive verbosity. It will stack abstractions to the moon even for simple projects, and haskell's strengths for humans in this regard are weaknesses for AI - different weaknesses than other languages, but still, TANSTAAFL

I haven't had this problem with Opus 4.5+ and Haskell. In fact, I get the opposite problem and often wish it was more capable of using abstractions.

jaggederest · 2026-03-02T19:36:56 1772480216

I guess it might be something with the subject matter and how I'm prompting. I prefer somewhat more imperative haskell though so that's probably a taste thing.

siliconc0w · 2026-03-02T19:27:13 1772479633

+1 to Rust - if we're offloading the coding to the clankers, might as well front-load more complexity cost to offload operational cost. Sure, it isn't a particularly ergonomic or simple language but we're not the ones who have to use it.

headcanon · 2026-03-02T20:00:59 1772481659

I've been cruising on rust too, not just because it works great for LLMs but also the great interop:

- I can build SPAs with typescript and offload expensive operations to a rust implementation that targets wasm

- I can build a multi-platform bundled app with Tauri that uses TS for the frontend, rust for the main parts of the backend, and it can load a python sidecar for anything I need python for (ML stuff mainly)

- Haven't dived too much into games but bevy seems promising for making performant games without the overhead of using one of the big engines (first-class ECS is a big plus too)

It ended up solving the problem of wanting to use the best parts of all of these different languages without being stuck with the worst parts.

jnpnj · 2026-03-02T19:32:23 1772479943

Was asking on mastodon if people tried leveraging very concise and high level languages like haskell, prolog with 2025 llms.. I'm really really curious.

synergy20 · 2026-03-02T19:47:36 1772480856

the problem there might be limited training data?

bethekind · 2026-03-02T21:35:20 1772487320

Jane Street had a cool video about how you can address lack of training data in a programming language using llm patching. Video is called "Arjun Guha: How Language Models Model Programming Languages & How Programmers Model Language Models"

The big take away is that you can "patch" llms and steer them to correct answers in less trained programming languages, allowing for superior performance. Might work here. Not a clue how to implement, but stuff to llm-to-doc and the like makes me hopeful

esafak · 2026-03-02T19:53:51 1772481231

So you're saying we should be vibe coding more open source stuff in languages for discerning programmers ;)

sockaddr · 2026-03-02T19:14:03 1772478843

Exactly. Here's my experience using LLMs to produce code:

- Rust: nearly universally compiles and runs without fault.

- Python,JS: very often will run for some time and then crash

The reason I think is type safety and the richness of the compiler errors and warnings. Rust is absolutely king here.

lmf4lol · 2026-03-02T20:18:19 1772482699

I ve just vibed for 2 weeks a pretty complex Python+Next.js app. I've forced Codex into TDD, so everything(!) has to be tested. So far, it is really really stable and type errors haven't been a thing yet.

Not wanting to disagree, I am sure with Rust, it would be even more stable.

9rx · 2026-03-02T19:31:03 1772479863

[flagged]

satvikpendem · 2026-03-02T19:36:20 1772480180

What will you use for dependent types, Idris 2? Lean? None are as popular as Rust especially counting the number of production level packages available.

valenterry · 2026-03-03T08:23:15 1772526195

Scala has dependant types (though inferior than Idris ones) and has the whole jvm ecosystem.

sockaddr · 2026-03-02T20:24:46 1772483086

This is quite sad to see someone react to a comment they disagree with by assuming that different opinion is paid for. I'd love it if you dug into my comment history and found even a shred of evidence that I'm being paid to talk positively about my programming language of choice.

I hope there aren't many of your type on here.

9rx · 2026-03-02T20:43:01 1772484181

All comments are paid for in some way, even if only in "warm fuzzies". If that is sad, why are you choosing to be sad? But outlandish comments usually require greater payment to justify someone putting in the effort. If you're not being paid well, what's the motivation to post things you know don't make any sense to try and sell a brand?

sockaddr · 2026-03-03T05:52:19 1772517139

> If you're not being paid well, what's the motivation to post things you know don't make any sense to try and sell a brand?

Because it works. Because it does in fact make sense despite your frustration with the concept.

9rx · 2026-03-03T07:15:43 1772522143

Right. So why did you deny it earlier?

sockaddr · 2026-03-03T15:54:46 1772553286

I'm answering the part of your question about why I post about something that I find works.

Are you this dense in real life or is this an act?

9rx · 2026-03-03T21:12:57 1772572377

Is a computer dense in real life? Does that not go without saying?

Are you under the impression that HN is some kind of intelligent animal?

chillfox · 2026-03-02T20:56:49 1772485009

Isn’t dependent types replicating the object oriented inheritance problem in the type system?

9rx · 2026-03-02T21:04:14 1772485454

No, unless you mean the problem of over-engineering? In which case, yes, that is a realistic concern. In the real world, tests are quite often more than good enough. And since they are good enough they end up covering all the same cases a half-assed type system is able to assert anyway by virtue of the remaining logic needing to be tested, so the type system doesn't become all that important in the first place.

A half-assed type system is helpful for people writing code by hand. Then you get things like the squiggly lines in your editor and automated refactoring tools, which are quite beneficial for productivity. However, when an LLM is writing code none of that matters. It doesn't care one bit if the failure reports comes from the compiler or the test suite. It is all the same to it.

ses1984 · 2026-03-02T21:09:11 1772485751

I’m not sure they’re saying rust is king of types, they’re saying it’s king of llm targets.

hu3 · 2026-03-02T21:16:46 1772486206

Which it obviously can't be because it has an anemic standard library and depends on creates for basic things like error handling and async.

Not to mention it's one of the slowest compilation of recent languages if not the slowest (maybe Kotlin).

ses1984 · 2026-03-02T21:48:23 1772488103

But there is no language that is best in all of these dimensions (including ones described above).

Everything is a trade-off.

squeegmeister · 2026-03-02T19:15:28 1772478928

Have also wondered how Haskell would be. From my limited understanding it’s one of the few languages whose compiler enforces functional purity. I’ve always liked that idea in theory but never tried the language

ruszki · 2026-03-02T19:21:42 1772479302

You can write in it like in imperative languages. I did it when I first encountered it long time ago, and I didn’t know how to write, or why I should write code in a functional way. It’s like how you can write in an object oriented way in simple C. It’s possible, and it’s a good thought experiment, but it’s not recommended. So, it’s definitely not “enforced” in a strict sense.

squeegmeister · 2026-03-02T20:09:16 1772482156

Isn’t code in Haskell pure by default and you have to use special keywords to have code with side effects?

lock1 · 2026-03-02T21:23:07 1772486587

There's no special keyword, just a "generic" type `IO<T>` defined in standard library which has a similar "tainting" property like `async` function coloring.

Any side effect has to be performed inside `IO<T>` type, which means impure functions need to be marked as `IO<T>` return. And any function that tries to "execute" `IO<T>` side effect has to mark itself as returning `IO<T>` as well.

gf000 · 2026-03-02T21:30:59 1772487059

It's pure even with side effects.

You basically compose a description of the side effects and pass this value representing those to the main handler which is special in that it can execute the side effects.

For the rest of the codebase this is simply an ordinary value you can pass on/store etc.

0x3f · 2026-03-02T19:21:40 1772479300

I think the intersection of FP and current AI is quite interesting. Purity provides a really tightly scoped context, so it almost seems like you could have one 'architect' model design the call graph/type skeleton at a high level (function signatures, tests, perf requirements, etc.) then have implementers fill them out in parallel.

iddan · 2026-03-02T20:04:04 1772481844

Also LLMs don’t mind repeating params for each child call. Pretty neat

dnautics · 2026-03-02T21:14:43 1772486083

> I think the more you can shift to compile time the better when it comes to agents

not born out by evidence. rust is bottom-mid tier on autocoderbenchmark. typescript is marginally bettee than js

shifting to compile time is not necessarily great, because the llm has to vibe its way through code in situ. if you have to have a compiler check your code it's already too late, and the llm does not havs your codebase in its weights, a fetch to read the types of your functions is context expensive since it's nonlocal.

zozbot234 · 2026-03-02T21:18:42 1772486322

> if you have to have a compiler check your code it's already too late

If you're running good agentic AI it can read the compile errors just like a human and work to fix them until the build goes through.

hu3 · 2026-03-02T21:22:56 1772486576

Which is slow and heavy in Rust. All languages have that but faster (and simpler due to no lifetimes).

zozbot234 · 2026-03-02T21:25:08 1772486708

cargo check is fast. It's only slow when the build goes through (barring extreme use of compile-time proc macros, which is rare and crate-specific).

dnautics · 2026-03-02T21:25:24 1772486724

i mean as a first order approximation context (the key resource that seems to affect quality) doesn't depend on real compilation speed, presumably the agent is suspended and not burning context while waiting for compliation

dnautics · 2026-03-02T21:22:47 1772486567

how about not making the error in the first place

hnhn34 · 2026-03-02T23:49:25 1772495365

If you have an LLM that doesn't make errors ever, then you have an ASI, at which point the conversation is meaningless. In the meantime, having a lower error rate but more uncaught errors is less important than making incorrect code impossible to compile, and/or flagged by strict linters.

dnautics · 2026-03-03T19:00:04 1772564404

incorrect. having a higher caught error rate means that you consume more context on the way to your solution which makes for worse results, both by spending more time in the context danger zone, and by losing more on compaction handoffs.

given a system that can ascertain the same level of overall non-business logic errors as one that makes a ton of non-business logic errors that are all catchable, your LLM's ability to correctly implement business logic amid the noise will be greatly impaired along the way.

bensyverson · 2026-03-02T19:10:28 1772478628

I built an agent with Go for the exact reasons laid out in the article, but did consider Rust. I would prefer it to be Rust actually. But the #1 reason I chose Go is token efficiency. My intuitive sense was that the LLM would have to spent a lot of time reasoning about lifetimes, interpreting and fixing compiler warnings, etc.

llimllib · 2026-03-02T19:14:35 1772478875

I've built tools with both Go and Rust as LLM experiments, and it is a real advantage for Go that the test/compile cycle is much faster.

I've been successful with each, I think there's positives and negatives to both, just wanted to mention that particular one that stands out as making it relatively more pleasant to work with.

g947o · 2026-03-02T19:36:55 1772480215

"LLM would have to spend a lot of time reasoning about lifetimes"

Let's set aside the fact that Go is a garbage collected language while Rust is not for now...

Do you prefer to let LLM reason about lifetimes, or debugging subtle errors yourself at runtime, like what happens with C++?

People who are familiar with the C++ safety discussion understand that lifetimes are like types -- they are part of the code and are just as important as the real logic. You cannot be ambiguous about lifetimes yet be crystal clear about the program's intended behavior.

gf000 · 2026-03-02T21:36:51 1772487411

For many (most) types of objects lifetimes can be a runtime property just fine. For e.g. a list, in rust/c/c++ you would have to do an explicit decision how long should it be "alive", meanwhile a managed language's assumption that when it's reachable that is its lifetime is completely correct and it has the benefit of fluidly adapting to future code changes, lessening maintenance costs.

Of course there are types where this is not true (file handlers, connections, etc), and managed languages usually don't have as good features to deal with these as CPP/Rust (raii).

bensyverson · 2026-03-02T19:52:16 1772481136

Fair point, and it depends on whether you're building code to last a decade, or creating a quick proof of concept.

zarzavat · 2026-03-02T19:31:21 1772479881

It's not a waste of time though. Those warnings and clippy lints are there to improve the quality of the code and to find bugs.

As a human I can just decide to write quality code (or not!), but LLMs don't understand when they're being lazy or stupid and so need to have that knowledge imposed on them by an external reviewer. Static analysis is cheap, and more importantly it's automatic. The alternative is to spend more time doing code review, but that's a bottleneck.

0x3f · 2026-03-02T19:13:38 1772478818

I've never actually seen it get a compiler issue arising from lifetimes, so it seems to one-shot that stuff just fine. Although my work is typically middle of the road, non-HFT trading applications, not super low-level.

littlestymaar · 2026-03-02T21:42:32 1772487752

That matches with actual Rust use actually, I've worked with Rust since 2017 on multiple projects and the number of times I've used the lifetime annotation has been very limited.

It's actually rare to have to borrow something and keep the borrow in another object (is where lifetime happens), most (95% at least I'd say) of the time you borrow something and then drop the borrow, or move the thing.

0x3f · 2026-03-02T23:41:31 1772494891

Yes, I basically do everything the lazy/thoughtless way for a first pass. I find in 99% of cases that's already performant enough and matches the intended data flow, but if you ever want to optimize it, you can. The same is also true with the types: you can bash out a prototype very quickly and then tighten them up later, using Clippy to easily find all the shortcuts you took.

bryanlarsen · 2026-03-02T19:23:04 1772479384

It certainly had to iterate on lifetimes prior to Claude 4.5, at least for me. Prior to Claude 4.0 it was pretty bad at Rust.

littlestymaar · 2026-03-02T21:46:08 1772487968

Most LLM sucked at Rust at the beginning because there's much less rust code available on the broad internet.

I suspect the providers started training specifically in it because it appeared proportionally much more in the actual LLM usage (obviously much less than more mainstream languages like Python or JavaScript, but I wouldn't be surprised if there was more LLM queries on Rust than on C, for demographic reasons).

Nowadays even small Qwens are decent at it in one-shot prompts, or at least much better than GPT-4 was.

b40d-48b2-979e · 2026-03-02T19:20:08 1772479208

LLMs don't "reason".

thot_experiment · 2026-03-02T19:29:52 1772479792

Why is this a meaningful distinction to you? What does "reason" mean here? Can we construct a test that cleanly splits what humans do from what LLMs do?

grey-area · 2026-03-02T19:49:34 1772480974

Sure, things like counting the ‘r’s in strawberry, for example (till they are retrained not to make that mistake).

thot_experiment · 2026-03-02T20:11:01 1772482261

There are humans that can't do that but are clearly capable of reasoning. Not a meaningful categorical split.

grey-area · 2026-03-02T23:40:30 1772494830

There are certainly humans with poor reasoning or even incapable of reasoning, I’m not sure what you think that proves?

thot_experiment · 2026-03-03T01:09:42 1772500182

Ok, but if you read my comment you would note that I constructed a category of humans who can reason but cannot count the r's in strawberry.

I think you don't know what it means to reason, and are dismissively claiming AI cannot reason as though it invalidates a point made earlier without even having a sturdy definition in your head. I think for you to say "LLMs can't reason" in this context is essentially a NOP.

grey-area · 2026-03-03T09:43:26 1772531006

It is hard to define reasoning or thinking, these are vague concepts. I use them to indicate there are areas where these machines take obviously wrong decisions, because they are above all probability weighing machines based on a corpus, that is not I hope you would agree thinking, so you must believe there is some emergent properties which constitute thinking since you're so confident these machines are in fact doing that.

AI companies use these terms (thinking, reasoning etc) to try to trick users into anthropomorphising pattern matching machines and so that people believe they are true general intelligence.

I don't think we've reached AGI yet, though we are closer than previously, and I'm skeptical LLMs will be the route - they are impressive, but they are better at tricking humans than at performing complex tasks they have not seen before IME.

Do you think we have seen AGI yet from LLMs? If not how would you define their limitations?

RedNifre · 2026-03-03T13:43:09 1772545389

They don't see the letters, so how could they possibly succeed at that? It's like asking a human how many infrared flowers they see.

grey-area · 2026-03-06T11:54:14 1772798054

I'm pointing out that they don't 'think' or 'reason' like humans, they're very impressive, but I don't think they've reached the bar for thinking yet, as simple logic puzzles or puzzles like this prove (until the LLM authors take note and add special workarounds for those particular use-cases).

I believe most LLMS no longer fail at this, because they've been given the tools to do so (for example use python under the hood to count letters), but it's an important observation because it shows us that they don't think like us.

bensyverson · 2026-03-02T19:53:22 1772481202

Take it up with OpenAI's API designers—it's their term

b40d-48b2-979e · 2026-03-05T13:56:05 1772718965

You are the one repeating their lies.

gf000 · 2026-03-02T21:08:45 1772485725

I absolutely love Rust, but due to the space it occupies there is simply more to specify in code, and more things to get wrong for a stochastic LLM.

Lifetimes are a global property and LLMs are not particularly good at reasoning about them compared to local ones.

Most applications don't need low level memory control, so this complexity is better pushed to runtime.

There are lots of managed languages with good/even stronger type systems than Rust, paired with a good modern GC.

zozbot234 · 2026-03-02T21:21:44 1772486504

> Lifetimes are a global property and LLMs are not particularly good at reasoning about them compared to local ones.

Huh? Lifetime analysis is a local analysis, same as any other kind of type checking. The semantics may have global implications, but exposing them locally is the whole point of having dedicated syntax for it.

gf000 · 2026-03-02T21:47:21 1772488041

> Lifetime analysis is a local analysis, same as any other kind of type checking

That's what the compiler is doing.

The developer (or LLM) is supposed to do the global reasoning so that what they end up writing down makes semantic sense.

Sure, throwing a bunch of variants at it and see what sticks is certainly an approach, but "lifetimes check out" only proves that the resulting code will be memory safe, not that it actually makes sense.

solomonb · 2026-03-02T19:23:45 1772479425

I've been using LLMs (Opus) heavily for writing Haskell, both at work and on personal projects and its shockingly effective.

I wouldn't use it for the galaxy brain libraries or explorations I like to do for my blog but for production Haskell Opus 4.5+ is really good. No other models have been effective for me.

lokl · 2026-03-02T19:40:02 1772480402

What about SPARK? Not enough training data?

chrismanning · 2026-03-02T19:35:12 1772480112

Haskell works pretty well with agents, particularly when the agent is LSP-capable and you set up haskell-language-server. Even less capable models do well with this combo. Without LSP works fine but the fast feedback loop after each edit really accelerates agents while the intent is still fresh in context

cortesoft · 2026-03-02T19:22:28 1772479348

I am guessing there is a balance between a language that has a lot of soundness checks (like Rust) and a language that has a ton of example code to train on (like Python). How much more valuable each aspect is I am not sure.

echelon · 2026-03-02T19:23:41 1772479421

Rust is the best language for AI:

- Rust code generates absolutely perfectly in Claude Code.

- Rust code will run without GC. You get that for free.

- Rust code has a low defect rate per LOC, at least measured by humans. Google gave a talk on this. The sum types + match and destructure make error handling ergonomic and more or less required by idiomatic code, which the LLM will generate.

I'd certainly pick Rust or Go over Python or TypeScript. I've had LLMs emit buggy dynamic code with type and parameter mismatches, but almost never statically typed code that fails to compile.

moritz · 2026-03-02T19:36:30 1772480190

https://arxiv.org/abs/2508.09101

In this benchmark, models can correctly solve Rust problems 61% on first pass — A far cry from other languages such as C# (88%) or Elixir (a “buggy dynamic language”) where they perform best (97%).

I wonder why that is, it’s quite surprising. Obviously details of their benchmark design matter, but this study doesn’t support your claims.

squeegmeister · 2026-03-02T20:07:49 1772482069

This is great, but aug 2025 is almost a lifetime ago with how fast these models are improving. Opus 4.5 came out November 2025 fwiw

xigoi · 2026-03-02T19:31:15 1772479875

The downside is that even simple Rust projects typically use hundreds of dependencies, and this is even worse with LLMs, who don’t understand the concept of “less is more”.

echelon · 2026-03-02T20:23:25 1772483005

Nobody forces dependencies on you. You can control that.

michaelbarton · 2026-03-02T22:19:59 1772489999

I wonder if then Idris would be even better than that since it has even more typing

nesarkvechnep · 2026-03-02T20:20:24 1772482824

Idris would be even better.

thot_experiment · 2026-03-02T19:27:07 1772479627

Of my friend group the two people I think of as standout in terms of getting useful velocity out of AI workflows in non-trivial domains (as opposed to SaaS plumbing or framework slop) primarily use Haskell with massive contexts and tight integration with the dev env to ground the model.