If the user manual fits into the context window, existing LLMs can already do an...

If the user manual fits into the context window, existing LLMs can already do an OK-but-not-great job. Not previously heard of Tamarin, quick google suggests that's a domain where the standard is theoretically "you need to make zero errors" but in practice is "be better than your opponent because neither of you is close to perfect"? In either case, have you tried giving the entire manual to the LLM context window?

If the new system can be interacted with in a non-destructive manner at low cost and with useful responses, then existing AI can self-generate the training data.

If it merely takes a year, businesses will rush to get that training data even if they need to pay humans for a bit: Cars are an example of "real data is expensive or destructive", it's clearly taking a lot more than a year to get there, and there's a lot of investment in just that.

Pay 10,000 people USD 100,000 each for a year, that billion dollar investment then gets reduced to 2.4 million/year in ChatGPT Plus subscription fees or whatever. Plenty of investors will take that deal… if you can actually be sure it will work.