You're right about parts, which are mostly state machines. The have a defined input and output. Tests are straightforward to implement and adjust.
But recording and replaying matches? Taking screenshots and comparing the output? Just think about it: If you have recorded a match and change the hitpoints of a single creature, the test could possibly fail. And then? Re-record the match?
The same applies to screenshots: What happens if models, sprites or colors change?
In my experience, tests like this are annoying, because:
1) They take a long time to create and adjust/recreate.
2) They fail for minor reasons.
3) It takes time to understand, what such tests even measure, if someone else made them.
4) You need a large, self made framework to support such tests.
5) It takes a long time to run them, because they are time dependent.
6) They hinder you to make large changes.
7) It's cheaper to make some low wage game testers play your game. Or better, make the game early access and let 1000s of players test your game for free, while even making money out of them
Yes, when you are trying to intentionally change the output, you simply regenerate the gold file to be used as reference (and yes, it should be easy). It’s brittle for sure but it does catch unintentional changes and should be used where relevant (if sparingly). There are definitely existing frameworks that do this (eg Jest calls this snapshot testing and has tooling to make it easy).
I’m sorry your experiences with this kind of stuff have been bad. I’ve generally had good experiences in the machine learning space where we used it judiciously where appropriate but didn’t overdo it.
I don’t see how it can ever hinder you though - you can always choose to go “I don’t care that the output has changed dramaticallly - it’s the new ground truth” as long as you communicate that’s what happening in your commit. What it doesn’t let you do is that the output is different every time you run it but that’s generally a positive (randomness should be intentionally injected deterministically).
But recording and replaying matches? Taking screenshots and comparing the output? Just think about it: If you have recorded a match and change the hitpoints of a single creature, the test could possibly fail. And then? Re-record the match?
The same applies to screenshots: What happens if models, sprites or colors change?
In my experience, tests like this are annoying, because:
1) They take a long time to create and adjust/recreate.
2) They fail for minor reasons.
3) It takes time to understand, what such tests even measure, if someone else made them.
4) You need a large, self made framework to support such tests.
5) It takes a long time to run them, because they are time dependent.
6) They hinder you to make large changes.
7) It's cheaper to make some low wage game testers play your game. Or better, make the game early access and let 1000s of players test your game for free, while even making money out of them