It works in this instance. On this run. It is not guaranteed to work next time. There is a error percentage here that makes it _INEVITABLE_ that eventually, with enough executions, the validation will pass when it should fail.
It will choose not to pass this to the validator, at some point in the future. It will create its own validator, at some point in the future. It will simply pretend like it did any of the above, at some point in the future.
This might be fine for your B2B use case. It is not fine for underlying infrastructure for a financial firm or communications.
Every time the LLM uses this tool, the response schema is validated--deterministically. The LLM will never see a non-integer value as output from the tool.
I write these as part of my job, I know how they work. I'm not going to spend more time explaining to you (and demonstrating!) what is in the spec. Read the spec and let the authors know that they don't understand what they wrote. I've run out of energy in this conversation.
llm tool call -> mcp client validates the schema -> mcp client calls the tool -> mcp server validates the schema -> mcp server responds with the result -> mcp client passes the tool result into llm
They often do fail, at the client level you can just feed the schema validation error message back into the LLM and it corrects itself most of the time.
If not the LLM throws itself into a loop until its caller times it out and it sends an error message back to the user.
At the server level it's just a good old JSON API at this point, and the server would send the usual error message it would send out to anyone.
It works in this instance. On this run. It is not guaranteed to work next time. There is a error percentage here that makes it _INEVITABLE_ that eventually, with enough executions, the validation will pass when it should fail.
It will choose not to pass this to the validator, at some point in the future. It will create its own validator, at some point in the future. It will simply pretend like it did any of the above, at some point in the future.
This might be fine for your B2B use case. It is not fine for underlying infrastructure for a financial firm or communications.