Makes me wonder if one could train a "neural net surgeon" model which can trace ...

		hnuser123456 3 days ago \| parent \| context \| favorite \| on: Training LLMs for honesty via confessions Makes me wonder if one could train a "neural net surgeon" model which can trace activations in another live model and manipulate it according to plain language instructions.