Well, all of the memory is at IO die. I remember AMD docs outright recommend to make processor hide NUMA nodes from the workload as trying to optimize for it might not even do anything for a lot of workloads
That AMD slide (in the conclusion) claims their switching fabric has some kind of bypass mode to improve latency when utilisation is low.
So they have been really optimising that IO die for latency.
NUMA is already workload sensitive, you need to benchmark your exact workload to know if it’s worth enabling or not, and this change is probably going to make it even less worthwhile. Sounds like you will need a workload that really pushes total memory bandwidth to make NUMA worthwhile.