Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The P4's main problem wasn't so much the 30 stage pipeline. It was the fact that the CPU went off into la-la land for 4000+ cycles when replays occurred. The trigger could be anything as simple as a misaligned load or microcode for an 8 byte rep ; movs or a page fault. As a result, the performance of the design was very fragile and too sensitive to extremely small changes in code and data layout, in large part because of how the features to support high clock speeds all fit together. Performance tuning could often get substantial improvements for complex code paths, but the optimizations were frequently useless for other microarchitectures that didn't have the same glass jaws.

Hyperthreading was much less of a concern given that threading of software was only ramping up for mainstream x86.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: