Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

solving a small subset of problems in a way noone asked for

What do you mean? Having ROCm fused MoE and MLA kernels as a counterpart to kernels for CUDA is very useful. AMD needs to provide this if they want to keep AMD accelerators competitive with new models.



should the matrix-multiplication at the core of this not be in a core library? Why are generic layers intermixed with LLM-specific kernels when the generic layers are duplicating functionality in torch?

Upstreaming that might actually help researchers doing new stuff vs. the narrow demographic of people speeding LLMs on MI300X's.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: