Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
ST-Moe: Designing Stable and Transferable Sparse Expert Models (paperswithcode.com)
1 point by sharemywin on Feb 21, 2023 | hide | past | favorite | 1 comment


From paper: Our work focuses on these issues and acts as a design guide. We conclude by scaling a sparse model to 269B parameters, with a computational cost comparable to a 32B dense encoder-decoder Transformer (Stable and Transferable Mixture-of-Experts or ST-MoE-32B)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: