ST-Moe: Designing Stable and Transferable Sparse Expert Models | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		ST-Moe: Designing Stable and Transferable Sparse Expert Models (paperswithcode.com)
		1 point by sharemywin on Feb 21, 2023 \| hide \| past \| favorite \| 1 comment

sharemywin on Feb 21, 2023 [–]

From paper: Our work focuses on these issues and acts as a design guide. We conclude by scaling a sparse model to 269B parameters, with a computational cost comparable to a 32B dense encoder-decoder Transformer (Stable and Transferable Mixture-of-Experts or ST-MoE-32B)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact