Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why does it matter in this case if GPT-3 was trained compute optimally or not? Are you saying that the over $100 million training cost is amount of training necessary to make a 175B parameter model compute optimal? And if they are the name number of parameters, why is there a greater latency with GPT-4?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: