Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The real world is a space of continuous actions. To this day Q algorithms have been ones of discrete action outputs. I'd be surprised if a Q algorithm could handle the huge action space of language. Honestly its weird they'd consider the Q family. I figured we were done with that after PPO performed so well.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: