Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Q* is the optimal (ie, correct for the decision problem) function computing the total expected reward of taking an action from a given state in reinforcement learning.


That is just Q. The asterisk is new


Depends on the notation. Sometimes Q* is used to denote optimality, for example here : https://www.cs.toronto.edu/~jlucas/teaching/csc411/lectures/... , page 31


The "star" in A* means optimal (as in actually proven to be optimal, so they could stop publishing A^[whatever] algorithms). I assume either Q* is considered optimal in a way regular Q-learning isn't, or they're mixing Q-learning with some A*-like search algorithm. (Or someone picked an undeserving name.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: