Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
DroPE: Extending the Context of LLMs by Dropping Their Positional Embeddings (sakana.ai)
5 points by hardmaru 9 days ago | hide | past | favorite | 1 comment




> While the original motivation for causal masking was not to provide positional information, but instead to have efficient parallelizable training, it turns out that a consistent <bos> token + causal masking is enough to perfectly reconstruct token positions.

I wish this point was explained further instead of being just a footnote. It seems like the central insight that is essential for this technique to work, and it is not obvious to me, maybe because I haven't implemented a transformer from scratch.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: