So a new type of neural network that has been proven to work well on regression tasks common in physics? And tested in practice to fit well on elementary algebra and compositions of complex functions. But no evidence at all if it works on the most basic machine learning tasks like MNIST, not to mention language models.
I mean it's great but at the current state it seems better suited for tasks where an explicit formula exists (though not known) and the goal is to predict it on unknown points (and be able to interpret the formula as a side effect). Deep learning tasks are more of a statistical nature (think models with a cross entropy loss - it's statistically predicting the frequency of different choices of the class/next token), it requires a specialized training procedure and it is designed to fit 100% rather than somewhat close (think linear algebra - it won't be good at it). It would very likely take a radically different idea to apply it to deep learning tasks. The recently updated "Author's note" also mentions this: "KANs are designed for applications where one cares about high accuracy and/or interpretability."
It's great but let's be patient before we see this improve LLM accuracy or be used elsewhere.
I mean it's great but at the current state it seems better suited for tasks where an explicit formula exists (though not known) and the goal is to predict it on unknown points (and be able to interpret the formula as a side effect). Deep learning tasks are more of a statistical nature (think models with a cross entropy loss - it's statistically predicting the frequency of different choices of the class/next token), it requires a specialized training procedure and it is designed to fit 100% rather than somewhat close (think linear algebra - it won't be good at it). It would very likely take a radically different idea to apply it to deep learning tasks. The recently updated "Author's note" also mentions this: "KANs are designed for applications where one cares about high accuracy and/or interpretability."
It's great but let's be patient before we see this improve LLM accuracy or be used elsewhere.