I wrote this blog post [link redacted] which seems to be a more brief introduction to some of these concepts. I guess the assistant API has changed the landscape but even that must be using some of these techniques under the hood, so I think it's still fascinating to study.
I used the assistant API for about 2 weeks before I realized I could do a better job with the raw completion API. For me, the Assistant API now feels like training wheels.
The manner in which long threads are managed over time will be domain-specific if we are seeking an ideal agent. I've got methods that can selectively omit data that is less relevant in our specific case. I doubt that OAI's solution can be this precise at scale.
I've noticed the assistents api is a lot slower and the fact you need to "poll" for when a run is completed is annoying.
There a few good points though, you can tweat the system document on the dashboard without needing to re start the app and you can switch which model is being used too.
> the fact you need to "poll" for when a run is completed
This is another good point. If everything happens in one synchronous call chain, it's likely to finish in a few seconds. With polling, I saw some threads take up to a minute.
I guess that's fair, it's more about the concepts. I will say that I would have liked to have read something like it before starting the project, it would have made the journey (which I have still only just started) quite a bit easier.