Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Split to smaller chunks, summarize them. Then summarize summaries.


You might want to overlap the first pass of chunks, something could get lost at the chunk boundaries. Not any sort of expert on this sort of thing, it just seems like an obvious pitfall for the context length.


I really like this idea. It’s basically applying similar principles as are used in image based nets - i.e. sliding window convolutional kernels - to text.


I built summarize.tech

Yes it's a great idea and I have a version that is basically a convolution over the transcript. It works much better than the current version - it can automatically create cohesive chapters and summaries of those chapters - however, it consumes an order of magnitude more ChatGPT API calls making it uneconomical (for now!)


I'm inspired that this is a side project, given everything you run. Kudos.


Thanks for the kind words. I built it on a few cross-country plane rides and now I mostly just leave it alone. The infrastructure and tooling we have these days is so incredible.


Can you please eli5 the difference of old and new?


Sure. The old one just splits the transcript into 5 minute chunks and summarizes those. The reason this sucks is because each 5 minute chunk could contain multiple topics, or the same topic could be repeated across multiple chunks.

This dumb technique is actually pretty useful for a lot of people though, and has the advantages of being super easy to parallelize and requiring only 1 pass through the data.

The more advanced technique does a pass through large chunks of the transcript to create lists of chapters in each chunk. Then it combines them to a single canonical chapter list with timestamps (it usually takes a few tries for the model to get it right). Then it does a second pass through the transcript, summarizing the content for each chapter.

The end result is a lot more useful, but is way slower and more expensive.


This is the standard practice already




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: