Hacker Newsnew | past | comments | ask | show | jobs | submit | nawazgafar's commentslogin

You beat me to the punch. I wrote a blog post[1] with the exact same title last week! Though, I went into a bit more detail with regard to embedding layers, so maybe my title is not accurate.

1. https://gafar.org/blog/text-to-tokens


Amazing, will have a read!


Author here, that sucks. I'd love to recreate this locally. Would you be willing to share the PDF?


As far as I am aware, the "hanging" issue remains unsolved to this day. The underlying problem is that LLMs sometimes get stuck in a loop where they repeat the same text again and again until they reach the token limit. You can break the loop by setting a repeat penalty, but when your image contains repeated text, such as in tables, the LLM will output incorrect results to prevent repetition.

Here is the corresponding GitHub issue for your default model (Qwen2.5-VL):

https://github.com/QwenLM/Qwen2.5-VL/issues/241

You can mitigate the fallout of this repetition issue to some degree by chopping up each page into smaller pieces (paragraphs, tables, images, etc.) with a page layout model. Then at least only part of the text is broken instead of the entire page.

A better solution might be to train a model to estimate a heat map of character density for a page of text. Then, condition the vision-language model on character density by feeding the density to the vision encoder. Also output character coordinates, which can be used with the heat map to adjust token probabilities.


Author here, I tested it with this PDF of a handwritten doc [1], and it converted both pages accurately.

1. https://github.com/pnshiralkar/text-to-handwriting/blob/mast...


Amazing, can't wait to try it!

FYI, your GitHub link tells me it's unable to render because the pdf is invalid.


Glad to hear it! What types of symbols did it miss?


Will do!


This is super cool, and really intuitive.

Not going to lie, I've put a few hundred hours into playing Cities:Skylines and have never finished a city. One night with this and I've actually made something I'm proud of.

Check my city out!: https://i.imgur.com/10nCtrO.png


haha, thats really cool. glad you liked it


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: