There are different extractors/services, and you can toggle them pretty easily. By default it screenshots everything, exports a PDF, saves like 4 different HTML copies and submits the link to the wayback machine. It also tries to extract important text, and stores that separately. You could easily configure it to only extract text, turn off some HTML extractors, or disable the PDF and screenshot captures if you want to prioritize disk space.
That probably depends on the scope of what you're looking to archive. If you're looking to make up local backup of your bookmarks folder (as one of the intentions seems to be), probably not an unreasonable amount of storage. Maybe a few GB at most(if you have a moderate to large bookmarks folder), depending on how many sites/heavy the sites are?
That is an insane amount of storage for so few links. Is your setup somehow very greedy?
Saving article only view (images + text) should probably do better
I suspect your numbers come from JavaScript and css, etc? Is there a way for archivebox to not download react 5000 times but share source files? Most likely custom bundles that sites compile will not make this possible most of the time. Just thinking out loud here.
It's recommended to run it on a compressed filesystem like ZFS. On mine it's using ~75GB for ~3000 URLs. It varies greatly depending on the content, usually the vast majority of storage is from video/audio ripped with youtube-dl.