Heh, I got a bit into hacking on python-docx last year (the original author seems to be focusing on other things than python-docx now) - I have a fork/branch where I tried to more properly implement external hyperlink functionality (https://github.com/icegreentea/python-docx/pull/7)
I realize now staring at this, that I might have broken API a little. You can't do "text = paragraph.text" anymore, but you can do "text = ''.join([run.text for run in paragraph.runs])" instead.
If you're curious at all why it breaks, it's because in the OOXML spec paragraphs are made up of a ordered list of runs or hyperlinks (and hyperlinks can then contain additional runs). The master branch just implements paragraphs as ordered list of runs (and ignores all hyperlinks).
This sounds amazing! Thanks for sharing it, I will try it to see if I can replace it with the main python-docx. For my use case it suffices to have full text of each paragraph (even if it includes a hyperlink) and heading but also be able to have each of them separated when needed.
I realize now staring at this, that I might have broken API a little. You can't do "text = paragraph.text" anymore, but you can do "text = ''.join([run.text for run in paragraph.runs])" instead.
If you're curious at all why it breaks, it's because in the OOXML spec paragraphs are made up of a ordered list of runs or hyperlinks (and hyperlinks can then contain additional runs). The master branch just implements paragraphs as ordered list of runs (and ignores all hyperlinks).