Parsing XML is relatively trivial--I'd never use regex, of course, but a basic recursive descent parser can do it pretty easily. I mean, the whole point of XML is that it's supposed to be easy to parse and generate!
Namespaces add a wrinkle, but it wasn't that hard to add. And I was able to add namespace aliasing in my API to handle the two separate "standard" namespaces that you're talking about.
But you're right about OPC/OOXML--those are massive specs and even the tiny slice that I'm handling has been error-prone. I haven't dealt with multiple internal files, so that's a future bug waiting for me. The good news is I'm building a nice library of test files for my regression tests!
It really isn't, and rolling your own parser is the diametric opposite of the "do the simplest thing" philosophy.
The XML v1.1 spec is 126 KB of text, and that doesn't even include XML Namespaces, which is a separate spec with 25 KB of text.
XML is only "simple" in the sense of being well-defined, which makes interoperability simple, in some sense. Contrast this with ill-defined or implementation-defined text formats, where it's decidedly not simple to write an interoperable parser.
As an end-user of XML, the simplest thing is to use an off-the-shelf XML parser, one that's had the bugs beaten out of it by millions of users.
There are very few programming languages out that don't have a convenient, full-featured XML parser library ready to use.
Namespaces add a wrinkle, but it wasn't that hard to add. And I was able to add namespace aliasing in my API to handle the two separate "standard" namespaces that you're talking about.
But you're right about OPC/OOXML--those are massive specs and even the tiny slice that I'm handling has been error-prone. I haven't dealt with multiple internal files, so that's a future bug waiting for me. The good news is I'm building a nice library of test files for my regression tests!