Yes. I've got a toy language that I've been working on that uses a rope to store characters internally.
Everything works on "logical characters" - arbitrary vectors of codepoints, as you say. There's still a number of edge cases I have yet to work out as to what exactly is considered a character, though. (I just added support for a single code point encoding multiple characters, for example.)
I'm not so sure that making thing reliant on a font would be the best way to solve that, though. I'd intuitively say that there should be less coupling between rendering choices and internal encoding than that.
The twist is that each node in the rope can only store characters of the same physical length in bytes (and same number of logical characters per physical character). This means that in the typical case (most characters require the same number of bytes to encode) it doesn't add too too much overhead. Still not something I would consider as the base String type for a lower-level language, though.
There are a few simple optimizations that I have yet to do (encode smaller characters as what would be ordinarily be invalid longer encodings, if it makes sense (a single one-byte character in the middle of a bunch of two-byte characters, for example), that sort of thing.)
It seems to work fairly well, so far. Or at least it tends to give "common-sensical" results, and avoids a large chunk of worst-case behavior that standard "prettified character array" strings have.
Everything works on "logical characters" - arbitrary vectors of codepoints, as you say. There's still a number of edge cases I have yet to work out as to what exactly is considered a character, though. (I just added support for a single code point encoding multiple characters, for example.)
I'm not so sure that making thing reliant on a font would be the best way to solve that, though. I'd intuitively say that there should be less coupling between rendering choices and internal encoding than that.
The twist is that each node in the rope can only store characters of the same physical length in bytes (and same number of logical characters per physical character). This means that in the typical case (most characters require the same number of bytes to encode) it doesn't add too too much overhead. Still not something I would consider as the base String type for a lower-level language, though.
There are a few simple optimizations that I have yet to do (encode smaller characters as what would be ordinarily be invalid longer encodings, if it makes sense (a single one-byte character in the middle of a bunch of two-byte characters, for example), that sort of thing.)
It seems to work fairly well, so far. Or at least it tends to give "common-sensical" results, and avoids a large chunk of worst-case behavior that standard "prettified character array" strings have.