How does the number of backslashes grow as the string is repeatedly escaped?
1 \
2 \\
4 \\\\
8 \\\\\\\\
We see it grows exponentially: to escape each backslash, you need two. So I guess I’m surprised that this is a post about a 1.5GB string and not the exponentially increasing work on each navigation causing performance problems, or the session crashing from running out of memory when a user navigates too far. Maybe the service wasn’t being used much yet or there’s some other reason that strings only grew to ridiculous sizes and not oom sizes. I’m curious why it didn’t exhibit sooner. (I saw a somewhat similar problem long ago but then it was more like doing ‘this.json = this.messages_array.to_json()’ on every new message, so merely quadratic. I think it was noticed not long after we first had a lot of messages added to one of the objects)
The solution to this problem has always been known: don't use the same escape character in different formats. With non-overlapping escape characters, you don't need to escape more than once.
So we see that in C, the escape character for strings delivered to the C compiler is \, while the escape character for strings delivered to printf is %.
It always boggles my mind that someone looked at this example of a working design and concluded that the escape character for a regular expression engine should be \, overlapping with the escape character for a literal string.
Computers are fast. I once worked on a service sitting at the top of a deep RPC stack. Stack traces were often folded into RPC error messages to help diagnostics. Well, if you have a failure at one layer, propagated to the next layer, and the next layer, and sometimes replicated for each looked up item (to support partial failure semantics), then you can end up with gigabytes of stack traces in memory for some fraction of time. Very hard to figure out until tasks started dying and leaving behind heap dumps during a wide-spread incident at a lower layer of the stack.
My back-of-the-envelope estimate for how long it would take to construct the 1.5GB string is about 200ms to 300ms (1.5GB write, 0.75 read, extra from growing a buffer for the result, sequential R/W at 10GB/s; copying to resize the buffer at 20). Which is somewhat fast but I’m still surprised that the latency increase wasn’t noticed first.
More importantly, the time and memory doubles with each further navigation which means a user shouldn’t have had to navigate much farther to see much worse performance. Maybe this was a known issue that no one had debugged by then.
My best guess is that screens were arranged in a tree and users would navigate ‘back’ instead of ‘up’ so it was hard to get a long history.
I joined the team long after this was in production, but I can testify that the service has a lot of moving parts and attributing the reliability / efficiency costs for a specific issue was challenging.