On Sat, Dec 31, 2011 at 12:09 AM, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:
On 12/30/11 10:09 PM, Walter Bright wrote:The lower frequency of bugs makes them that much more difficult to spot. This is essentially similar to the UTF16/UCS-2 morass: in a vast majority of the time the programmer may consider UTF16 a coding with one code unit per code point (which is what UCS-2 is). The existence of surrogates didn't make much of a difference because, again, very often the wrong assumption just worked. Well that all didn't go over all that well.
On 12/30/2011 7:30 PM, Jonathan M Davis wrote:
Yes, diligent programmers will generally find such problems, but with the
current scheme, it's _so_ easy to use length when you shouldn't, that
it's
pretty much a guarantee that it's going to happen.
I'm not so sure about that. Timon Gehr's X macro tried to handle UTF-8
correctly, but it turned out that the naive version that used [i] and
.length worked correctly. This is typical, not exceptional.
We need .raw and we must abolish .length and [] for narrow strings.
Andrei