On 12/30/11 10:09 PM, Walter Bright wrote:

On 12/30/2011 7:30 PM, Jonathan M Davis wrote:

Yes, diligent programmers will generally find such problems, but with the
current scheme, it's _so_ easy to use length when you shouldn't, that
it's
pretty much a guarantee that it's going to happen.

I'm not so sure about that. Timon Gehr's X macro tried to handle UTF-8
correctly, but it turned out that the naive version that used [i] and
.length worked correctly. This is typical, not exceptional.

The lower frequency of bugs makes them that much more difficult to spot. This is essentially similar to the UTF16/UCS-2 morass: in a vast majority of the time the programmer may consider UTF16 a coding with one code unit per code point (which is what UCS-2 is). The existence of surrogates didn't make much of a difference because, again, very often the wrong assumption just worked. Well that all didn't go over all that well.

We need .raw and we must abolish .length and [] for narrow strings.

Andrei

I don't know that Phobos would be an appropriate place for it but offering some easy to access string data containing extensive and advanced unicode which users could easily add to their programs unit tests may help people ensure proper unicode usage. Unicode seems to be one of those things where you either know it really well or you know just enough to get yourself in trouble so having test data written by unicode experts could be very useful for the rest of us mortals.