View mode: basic / threaded / horizontal-split · Log in · Help
October 02, 2006
Re: toString issue
Derek Parnell wrote:
> On Mon, 02 Oct 2006 00:52:44 -0600, Hasan Aljudy wrote:
> 
>> Sean Kelly wrote:
>>> How about toUtf8() for classes and structs :-)
>>>
>>> Sean
>> I think there's a fundamental problem with the way D deals with strings.
>> The spec claims that D natively supports strings through char[], at the 
>> same time, claims that D fully supports Unicode.
>> The fundamental issue is that UTF-8 is one encoding for Unicode strings, 
>> but it's not always the best choice. Phobos mostly only deals with 
>> char[], and mixing code that uses wchar[] with code that uses char[] 
>> isn't very straight forward.
>>
>> Consider the simple case of reading a text file and detecting "words". 
>> To detect a word, you must first recognize letters, no .. not English 
>> letters; letters of any language, and for that purpose, we have 
>> isUniAlpha function. Now, If you encode the string as char[], then how 
>> are you gonna determine whether or not the next character is a Unicode 
>> alpha or not?
>>
>> The following definitely shouldn't work:
>> //assuming text is char[]
>> for( int i = 0; i < text.length; i++ )
>> {
>>      bool isLetter = isUniAlpha( text[i] );
>>      ....
>> }
> 
>   foreach(int i, dchar c; text)
>   {
>        bool isLetter = isUniAlpha( c );
>        ...
>   }
> 
> 

I know, but that's still a work-around. What if you need to iterate back 
and forth? You're gonna need to convert it to dchar[] (or wchar[]).

However, that brings up a good point:
Notice how foreach allows to iterate a string by Unicode characters 
(a.k.a code-points)? Shouldn't this kind of iteration be supported 
outside of foreach as well?
Sure I know, you can write you're own String class and even an iterator, 
but that just proves that string support isn't really/fully built-in.
October 02, 2006
Re: toString issue
Hasan Aljudy wrote:
> Derek Parnell wrote:
>>   foreach(int i, dchar c; text)
>>   {
>>        bool isLetter = isUniAlpha( c );
>>        ...
>>   }
>>
>>
> 
> I know, but that's still a work-around. What if you need to iterate back 
> and forth? You're gonna need to convert it to dchar[] (or wchar[]).
> 
> However, that brings up a good point:
> Notice how foreach allows to iterate a string by Unicode characters 
> (a.k.a code-points)? Shouldn't this kind of iteration be supported 
> outside of foreach as well?

see std.utf.decode and std.utf.stride.

/Oskar
October 02, 2006
Re: toString issue
Oskar Linde wrote:
> Hasan Aljudy wrote:
>> Derek Parnell wrote:
>>>   foreach(int i, dchar c; text)
>>>   {
>>>        bool isLetter = isUniAlpha( c );
>>>        ...
>>>   }
>>>
>>>
>>
>> I know, but that's still a work-around. What if you need to iterate 
>> back and forth? You're gonna need to convert it to dchar[] (or wchar[]).
>>
>> However, that brings up a good point:
>> Notice how foreach allows to iterate a string by Unicode characters 
>> (a.k.a code-points)? Shouldn't this kind of iteration be supported 
>> outside of foreach as well?
> 
> see std.utf.decode and std.utf.stride.
> 
> /Oskar

I have .. and I know the functions are all there. but hey, the C 
standard library also has all sorts of string processing functions.

I'm talking about the "built-in" string type, which doesn't really 
exist, even though the spec claims it does.
Next ›   Last »
1 2
Top | Discussion index | About this forum | D home