View mode: basic / threaded / horizontal-split · Log in · Help
July 30, 2006
Re: To Walter, about char[] initialization by FF
Unknown W. Brackets wrote:
> Walter mentioned that "0 is a valid UTF-8 character."  Andrew asked what 
> this meant, so I explained that in this case (as you also clarified) it 
> doesn't make any difference.  Regardless, it's a valid [whatever it is] 
> and that meaning is not unclear.

I confess I often misuse the terminology.
July 30, 2006
Re: To Walter, about char[] initialization by FF
On Sat, 29 Jul 2006 20:37:56 -0400, Walter Bright  
<newshound@digitalmars.com> wrote:

> In D, char[] is a UTF-8 sequence. It's well defined, and therefore  
> portable. It supports every human language.

Even body language? :)
July 30, 2006
Re: To Walter, about char[] initialization by FF
On Sun, 30 Jul 2006 03:25:22 -0700, Paolo Invernizzi  
<arathorn@NOSPAM_fastwebnet.it> wrote:

> LOL!!!
>
> ---
> Paolo
>
> Walter Bright wrote:
>
>> "One Encoding to rule them all, One Encoding to replace them,
>> One Encoding to handle them all and in the darkness bind them"
>> -- UTF Tolkien


Okay, that clears things up. Now we know that UTF is a conspiracy for  
world domination. ;)

-JJR
July 30, 2006
Re: To Walter, about char[] initialization by FF
John Reimer wrote:
> On Sun, 30 Jul 2006 03:25:22 -0700, Paolo Invernizzi  
> <arathorn@NOSPAM_fastwebnet.it> wrote:
> 
>> LOL!!!
>>
>> ---
>> Paolo
>>
>> Walter Bright wrote:
>>
>>> "One Encoding to rule them all, One Encoding to replace them,
>>> One Encoding to handle them all and in the darkness bind them"
>>> -- UTF Tolkien
> 
> 
> 
> Okay, that clears things up. Now we know that UTF is a conspiracy for  
> world domination. ;)
> 
> -JJR


And created on the back of a napkin in a New Jersey diner ... way to go, Ken
July 31, 2006
Re: To Walter, about char[] initialization by FF
Maybe I missed the point here, correct me if I misunderstood.

This is how I see the problem with char[] as utf-8 *string*. The length 
of array of chars is not always count of characters, but rather size of 
array in bytes. Which makes no sense for me. For that purpose I would 
like to see separate properties.

For example,
char[] str = "тест";
word "test" in russian - 4 cyrillic characters, would give you 
str.length 8, which make no use of this length property if you not sure 
that string is latin characters only.
July 31, 2006
Re: To Walter, about char[] initialization by FF
Serg Kovrov wrote:
> Maybe I missed the point here, correct me if I misunderstood.

You have understood correctly.

> This is how I see the problem with char[] as utf-8 *string*. The length 
> of array of chars is not always count of characters, but rather size of 
> array in bytes. Which makes no sense for me. For that purpose I would 
> like to see separate properties.

Having char[].length return something other than the actual number of 
char-units would break it's array semantics.

> For example,
> char[] str = "тест";
> word "test" in russian - 4 cyrillic characters, would give you 
> str.length 8, which make no use of this length property if you not sure 
> that string is latin characters only.

It is actually not very often that you need to count the number of 
characters as opposed to the number of (UTF-8) code units. Counting the 
number of characters is also a rather expensive operation. All the 
ordinary operations (searching, slicing, concatenation, sub-string 
search, etc) operate on code units rather than characters.

It is easy to implement your own character count though:

size_t count(char[] arr) {
	size_t c = 0;
	foreach(dchar c;arr)
		c++;
	return c;
}

assert("тест".count() == 4);

Also note that:

assert("тест"d.length == 4);

/Oskar
July 31, 2006
Re: To Walter, about char[] initialization by FF
* Oskar Linde:
> Having char[].length return something other than the actual number
> of char-units would break it's array semantics.

Yes, I see. Thats why I do not like much char[] as substitute for string
type.

> It is actually not very often that you need to count the number
> of characters as opposed to the number of (UTF-8) code units.

Why not use separate properties for that?

> Counting the number of characters is also a rather expensive
> operation. 

Indeed. Store once as property (and update as needed) is better than 
calculate it each time you need it.

> All the ordinary operations (searching, slicing, concatenation, 
> sub-string  search, etc) operate on code units rather than
> characters.

Yes that's tough one. If you want to slice an array - use array unit's 
count for that. But if you want to slice a *string* (substring, search, 
etc) - use character's count for that.

Maybe there should be interchangeable types - string and char[]. For 
different length, slice, find, etc. behaviors? I mean it could be same 
actual type, but different contexts for properties.

And besides, string as opposite to char[] is more pleasant for my eyes =)
July 31, 2006
Re: To Walter, about char[] initialization by FF
Serg Kovrov wrote:
> * Oskar Linde:
>> Counting the number of characters is also a rather expensive
>> operation. 
> 
> Indeed. Store once as property (and update as needed) is better than 
> calculate it each time you need it.

Store where? You can't put it in the array data itself without breaking 
slicing, and you putting it in the reference introduces problems with it 
getting out of date if the array is modified through another reference 
(without enforcing COW, that is).
July 31, 2006
Re: To Walter, about char[] initialization by FF
Serg Kovrov wrote:
> * Oskar Linde:
> 
>> Having char[].length return something other than the actual number
>> of char-units would break it's array semantics.
> 
> 
> Yes, I see. Thats why I do not like much char[] as substitute for string
> type.
> 
>> It is actually not very often that you need to count the number
>> of characters as opposed to the number of (UTF-8) code units.
> 
> 
> Why not use separate properties for that?
> 
>> Counting the number of characters is also a rather expensive
>> operation. 
> 
> 
> Indeed. Store once as property (and update as needed) is better than 
> calculate it each time you need it.
> 
>> All the ordinary operations (searching, slicing, concatenation, 
>> sub-string  search, etc) operate on code units rather than
>> characters.
> 
> 
> Yes that's tough one. If you want to slice an array - use array unit's 
> count for that. But if you want to slice a *string* (substring, search, 
> etc) - use character's count for that.
> 
> Maybe there should be interchangeable types - string and char[]. For 
> different length, slice, find, etc. behaviors? I mean it could be same 
> actual type, but different contexts for properties.
> 
> And besides, string as opposite to char[] is more pleasant for my eyes =)


I say this calls for a proper *standard* String class ... <g>
July 31, 2006
Re: To Walter, about char[] initialization by FF
* Frits van Bommel:
> Serg Kovrov wrote:
>> * Oskar Linde:
>>> Counting the number of characters is also a rather expensive
>>> operation. 
>>
>> Indeed. Store once as property (and update as needed) is better than 
>> calculate it each time you need it.
> 
> Store where? You can't put it in the array data itself without breaking 
> slicing, and you putting it in the reference introduces problems with it 
> getting out of date if the array is modified through another reference 
> (without enforcing COW, that is).

Need to say that I no not have an idea where to store it, neither where 
current length property stored. I'm really glad that compiler do it for me.

As language user I just want to be confident that compiler do it wisely, 
and focus on my domain problems.
2 3 4 5 6 7 8 9 10
Top | Discussion index | About this forum | D home