June 29, 2004
On Tue, 29 Jun 2004 09:50:35 +0000 (UTC), Arcane Jill wrote:

> In article <cbr9e5$vai$1@digitaldaemon.com>, Derek Parnell says...
> 
>>Because that's not what is being meant. I'd like to differentiate between INITIALIZED and UNINITIALIZED vectors.
> 
> Why?
> 
> D's dynamic arrays are the same thing as C++ std::vectors (as I'm sure you realize). In C++, there is no such thing as an uninitialized vector. Why on Earth would you want them in D?
> 

I don't use C++, so I'm not aware of what std::vector does or does not provide.

Ok, off the top of my head...

I'm writing a library that will be used by other coders. It has a function that accepts a dynamic array. A zero-length array is a valid parameter. The caller however can pass an uninitialized parameter to tell my function that the user wishes to use the default values instead of supplying a value.

In short, an uninitialized variable contains information - namely the fact that it *is* uninitialized. And that information could be utilized by a coder - if they had the chance.

> 
>>This non-existant thing is a
>>red-herring. 'empty' means initialized and length of zero. 'non-existant'
>>means not initialized yet.
> 
> Yeah - but nobody has yet answered WHY? Why would ANYONE want to allow uninitialized array handles (as opposed to array content) to exist in D. It makes no sense.

Ok, but it does to me. Sorry I can't seem to be able to explain why.

> Please, can someone who is arguing in favor of allowing a distinction between initialized and unintialized dynamic array handles, explain exactly why you want such a distinction to exist?

Apparently not; sorry.

-- 
Derek
Melbourne, Australia
June 29, 2004
In article <12vwf4nkzjzxa.17ai9mojp3dpz$.dlg@40tude.net>, Derek says...
>
>Ok, off the top of my head...
>
>I'm writing a library that will be used by other coders. It has a function that accepts a dynamic array. A zero-length array is a valid parameter. The caller however can pass an uninitialized parameter to tell my function that the user wishes to use the default values instead of supplying a value.

I'd use two functions for this:
#    f(uint[] a);   // means, use the information in a
#    f();           // means, use the default value

..but only if an empty array was NOT the default. In many cases, I could probably get away with an empty array BEING the default, in which case, I could simply do:

#    f(uint[] a = null);


>In short, an uninitialized variable contains information - namely the fact that it *is* uninitialized.

It's a nice argument, but it could be applied equally well to ANY types. If I were supremely in favor of the notion that uninitializedness carries information (which I'm not), I might argue as follows:

>I'm writing a library that will be used by other coders. It has a function
>that accepts a bit. Zero is a valid parameter. The
>caller however can pass an uninitialized parameter to tell my function that
>the user wishes to use the default values instead of supplying a value.

If I believed that, I'd be arguing for a distinction between an uninitialized bit, and a bit containing zero. I happen not to believe that, however.



>> Why would ANYONE want to allow
>> uninitialized array handles (as opposed to array content) to exist in D. It
>> makes no sense.

>Ok, but it does to me. Sorry I can't seem to be able to explain why.

Yeah, human language is a bummer. Someone ought to invent telepathy.

Jill


June 29, 2004
Arcane Jill wrote:
> In article <12vwf4nkzjzxa.17ai9mojp3dpz$.dlg@40tude.net>, Derek says...
> 
>>Ok, off the top of my head...
>>
>>I'm writing a library that will be used by other coders. It has a function
>>that accepts a dynamic array. A zero-length array is a valid parameter. The
>>caller however can pass an uninitialized parameter to tell my function that
>>the user wishes to use the default values instead of supplying a value.
> 
> 
> I'd use two functions for this:
> #    f(uint[] a);   // means, use the information in a
> #    f();           // means, use the default value
> 
> ..but only if an empty array was NOT the default. In many cases, I could
> probably get away with an empty array BEING the default, in which case, I could
> simply do:
> 
> #    f(uint[] a = null);
Sure, but it sucks if there's a lot of them, and is impossible if the function is variadic.
The ability to pass null to a function is very useful, I've switched from structs to classes more than once for this reason.

Sam
June 29, 2004
Arcane Jill wrote:

> Such a pointer is never used for reading OR writing. It /is/, however, used in pointer comparison expressions, and in such context, is perfectly meaningful, and safe.

True, you have a point there - I really don't know what to think about it.

> But anyway, Farmer tells me I can write cast(elementtype*)a+n, so I'm
> happy.

Well - that's a workaround but not a clean solution.

June 29, 2004
In article <cbrfn4$1805$1@digitaldaemon.com>, Sam McCall says...

>[RANT]
>IMO, D (language, not libraries) isn't _really_ trying to be
>fully-unicode at all.
>What is the purpose of a char/wchar variable? How often do you actually
>need to be directly manipulating UTF8/16 fragments? (Hint: in a
>unicode-based language with good libraries, almost never).

Maybe not, but you still need something to store them in. Even if you let a library do all your UTF-8 work for you (which you should), then you still a type designed to contain such sequences. In D, a char array is that type.

In other words, the type char exists in order that the type char[] might exist. I don't have a problem with that.


>*IF* D is going to be fully-unicode, that does have performance impacts. A single character must _always_ go in a dchar variable. So what is the advantage in having strings being char[] arrays?

Space.


>("knowing the encoding" doesn't count, the user shouldn't have to care).

In a strongly typed language, that would be true, but D is not a strongly typed language. Walter is on record as stating that all char types including dchar can be freely used as integers. If that's going to be true, you MUST care about the encoding.



>IMO, strings NEED to:
>	* Have only one type, or one base type.

And, to take that reasoning further, it should have other interesting properties too, like it should be IMPOSSIBLE IN ANY CIRCUMSTANCE to end up with a char containing a value outside the range U+000000 to U+10FFFF inclusive. However, I don't see this happening in D. The reason being that even a dchar is not a character in the Unicode sense. It is a UTF-32 encoding of a character. (The minor technical difference being that dchar values above 0x10FFFF exist, but are invalid, whereas Unicode characters beyond U+10FFFF do not even exist).



>	* Expose character data as _characters_, not fragments.
>This means characters accessed must be dchars, indexing must be
>character, not fragment-based.

That depends on your point of view. Unicode may be viewed on many levels. I'm sure I could hold a reasonable argument in which I insisted that string data should be exposed as _glyphs_, not characters (characters are, after all, merely glyph fragments). Glyphs are what you see. If a string contains an e-acute glyph, should your application really /care/ which characters compose that glyph?

Somewhere along the line, you have to face the bottom level. That level is the level of character encoding. Language support is given to the encoding level. For anything above that, you use libraries. If such libraries don't exist yet, we can write them.




>The "char" type should be 32 bits wide. Anything else is confusing.

21 bits wide, and limited to the range 0-0x10FFFF. Anything else is confusing. But this is D, and D is practical.


>Now flame on, I'm sure that's not going to be too popular ;-)

Actually, I loved it, and I'm not flaming (and I hope nobody does). You've made some excellent observations. But it's way too late to shape D that way now. In the future, the may well be languages which handle characters as true, pure, Unicode characters, but the world isn't fully Unicode-aware yet.

To give an example of what I mean: Suppose you publish a web site containing a few musical symbols and a few exotic math symbols. (All valid Unicode). The sad fact is, such a website won't display properly on most people's browsers. To get them to display properly, it is currently the responsibility of VIEWERS (rather than publishers) of web sites, to "obtain", somehow, the relevant fonts to make it work. Usually, obtaining such fonts costs money, so who's going to bother? It'd be like buying a book and opening it to find half the characters looking like black blobs until you pay more money to a font-designer. And so, web site designers tend NOT to use such characters on their web sites, prefering gif images which everyone can view. It's a vicious circle.

In short, the world is not Unicode yet, and it's frustrating. Bits of it are still trying to catch up with other bits. Sometimes you just want scream at the planet to get its act together right now. But we have to be realistic.

And realistically, things /are/ changing - but slowly. What D is doing is moving in the right direction. The shift to full Unicode support in all things is a long way off yet, and to get there, we must move in small steps.

Defining a char as a UTF-8 fragment may be a small step, but it is a very important and valuable one. At least we don't say "a char is a character in some unspecified encoding", like some other languages do.

Nice post, by the way. I enjoyed reading it.

Jill


June 29, 2004
> Frankly, yes, I use -1 as a "magic value" all the time, and do all sorts of ugly things when negative numbers are perfectly valid. This is

That's true. In Standard ML you could do

val index : 'a -> int option

Then if 'a exists return SOME(x), if not, return NONE. If a function has a an option type as a domain it has to deal with both cases.

In D, you'd either use a magic value like -1 or encapsulate values in a class; then null is NONE and not null is SOME.

> I think arrays should become fully reference types, for the same reason as strings above. Yes, this would probably mean double indirection, arrays would be a pointer to the (length,data pointer) struct that they currently are.

But you can go ahead and create a class for lists, no problem at all. Neither Phobos nor DTL has fully hatched yet, so we'll see what happens.


June 29, 2004
In article <cbrgtm$19gj$1@digitaldaemon.com>, Sam McCall says...
>
>PS: AJ, I'm not sure if you read the forums at dsource,

I do, but less frequently than this one as it's a slow turnover list. I get notified when new posts are added to existing threads, but not when new threads are added.


>I posted a
>couple of deimos bugs:
>http://dsource.org/forums/viewtopic.php?t=224

Okay, I'm on it. I'll let you know when they're fixed.

Maybe we could start a "Bugs" thread on Deimos. That way I'll always get notified when anyone adds to it.

Jill


June 29, 2004
Arcane Jill wrote:

> In article <cbrfn4$1805$1@digitaldaemon.com>, Sam McCall says...
> 
> 
>>[RANT]
>>IMO, D (language, not libraries) isn't _really_ trying to be fully-unicode at all.
>>What is the purpose of a char/wchar variable? How often do you actually need to be directly manipulating UTF8/16 fragments? (Hint: in a unicode-based language with good libraries, almost never).
> 
> 
> Maybe not, but you still need something to store them in. Even if you let a
> library do all your UTF-8 work for you (which you should), then you still a type
> designed to contain such sequences. In D, a char array is that type.
> 
> In other words, the type char exists in order that the type char[] might exist.
> I don't have a problem with that.
Sure, but given that the "user" shouldn't be touching chars without realising that they're more complicated than in C, byte[] would do?
Still, I'm not fussed about this.

>>*IF* D is going to be fully-unicode, that does have performance impacts. A single character must _always_ go in a dchar variable. So what is the advantage in having strings being char[] arrays?
> 
> 
> Space.
Sorry, I didn't mean char[] as opposed to dchar[], I meant char[] as opposed to something more opaque. The reasoning for not having a string class, IIRC, is "strings are lists of characters". Well, chars aren't characters.

>>("knowing the encoding" doesn't count, the user shouldn't have to care).
> In a strongly typed language, that would be true, but D is not a strongly typed
> language. Walter is on record as stating that all char types including dchar can
> be freely used as integers. If that's going to be true, you MUST care about the
> encoding.
If you don't use them as integers, then you don't have to care.
I'm not saying it shouldn't be well-defined, but Java doesn't require the user to understand the intricacies of unicode encodings to manipulate strings.
(Yes, java has efficiency problems with strings and presumably some problems with wide unicode characters due to a 16 bit char type, but I think that still makes sense).

>>IMO, strings NEED to:
>>	* Have only one type, or one base type.
> 
> 
> And, to take that reasoning further, it should have other interesting properties
> too, like it should be IMPOSSIBLE IN ANY CIRCUMSTANCE to end up with a char
> containing a value outside the range U+000000 to U+10FFFF inclusive. However, I
> don't see this happening in D. The reason being that even a dchar is not a
> character in the Unicode sense. It is a UTF-32 encoding of a character. (The
> minor technical difference being that dchar values above 0x10FFFF exist, but are
> invalid, whereas Unicode characters beyond U+10FFFF do not even exist).
Okay, I didn't realise dchars were 21 bits wide... if there's a way of doing this that's efficient, that'd be cool, dchar (or "char") could be 21 bits. If it's going to be hopelessly slow, you have to trust the programmer to some extent, what about "any library operation involving an out-of-range dchar is undefined"?

> That depends on your point of view. Unicode may be viewed on many levels. I'm
> sure I could hold a reasonable argument in which I insisted that string data
> should be exposed as _glyphs_, not characters (characters are, after all, merely
> glyph fragments). Glyphs are what you see. If a string contains an e-acute
> glyph, should your application really /care/ which characters compose that
> glyph?
Probably not, although if reading an encoded string and then writing it again doesn't produce the same byte-output, I'm sure I could find a contrived example... copy-pasting text invalidating a digital signature?
Either would be much better than what we've got now, and I think character is more likely (though still spectacularly unlikely), because it has an obvious, efficient representation (32 bit unsigned number). Am I right in assuming a glyph can be fairly complicated?

> Somewhere along the line, you have to face the bottom level. That level is the
> level of character encoding. Language support is given to the encoding level.
> For anything above that, you use libraries. If such libraries don't exist yet,
> we can write them.
Yeah. It's just a bit disappointing after hearing "Strings are character arrays and everything about them makes sense" to realise that you either have to grok UTF-N or treat these "characters" as opaque... the advantages over a class are gone, and a class has reference semantics and member functions.

>>The "char" type should be 32 bits wide. Anything else is confusing. 
> 21 bits wide, and limited to the range 0-0x10FFFF. Anything else is confusing.
It clearly is, because I assumed a unicode character was 32 bits wide, on the basis that that's what D had taught me :-\
> But this is D, and D is practical.
If it's going to be horribly inefficient to make it 21 bits, have the spec say "it's at least 21 bits" and alias it to uint.

> Actually, I loved it, and I'm not flaming (and I hope nobody does). You've made
> some excellent observations. But it's way too late to shape D that way now. In
> the future, the may well be languages which handle characters as true, pure,
> Unicode characters, but the world isn't fully Unicode-aware yet.
Yeah, it's the partly-there that's frustrating... my selfish side would be happy with just ASCII ;-). It just seems sometimes that if it's not easy and consistent to make things unicode-friendly, it won't happen. Especially in places where ASCII works fine, that's certainly easy and consistent! The current way seems to suggest that officially it's all unicode and happy, but (don't tell anyone) feel free to use ascii and assume chars are characters if you want. The standard library even does this, in std.string no less.

> To give an example of what I mean: Suppose you publish a web site containing a
> few musical symbols and a few exotic math symbols. (All valid Unicode). The sad
> fact is, such a website won't display properly on most people's browsers. To get
> them to display properly, it is currently the responsibility of VIEWERS (rather
> than publishers) of web sites, to "obtain", somehow, the relevant fonts to make
> it work. Usually, obtaining such fonts costs money, so who's going to bother?
> It'd be like buying a book and opening it to find half the characters looking
> like black blobs until you pay more money to a font-designer. And so, web site
> designers tend NOT to use such characters on their web sites, prefering gif
> images which everyone can view. It's a vicious circle.
Yeah, fonts are a problem. My ideal world would have a (huge!) complete system default font (or one each for serif, sans, and mono) supplied with the OS, that would be the fallback for nonexistant characters.

> And realistically, things /are/ changing - but slowly. What D is doing is moving
> in the right direction. The shift to full Unicode support in all things is a
> long way off yet, and to get there, we must move in small steps.
Yes. What gets me is that in a 5 years we'll (hopefully) be far enough down the unicode road that D's approach will seem backward, and I'll have to wait for someone to reinvent a similar language, with a more thorough unicode integration.
Ah well, maybe we'll get a strong boolean next time <g>

> Defining a char as a UTF-8 fragment may be a small step, but it is a very
> important and valuable one. At least we don't say "a char is a character in some
> unspecified encoding", like some other languages do.
Yeah, definitely. I just wish it was easier to use and harder to ignore.

Sam
June 29, 2004
Bent Rasmussen wrote:

>>Frankly, yes, I use -1 as a "magic value" all the time, and do all sorts
>>of ugly things when negative numbers are perfectly valid. This is
> 
> 
> That's true. In Standard ML you could do
> 
> val index : 'a -> int option
> 
> Then if 'a exists return SOME(x), if not, return NONE. If a function has a
> an option type as a domain it has to deal with both cases.
McCall's Law the First:
Every feature of a "traditional" language is a special case of a feature of every functional language.
McCall's Law the Second:
Every feature of every functional language is a special case of the only feature of Lisp.

> In D, you'd either use a magic value like -1 or encapsulate values in a
> class; then null is NONE and not null is SOME.
But this isn't ML. I will get some weird looks, and nobody will touch my libraries ;-)
Besides, that's exactly equivalent (AFAICS) to a reference type, assuming no pointer arithmetic and casting shenanigans. If this _is_ useful, is dereferencing one more pointer to access arrays really going to kill us? Or is there some case where the value-type-kinda nature of arrays is useful?

> But you can go ahead and create a class for lists, no problem at all.
> Neither Phobos nor DTL has fully hatched yet, so we'll see what happens.
I'm beginning to think this is the only answer. But lists are such a fundamental type, using a non-standard list type would be a pain. I can't see room for another list type, so I guess I'll end up using DTL's list everywhere, and hope everyone does the same. But it does seem a waste of such powerful arrays in the language.

Sam
June 29, 2004
In article <cbs0bj$1vhf$1@digitaldaemon.com>, Sam McCall says...
>
>I'm not saying it shouldn't be well-defined, but Java doesn't require the user to understand the intricacies of unicode encodings to manipulate strings.

Yes it does. Java chars operate in UTF-16. If you want to store the character U+012345 in a Java string, you need to worry about UTF-16.


>Probably not, although if reading an encoded string and then writing it again doesn't produce the same byte-output, I'm sure I could find a contrived example... copy-pasting text invalidating a digital signature?

That's what normalization is for. We'll have that soon in a forthcoming version of etc.unicode.


>Am I right in assuming a glyph can be fairly complicated?

Very much so. Especially if you're a font designer, since Unicode allows you to munge any two glyphs together into a bigger glyph (a ligature). In practice, fonts only provide a small subset of all possible ligatures (as you can imagine!).


>Yeah. It's just a bit disappointing after hearing "Strings are character arrays and everything about them makes sense" to realise that you either have to grok UTF-N or treat these "characters" as opaque... the advantages over a class are gone, and a class has reference semantics and member functions.

Not really. So long as you remember that characters <= 0x7F are OK in a char, and that characters <= 0xFFFF are fine in a wchar, you're sorted.


>Yeah, it's the partly-there that's frustrating... my selfish side would be happy with just ASCII ;-). It just seems sometimes that if it's not easy and consistent to make things unicode-friendly, it won't happen.

Right, but it's a question of where that support comes from. To demand it all of the language itself is asking /a lot/ from poor old Walter. If we can add it, piece by piece, in libraries, I'd say we're not doing too badly.



>Especially in places where ASCII works fine, that's certainly easy and consistent! The current way seems to suggest that officially it's all unicode and happy, but (don't tell anyone) feel free to use ascii

It /is/ okay to use ASCII. All valid ASCII also happens to be valid UTF-8. UTF-8 was designed that way.


>and assume chars are characters if you want. The standard library even does this, in std.string no less.

So long as they make no assumptions about characters > 0x7F, that's perfectly reasonable.


>Yeah, fonts are a problem. My ideal world would have a (huge!) complete system default font (or one each for serif, sans, and mono) supplied with the OS, that would be the fallback for nonexistant characters.

I absolutely agree. There are free fonts which do this, but they don't display well at small point-size because of something called "hinting", which apparently you can't do without paying someone royalties because of some stupid IP nonsense.


>Yes. What gets me is that in a 5 years we'll (hopefully) be far enough down the unicode road that D's approach will seem backward, and I'll have to wait for someone to reinvent a similar language, with a more thorough unicode integration.

Yup. That's the way it goes. So what else shall we imagine for D++?

Jill