June 29, 2004
On Mon, 28 Jun 2004 16:33:25 -0700, Andy Friesen <andy@ikagames.com> wrote:
> Regan Heath wrote:
>> On Mon, 28 Jun 2004 12:50:08 -0700, Andy Friesen <andy@ikagames.com> wrote:
>>> You say that as though it is self-evident that strings must absolutely, unequivocably be, at all costs, reference types.  Why?
>> If it's not a reference type, then how can you signal non-existance (null)?
>
> You don't.

Thought so..

>> I have not used C++ containers. I program in C for a living, and C++ for a hobby. Is there a C++ container for strings that cannot tell the difference between non-existant and empty?
>
> Yeah, it's called std::string, and it's more or less the default.

And it's crap. IMNSHO.

>>> A 'null array' is a completely arbitrary concept that has been extrapolated from undefined behaviour. :)
>> It may be undefined, but I believe it is required.
>
> Why?  C++ gets along without them just fine, and every C derivant I know of gets along fine without allowing primitive type returns to signify nonexistence.
>
> Functions which returns structs cannot return null either.

Thus why just about no-one ever does this (in C). They all return a pointer to a struct.

>> The soln IMO is either to make the current behaviour official and consistent, or to change the behaviour, make that official and provide another way to tell null apart from an empty string.
>
> Farmer's test reports pretty consistent results if you suppose that comparing arrays to null is ill-formed:
>
>      empty1.length == 0    is true
>      empty1 == ""          is true
>      empty2.length == 0    is true
>      empty2 == ""          is true
>      empty3.length == 0    is true
>      empty3 == ""          is true
>
> Don't compare arrays to null.  Don't try to differentiate between empty and nonexistent.

Fine and dandy EXCEPT we *need* to differentiate between empty and non-existant strings.

> D arrays simply do not work that way.

In that case we need an array specialisation for strings, so I'll have to write my own. This defeats the purpose of char[] in the first place, which was, to be a better more consistent  string handling method than in possible in c/c++.

Regan.

-- 
Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
June 29, 2004
Regan Heath wrote:
> ... we need an array specialisation for strings, so I'll have to write my own. This defeats the purpose of char[] in the first place, which was, to be a better more consistent  string handling method than in possible in c/c++.

That would work, but it might be better to adjust your thinking to match the language instead of trying to shoehorn the way you're used to thinking onto an abstraction that clearly wasn't built for it.  Don't think in Java/C++/etc.  Think in D. :)

 -- andy
June 29, 2004
In article <opsab6o5rl5a2sq9@digitalmars.com>, Regan Heath says...
>
>> Yeah, it's called std::string, and it's more or less the default.
>
>And it's crap. IMNSHO.

You'll get no arguments from me there. D got it right in not having a string class. I didn't think that at first, but I've come round to the D way of thinking. The problem with a string class is that you can't add new member functions to it. (Oh, you may be able to subclass String, if it's not final. Oh wait - it /is/ final in Java). With char[] arrays, you CAN add new functions.

Besides which, what else can a char[] array possibly repreresent, other than a string? (Given that a char[] array MUST contain UTF-8, I mean). It's not the same as a byte[] array, which could mean anything.







>> Don't compare arrays to null.  Don't try to differentiate between empty and nonexistent.
>
>Fine and dandy EXCEPT we *need* to differentiate between empty and non-existant strings.

Why? Do we also need a way to differentiate between empty and non-existent ints?


In D, there is no such thing as a non-existent int; there is no such thing as a non-existent struct; and there is no such thing as a non-existent string.

Why not just start from the assumption that we DON'T need to differentiate between empty and non-existant strings, and take it from there?

Maybe the real solution would be to make it a compile error to assign an array with null, or to compare it with null. This would then force people to say what they mean, and all such problems would go away.

(Anyway, you all KNOW my opinion that

#    // given char[] s
#    if (s)

should be a compile-time error anyway, because char[] is not boolean. But that's another story).

Jill



June 29, 2004
Arcane Jill wrote:

> In article <opr99w0st25a2sq9@digitalmars.com>, Regan Heath says...
>>> (1) given that a is an array of length n, the expression a[n..n] gives
>>> an array
>>> bounds exception,
> 
>>This (now?) works.
> 
> Indeed, I think it has always worked. It was just me misremembering the problem. I'll start again. What I MEANT was...
> 
> Given that a is an array of length n, the expression &a[n] gives an array bounds exception. And I don't believe it should. Taking the address of the first byte beyond the end of an array can be a very useful thing to do.

No, I disagree here. In general, that address would point to nothing. Reading there is pointless, writing is dangerous. If you want to append to a string by doing a low-level write to memory, then increment length first and write then.

The way you could phrase it: In some cases it would be convenient if it were not an error to take that address, if it is then not used afterward.

But still, I don't see that coding around that "limitation" is that much of an effort. It gives you a few if-clauses around expressions, so what?

June 29, 2004
In article <cbr57k$p0m$1@digitaldaemon.com>, Norbert Nemec says...
>
>> Given that a is an array of length n, the expression &a[n] gives an array bounds exception. And I don't believe it should. Taking the address of the first byte beyond the end of an array can be a very useful thing to do.
>
>No, I disagree here. In general, that address would point to nothing. Reading there is pointless, writing is dangerous.

Such a pointer is never used for reading OR writing. It /is/, however, used in pointer comparison expressions, and in such context, is perfectly meaningful, and safe.

But anyway, Farmer tells me I can write cast(elementtype*)a+n, so I'm happy.



>If you want to append to
>a string by doing a low-level write to memory,

I never said I wanted to do any such thing.


Arcane Jill


June 29, 2004
On Tue, 29 Jun 2004 07:18:20 +0000 (UTC), Arcane Jill wrote:

> In article <opsab6o5rl5a2sq9@digitalmars.com>, Regan Heath says...
>>
>>> Yeah, it's called std::string, and it's more or less the default.
>>
>>And it's crap. IMNSHO.
> 
> You'll get no arguments from me there. D got it right in not having a string class. I didn't think that at first, but I've come round to the D way of thinking. The problem with a string class is that you can't add new member functions to it. (Oh, you may be able to subclass String, if it's not final. Oh wait - it /is/ final in Java). With char[] arrays, you CAN add new functions.
> 
> Besides which, what else can a char[] array possibly repreresent, other than a string? (Given that a char[] array MUST contain UTF-8, I mean). It's not the same as a byte[] array, which could mean anything.
> 
>>> Don't compare arrays to null.  Don't try to differentiate between empty and nonexistent.
>>
>>Fine and dandy EXCEPT we *need* to differentiate between empty and non-existant strings.
> 
> Why? Do we also need a way to differentiate between empty and non-existent ints?
> 
> In D, there is no such thing as a non-existent int; there is no such thing as a non-existent struct; and there is no such thing as a non-existent string.
> 
> Why not just start from the assumption that we DON'T need to differentiate between empty and non-existant strings, and take it from there?

Because that's not what is being meant. I'd like to differentiate between INITIALIZED and UNINITIALIZED vectors. This non-existant thing is a red-herring. 'empty' means initialized and length of zero. 'non-existant' means not initialized yet.

# char[] x;
# void foo(char[] p)
# {
#    if (p.isInitialized == false)
#    {
#         InitTheDamnThing(p);
#    }
#
#    // Now deal with it.
# }

Its a workaround for the current (longer) way of handling this situation. Its no big deal but it would be 'nice to have'. Like a strict bool type would be nice to have.
-- 
Derek
Melbourne, Australia
29/Jun/04 6:24:19 PM
June 29, 2004
In article <cbr9e5$vai$1@digitaldaemon.com>, Derek Parnell says...

>Because that's not what is being meant. I'd like to differentiate between INITIALIZED and UNINITIALIZED vectors.

Why?

D's dynamic arrays are the same thing as C++ std::vectors (as I'm sure you realize). In C++, there is no such thing as an uninitialized vector. Why on Earth would you want them in D?



>This non-existant thing is a
>red-herring. 'empty' means initialized and length of zero. 'non-existant'
>means not initialized yet.

Yeah - but nobody has yet answered WHY? Why would ANYONE want to allow uninitialized array handles (as opposed to array content) to exist in D. It makes no sense.

Please, can someone who is arguing in favor of allowing a distinction between initialized and unintialized dynamic array handles, explain exactly why you want such a distinction to exist?


Arcane Jill


June 29, 2004
Arcane Jill wrote:
> In article <opsab6o5rl5a2sq9@digitalmars.com>, Regan Heath says...
> 
>>>Yeah, it's called std::string, and it's more or less the default.
>>
>>And it's crap. IMNSHO.
> 
> 
> You'll get no arguments from me there. D got it right in not having a string
> class. I didn't think that at first, but I've come round to the D way of
> thinking. 
I'm still getting there... I still don't see why toUpper("hello") is better than "hello".toUpper(), under the assumption that the OO way has any merit. (If it doesn't, why do we have it?)

> The problem with a string class is that you can't add new member
> functions to it. (Oh, you may be able to subclass String, if it's not final. Oh
> wait - it /is/ final in Java). With char[] arrays, you CAN add new functions.
I'm confused: is there a way of adding functions to array types that can't be used with classes?

> Besides which, what else can a char[] array possibly repreresent, other than a
> string? (Given that a char[] array MUST contain UTF-8, I mean). It's not the
> same as a byte[] array, which could mean anything.
In theory you're right. The problem is when people assume "a char array is a list of characters", which is perfectly logical, given the name.
In theory, you should only store a list of characters in a dchar[]. But it's not going to happen, see std.string.maketrans (char[] is a list) and translate (char[] is opaque).

[RANT]
IMO, D (language, not libraries) isn't _really_ trying to be fully-unicode at all.
What is the purpose of a char/wchar variable? How often do you actually need to be directly manipulating UTF8/16 fragments? (Hint: in a unicode-based language with good libraries, almost never).
*IF* D is going to be fully-unicode, that does have performance impacts. A single character must _always_ go in a dchar variable. So what is the advantage in having strings being char[] arrays? ("knowing the encoding" doesn't count, the user shouldn't have to care).
IMO, strings NEED to:
	* Have only one type, or one base type.
I want to write a function that accepts a string. I don't want to write three functions, or use a template (that has to be manually instantiated).
	* Expose character data as _characters_, not fragments.
This means characters accessed must be dchars, indexing must be character, not fragment-based.
	* Be efficient in the common case.
At the moment, this probably means using UTF-8 internally. This could be changed in the future, or there could be multiple versions with the same base type, because all character data would be exposed at the character level.
	* Be fully reference types.
At the moment, if someone passes in a string, I can modify its data, which is shared, and its length, which is not. This makes sense if you understand the implementation, but why should foo~="bar" have the truly odd effects it does? Always passing strings inout is ugly and confusing in other cases.

Based on this, the solution to me looks like a String interface that exposes character data, and UTF8String as the default implementation, which stores its data in a ubyte[], literal strings would create these.
There could then be a UTF32String implementation which would be more efficient for various other languages.
The "char" type should be 32 bits wide. Anything else is confusing. (Hey, they did it with "int"...).
[/RANT]

Now flame on, I'm sure that's not going to be too popular ;-)

>>>Don't compare arrays to null.  Don't try to differentiate between empty and nonexistent.
>>
>>Fine and dandy EXCEPT we *need* to differentiate between empty and non-existant strings.
> 
> Why? Do we also need a way to differentiate between empty and non-existent ints?
Frankly, yes, I use -1 as a "magic value" all the time, and do all sorts of ugly things when negative numbers are perfectly valid. This is neccesary for pragmatic reasons of efficiency, I'd love chips to treat 0x8000... as NaN like the NaN we have in IEEE floating point. (This'd also balance the range of integers). I'm not saying we can/should change the behaviour of ints, just that I don't think this argument has merit.

I think arrays should become fully reference types, for the same reason as strings above. Yes, this would probably mean double indirection, arrays would be a pointer to the (length,data pointer) struct that they currently are.

Sam
June 29, 2004
Arcane Jill wrote:

> In article <cbr9e5$vai$1@digitaldaemon.com>, Derek Parnell says...
> 
> 
>>Because that's not what is being meant. I'd like to differentiate between
>>INITIALIZED and UNINITIALIZED vectors.
> 
> 
> Why?
> 
> D's dynamic arrays are the same thing as C++ std::vectors (as I'm sure you
> realize). 
The difference is in C++ it's common to use a pointer to a class (and I presume, a vector).
In D, an array is a struct, not a class, so to get reference semantics you have to use a struct pointer. In C++ this would be no big deal, but this doesn't seem like the D way.
Reference semantics allow me to change the length of an array and have it reflected in the caller, and to store nulls.

> In C++, there is no such thing as an uninitialized vector. Why on
> Earth would you want them in D?
For the same reason you use null in other situations with reference types. I want accessing an uninitialised member array to give an error. I want to be able to use a null argument to a function to trigger special or default behaviour (optional arguments in any position).

Sam

PS: AJ, I'm not sure if you read the forums at dsource, I posted a couple of deimos bugs:
http://dsource.org/forums/viewtopic.php?t=224
June 29, 2004
Sean Kelly wrote:

> In article <cbprfd$1sq9$1@digitaldaemon.com>, Andy Friesen says...
> 
>>Something which just occurred to me that would resolve this issue would be to add two properties to array types: begin and end.  These properties would be pointer types which point to the beginning and end of the array's contents.  (exactly like C++ iterators)
> 
> 
> This might be very handy.  If so, I wouldn't mind seeing rbegin and rend
> parameters as well though.
Huh? They're pointers... wouldn't rbegin == end and rend == begin?
I think I missed the point...

> Plus, it raises the question of what they return for
> associative arrays.
The concept doesn't apply to associative arrays afaics, so they wouldn't exist.
Sam