October 19, 2009
Denis Koroskin wrote:
> On Mon, 19 Oct 2009 17:13:46 +0400, dsimcha <dsimcha@yahoo.com> wrote:
> 
>> This discussion originated in the T[new] thread, but I think it deserves its own
>> thread.
>>
>> == Quote from Denis Koroskin (2korden@gmail.com)'s article
>>> An Array!(T) is really just a different name to a T[new]. You'll have the
>>> same problem explaining difference between Array!(T) and T[].
>>> But you are also creating a nightmare for CTFE. Since you can't use "a ~=
>>> b;" anymore, you'll have to use "a = a ~ b;" which *always* allocates. Not
>>> only it is syntactically less pleasant, this way you render this function
>>> useless at run-time - who in the sane mind will use such an inefficient
>>> stuff?
>>
>> Maybe what we need is a version(ctfe) statement.  Stuff inside such a block would
>> be executed only if a function is being compile time evaluated.  When code is
>> generated for runtime evaluation the else block would be used.  This would allow
>> problems like this to be solved in a well-encapsulated way.  Example:
>>
>> uint[] findPrimes(uint maxPrime) {
>>     version(ctfe) {
>>         uint[] ret;
>>     } else {
>>         ArrayBuilder!uint ret;
>>     }
>>
>>     foreach(i; 0..maxPrime) {
>>         if(!isPrime(i)) {
>>              continue;
>>         }
>>
>>         version(ctfe) {
>>             ret = ret ~ i;
>>         } else {
>>             ret ~= i;
>>         }
>>     }
>> }
>>
>> Given that CTFE will likely never support everything that is supported at runtime,
>> this will likely make it much more useful.
> 
> It was suggested before and IIRC Walter said it is next to impossible to implement.

I had a bit of an attempt at it. version(ctfe) seems to be nearly impossible.
But I'm almost certain I could get a magic bool __ctfe to work:

if (__ctfe) {
  ...      // only contains D code
} else {
   asm { .... }
}

It would only be accessible inside unsafe modules (which are the only place you'd need it).

> What you suggest is almost the same as writing 2 different functions. The killer feature of CTFE is that you write it just once and use both at compile and run-time without modifications.

Yes, but there are problems. A few low-level, high-speed functions can't be used in CTFE because they use asm or nasty unions.

eg, std.math.poly()
October 19, 2009
== Quote from Don (nospam@nospam.com)'s article
> Denis Koroskin wrote:
> > On Mon, 19 Oct 2009 17:13:46 +0400, dsimcha <dsimcha@yahoo.com> wrote:
> >
> >> This discussion originated in the T[new] thread, but I think it
> >> deserves its own
> >> thread.
> >>
> >> == Quote from Denis Koroskin (2korden@gmail.com)'s article
> >>> An Array!(T) is really just a different name to a T[new]. You'll have
> >>> the
> >>> same problem explaining difference between Array!(T) and T[].
> >>> But you are also creating a nightmare for CTFE. Since you can't use
> >>> "a ~=
> >>> b;" anymore, you'll have to use "a = a ~ b;" which *always*
> >>> allocates. Not
> >>> only it is syntactically less pleasant, this way you render this
> >>> function
> >>> useless at run-time - who in the sane mind will use such an inefficient
> >>> stuff?
> >>
> >> Maybe what we need is a version(ctfe) statement.  Stuff inside such a
> >> block would
> >> be executed only if a function is being compile time evaluated.  When
> >> code is
> >> generated for runtime evaluation the else block would be used.  This
> >> would allow
> >> problems like this to be solved in a well-encapsulated way.  Example:
> >>
> >> uint[] findPrimes(uint maxPrime) {
> >>     version(ctfe) {
> >>         uint[] ret;
> >>     } else {
> >>         ArrayBuilder!uint ret;
> >>     }
> >>
> >>     foreach(i; 0..maxPrime) {
> >>         if(!isPrime(i)) {
> >>              continue;
> >>         }
> >>
> >>         version(ctfe) {
> >>             ret = ret ~ i;
> >>         } else {
> >>             ret ~= i;
> >>         }
> >>     }
> >> }
> >>
> >> Given that CTFE will likely never support everything that is supported
> >> at runtime,
> >> this will likely make it much more useful.
> >
> > It was suggested before and IIRC Walter said it is next to impossible to implement.
> I had a bit of an attempt at it. version(ctfe) seems to be nearly
> impossible.
> But I'm almost certain I could get a magic bool __ctfe to work:
> if (__ctfe) {
>    ...      // only contains D code
> } else {
>     asm { .... }
> }
> It would only be accessible inside unsafe modules (which are the only
> place you'd need it).
> > What you suggest is almost the same as writing 2 different functions. The killer feature of CTFE is that you write it just once and use both at compile and run-time without modifications.
> Yes, but there are problems. A few low-level, high-speed functions can't
> be used in CTFE because they use asm or nasty unions.
> eg, std.math.poly()

So this magic bool would be like a regular if statement in that it would create a new scope?  I guess if you could do it such that it worked with static if, you'd be able to get it to work with version.
October 19, 2009
Walter Bright wrote:
> Ary Borenszweig wrote:
>> I remember seeing a lot of CTFE code that created a dynamic array and then appended stuff to it, like for example to build a list of prime numbers. Would that still work with ArrayBuilder?
> 
> Probably not. But you can rewrite:
> 
>   a ~= stuff;
> 
> as:
> 
>   a = a ~ stuff;
> 
> to make it work.

Is there any reason the first can't be a short-hand for the second?
October 19, 2009
== Quote from downs (default_357-line@yahoo.de)'s article
> Walter Bright wrote:
> > Ary Borenszweig wrote:
> >> I remember seeing a lot of CTFE code that created a dynamic array and then appended stuff to it, like for example to build a list of prime numbers. Would that still work with ArrayBuilder?
> >
> > Probably not. But you can rewrite:
> >
> >   a ~= stuff;
> >
> > as:
> >
> >   a = a ~ stuff;
> >
> > to make it work.
> Is there any reason the first can't be a short-hand for the second?

Devil's advocate because I somewhat agree with you:  a = a ~ stuff; is so inefficient that it should be ugly.
October 19, 2009
dsimcha wrote:
> == Quote from Don (nospam@nospam.com)'s article
>> Denis Koroskin wrote:
>>> On Mon, 19 Oct 2009 17:13:46 +0400, dsimcha <dsimcha@yahoo.com> wrote:
>>>
>>>> This discussion originated in the T[new] thread, but I think it
>>>> deserves its own
>>>> thread.
>>>>
>>>> == Quote from Denis Koroskin (2korden@gmail.com)'s article
>>>>> An Array!(T) is really just a different name to a T[new]. You'll have
>>>>> the
>>>>> same problem explaining difference between Array!(T) and T[].
>>>>> But you are also creating a nightmare for CTFE. Since you can't use
>>>>> "a ~=
>>>>> b;" anymore, you'll have to use "a = a ~ b;" which *always*
>>>>> allocates. Not
>>>>> only it is syntactically less pleasant, this way you render this
>>>>> function
>>>>> useless at run-time - who in the sane mind will use such an inefficient
>>>>> stuff?
>>>> Maybe what we need is a version(ctfe) statement.  Stuff inside such a
>>>> block would
>>>> be executed only if a function is being compile time evaluated.  When
>>>> code is
>>>> generated for runtime evaluation the else block would be used.  This
>>>> would allow
>>>> problems like this to be solved in a well-encapsulated way.  Example:
>>>>
>>>> uint[] findPrimes(uint maxPrime) {
>>>>     version(ctfe) {
>>>>         uint[] ret;
>>>>     } else {
>>>>         ArrayBuilder!uint ret;
>>>>     }
>>>>
>>>>     foreach(i; 0..maxPrime) {
>>>>         if(!isPrime(i)) {
>>>>              continue;
>>>>         }
>>>>
>>>>         version(ctfe) {
>>>>             ret = ret ~ i;
>>>>         } else {
>>>>             ret ~= i;
>>>>         }
>>>>     }
>>>> }
>>>>
>>>> Given that CTFE will likely never support everything that is supported
>>>> at runtime,
>>>> this will likely make it much more useful.
>>> It was suggested before and IIRC Walter said it is next to impossible to
>>> implement.
>> I had a bit of an attempt at it. version(ctfe) seems to be nearly
>> impossible.
>> But I'm almost certain I could get a magic bool __ctfe to work:
>> if (__ctfe) {
>>    ...      // only contains D code
>> } else {
>>     asm { .... }
>> }
>> It would only be accessible inside unsafe modules (which are the only
>> place you'd need it).
>>> What you suggest is almost the same as writing 2 different functions.
>>> The killer feature of CTFE is that you write it just once and use both
>>> at compile and run-time without modifications.
>> Yes, but there are problems. A few low-level, high-speed functions can't
>> be used in CTFE because they use asm or nasty unions.
>> eg, std.math.poly()
> 
> So this magic bool would be like a regular if statement in that it would create a
> new scope?  I guess if you could do it such that it worked with static if, you'd
> be able to get it to work with version.

It'd just be a variable, which the CTFE interpreter would evaluate as 'true', but would be a normal bool variable everywhere else. Similar to the magic variable __dollar.
It'd only become false just before being sent to the code generator. Dead code elimination would remove the CTFE part.

The important thing is that the AST must be identical for both CTFE and run-time.
October 19, 2009
Walter Bright írta:
> The purpose of T[new] was to solve the problems T[] had with passing T[] to a function and then the function resizes the T[]. What happens with the original?

I've tried to follow the T[], T[new], T+slice discussion, but i'm lost,
I don't think if it's a good idea, to extend the language with all kind of decorated version of the same object ( scope  (*T)[scope new][1..$] :) )

"It will tell you for instance how to describe something that was about to happen to you in the past before you avoided it by time-jumping forward two days in order to avoid it. The event will be described differently according to whether you are talking about it from the standpoint of your own natural time, from a time in the further future, or a time in the further past and is further complicated by the possibility of conducting conversations whilst you are actually travelling from one time to another with the intention of becoming your own father or mother." Douglas Adams - The Hitchhiker's Guide to the Galaxy

Just an idea to simplify this whole thing, maybe it's whole wrong, but who knows... :)

Instead of creating new types of arrays we have already two: static and dynamic. A dynamic version can be resized (restructured) any time, the static has a fixed size, thus the dynamic version works as a class and the static version works as a struct for either in,out,ref/etc.

To create slices, slices those references the whole array i'd introduce a new type (as there is something like this): range!(T). A range is actually an inner class of the array implementing an IRange(T) interface with the usual popFront, front, etc. For a general class if one wants to add range, he just have to provide an onRange (onSlice) function.

As the range interface itself provides the opIndex, opAssign, opXAssign etc., they can be implemented as desired and no more opSliceAddAssign and other decorated function's are required. Even more as the range is an inner class of the array (and knows about the array), a finer tune of what_shall_happen_when_my_referrer_array_is_resize policy can be created.

As the range itself can work as an array, range[1] (or maybe even rang[1..2], calling the IRange!(T).onRange() ) it can be used as an array  whose structure (length) cannot be altered as desired for T[]

ex:
T[] a; // is a dynamic array as before
range(T) s1 = T[1..2]; // the slice from 1..2
range(T) s1 = T.opRange(); // the slice representing the whole range

foo( T[new] a ) is same as foo( T[] )
foo( T[] ) would be replaced by foo( range(T) ) and thus cannot be resized

furthermore

 foreach( a; array )
translates to something like ->
 foreach( a; a.onRange() )...

foreach( a; array[1..2] )
translates to something like ->
 foreach( a; a.onRange(1,2) )...

foreach( a; array[1..2,3..4] )
translates to something like ->
 foreach( a; a.onRange(1,2,3,4) )...

array[1..2] += 1
translates to something like ->
array.onRange(1,2).opAddAssign(1);


It'd be also great to create more then one onRange implementation of a class ex: matrix.opRangeRowWise matrix.opRangeColumnWise, but i don't
know how to select the appropriate one in a foreach loop "automatically":
foreach( a; mx.rowWise ) ->
 mx.selectRange(name : string) { mixin( "onRange" ~ name )  }
 foreach( a; ma.selectRange("rowWise")(1,2) )...


As a conclusion: please don't create any more array type, we have just enough. We have already all the features in the language to express both the desired and explained (above) behaviour. Only the syntax should be cleared out. Using the same notation multiple times with small decorations is not a good idea, and makes things confusing.

Thanks.

October 19, 2009
Andrei Alexandrescu, el 18 de octubre a las 20:16 me escribiste:
> Here's what I wrote to Walter:
> 
> ====================
> I'm going to suggest something terrible - let's get rid of T[new]. I
> know it's difficult to throw away work you've already done, but
> really things with T[new] start to look like a Pyrrhic victory. Here
> are some issues:
> 
> * The abstraction doesn't seem to come off as crisp and clean as we both wanted;
> 
> * There are efficiency issues, such as the two allocations that you valiantly tried to eliminate in a subset of cases;
> 
> * Explaining two very similar but subtly different types to newcomers is excruciatingly difficult (I'll send you a draft of the chapter - it looks like a burn victim who didn't make it);
> 
> * Furthermore, explaining people when to use one vs. the other is much more difficult than it seems. On the surface, it goes like this: "If you need to append stuff, use T[new]. If not, use T[]." Reality is much more subtle. For one thing, T[new] does not allow contraction from the left, whereas T[] does. That puts T[] at an advantage. So if you want to append stuff and also contract from the left, there's nothing our abstractions can help you with.

I think this is getting overcomplicated. I don't see it as complex, I see it like this:

2 types should be provided: array and slice.

array is a *real* type, storing and owning memory, it should be something
like this (conceptually):

class array(T)
{
	size_t length;
	size_t capacity;
	T[capacity] elements;
}

1) a pure reference type.
2) 1 allocation only (interior pointers are not a problem, the GC have to
   support them anyways).
3) easily appendable (capacity field).

slice should be something like this:

struct slice(T)
{
	size_t length;
	T* ptr;
}

1) a pure value type.
2) no allocation at all, *ever*.
3) not appendable at all.
4) You can change both ptr and length, and you can mutate the elements (if
   not immutable).

So a slice is a window to peek a chunk of data.

Then you have the syntax. In this discussion, T[new] is array and T[] is slice. I find that syntax very confusing. I think it could be even better to just put those 2 types in object.d (well, do public imports of those 2 types in object.d) and let the people write:

array!T a;
slice!T s;

The array literals should be immutable chunks of memory in the static
memory, as ClassInfo.init. The type of [1,2,3] should be, for example,
slice!(immutable int), so:
auto s = [1,2,3];
should create a slice!(immutable int) with length 3 and ptr pointing to
the static memory where the [1,2,3] was stored (this is what happens with
strings, right? So no surprises here, they are consistent).

slice.dup should return an array. If you want to just copy the length and ptr members, use assignment (it's a value type, remember)?

auto s = [1,2,3];
auto t = s;
assert(s.length == t.length);
assert(s.ptr == t.ptr);
assert(&s != &t);

Assignment of arrays is just a pointer assignment (is a reference type):

auto a = [1,2,3].dup;
auto b = a;
assert(a is b);

array.dup returns another array. array[] returns a slice though (you can allow implicit casting from array to slice but I don't see the point as array[] is short enough and more explicit). slices should not be convertible to arrays (use slice.dup for that).


Back to the syntax, I think T[new] is *really* confusing. It tell me
nothing about the type, it provides the same information as calling it
it wazzzaaap!T for me. I *really* think it would be better to call it
array!T. But if we have both types, T[] is not clear anymore that is
a slice, again, you can't figure that out. So maybe T[] should be used for
arrays, not slices; if you want some syntax sugar I think T[] is more
guessable to be an array than a slice, and it would be a little more
backward compatible. Maybe slices can be T[..], but I'm not sure is clear
enough, again I think we could live with something more explicit like
array!T and slice!T. But at least slices should be part of the language,
because that should be the type of "array literals" (damn! that didn't
sound right =P).


I'm missing something? Why this shouldn't work?

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
- Que hacés, ratita?
- Espero un ratito...
October 19, 2009
On Mon, 19 Oct 2009 18:57:45 +0400, Leandro Lucarella <llucax@gmail.com> wrote:

> Andrei Alexandrescu, el 18 de octubre a las 20:16 me escribiste:
>> Here's what I wrote to Walter:
>>
>> ====================
>> I'm going to suggest something terrible - let's get rid of T[new]. I
>> know it's difficult to throw away work you've already done, but
>> really things with T[new] start to look like a Pyrrhic victory. Here
>> are some issues:
>>
>> * The abstraction doesn't seem to come off as crisp and clean as we
>> both wanted;
>>
>> * There are efficiency issues, such as the two allocations that you
>> valiantly tried to eliminate in a subset of cases;
>>
>> * Explaining two very similar but subtly different types to
>> newcomers is excruciatingly difficult (I'll send you a draft of the
>> chapter - it looks like a burn victim who didn't make it);
>>
>> * Furthermore, explaining people when to use one vs. the other is
>> much more difficult than it seems. On the surface, it goes like
>> this: "If you need to append stuff, use T[new]. If not, use T[]."
>> Reality is much more subtle. For one thing, T[new] does not allow
>> contraction from the left, whereas T[] does. That puts T[] at an
>> advantage. So if you want to append stuff and also contract from the
>> left, there's nothing our abstractions can help you with.
>
> I think this is getting overcomplicated. I don't see it as complex, I see
> it like this:
>
> 2 types should be provided: array and slice.
>
> array is a *real* type, storing and owning memory, it should be something
> like this (conceptually):
>
> class array(T)
> {
> 	size_t length;
> 	size_t capacity;
> 	T[capacity] elements;
> }
>
> 1) a pure reference type.
> 2) 1 allocation only (interior pointers are not a problem, the GC have to
>    support them anyways).
> 3) easily appendable (capacity field).
>
> slice should be something like this:
>
> struct slice(T)
> {
> 	size_t length;
> 	T* ptr;
> }
>
> 1) a pure value type.
> 2) no allocation at all, *ever*.
> 3) not appendable at all.
> 4) You can change both ptr and length, and you can mutate the elements (if
>    not immutable).
>
> So a slice is a window to peek a chunk of data.
>
> Then you have the syntax. In this discussion, T[new] is array and T[] is
> slice. I find that syntax very confusing. I think it could be even better
> to just put those 2 types in object.d (well, do public imports of those
> 2 types in object.d) and let the people write:
>
> array!T a;
> slice!T s;
>
> The array literals should be immutable chunks of memory in the static
> memory, as ClassInfo.init. The type of [1,2,3] should be, for example,
> slice!(immutable int), so:
> auto s = [1,2,3];
> should create a slice!(immutable int) with length 3 and ptr pointing to
> the static memory where the [1,2,3] was stored (this is what happens with
> strings, right? So no surprises here, they are consistent).
>
> slice.dup should return an array. If you want to just copy the length and
> ptr members, use assignment (it's a value type, remember)?
>
> auto s = [1,2,3];
> auto t = s;
> assert(s.length == t.length);
> assert(s.ptr == t.ptr);
> assert(&s != &t);
>
> Assignment of arrays is just a pointer assignment (is a reference type):
>
> auto a = [1,2,3].dup;
> auto b = a;
> assert(a is b);
>
> array.dup returns another array. array[] returns a slice though (you can
> allow implicit casting from array to slice but I don't see the point as
> array[] is short enough and more explicit). slices should not be
> convertible to arrays (use slice.dup for that).
>
>
> Back to the syntax, I think T[new] is *really* confusing. It tell me
> nothing about the type, it provides the same information as calling it
> it wazzzaaap!T for me. I *really* think it would be better to call it
> array!T. But if we have both types, T[] is not clear anymore that is
> a slice, again, you can't figure that out. So maybe T[] should be used for
> arrays, not slices; if you want some syntax sugar I think T[] is more
> guessable to be an array than a slice, and it would be a little more
> backward compatible. Maybe slices can be T[..], but I'm not sure is clear
> enough, again I think we could live with something more explicit like
> array!T and slice!T. But at least slices should be part of the language,
> because that should be the type of "array literals" (damn! that didn't
> sound right =P).
>
>
> I'm missing something? Why this shouldn't work?
>

That's what was initially planned (except some nuances).

I also think we need a precedent of mapping built-in language type to a library type, starting with Array!(T) and Range!(T). We could then map V[K] to AArray!(K,V), T? to Nullable!(T) etc.

There is one big issue, though: classes and allocations via 'new' don't work with CTFE. I believe this is something that is planned (Walter once said that an entire subset of D called SafeD should work with CTFE), but it is *very* hard to implement, not feasible in a nearest future for sure.
October 19, 2009
Leandro Lucarella, el 19 de octubre a las 11:57 me escribiste:
> slice.dup should return an array.

To avoid making the language aware of the array type, slice.dup can be removed and use an array constructor instead:

auto a = slice.dup;

should be:

auto a = array!T(slice);

This way, I think the language only have to know about slices.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
He cometido pecados, he hecho el mal, he sido víctima de la envidia, el
egoísmo, la ambición, la mentira y la frivolidad, pero siempre he sido
un padre argentino que quiere que su hijo triunfe en la vida.
	-- Ricardo Vaporeso
October 19, 2009
Leandro Lucarella wrote:
> Andrei Alexandrescu, el 18 de octubre a las 20:16 me escribiste:
>> Here's what I wrote to Walter:
>>
>> ====================
>> I'm going to suggest something terrible - let's get rid of T[new]. I
>> know it's difficult to throw away work you've already done, but
>> really things with T[new] start to look like a Pyrrhic victory. Here
>> are some issues:
>>
>> * The abstraction doesn't seem to come off as crisp and clean as we
>> both wanted;
>>
>> * There are efficiency issues, such as the two allocations that you
>> valiantly tried to eliminate in a subset of cases;
>>
>> * Explaining two very similar but subtly different types to
>> newcomers is excruciatingly difficult (I'll send you a draft of the
>> chapter - it looks like a burn victim who didn't make it);
>>
>> * Furthermore, explaining people when to use one vs. the other is
>> much more difficult than it seems. On the surface, it goes like
>> this: "If you need to append stuff, use T[new]. If not, use T[]."
>> Reality is much more subtle. For one thing, T[new] does not allow
>> contraction from the left, whereas T[] does. That puts T[] at an
>> advantage. So if you want to append stuff and also contract from the
>> left, there's nothing our abstractions can help you with.
> 
> I think this is getting overcomplicated. I don't see it as complex, I see
> it like this:
> 
> 2 types should be provided: array and slice.
> 
> array is a *real* type, storing and owning memory, it should be something
> like this (conceptually):
> 
> class array(T)
> {
> 	size_t length;
> 	size_t capacity;
> 	T[capacity] elements;
> }

At the point I introduce arrays I hadn't described classes.

One major problem with writing a "The X Programming Language" book is sequencing. It's very easy to explain a complicated feature if the listener knows all others. It is very difficult to introduce them serially.

> 1) a pure reference type.
> 2) 1 allocation only (interior pointers are not a problem, the GC have to
>    support them anyways).
> 3) easily appendable (capacity field).
> 
> slice should be something like this:
> 
> struct slice(T)
> {
> 	size_t length;
> 	T* ptr;
> }

structs and pointers have also not been introduced yet.

> I'm missing something? Why this shouldn't work?

It may work, but I was unable to pull it off reasonably well.


Andrei