October 19, 2009
Andrei Alexandrescu, el 19 de octubre a las 08:38 me escribiste:
> For relatively large chunks of memory, the GC keeps a control block. We could add a member size_t requestedSize that keeps the size that was requested with an allocation/reallocation call. The GC can take initiative in overallocating memory and expose primitives such as requestedSize (how much did the program ask for?) and capacity (how much is really allocated?).  With that API, it is possible to implement ~= efficiently.

That would mean moving the overhead of arrays to the GC, but that overhead will be present for *all* objects, arrays or not, so I don't think it will be a good idea...

Having that information could be used if present though, for example, to avoid scanning chunks of memory that are not used. That could help avoiding false positives (if that chunks of memory are not zeroed already).

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Yo soy Peperino, mártir latino, venid al asado pero traed el vino.
	-- Peperino Pómoro
October 19, 2009
Leandro Lucarella, el 19 de octubre a las 12:13 me escribiste:
> Leandro Lucarella, el 19 de octubre a las 11:57 me escribiste:
> > slice.dup should return an array.
> 
> To avoid making the language aware of the array type, slice.dup can be removed and use an array constructor instead:
> 
> auto a = slice.dup;
> 
> should be:
> 
> auto a = array!T(slice);
> 
> This way, I think the language only have to know about slices.

Unless you want to have an appendable array type in CTFE. I guess the language should know about array anyways :S

(I should read the entire thread before posting)

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Si el enemigo se siente abatido, recuérdele que su familia está lejos y
que mientras siga la pelea él se estará perdiendo el crecimiento de sus
hijos. Si está feliz, recuerdele de la existencia de su familia
política, en especial de los cuñados que tienen sexo con sus hermanas.
	-- Ricardo Vaporeso. De la desmoralización del enemigo.
October 19, 2009
Leandro Lucarella wrote:
> Andrei Alexandrescu, el 19 de octubre a las 08:38 me escribiste:
>> For relatively large chunks of memory, the GC keeps a control block.
>> We could add a member size_t requestedSize that keeps the size that
>> was requested with an allocation/reallocation call. The GC can take
>> initiative in overallocating memory and expose primitives such as
>> requestedSize (how much did the program ask for?) and capacity (how
>> much is really allocated?).  With that API, it is possible to
>> implement ~= efficiently.
> 
> That would mean moving the overhead of arrays to the GC, but that overhead
> will be present for *all* objects, arrays or not, so I don't think it will
> be a good idea...

I forgot to mention that ~= would also have a means to communicate the GC to overallocate. My point is simple: the problem is that slices don't  store enough info. That info could get in the memory control block.

> Having that information could be used if present though, for example, to
> avoid scanning chunks of memory that are not used. That could help
> avoiding false positives (if that chunks of memory are not zeroed
> already).

Good point.


Andrei
October 19, 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> Leandro Lucarella wrote:
> > Andrei Alexandrescu, el 19 de octubre a las 08:38 me escribiste:
> >> For relatively large chunks of memory, the GC keeps a control block. We could add a member size_t requestedSize that keeps the size that was requested with an allocation/reallocation call. The GC can take initiative in overallocating memory and expose primitives such as requestedSize (how much did the program ask for?) and capacity (how much is really allocated?).  With that API, it is possible to implement ~= efficiently.
> >
> > That would mean moving the overhead of arrays to the GC, but that overhead will be present for *all* objects, arrays or not, so I don't think it will be a good idea...
> I forgot to mention that ~= would also have a means to communicate the
> GC to overallocate. My point is simple: the problem is that slices don't
>   store enough info. That info could get in the memory control block.

This is the part of this thread I'm not understanding:  Isn't this info stored
already?  The only problem is querying it is slow.  See GC.sizeOf(), and bug 2900.
 (2900 is fixed, but it's still slow, and there's still bug 2093.)  What are you
proposing that's different from the status quo?

October 19, 2009
Andrei Alexandrescu, el 19 de octubre a las 10:18 me escribiste:
> Leandro Lucarella wrote:
> >Andrei Alexandrescu, el 18 de octubre a las 20:16 me escribiste:
> >>Here's what I wrote to Walter:
> >>
> >>====================
> >>I'm going to suggest something terrible - let's get rid of T[new]. I
> >>know it's difficult to throw away work you've already done, but
> >>really things with T[new] start to look like a Pyrrhic victory. Here
> >>are some issues:
> >>
> >>* The abstraction doesn't seem to come off as crisp and clean as we both wanted;
> >>
> >>* There are efficiency issues, such as the two allocations that you valiantly tried to eliminate in a subset of cases;
> >>
> >>* Explaining two very similar but subtly different types to newcomers is excruciatingly difficult (I'll send you a draft of the chapter - it looks like a burn victim who didn't make it);
> >>
> >>* Furthermore, explaining people when to use one vs. the other is much more difficult than it seems. On the surface, it goes like this: "If you need to append stuff, use T[new]. If not, use T[]." Reality is much more subtle. For one thing, T[new] does not allow contraction from the left, whereas T[] does. That puts T[] at an advantage. So if you want to append stuff and also contract from the left, there's nothing our abstractions can help you with.
> >
> >I think this is getting overcomplicated. I don't see it as complex, I see it like this:
> >
> >2 types should be provided: array and slice.
> >
> >array is a *real* type, storing and owning memory, it should be something
> >like this (conceptually):
> >
> >class array(T)
> >{
> >	size_t length;
> >	size_t capacity;
> >	T[capacity] elements;
> >}
> 
> At the point I introduce arrays I hadn't described classes.

You are talking about the book, right? You don't have to! You can explain
arrays without talking about classes at all! See the Python Tutorial for
example:
http://docs.python.org/tutorial/index.html

You see all kind of complex data structures (lists, dictionaries, tuples, strings, sets), input and output, and even exceptions! before classes. And I read it, and was a very easy and pleasant reading.

Talking about a class here was just to prove my point more easily because I *know* everybody here knows very well about classes semantics. I was not describing it for a book.

> One major problem with writing a "The X Programming Language" book is sequencing. It's very easy to explain a complicated feature if the listener knows all others. It is very difficult to introduce them serially.

I don't think an array would be the case, come on! =)

> >1) a pure reference type.
> >2) 1 allocation only (interior pointers are not a problem, the GC have to
> >   support them anyways).
> >3) easily appendable (capacity field).
> >
> >slice should be something like this:
> >
> >struct slice(T)
> >{
> >	size_t length;
> >	T* ptr;
> >}
> 
> structs and pointers have also not been introduced yet.

I don't know what this have to do with the book. Should be D designed based on how good are you to explain the concepts on the book?  I think this is getting ridiculous... seriously. I don't want to be rude, I know you're putting a lot of work in the book and we all appreciate that, but you didn't mention any technical point about what I wrote.

> >I'm missing something? Why this shouldn't work?
> 
> It may work, but I was unable to pull it off reasonably well.

What problems did you find?

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
DESCARTAN BIDET VIRTUAL PORQUE NO LAVA
	-- Cronista TV
October 19, 2009
Andrei Alexandrescu, el 19 de octubre a las 10:33 me escribiste:
> Leandro Lucarella wrote:
> >Andrei Alexandrescu, el 19 de octubre a las 08:38 me escribiste:
> >>For relatively large chunks of memory, the GC keeps a control block. We could add a member size_t requestedSize that keeps the size that was requested with an allocation/reallocation call. The GC can take initiative in overallocating memory and expose primitives such as requestedSize (how much did the program ask for?) and capacity (how much is really allocated?).  With that API, it is possible to implement ~= efficiently.
> >
> >That would mean moving the overhead of arrays to the GC, but that overhead will be present for *all* objects, arrays or not, so I don't think it will be a good idea...
> 
> I forgot to mention that ~= would also have a means to communicate the GC to overallocate. My point is simple: the problem is that slices don't  store enough info. That info could get in the memory control block.

The control block is not part of the object's memory, so I don't think it's something you want to manipulate frequently (because of locality issues). This works well for the GC because when you manipute control information often is when you are collecting, so for collection most of the control info should be cached and have good locality. In collections you have the exact opposite scheme, you don't look at the data, only the control info. When you manipulate an array you will be probably touching both. That's why I think you should keep all the array data in one place. Well, that and because I don't think is a good idea to transfer array complexities to the GC. It's just not the place for that.

-- 
Leandro Lucarella (AKA luca)                     http://llucax.com.ar/
----------------------------------------------------------------------
GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145  104C 949E BFB6 5F5A 8D05)
----------------------------------------------------------------------
Si por el chancho fuera, se autocomería con chimichurri Worshestershire!
October 19, 2009
Leandro Lucarella wrote:
> Andrei Alexandrescu, el 19 de octubre a las 10:18 me escribiste:
>>> 2 types should be provided: array and slice.
>>>
>>> array is a *real* type, storing and owning memory, it should be something
>>> like this (conceptually):
>>>
>>> class array(T)
>>> {
>>> 	size_t length;
>>> 	size_t capacity;
>>> 	T[capacity] elements;
>>> }
>> At the point I introduce arrays I hadn't described classes.
> 
> You are talking about the book, right? You don't have to! You can explain
> arrays without talking about classes at all!

I was just replying to your suggestion.

>> One major problem with writing a "The X Programming Language" book
>> is sequencing. It's very easy to explain a complicated feature if
>> the listener knows all others. It is very difficult to introduce
>> them serially.
> 
> I don't think an array would be the case, come on! =)
> 
>>> 1) a pure reference type.
>>> 2) 1 allocation only (interior pointers are not a problem, the GC have to
>>>   support them anyways).
>>> 3) easily appendable (capacity field).
>>>
>>> slice should be something like this:
>>>
>>> struct slice(T)
>>> {
>>> 	size_t length;
>>> 	T* ptr;
>>> }
>> structs and pointers have also not been introduced yet.
> 
> I don't know what this have to do with the book. Should be D designed
> based on how good are you to explain the concepts on the book?  I think
> this is getting ridiculous... seriously. I don't want to be rude, I know
> you're putting a lot of work in the book and we all appreciate that, but
> you didn't mention any technical point about what I wrote.
> 
>>> I'm missing something? Why this shouldn't work?
>> It may work, but I was unable to pull it off reasonably well.
> 
> What problems did you find?

I thought I explained that above and in several other posts. I have the feeling this is going in circles, but let me add one more thing. People would want to have a reasonable way of choosing between T[new] and T[]. The differences between them are subtle (I have two tables showing the primitives of T[] and T[new], which are very similar). That static decision concerns future appends to the array, which doesn't strike me as something you know from the get-go through future iterations of a design. Use of "auto" messes up things further: a nice function may choose to return T[new] because it just created an array (an implementation detail), but clients may find that unexpected.


Andrei
October 19, 2009
dsimcha wrote:
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>> Leandro Lucarella wrote:
>>> Andrei Alexandrescu, el 19 de octubre a las 08:38 me escribiste:
>>>> For relatively large chunks of memory, the GC keeps a control block.
>>>> We could add a member size_t requestedSize that keeps the size that
>>>> was requested with an allocation/reallocation call. The GC can take
>>>> initiative in overallocating memory and expose primitives such as
>>>> requestedSize (how much did the program ask for?) and capacity (how
>>>> much is really allocated?).  With that API, it is possible to
>>>> implement ~= efficiently.
>>> That would mean moving the overhead of arrays to the GC, but that overhead
>>> will be present for *all* objects, arrays or not, so I don't think it will
>>> be a good idea...
>> I forgot to mention that ~= would also have a means to communicate the
>> GC to overallocate. My point is simple: the problem is that slices don't
>>   store enough info. That info could get in the memory control block.
> 
> This is the part of this thread I'm not understanding:  Isn't this info stored
> already?  The only problem is querying it is slow.  See GC.sizeOf(), and bug 2900.
>  (2900 is fixed, but it's still slow, and there's still bug 2093.)  What are you
> proposing that's different from the status quo?

Sorry, I got confused a bit: you have the size in the array and you can get the capacity from the GC, just slowly. Let's sleep on this some more, it may not be impossible to find a growth scheme that works fast.

Andrei
October 19, 2009
Andrei Alexandrescu wrote:

> Leandro Lucarella wrote:
>> Andrei Alexandrescu, el 19 de octubre a las 10:18 me escribiste:
>>>> 2 types should be provided: array and slice.
>>>>
>>>> array is a *real* type, storing and owning memory, it should be
>>>> something like this (conceptually):
>>>>
>>>> class array(T)
>>>> {
>>>> size_t length;
>>>> size_t capacity;
>>>> T[capacity] elements;
>>>> }
>>> At the point I introduce arrays I hadn't described classes.
>> 
>> You are talking about the book, right? You don't have to! You can explain arrays without talking about classes at all!
> 
> I was just replying to your suggestion.
> 
>>> One major problem with writing a "The X Programming Language" book is sequencing. It's very easy to explain a complicated feature if the listener knows all others. It is very difficult to introduce them serially.
>> 
>> I don't think an array would be the case, come on! =)
>> 
>>>> 1) a pure reference type.
>>>> 2) 1 allocation only (interior pointers are not a problem, the GC have
>>>> to
>>>>   support them anyways).
>>>> 3) easily appendable (capacity field).
>>>>
>>>> slice should be something like this:
>>>>
>>>> struct slice(T)
>>>> {
>>>> size_t length;
>>>> T* ptr;
>>>> }
>>> structs and pointers have also not been introduced yet.
>> 
>> I don't know what this have to do with the book. Should be D designed based on how good are you to explain the concepts on the book?  I think this is getting ridiculous... seriously. I don't want to be rude, I know you're putting a lot of work in the book and we all appreciate that, but you didn't mention any technical point about what I wrote.
>> 
>>>> I'm missing something? Why this shouldn't work?
>>> It may work, but I was unable to pull it off reasonably well.
>> 
>> What problems did you find?
> 
> I thought I explained that above and in several other posts. I have the feeling this is going in circles, but let me add one more thing. People would want to have a reasonable way of choosing between T[new] and T[]. The differences between them are subtle (I have two tables showing the primitives of T[] and T[new], which are very similar). That static decision concerns future appends to the array, which doesn't strike me as something you know from the get-go through future iterations of a design. Use of "auto" messes up things further: a nice function may choose to return T[new] because it just created an array (an implementation detail), but clients may find that unexpected.
> 
> 
> Andrei

I think you are seeing a larger problem than their is. But consider this, who is D a language for, is it for beginers only? advanced users only? or everyone, if it is a language for everyone don't complicate the language for the advanced users by rejecting nice syntax just because it is hard to explain. The beginers already will just pick one and write ineficient code and I don't think one more type will manage to confuse them more (as most beginers I thaught is already maximaly confused).

October 19, 2009
On Mon, 19 Oct 2009 20:16:50 +0400, Andrei Alexandrescu <SeeWebsiteForEmail@erdani.org> wrote:

> I thought I explained that above and in several other posts. I have the feeling this is going in circles, but let me add one more thing. People would want to have a reasonable way of choosing between T[new] and T[]. The differences between them are subtle (I have two tables showing the primitives of T[] and T[new], which are very similar). That static decision concerns future appends to the array, which doesn't strike me as something you know from the get-go through future iterations of a design. Use of "auto" messes up things further: a nice function may choose to return T[new] because it just created an array (an implementation detail), but clients may find that unexpected.
>
>

Put it simple: T[] is a range, and T[new] is a container. They belong to different leagues.
Yes, there is a lot common between them, because T[] supports some subset of operations that T[new] support.

We are going in circles because you couldn't convince anyone that we don't need dynamic array type in language core. Not unless library types are usable in CTFE (and have nice syntax).