November 19, 2009
dsimcha wrote:
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>> Yes, it will be because the book has a few failing unittests. In fact, I
>> was hoping I could talk you or David into doing it :o).
>> Andrei
> 
> Unfortunately, I've come to hate the MRU idea because it would fail miserably for
> large arrays.  I've explained this before, but not particularly thoroughly, so
> I'll try to explain it more thoroughly here.  Let's say you have an array that
> takes up more than half of the total memory you are using.  You try to append to
> it and:
> 
> 1.  The GC runs.  The MRU cache is therefore cleared.
> 
> 2.  Your append succeeds, but the array is reallocated.
> 
> 3.  You try to append again.  Now, because you have a huge piece of garbage that
> you just created by reallocating on the last append, the GC needs to run again.
> The MRU cache is cleared again.
> 
> 4.  Goto 2.

This is not a matter of principles, but one of implementation. When you GC, you can adjust the cache instead of clearing it.

> Basically, for really huge arrays, we will have the nasty surprise that arrays are
> reallocated almost every time.  Unless something can be done about this, my vote
> is as follows:
> 
> 1.  a ~= b -> syntactic sugar for a = a ~ b.  It's inefficient, but at least it's
> predictably inefficient and people won't use it if they care at all about performance.
> 
> 2.  .length always reallocates when increasing the length of the array.
> 
> 3.  Get a library type with identical syntax to slices that truly owns its
> contents and supports efficient appending, resizing, etc.  I'd be willing to write
> this (and even write it soon) if noone else wants to, especially since I think we
> might be able to use it to define unique ownership for arrays to allow essentially
> a safe assumeUnique for arrays and kill two birds w/ one stone.

UniqueArray is something we need regardless - for message passing.


Andrei
November 19, 2009
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> dsimcha wrote:
> > == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
> >> Yes, it will be because the book has a few failing unittests. In fact, I
> >> was hoping I could talk you or David into doing it :o).
> >> Andrei
> >
> > Unfortunately, I've come to hate the MRU idea because it would fail miserably for large arrays.  I've explained this before, but not particularly thoroughly, so I'll try to explain it more thoroughly here.  Let's say you have an array that takes up more than half of the total memory you are using.  You try to append to it and:
> >
> > 1.  The GC runs.  The MRU cache is therefore cleared.
> >
> > 2.  Your append succeeds, but the array is reallocated.
> >
> > 3.  You try to append again.  Now, because you have a huge piece of garbage that you just created by reallocating on the last append, the GC needs to run again. The MRU cache is cleared again.
> >
> > 4.  Goto 2.
> This is not a matter of principles, but one of implementation. When you GC, you can adjust the cache instead of clearing it.

Technically true, but what is a matter of principles is whether the implementation of arrays should be very tightly coupled to the implementation of the GC.  Fixing this issue would have massive ripple effects throughout the already spaghetti code-like GC, and might affect GC performance.  For every single object the GC freed, it would have to look through the MRU cache and remove it from there if present, too.

The point is that this **can** be done, but we probably don't **want** to introduce this kind of coupling, especially if we want our GC model to be sane enough that people might actually come along and write us a better GC one day.
November 19, 2009
grauzone wrote:
> Andrei Alexandrescu wrote:
>> grauzone wrote:
>>> Andrei Alexandrescu wrote:
>>>> 3. It was mentioned in this group that if getopt() does not work in SafeD, then SafeD may as well pack and go home. I agree. We need to make it work. Three ideas discussed with Walter:
>>>
>>> If that's such an issue, why don't you just change it and use a struct defined by the user? structs are natural name-value pairs that work at compile time, and they can always be returned from functions.
> 
> No reply to this one? I would have been curious.

You mean use a struct for the string-value pair? A struct cannot have a ref member, so it would still need to store a pointer. (But maybe I misunderstood the point.)

Andrei
November 19, 2009
Andrei Alexandrescu wrote:
> opBinary does solve opIndex* morass because it only adds one function per category, not one function per operator. For example:
> 
> struct T {
>     // op can be "=", "+=", "-=" etc.
>     E opAssign(string op)(E rhs) { ... }
>     // op can be "=", "+=", "-=" etc.
>     E opIndexAssign(string op)(size_t i, E rhs) { ... }
> }

I see, that makes sense now. It shouldn't be "+=", though (rather "+"), because it makes chaining harder.
November 19, 2009
Andrei Alexandrescu wrote:
> You mean use a struct for the string-value pair? A struct cannot have a ref member, so it would still need to store a pointer. (But maybe I misunderstood the point.)

Like this:

void main(string[] commandline) {
	struct Args {
		string param1 = "can even have default arguments";
		int param2;
	}
	Args args = getopt!(Args)(commandline);
	writefln("param1 = %s", args.param1);
}

No pointers. Instead of returning it, struct could be passed by ref, too.
November 19, 2009
dsimcha wrote:
> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>> dsimcha wrote:
>>> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>>>> Yes, it will be because the book has a few failing unittests. In fact, I
>>>> was hoping I could talk you or David into doing it :o).
>>>> Andrei
>>> Unfortunately, I've come to hate the MRU idea because it would fail miserably for
>>> large arrays.  I've explained this before, but not particularly thoroughly, so
>>> I'll try to explain it more thoroughly here.  Let's say you have an array that
>>> takes up more than half of the total memory you are using.  You try to append to
>>> it and:
>>>
>>> 1.  The GC runs.  The MRU cache is therefore cleared.
>>>
>>> 2.  Your append succeeds, but the array is reallocated.
>>>
>>> 3.  You try to append again.  Now, because you have a huge piece of garbage that
>>> you just created by reallocating on the last append, the GC needs to run again.
>>> The MRU cache is cleared again.
>>>
>>> 4.  Goto 2.
>> This is not a matter of principles, but one of implementation. When you
>> GC, you can adjust the cache instead of clearing it.
> 
> Technically true, but what is a matter of principles is whether the implementation
> of arrays should be very tightly coupled to the implementation of the GC.  Fixing
> this issue would have massive ripple effects throughout the already spaghetti
> code-like GC, and might affect GC performance.  For every single object the GC
> freed, it would have to look through the MRU cache and remove it from there if
> present, too.
> 
> The point is that this **can** be done, but we probably don't **want** to
> introduce this kind of coupling, especially if we want our GC model to be sane
> enough that people might actually come along and write us a better GC one day.

I agree. But probably there might be a solution out there that solves this issue in a better way. We should keep on looking. Maybe something that refines the MRU, or something that obviates the MRU altogether.

Andrei
November 19, 2009
On Thu, 19 Nov 2009 12:01:25 -0500, dsimcha <dsimcha@yahoo.com> wrote:

> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>> dsimcha wrote:
>> > == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s  
>> article
>> >> Yes, it will be because the book has a few failing unittests. In  
>> fact, I
>> >> was hoping I could talk you or David into doing it :o).
>> >> Andrei
>> >
>> > Unfortunately, I've come to hate the MRU idea because it would fail  
>> miserably for
>> > large arrays.  I've explained this before, but not particularly  
>> thoroughly, so
>> > I'll try to explain it more thoroughly here.  Let's say you have an  
>> array that
>> > takes up more than half of the total memory you are using.  You try  
>> to append to
>> > it and:
>> >
>> > 1.  The GC runs.  The MRU cache is therefore cleared.
>> >
>> > 2.  Your append succeeds, but the array is reallocated.
>> >
>> > 3.  You try to append again.  Now, because you have a huge piece of  
>> garbage that
>> > you just created by reallocating on the last append, the GC needs to  
>> run again.
>> > The MRU cache is cleared again.
>> >
>> > 4.  Goto 2.
>> This is not a matter of principles, but one of implementation. When you
>> GC, you can adjust the cache instead of clearing it.
>
> Technically true, but what is a matter of principles is whether the implementation
> of arrays should be very tightly coupled to the implementation of the GC.  Fixing
> this issue would have massive ripple effects throughout the already spaghetti
> code-like GC, and might affect GC performance.  For every single object the GC
> freed, it would have to look through the MRU cache and remove it from there if
> present, too.

You perform the lookup via MRU cache (after mark, before sweep).  I see it as a single function call at the right place in the GC.

> The point is that this **can** be done, but we probably don't **want** to
> introduce this kind of coupling, especially if we want our GC model to be sane
> enough that people might actually come along and write us a better GC one day.

What about implementing it as a hook "do this between mark and sweep"?  Then it becomes decoupled from the GC.

-Steve
November 19, 2009
grauzone wrote:
> Andrei Alexandrescu wrote:
>> You mean use a struct for the string-value pair? A struct cannot have a ref member, so it would still need to store a pointer. (But maybe I misunderstood the point.)
> 
> Like this:
> 
> void main(string[] commandline) {
>     struct Args {
>         string param1 = "can even have default arguments";
>         int param2;
>     }
>     Args args = getopt!(Args)(commandline);
>     writefln("param1 = %s", args.param1);
> }
> 
> No pointers. Instead of returning it, struct could be passed by ref, too.

I think that's a good idea, but we still need a means to express the name of the parameters e.g. for mapping "--multi-word-option" to multiWordOption. Then I fear things will inevitably get thicker, and simplicity was the main attractiveness of getopt.

(FWIW, getopt is somewhat dear to me; it was one of the first pieces of code I wrote in D. I'd dreamed for such a simple facility for years, and I was very glad that it was doable in D. If getopt starts becoming bigger and more nontrivial to use, I think some of the elegance of its initial design would be lost.)


Andrei
November 19, 2009
Steven Schveighoffer wrote:
> On Thu, 19 Nov 2009 12:01:25 -0500, dsimcha <dsimcha@yahoo.com> wrote:
> 
>> == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s article
>>> dsimcha wrote:
>>> > == Quote from Andrei Alexandrescu (SeeWebsiteForEmail@erdani.org)'s 
>>> article
>>> >> Yes, it will be because the book has a few failing unittests. In 
>>> fact, I
>>> >> was hoping I could talk you or David into doing it :o).
>>> >> Andrei
>>> >
>>> > Unfortunately, I've come to hate the MRU idea because it would fail 
>>> miserably for
>>> > large arrays.  I've explained this before, but not particularly 
>>> thoroughly, so
>>> > I'll try to explain it more thoroughly here.  Let's say you have an 
>>> array that
>>> > takes up more than half of the total memory you are using.  You try 
>>> to append to
>>> > it and:
>>> >
>>> > 1.  The GC runs.  The MRU cache is therefore cleared.
>>> >
>>> > 2.  Your append succeeds, but the array is reallocated.
>>> >
>>> > 3.  You try to append again.  Now, because you have a huge piece of 
>>> garbage that
>>> > you just created by reallocating on the last append, the GC needs 
>>> to run again.
>>> > The MRU cache is cleared again.
>>> >
>>> > 4.  Goto 2.
>>> This is not a matter of principles, but one of implementation. When you
>>> GC, you can adjust the cache instead of clearing it.
>>
>> Technically true, but what is a matter of principles is whether the implementation
>> of arrays should be very tightly coupled to the implementation of the GC.  Fixing
>> this issue would have massive ripple effects throughout the already spaghetti
>> code-like GC, and might affect GC performance.  For every single object the GC
>> freed, it would have to look through the MRU cache and remove it from there if
>> present, too.
> 
> You perform the lookup via MRU cache (after mark, before sweep).  I see it as a single function call at the right place in the GC.
> 
>> The point is that this **can** be done, but we probably don't **want** to
>> introduce this kind of coupling, especially if we want our GC model to be sane
>> enough that people might actually come along and write us a better GC one day.
> 
> What about implementing it as a hook "do this between mark and sweep"?  Then it becomes decoupled from the GC.
> 
> -Steve

I think these are great ideas, but you'd need to transport certain information to the cache so it can adjust its pointers. Anyhow, I believe this is worth exploring because it can help with a great many other things such as weak pointers and similar checks and adjustments (there was a paper on GC assertions that I don't have time to dig right now. Aw what the heck, found it:

http://www.eecs.tufts.edu/~eaftan/gcassertions-mspc-2008.pdf


Andrei
November 19, 2009
Andrei Alexandrescu pisze:
> We're entering the finale of D2 and I want to keep a short list of things that must be done and integrated in the release. It is clearly understood by all of us that there are many things that could and probably should be done.
> 
> 1. Currently Walter and Don are diligently fixing the problems marked on the current manuscript.
> 
> 2. User-defined operators must be revamped. Fortunately Don already put in an important piece of functionality (opDollar). What we're looking at is a two-pronged attack motivated by Don's proposal:
> 
> http://prowiki.org/wiki4d/wiki.cgi?LanguageDevel/DIPs/DIP7
> 
> The two prongs are:
> 
> * Encode operators by compile-time strings. For example, instead of the plethora of opAdd, opMul, ..., we'd have this:
> 
> T opBinary(string op)(T rhs) { ... }
> 
> The string is "+", "*", etc. We need to design what happens with read-modify-write operators like "+=" (should they be dispatch to a different function? etc.) and also what happens with index-and-modify operators like "[]=", "[]+=" etc. Should we go with proxies? Absorb them in opBinary? Define another dedicated method? etc.
> 
> * Loop fusion that generalizes array-wise operations. This idea of Walter is, I think, very good because it generalizes and democratizes "magic". The idea is that, if you do
> 
> a = b + c;
> 
> and b + c does not make sense but b and c are ranges for which a.front = b.front + c.front does make sense, to automatically add the iteration paraphernalia.
> 
> 3. It was mentioned in this group that if getopt() does not work in SafeD, then SafeD may as well pack and go home. I agree. We need to make it work. Three ideas discussed with Walter:
> 
> * Allow taking addresses of locals, but in that case switch allocation from stack to heap, just like with delegates. If we only do that in SafeD, behavior will be different than with regular D. In any case, it's an inefficient proposition, particularly for getopt() which actually does not need to escape the addresses - just fills them up.
> 
> * Allow @trusted (and maybe even @safe) functions to receive addresses of locals. Statically check that they never escape an address of a parameter. I think this is very interesting because it enlarges the common ground of D and SafeD.
> 
> * Figure out a way to reconcile "ref" with variadics. This is the actual reason why getopt chose to traffic in addresses, and fixing it is the logical choice and my personal favorite.
> 
> 4. Allow private members inside a template using the eponymous trick:
> 
> template wyda(int x) {
>    private enum geeba = x / 2;
>    alias geeba wyda;
> }
> 
> The names inside an eponymous template are only accessible to the current instantiation. For example, wyda!5 cannot access wyda!(4).geeba, only its own geeba. That we we elegantly avoid the issue "where is this symbol looked up?"
> 
> 5. Chain exceptions instead of having a recurrent exception terminate the program. I'll dedicate a separate post to this.
> 
> 6. There must be many things I forgot to mention, or that cause grief to many of us. Please add to/comment on this list.
> 
> 
> 
> Andrei

I kinda like this proposal. But I would rather call template like below:

T opInfix(string op)(T rhs) { ... }
T opPrefix(string op)(T rhs) { ... }
T opPostfix(string op)(T rhs) { ... }

and allow user to define her own operators (though it doesn't have to be done now).

I know that quite a few people here doesn't like to allow users to define their own operators, because it might obfuscate code. But it doesn't have to be like this. Someone here already mentioned here that it is not real problem for programs in C++. Good libraries don't abuse this functionality.

User defined operators would allow easy definition of Domain Specific Languages in D. I was already writing about it some time ago:

http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=81026
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=81352

BR
Marcin Kuszczak
(aarti_pl)