May 12, 2006
For what it's worth, I've been following a derailed thread on comp.lang.c++.moderated where Walter has been discussing foreach (curse the long Google Groups links):

http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/d6695737a74e1853/33810ced7bb0924c

It makes complete sense that the data structure should be considered the loop invariant in foreach and so this may be a lost cause, but I don't want to give up on it without a bit more research, mostly because the design of foreach is so useful that I very much prefer it over the alternatives from a semantic perspective.  If there is a way that optimizations of foreach could be made more flexible without compromising its design intent then fantastic.  If not... well, who knows.


Sean
May 12, 2006
Sean Kelly wrote:
> For what it's worth, I've been following a derailed thread on comp.lang.c++.moderated where Walter has been discussing foreach (curse the long Google Groups links):
> 
> http://groups.google.com/group/comp.lang.c++.moderated/browse_frm/thread/d6695737a74e1853/33810ced7bb0924c 
> 
> 
> It makes complete sense that the data structure should be considered the loop invariant in foreach and so this may be a lost cause, but I don't want to give up on it without a bit more research, mostly because the design of foreach is so useful that I very much prefer it over the alternatives from a semantic perspective.  If there is a way that optimizations of foreach could be made more flexible without compromising its design intent then fantastic.  If not... well, who knows.
> 
> 
> Sean

Straight from the, pardon the phrase, horse's mouth:

"That said, D will probably get some notion of const. But it will be one
that works, i.e. can be relied upon by both the programmer and the
optimizer."

And there was much rejoicing.

-- 
Kyle Furlong // Physics Undergrad, UCSB

"D is going wherever the D community wants it to go." - Walter Bright
May 12, 2006
Walter Bright wrote:
> Sean Kelly wrote:
>> *sigh*  I can see the reason for this, but it dramatically reduces the utility of foreach for me, as it's a very common idiom to want to remove elements during an iteration.  In fact, I already use this technique quite a bit in Thread and ThreadGroup.  That aside, does this restriction also exist for processing AA elements?
> 
> Yes - everything used as an aggregate in a foreach.
> 
>> I ask because this is done on a dynamically allocated list of references rather than against the AA itself, though I suppose that's implementation defined.
> 
> The idea is to enable the compiler to do aggressive loop optimizations, regardless of the aggregate type used, and regardless of whether it is user-defined or built-in.

To play the Devil's advocate... doesn't this imply that the body of an opApply function is also not allowed to make any modifications to the underlying data structure?  As a contrived example, suppose I have a SqlCursor object that performs lazy reads and caches this data once read so the read must only occur once, and that this data is stored in an SList which is used for the actual iteration.  From a code optimization standpoint this seems entirely reasonable.  However, it does seem to violate the general rule you've outlined for what is legal in foreach loops.

You've said that the point of foreach is to allow the compiler to treat the aggregate as a loop invariant, but my understanding of this term implies that only optimizations on the referent of v may be optimized. For example:

    int[] v;
    ...
    foreach( int i; v )
    {
        ...
    }

may be transformed into:

    t1 = v.ptr;
    t2 = v.len;
    t3 = 0;
L1: if( t2 <= t3 ) goto L2;
    ...
    t3 += 1;
    goto L1;
L2: ...

This would explain Derek's experience where sequence modifications were not observed within the loop.  However, without a viable 'const' signifier, I don't think any optimization regarding the sequence itself may be performed:

    t1 = v.ptr;
    t2 = v.len;
    t3 = 0;
    if( t2 <= 0  ) goto L2;
    t4   = alloca( t2 )
    t4[] = t1[0 .. t2]
L1: if( t2 <= t3 ) goto L2;
    ...
    t3 += 1;
    goto L1;
L2: ...

Is this correct?  The first case would explain why rehashing an AA within a foreach loop would likely result in undefined behavior, as would an append-type operation on dynamic arrays.  But more structurally cohesive data structures (ie. those for which modifications don't result in iterator invalidation in C++: map, list, etc) should be fairly compatible with foreach even in the face of such changes regardless of whether such detailed rules would be advisable within a language spec.

Regarding the 'const' label.  The only way I can see such an idea working in a way that supports compiler optimizations is to make it a dynamic storage attribute of sorts.  That is, the const-ness must be applied to the data, not to the reference.  I simply can't think of any option where the traditional approach would work:

    char[]       ptr = "";
    const char[] ref = ptr;
    ...
    foreach( char c; ref )
    {
        ...
    }

If ptr is ever passed by reference to an opaque function then there is no way to determine whether the data referenced by ref may change within the life of the program.  In fact, the same could be said if ref were ever passed by reference to an opaque function implemented in any language other than D, as one must assume that the const attribute could be ignored.  Frankly, this has me wondering whether it's possible to enforce any sort of const behavior without data movement in the face of aliasing.  For example, in instances where const data is being passed to an opaque code segment it should be possible to instead make a copy in a write-protected memory page and pass a reference to that instead, similar to how passing structs by value to functions works now.  This would result in a run-time error if a modification occurred to such data, similar to what happens for modifications to const data in D now.  But as far as I know it isn't possible to mark arbitrary memory ranges as read-only.

In the long-term, if transactional memory ever becomes a reality then it may be possible to exploit this as a lightweight way of enforcing const-ness: cache the data somewhere and if the cache is ever invalidated then signal an error.  But this may not be an option for const data that has a long lifetime.


Sean
May 12, 2006
Kyle Furlong wrote:
> 
> Straight from the, pardon the phrase, horse's mouth:
> 
> "That said, D will probably get some notion of const. But it will be one
> that works, i.e. can be relied upon by both the programmer and the
> optimizer."
> 
> And there was much rejoicing.

Personally, I'll wait to rejoice until this actually happens, as it seems a difficult problem to solve.  See the bottom of my other post for a few comments on why.


Sean
May 13, 2006
Sean Kelly wrote:
> Walter Bright wrote:
>> Sean Kelly wrote:
>>> *sigh*  I can see the reason for this, but it dramatically reduces the utility of foreach for me, as it's a very common idiom to want to remove elements during an iteration.  In fact, I already use this technique quite a bit in Thread and ThreadGroup.  That aside, does this restriction also exist for processing AA elements?
>>
>> Yes - everything used as an aggregate in a foreach.
>>
>>> I ask because this is done on a dynamically allocated list of references rather than against the AA itself, though I suppose that's implementation defined.
>>
>> The idea is to enable the compiler to do aggressive loop optimizations, regardless of the aggregate type used, and regardless of whether it is user-defined or built-in.
> 
> To play the Devil's advocate... doesn't this imply that the body of an opApply function is also not allowed to make any modifications to the underlying data structure?

Yes, it does imply that.

> As a contrived example, suppose I have a SqlCursor object that performs lazy reads and caches this data once read so the read must only occur once, and that this data is stored in an SList which is used for the actual iteration.  From a code optimization standpoint this seems entirely reasonable.  However, it does seem to violate the general rule you've outlined for what is legal in foreach loops.
> 
> You've said that the point of foreach is to allow the compiler to treat the aggregate as a loop invariant, but my understanding of this term implies that only optimizations on the referent of v may be optimized. For example:
> 
>     int[] v;
>     ...
>     foreach( int i; v )
>     {
>         ...
>     }
> 
> may be transformed into:
> 
>     t1 = v.ptr;
>     t2 = v.len;
>     t3 = 0;
> L1: if( t2 <= t3 ) goto L2;
>     ...
>     t3 += 1;
>     goto L1;
> L2: ...
> 
> This would explain Derek's experience where sequence modifications were not observed within the loop.  However, without a viable 'const' signifier, I don't think any optimization regarding the sequence itself may be performed:
> 
>     t1 = v.ptr;
>     t2 = v.len;
>     t3 = 0;
>     if( t2 <= 0  ) goto L2;
>     t4   = alloca( t2 )
>     t4[] = t1[0 .. t2]
> L1: if( t2 <= t3 ) goto L2;
>     ...
>     t3 += 1;
>     goto L1;
> L2: ...
> 
> Is this correct?  The first case would explain why rehashing an AA within a foreach loop would likely result in undefined behavior, as would an append-type operation on dynamic arrays.  But more structurally cohesive data structures (ie. those for which modifications don't result in iterator invalidation in C++: map, list, etc) should be fairly compatible with foreach even in the face of such changes regardless of whether such detailed rules would be advisable within a language spec.

If you don't mess with the iteration state, it will probably work. But the spec won't guarantee it will work, as the intention of foreach is to enable aggressive compiler optimizations. I want to leave the door as wide open to that as possible.

> If ptr is ever passed by reference to an opaque function then there is no way to determine whether the data referenced by ref may change within the life of the program.  In fact, the same could be said if ref were ever passed by reference to an opaque function implemented in any language other than D, as one must assume that the const attribute could be ignored.  Frankly, this has me wondering whether it's possible to enforce any sort of const behavior without data movement in the face of aliasing.  For example, in instances where const data is being passed to an opaque code segment it should be possible to instead make a copy in a write-protected memory page and pass a reference to that instead, similar to how passing structs by value to functions works now.  This would result in a run-time error if a modification occurred to such data, similar to what happens for modifications to const data in D now.  But as far as I know it isn't possible to mark arbitrary memory ranges as read-only.

It isn't possible to prevent the programmer from subverting the rules and modifying memory. The only thing that can be done is to label such as "undefined behavior".
May 13, 2006
Walter Bright wrote:
> 
> If you don't mess with the iteration state, it will probably work. But the spec won't guarantee it will work, as the intention of foreach is to enable aggressive compiler optimizations. I want to leave the door as wide open to that as possible.

Understood.

>> If ptr is ever passed by reference to an opaque function then there is no way to determine whether the data referenced by ref may change within the life of the program.  In fact, the same could be said if ref were ever passed by reference to an opaque function implemented in any language other than D, as one must assume that the const attribute could be ignored.  Frankly, this has me wondering whether it's possible to enforce any sort of const behavior without data movement in the face of aliasing.  For example, in instances where const data is being passed to an opaque code segment it should be possible to instead make a copy in a write-protected memory page and pass a reference to that instead, similar to how passing structs by value to functions works now.  This would result in a run-time error if a modification occurred to such data, similar to what happens for modifications to const data in D now.  But as far as I know it isn't possible to mark arbitrary memory ranges as read-only.
> 
> It isn't possible to prevent the programmer from subverting the rules and modifying memory. The only thing that can be done is to label such as "undefined behavior".

Exactly.  And this was the motivation for my suggestion above.  If there is no way to prevent the user from subverting a system built upon reference attributes, then the only alternative I can think of would be to build it upon the data itself.  D already has the const storage type which does much the same thing: if the user manages to outfox the compiler and modify data in ROM an error will occur.  A similar measure of protection could be extended to dynamic data using the page protection features supplied by the memory manager, but there are some obvious problems:

- To add data to a protected page the page must either be temporarily unprotected, making it vulnerable to modification by another thread, or it must be modified and the resulting error ignored (which is quite slow).

- Such protection is far from granular, as it requires copying data the compiler cannot guarantee will not be modified by user code into a protected memory page.  But as far as I'm aware, there is no other way to obtain low-level support for read-only data.

I can think of a few possible workarounds to the above problems, but all of them would require special handing of references to const data, and this seems unacceptable for a language like D.  But a data-oriented const system seems the only option for something that could actually be exploited by an optimizer.  It's just too easy to work around any such attempt built into the language syntax itself.  Heck, inline assembler is an obvious and huge back-door to any syntax-oriented protection mechanism.  There's simply no point in even considering that route.


Sean
May 14, 2006
"Chris Miller" <chris@dprogramming.com> wrote in message news:op.s9dvy3lhpo9bzi@moe...
>A new linked list module has been released at http://www.dprogramming.com/list.php
> You may be wondering why another linked list; check out the FAQ to see!

Yep, intrusive lists are way to go.

BTW: There was another one http://www.digitalmars.com/d/archives/digitalmars/D/dtl/378.html , also intrusive list with support of hierarchical collections - trees.

Andrew Fedoniouk.
http://terrainformatica.com


May 16, 2006
"Andrew Fedoniouk" <news@terrainformatica.com> wrote in news:e4689r$cmm$1@digitaldaemon.com:

> "Chris Miller" <chris@dprogramming.com> wrote in message news:op.s9dvy3lhpo9bzi@moe...
>>A new linked list module has been released at http://www.dprogramming.com/list.php
>> You may be wondering why another linked list; check out the FAQ to see!
> 
> Yep, intrusive lists are way to go.
> 
> BTW: There was another one http://www.digitalmars.com/d/archives/digitalmars/D/dtl/378.html , also intrusive list with support of hierarchical collections - trees.
> 
> Andrew Fedoniouk.
> http://terrainformatica.com
> 
> 
> 

Since I read an article from Jiri Soukup about intrusive data structures back in 1998, I'm a lttle bit biased towards them. My Nova project on dsource hosts currently implementations for

- intrusive single linked list
- intrusive double linked list
- intrusive doublelink
- intrusive red/black tree

See http://www.dsource.org/projects/nova/wiki/WikiStart

Previously I had implemented some of them in C++, but the template system of D and the support for delegates improves the usability a lot.

Further references for intrusive data structures are:

boost::intrusive - the idea to use accessor classes for Nova comes from this project.

In the meantime Jiri Soukup drives his own open source project dedicated exclusively to intrusive data structures. It is called incode and hosted on sourceforge (http://incode.sourceforge.net).


Klaus Oberhofer


May 16, 2006
Some updates to the linked list module:  http://www.dprogramming.com/list.php

filter() added to allow removing list items, since foreach does not allow it; ListHead added; and more.
May 16, 2006
Sean Kelly wrote:
> Exactly.  And this was the motivation for my suggestion above.  If there is no way to prevent the user from subverting a system built upon reference attributes, then the only alternative I can think of would be to build it upon the data itself.  D already has the const storage type which does much the same thing: if the user manages to outfox the compiler and modify data in ROM an error will occur.  A similar measure of protection could be extended to dynamic data using the page protection features supplied by the memory manager, but there are some obvious problems:
> 
> - To add data to a protected page the page must either be temporarily unprotected, making it vulnerable to modification by another thread, or it must be modified and the resulting error ignored (which is quite slow).
> 
> - Such protection is far from granular, as it requires copying data the compiler cannot guarantee will not be modified by user code into a protected memory page.  But as far as I'm aware, there is no other way to obtain low-level support for read-only data.
> 
> I can think of a few possible workarounds to the above problems, but all of them would require special handing of references to const data, and this seems unacceptable for a language like D.  But a data-oriented const system seems the only option for something that could actually be exploited by an optimizer.  It's just too easy to work around any such attempt built into the language syntax itself.  Heck, inline assembler is an obvious and huge back-door to any syntax-oriented protection mechanism.  There's simply no point in even considering that route.

I have been thinking along the lines of making 'const' a promise *by the programmer* not to modify the data, and then the compiler will *assume* the promise is kept. Your idea is a way to enforce the issue. The problem is that changing page protection attributes is very slow - but one could afford the cost in "debug compiles."

This would be a cool way to implement "const-correctness" without the hassle and cruft. There are still problems, though. You can't make part of an object readonly with hardware.