Round III, Brainstorm, Re: Immutable arrays for Walter and rest of us.

Let's say we decided that D needs
read-only references and read-only slices
(aka const references and const slices).

Let's say that compiler can verify
constness with the same quality
as detecting that e.g. char[] type has
no property 'foobar'.

Then I would like to brainstorm about
following subjects:

1) Reasonable model of "costness"

    E.g. implicit constness (C++ style)
    const typename& - can be applied
    to any type - new const type created behind
    the scene.

    Or explicit constness. E.g. Author
    can define new array type having
    no modifying members - define new
    const type explicitly.

    Or Delphi style - const parameter is
    only for string (builtin) type.

    Or [ your choice is here ]

2) Notation. E.g. how readonly slice will look like
     in code. Function parameters, etc.

3) How to avoid "clutter". This term appears here
     not once - means that it is reasonable - so
     we need to deal with it.

-----------------------------------
Please follow rules of brainstrom in this topic:
1) No critique. Like "this is bad".
2) No applauses. Like "this is cool" .
3) Answers on answers shall contain
    only statements evolving original idea.
4) Please be as much brief as possible.

This is an experiment. I many times did brainstorm
in team but never in the news conference.
In real life BS session is limited by 16-20 minutes.
Here let's say it will be one day.

Who'll be the first?

July 04, 2005

Re: Round III, Brainstorm, Re: Immutable arrays for Walter and rest of us.

Posted by Uwe Salomon
in reply to Andrew Fedoniouk

Permalink

Uwe Salomon

Posted in reply to Andrew Fedoniouk

Permalink

This is too long, i know. Sorry. To sum up the contents:

* Model: Needs const variables and return types, not only const parameters.
* Notation: BasicType ConstModifier RefModifier, for example "char fixed[]".
* Clutter: You cannot avoid it with these requirements.




1) Reasonable model of "constness"

Most of you seem to want a mechanism that makes it possible for an object (read as: "some logical part of the program") to return some piece of information as a read-only reference. Examples:

* Slice of an array that must not be modified.
* Reference to a container of objects that only allows traversals and reading.


As the receiver of this information has to "prove" to the compiler that he will not modify the read-only reference, there must be a mechanism to declare and check that, too. In more complex programs the receiver often does not work on the information himself, but passes them to some other part of the program ─ which must in turn ensure that he will not modify it.


Any implementation of constness that does not provide the above points is not useful as a help for designing & debugging of a program. Thus it will not satisfy those of you who want exactly that, those who want something like C++ const. It may be useful for program optimization (speed), but that is rarely requested apparently.



2) Notation.

As we need not only const function parameters, but also const return types and const variables (see above), there should be a syntax like this:

BasicType ConstModifier RefModifier

BasicType is int, char, ...
ConstModifier could be "immutable", "#", "fixed", ...
RefModifier is *, []


Examples:

char#[] is an array whose contents must not be changed. BUT the array length etc. may be changed.
int fixed* is a pointer to an integer that must not be changed. BUT the pointer address may be changed.



I do not know how to make this syntax work with object references (as there is no RefModifier there, they are implicitly references).



3) Clutter.

In a well-designed program each part should be as independent from the others as possible. This includes that he implements not only the operations that are needed by the rest of the program, but the "full range of reasonable operations".


This makes it necessary for every container to provide a non-const API (containers whose contents cannot be changed are useless) and a const API (otherwise it cannot be passed to program parts that are not allowed to modify the contents).


This makes it necessary for every part of the program that does not modify some parameter to declare this parameter const.


This makes it necessary for every user-defined type to define all its methods const that do not change its contents, that do not even make it possible to change its contents.


You cannot avoid clutter with these requirements.


Ciao
uwe

July 04, 2005

Explicit contsness, variant I

Posted by Andrew Fedoniouk
in reply to Andrew Fedoniouk

Permalink

Andrew Fedoniouk

Posted in reply to Andrew Fedoniouk

Permalink

(this is a formalization of the proposal I made before)

1) Reasonable model of "costness"

Preconditions:
a) Now it is already possible to implement "readonlyness"
for classes and structures "by hands" (see [1] and [2]) and
b) completely impossible to implement "readonlyness" for arrays(slices)
and pointers.

Proposal:

to introduce two new types: readonly array/slice
and readonly pointer.

readonly array/slice type has the same set of attributes
as array/slice except of 'mutating' methods: opIndexAssign
and length(int newLength)

readonly pointer cannot  be used for 'dereferncing
l-values' :
*readonly_pointer = something  // invalid opearation.

2) Notation.

As current proposal is limited only by
arrays and pointers cases then it is possible to
avoid use of any keyword like 'const',
'readonly' or 'fixed'.

Proposal: to use "readonly brackets" and "readonly star" - special
tokens like:
typename #[]
typename #*
or
typename []#
typename *#

3) How to avoid "clutter".

    Usually "const clutter" means [3]:
    a) "const is a pain to write everywhere,"
    b) "If I use it in one place, I have to use it all the time."

As given proposal is not forcing constness propagation
then b) problem is significantly less than as
with C++ const use cases.

Short notation (only one #) reduces a) case.
In any case one of the purposes of aliasing is to reduce
all sorts of clutters, e.g.:
     alias wchar#[] string;

---------------------------------------------------------------

[1] Ben Hinkle, MinTL:
struct Deque(..., bit ReadOnly = false, ...)
http://home.comcast.net/~benhinkle/mintl/

[2] Kris Bell, Mango Tree library
Dictionary, MutableDictionary
http://mango.dsource.org/Dictionary_8d-source.html

[3] Herb Sutter:
http://www.gotw.ca/publications/advice98.htm
Scott Meyers:
http://artima.com/intv/const.html

July 05, 2005

Explicit constness, vatiant II

Posted by Andrew Fedoniouk
in reply to Andrew Fedoniouk

Permalink

Andrew Fedoniouk

Posted in reply to Andrew Fedoniouk

Permalink

Proposal also based on assumption that
readonly type is a base type having mutating methods
removed/disabled.

Idea is simple - to allow struct to derive from primitive types (including struct itself). Having this it will be possible to define readonly array as:

struct readonly(T:T[]): T[]
{
   private T opIndexAssign(T v, uint i) { return v; } // private -
                  // disabling it for the outer world.
   private uint length( uint nl ) { return length(); }  // private ...

   readonly opSlice() ....
   readonly opSlice(uint a, uint b) ....

}

so string (a.k.a. string-value) will look like as

alias readonly!(char[]) string;

----------------------------------------
openrj.d collections might look like

class Recordest
{
    private Field[] _fields;

    alias readonly!(Field[]) Fields;

    Fields fields() {  return cast(Fields) _fields;  }
}

-------------------------------------------
Question remains: what to do with pointers in this case?.
--------------------------------------------

Andrew.

July 05, 2005

Re: Round III, Brainstorm, Re: Immutable arrays for Walter and rest of us.

Posted by Nod
in reply to Andrew Fedoniouk

Permalink

Nod

Posted in reply to Andrew Fedoniouk

Permalink

Heh, I can never resist a brainstorm; my mind thrives in such weather :)

NB: I have not followed this thread from the beginning, so there may be rehashing, misunderstandings, and general unimplementability. Bash away.

>1) Reasonable model of "costness"

The model I suggest is like the Delphi model, only slightly different. This model does not give us general constness, only constness for array contents, and only when passing function boundaries.

I suggest a model where "constness" is only a contract like any other. The compiler does a best-effort to detect obvious violations at compile-time, and inserts thorough run-time checks for debug code. In this model, const violations can occur, but should be detected in the debugging phase.

This model may seem limited, but I believe it to be sufficient. Remember, we are not trying to emulate strict hardware read-only-ness, we are only trying to avoid accidental alteration by code which does not "own" the data.

>2) Notation. E.g. how readonly slice will look like
>     in code. Function parameters, etc.

When defining a function, we must be able to set constness both on entry and return of the function. One could use the in/inout keywords to mean readonly/readwrite respectively, though this would be quite unintuitive on the return value. Defining the keywords ro/rw to use for this may be better.

Example:
char[rw] toUpper(char[rw] s) { ... }
char[rw] toUpper(char[ro] s) { ... }

>3) How to avoid "clutter". This term appears here
>     not once - means that it is reasonable - so
>     we need to deal with it.

Default to the common case. The ro/rw keywords are not required. If they are left out, we default to whichever is most common.

char[] toUpper(char[] s) { ... } // uses defaults

The common case can shift depending on context, and on whether we are receiving or returning the array.

* When passing an array to a function, read-only is the common case. Thus, one has to explicitly use the "rw" keyword in the prototype if the function alters the array contents.

* When returning arrays from functions, and the array was passed to us, the common case is that we can return it as read-write. As such, no "rw" keyword is necessary, in the general case.

* When passing an array which is a member variable of the current class, it should be passed read-write between member functions.

This list is not exhaustive, of course, but the rest of the defaults are only logical, e.g. always return a private member as read-only.

>4) Please be as much brief as possible.

Well, at least I tried :)

-Nod-

July 05, 2005

Re: Round III, Brainstorm, Re: Immutable arrays for Walter and rest of us.

Posted by Eugene Pelekhay
in reply to Andrew Fedoniouk

Permalink

Eugene Pelekhay

Posted in reply to Andrew Fedoniouk

Permalink

Andrew Fedoniouk wrote:
> Let's say we decided that D needs
> read-only references and read-only slices
> (aka const references and const slices).
> 
Here is my proposition

Assume that by default reference/slice/pointer is immutable
: // mutable string
: var char[] s1;
: // immutable string
: char[] s2; 	
: // slices
: s2 = s1; // legal
: s1 = s2; // illegal
: s2 = s1[1..2]; // legal
: s1 = s2[1..2]; // illegal
: s1 ~= s2; // legal
: s2 ~= s1; // illegal

: // function returns mutable string
: var char[] func1(){}
: // function returns immutable string
: char[] func2(){}

: // function with argument of immutable string
: void func3(char[] s) {}
: // function with argument of reference to immutable string
: // reference assigned during function call and content can't be
: // changed in caller trough returned reference
: void func4(out char[] s) {}
: // function with argument of initialized immutable string
: // that reference can be changed during function call
: void func5(inout char[] s) {}
: // function with argument of mutable string
: // content of string can be changed during function call
: void func6(var char[] s) {}
: // function with argument of reference to mutable string
: // content of string can be changed outside of function
: // through returned reference
: void func7(out var char[] s) {}
: // function with argument of reference to initialized mutable string
: // content of string can be changed outside of function or during call
: // through returned/passed reference
: void func8(inout var char[] s) {}

:
: // slightly modified code from Regan
: // news://news.digitalmars.com:119/opstczhfvz23k2f5@nrage.netwin.co.nz
: void foo(char[] string)
: var {
:    void* cmp = null; // defines variable for in/out contracts scope
: }
: in {
:     cmp = realloc(cmp,string.sizeof);
:     memcpy(cmp,&string,string.sizeof);
: }
: out {
:     assert(memcmp(cmp,&string,string.sizeof) == 0);
: }
: body {
:       //this causes the assert, remove it, no assert.
:     string.length = 20;
: }

: // for expression can be extented to support more than one type
: // of variable
: for(var int i=0; var float f=3.14; i<len; ++i) {}

July 05, 2005

Re: Round III, Brainstorm, Re: Immutable arrays for Walter and rest of us.

Posted by Ben Hinkle
in reply to Andrew Fedoniouk

Permalink

Ben Hinkle

Posted in reply to Andrew Fedoniouk

Permalink

>1) Reasonable model of "costness"

Do what Walter suggested for 'in' plus 3 things: 1) 'out' return values 2) 'final' local variables and 3) make violations warn and not error. I think it should be a warning because mixing final/non-final is roughly like passing a signed int to an unsigned int - something fishy is going on but we'll assume the user knows what they are doing until they ask for our advice. In case it isn't obvious an explicit out return value means the output shouldn't be modified by the caller (ie assign it to a final variable or an in parameter). A final local variable is "deep immutable".

>2) Notation. E.g. how readonly slice will look like
>     in code. Function parameters, etc.

void foo(in char[] str); // Walter's idea
out char[] foo(); // means output is read-only
final char[] str = "blah"; // str is read-only

>3) How to avoid "clutter". This term appears here
>     not once - means that it is reasonable - so
>     we need to deal with it.

Avoids clutter by making it a warning so you don't have to go nuts with final if you don't want to.

July 05, 2005

Re: Round III, Brainstorm, Re: Immutable arrays for Walter and rest of us.

Posted by xs0
in reply to Nod

Permalink

xs0

Posted in reply to Nod

Permalink

> When defining a function, we must be able to set constness both on entry and
> return of the function. One could use the in/inout keywords to mean
> readonly/readwrite respectively, though this would be quite unintuitive on the
> return value. Defining the keywords ro/rw to use for this may be better.
> 
> Example:
> char[rw] toUpper(char[rw] s) { ... }
> char[rw] toUpper(char[ro] s) { ... }

I proposed this (with in/out/inout) some time ago, with not much response; anyhow, I still think it's a good idea, so +1 from me :) It may make sense to have write-only parameters, too..


>>3) How to avoid "clutter". This term appears here
>>    not once - means that it is reasonable - so
>>    we need to deal with it.
> 
> 
> Default to the common case. The ro/rw keywords are not required. If they are
> left out, we default to whichever is most common.
> 
> char[] toUpper(char[] s) { ... } // uses defaults
> 
> The common case can shift depending on context, and on whether we are receiving
> or returning the array.
> 
> * When passing an array to a function, read-only is the common case. Thus, one
> has to explicitly use the "rw" keyword in the prototype if the function alters
> the array contents.

Agreed.

> * When returning arrays from functions, and the array was passed to us, the
> common case is that we can return it as read-write. As such, no "rw" keyword is
> necessary, in the general case.

Not agreed. I think read-only arrays would be commonly returned, if such a facility existed. Furthermore, the COW principle suggests that it should be the consumer that .dups, not the producer. Even toUpper from above should not make a copy of the array, if it determines that all characters are upper-case already..

> * When passing an array which is a member variable of the current class, it
> should be passed read-write between member functions.

Why would you pass a member variable between member functions?

> This list is not exhaustive, of course, but the rest of the defaults are only
> logical, e.g. always return a private member as read-only.

Well, I think that the rules should be as simple as possible, so I'd say that the default for both function parameters and return values should be read-only (simply because it is the safest thing to do); the default for variables should be rw.


xs0

July 06, 2005

Re: Round III, Brainstorm, Re: Immutable arrays for Walter and rest of us.

Posted by Regan Heath
in reply to Andrew Fedoniouk

Permalink

Regan Heath

Posted in reply to Andrew Fedoniouk

Permalink

The idea I have been posting for the last few weeks...

> 1) Reasonable model of "costness"

'readonly' type modifier (*)
'in' (implicit/explicit) parameters are 'readonly'

(*) note, this type modifier does *not* create a distinct type, instead it flags a variable as being readonly. More on the flag below. This is the essential and important difference between this and C++ const.

> 2) Notation. E.g. how readonly slice will look like
>      in code. Function parameters, etc.

readonly char[] foo() {} //returns a readonly
char[] p = foo(); //p is readonly (*)
char[] s = p; //s is readonly (*)

void bar(char[] p) {} //p is readonly
void bar(in char[] p) {} //p is readonly

(*) note, the type modifier is not required on variable declarations, instead they become readonly by assignment from a readonly RHS. "readonly char[]" is not a distinct type, it is simply "char[]" with a readonly *compile time* flag set).

It's important to note that this readonly cannot be cast away like C++ const, the only way to get a mutable version of a readonly variable is to use dup, eg.

void foo(char[] p)  //p is readonly
{
  char[] s = p.dup;
  s[0] = 'a';
  ...
}

This essentially induces correct COW behaviour.

> 3) How to avoid "clutter". This term appears here
>      not once - means that it is reasonable - so
>      we need to deal with it.

The common cases require no notation, eg.

parameters - read only by default.
return values - writable by default.

The remaining 'clutter' isn't clutter IMO but a required declaration of the programmers intent. i.e.

void bar(out char[] p) {}   //I will write to 'p'
void bar(inout char[] p) {} //I will read and write to 'p'
readonly char[] p = "test"; //p is readonly
readonly char[] foo() {}    //return is readonly

4) Suggested implementation

A large percentage of readonly violations can be detected at compile time by simply flagging variables during the compile phase, passing that flag on during assignment and giving an error when violations are detected.

For certain cases eg.

char[] s = condition?foo():"";
s[0] = 'a'; //? readonly, or not

The only soln I can imagine is a runtime readonly flag. This might be too much cost for very little additional gain. It could however be enabled/disabled much like unittest or other dbc features are.

In some cases i.e. functions with 'in' parameters runtime protection/detection can be achieved with the following DBC style code which copies and then compares the readonly variables ensuring no violation has occurred. eg.

void foo(char[] p)
in {
  copy = malloc(p.sizeof);
  memcpy(copy,&p,p.sizeof);
}
out() {
  assert(memcmp(copy,&p,p.sizeof) == 0);
}
body {
  //causes violation
  p.length = 20;
}

It's possible this sort of thing could be applied to other scope entrace/exit points.

Regan

July 06, 2005

Re: Round III, Brainstorm, Re: Immutable arrays for Walter and rest

Posted by Nod
in reply to xs0

Permalink

Nod

Posted in reply to xs0

Permalink

In article <dae7tv$9jv$1@digitaldaemon.com>, xs0 says...
>
>> When defining a function, we must be able to set constness both on entry and return of the function. One could use the in/inout keywords to mean readonly/readwrite respectively, though this would be quite unintuitive on the return value. Defining the keywords ro/rw to use for this may be better.
>> 
>> Example:
>> char[rw] toUpper(char[rw] s) { ... }
>> char[rw] toUpper(char[ro] s) { ... }
>
>I proposed this (with in/out/inout) some time ago, with not much response; anyhow, I still think it's a good idea, so +1 from me :) It may make sense to have write-only parameters, too..
>
>

Yeah, I thought it wasn't really a new idea. +1 for you too then, mate :)
I am not sure write-only parameters is needed though... in/out/inout does this
job good enough, me thinks.

>>>3) How to avoid "clutter". This term appears here
>>>    not once - means that it is reasonable - so
>>>    we need to deal with it.
>> 
>> 
>> Default to the common case. The ro/rw keywords are not required. If they are left out, we default to whichever is most common.
>> 
>> char[] toUpper(char[] s) { ... } // uses defaults
>> 
>> The common case can shift depending on context, and on whether we are receiving or returning the array.
>> 
>> * When passing an array to a function, read-only is the common case. Thus, one has to explicitly use the "rw" keyword in the prototype if the function alters the array contents.
>
>Agreed.

Woot!

>
>> * When returning arrays from functions, and the array was passed to us, the common case is that we can return it as read-write. As such, no "rw" keyword is necessary, in the general case.
>
>Not agreed. I think read-only arrays would be commonly returned, if such a facility existed. Furthermore, the COW principle suggests that it should be the consumer that .dups, not the producer. Even toUpper from above should not make a copy of the array, if it determines that all characters are upper-case already..
>

Ah yes, but you are thinking about arrays returned from class methods and such (see below). For regular functions, we can usually return it as rw since when a function exits, it immediately stops owning the array.

>> * When passing an array which is a member variable of the current class, it should be passed read-write between member functions.
>
>Why would you pass a member variable between member functions?
>

Heh, if one has a big list to choose from perhaps? Ok, that was a bad example, but the same would be true for an array passed into a inner class method, which happens a bit more often.

The theory is that when arrays which are hidden by data encapsulation, i.e private, are passed into a scope in which they are normally not visible, they should by default be passed as ro. Conversely, while being passed within scopes which normally *does* have access, the default should be to pass it as rw.

And no, I don't know how feasible it is for the compiler to keep track of all this :)

>> This list is not exhaustive, of course, but the rest of the defaults are only logical, e.g. always return a private member as read-only.
>
>Well, I think that the rules should be as simple as possible, so I'd say that the default for both function parameters and return values should be read-only (simply because it is the safest thing to do); the default for variables should be rw.
>

I agree that the rules should be as simple as possible, but one set of defaults just won't fit all situations. It's a subjective choice, of course, but I'd rather learn two sets of defaults, one for encapsulated data, and one for without, than have to explicitly specify rw-passing within all my classes.

>
>xs0

-Nod-

Top | Forum index | About this forum

Forums