July 13, 2005
Brad Beveridge wrote:

> The key thing being "all this happens transparently to callers", in D's take on COW the callers must explicitly manage their own COW.

Right, it's not that transparent for the routine that modifies a string.
(i.e. if you write to it, you need to .dup it - or similar - yourself)

> Now, as a thought experiment imagine that COW in D was transparent. Imagine that when you returned an array in D it acted just as it does now, except that when you wrote to it, the D compiler/runtime automatically duped it and you wrote to your own array.  Would this make a difference to how we programed?  Certainly it would make the idea that we needed immutable strings go away (all arrays are effectively immutable).  Now of course we would need some mechanism for actually returning a modifiable array.  Thoughts?

It would be nice, but is hard to implement outside the MMU sphere...
See this old post http://www.digitalmars.com/drn-bin/wwwnews?D/27605

--anders
July 13, 2005
On Wed, 13 Jul 2005 15:43:31 -0700, Brad Beveridge <brad@somewhere.net> wrote:
> Anders F Björklund wrote:
>
>> Seriously, D does have two types of strings too - it is "just"
>> that the compiler doesn't enforce any difference between them,
>> so one just has the Gentlemen's Agreement to honor the C-o-w*.
>>  (*See http://en.wikipedia.org/wiki/Copy-on-write, or Phobos doc)
>>
>
> Actually, the Wikipedia entry on COW disagrees with D's take on COW.  From Wikipedia "This fiction can be maintained until a caller tries to modify its "copy" of the resource, at which point a true private copy is created to prevent the changes becoming visible to everyone else. All of this happens transparently to the callers."
>
> The key thing being "all this happens transparently to callers", in D's take on COW the callers must explicitly manage their own COW.
> Now, as a thought experiment imagine that COW in D was transparent. Imagine that when you returned an array in D it acted just as it does now, except that when you wrote to it, the D compiler/runtime automatically duped it and you wrote to your own array.  Would this make a difference to how we programed?  Certainly it would make the idea that we needed immutable strings go away (all arrays are effectively immutable).  Now of course we would need some mechanism for actually returning a modifiable array.  Thoughts?

I think the default case would be mutable, eg
  char[] foo() {}

returns a mutable array, however:
  readonly char[] foo() {}

returns an immutable one.

An array declared in the current scope would default to being mutable, eg.
  char[] bar;

it would be mutable and not create a duplicate when you wrote to it (in this scope). Passing it to an 'in' parameter would treat is as immutable, causing a dup on write, eg.
  void foobar(char[] p) { p[0] = 'a'; }

Assigning a mutable reference to an immutable one would make the mutable immutable. eg.
immutable char[] p;
char[] s;
s = p; //s is now immutable.

The question is how do you implement this sort of thing. It's essentially an Auto Pointer or Reference Counter, there seem to be 2 choices for implementation:

1. runtime mutable bit flag. (if false dup, set to true on copy)
2. 2 distinct types. (writing to immutable type, creates mutable copy)

Though it might be possible for the compiler to determine at compile time the cases where a dup is required and for it to silently insert them into the code during compile. Not sure if that makes sense, just floating the idea.

Regan

p.s. This is essentially the readonly idea I have posted before. What can I say, I think it works as an idea. The implementation is the difficult part. (as Anders has just posted)
July 13, 2005
Regan Heath wrote:

> I think the default case would be mutable, eg
>   char[] foo() {}
> 
> returns a mutable array, however:
>   readonly char[] foo() {}
> 
> returns an immutable one.

I think the default case should be immutable...
AFAIK, it already is - although not enforced ?

If nothing else, "const char *" is boring to type,
and I don't think "readonly char[]" would be less.

Better to have readwrite/mutable for "the other 10%",
than having a readonly/immutable applied to "most" ?

--anders
July 13, 2005
Anders F Björklund wrote:
> Brad Beveridge wrote:
> 
<snip>
> 
> It would be nice, but is hard to implement outside the MMU sphere...
> See this old post http://www.digitalmars.com/drn-bin/wwwnews?D/27605
> 
> --anders

Any particular reason that we need to move outside of the MMU sphere? About the only one that I can think of is the fact that MMUs work with fixed sized pages, so you will loose some memory due to page wastage.

I was thinking it should be possible to do something like

char[] dynamic = "Thlkjdhlaksdjfj";
char[] locked = dynamic.MMUlock;

locked.MMUunlock;

MMUlock dupes the array into a new page & uses OS specific functions to lock down the MMU page to read onlyness.  MMUunlock is not needed, except to release the page.  I don't know if the locked MMU page can be under GC control or not.  Also MMUlock would need to be synchronized.
In this way you will generate runtime exceptions when people accidentally write to the locked array, and you won't incur any runtime overhead.  I might try to implement something like this tonight.

Brad
July 13, 2005
Brad Beveridge wrote:

> Any particular reason that we need to move outside of the MMU sphere? About the only one that I can think of is the fact that MMUs work with fixed sized pages, so you will loose some memory due to page wastage.

I don't know, it just sounded like a portability nightmare - much like how the current handling of "null pointer dereferencing" works in D...?

> I might try to implement something like this tonight.

Let us know how it goes, it all brings back memories of doing hacks like
"video-on-segfault" to optimize emulator video performance... (Basilisk)

--anders
July 13, 2005
On Thu, 14 Jul 2005 01:30:11 +0200, Anders F Björklund <afb@algonet.se> wrote:
> Regan Heath wrote:
>
>> I think the default case would be mutable, eg
>>   char[] foo() {}
>>  returns a mutable array, however:
>>   readonly char[] foo() {}
>>  returns an immutable one.
>
> I think the default case should be immutable...
> AFAIK, it already is - although not enforced ?

Well, COW dictates you should be copying it before you write, so, yes, all arrays are immutable though not enforced.

I think it should be mutable because I think if you return something, most of the time it's the only copy, and you're not holding it anywhere i.e. char[] toString(int i);

Of course class members might return references or slices to internal data, then you'd use 'readonly'. I don't think this is as common as returning a new array, one that can safely be written to.

I might be wrong.

> If nothing else, "const char *" is boring to type,
> and I don't think "readonly char[]" would be less.

True. Which is why I chose 'mutable' as the default, I think it's more common, thus less typing. You obviously disagree. Only code analysis will tell us for sure, and then I suspect different types/areas will show a different result.

> Better to have readwrite/mutable for "the other 10%",
> than having a readonly/immutable applied to "most" ?

I disagree with 10%, and "most", obviously.

Regan
July 13, 2005
Anders F Björklund wrote:
> Brad Beveridge wrote:
> 
>> Any particular reason that we need to move outside of the MMU sphere? About the only one that I can think of is the fact that MMUs work with fixed sized pages, so you will loose some memory due to page wastage.
> 
> 
> I don't know, it just sounded like a portability nightmare - much like how the current handling of "null pointer dereferencing" works in D...?
> 
>> I might try to implement something like this tonight.
> 
> 
> Let us know how it goes, it all brings back memories of doing hacks like
> "video-on-segfault" to optimize emulator video performance... (Basilisk)
> 
> --anders
Well, I'm sure that there will be portibility issues - but as far as I know, all OSes/CPUs that D will run on will support some sort of MMU protection.  The actual handling of the exception trap might be a little hairy (as you say, like the null pointer now) - but I think all debuggers will trap the exception if nothing else.

Brad
July 14, 2005
This is cool, except it's not documented.

Can I assume that this is how native-type-methods are implemented in D?
And is that behaviour intended or is it just a bug? (I'm referring to the behaviour that Andrew explained)

Walter? Are you reading this?

Andrew Fedoniouk wrote:
>>OTOH, a char[] isn't really a full fledged string, even with slicing. It's quite close, but not quite there.  To get to String status you need to do something like importing std.string. But char[]'s really build in most features that need to be built in.  Basically what should be added is some special magic so that if you import std.string, the std.string methods are all available as methods of variables/literals of type char[], so you would be able to do things like:
>>j = "This is a test".rfind("is");
> 
> 
> Try to write it:
> "This is a test".rfind("is");
> 
> It should run flawlessly. By some accident D allows to do such
> trick with array parameters.
> 
> 
> 
>>Even that isn't quite perfect.  It would be nice to be able to do things like:
>>s = "This is a test";
>>s.removechars("/is/");  //   N.B.:  Note the /'s this should mean
>>                        //       "interpret this as a pattern"
>>Notice that the syntax has changed a bit from the current usage as this is now a method invocation rather than a function call.
> 
> 
> As I mentioned if  removechars is declared as
> void removechars(char[],char[]) then you can call it as
> 
> s.removechars("/is/");
> 
> I wouldn't rely on this behavior though as it is not specified - "D's hidden treasure"
> 
> 
July 14, 2005
Well, I have done a very simple proof of concept.  It only works under linux, and basically memory protecting pages in userspace isn't as nice or easy as I would like.  You basically need to create an actual file, and them mmap it.  Though I don't think the disk file actually gets anything written to.  To implement this properly you would need to be able to unlock memory pages, hook into the GC somehow, and reclaim empty areas of the memory mapped file.

In short, this is possible to do but pretty ugly to implement, and locking/unlocking may be expensive.  Though if the runtime implemented it then average joe programmers wouldn't see the ugliness.

Here is the code, do what you will with it :)

Brad

import std.stdio;
import std.c.linux.linuxextern;
import std.c.linux.linux;
import std.string;

int global_fd;

static this ()
{
    char[] junk = "t";
    global_fd = open ("tempfile", O_CREAT | O_RDWR, 0600);
    if (global_fd == -1)
    {
        writefln ("Error 0");
    }
    // don't know why I have to do this, but if the file is empty you
    // cannot write to it after it is mmaped
    write (global_fd,junk, 1);
}

static ~this ()
{
    close (global_fd);
    remove ("tempfile");
}

char[] MMU_lock(char[] array)
{
    void *p;
    void *r;
    char[] readonly;
    p = mmap(null, array.length, PROT_READ | PROT_WRITE, MAP_SHARED, global_fd, 0);
    if (p == MAP_FAILED)
    {
        writefln ("Error 1");
        return null;
    }
    writefln ("P is %x", p);
    memcpy (p, array, array.length);

    r = mmap(null, array.length, PROT_READ, MAP_SHARED, global_fd, 0);
    munmap(p, array.length);
    if (r == MAP_FAILED)
    {
        writefln ("Error 2");
        return null;
    }
    writefln ("R is %x", r);
    readonly = cast(char[])r[0 .. array.length];
    return readonly;
}

char[] MMU_unlock(char[] array)
{
}

int main(char[][] args)
{
    char[] stat = "This is a static String";
    char[] d = stat.dup;
    d[0] = 't';
    char[] ro = MMU_lock(d);
    writefln ("ro string is :: %s %d %x", ro, ro.length, ro.ptr);
    writefln ("About to write to ro string");
    ro[0] = 'T';  // this causes a segfault
    writefln ("Done and finished");
    return 1;
}
July 14, 2005
"brad beveridge" <brad@nowhere.com> wrote in message news:db4jab$1gnu$1@digitaldaemon.com...
> Well, I have done a very simple proof of concept.  It only works under linux, and basically memory protecting pages in userspace isn't as nice or easy as I would like.  You basically need to create an actual file, and them mmap it.  Though I don't think the disk file actually gets anything written to.  To implement this properly you would need to be able to unlock memory pages, hook into the GC somehow, and reclaim empty areas of the memory mapped file.
>
> In short, this is possible to do but pretty ugly to implement, and locking/unlocking may be expensive.  Though if the runtime implemented it then average joe programmers wouldn't see the ugliness.
>
> Here is the code, do what you will with it :)
>
> Brad
>
> import std.stdio;
> import std.c.linux.linuxextern;
> import std.c.linux.linux;
> import std.string;
>
> int global_fd;
>
> static this ()
> {
>     char[] junk = "t";
>     global_fd = open ("tempfile", O_CREAT | O_RDWR, 0600);
>     if (global_fd == -1)
>     {
>         writefln ("Error 0");
>     }
>     // don't know why I have to do this, but if the file is empty you
>     // cannot write to it after it is mmaped
>     write (global_fd,junk, 1);
> }
>
> static ~this ()
> {
>     close (global_fd);
>     remove ("tempfile");
> }
>
> char[] MMU_lock(char[] array)
> {
>     void *p;
>     void *r;
>     char[] readonly;
>     p = mmap(null, array.length, PROT_READ | PROT_WRITE, MAP_SHARED,
> global_fd, 0);
>     if (p == MAP_FAILED)
>     {
>         writefln ("Error 1");
>         return null;
>     }
>     writefln ("P is %x", p);
>     memcpy (p, array, array.length);
>
>     r = mmap(null, array.length, PROT_READ, MAP_SHARED, global_fd, 0);
>     munmap(p, array.length);
>     if (r == MAP_FAILED)
>     {
>         writefln ("Error 2");
>         return null;
>     }
>     writefln ("R is %x", r);
>     readonly = cast(char[])r[0 .. array.length];
>     return readonly;
> }
>
> char[] MMU_unlock(char[] array)
> {
> }
>
> int main(char[][] args)
> {
>     char[] stat = "This is a static String";
>     char[] d = stat.dup;
>     d[0] = 't';
>     char[] ro = MMU_lock(d);
>     writefln ("ro string is :: %s %d %x", ro, ro.length, ro.ptr);
>     writefln ("About to write to ro string");
>     ro[0] = 'T';  // this causes a segfault
>     writefln ("Done and finished");
>     return 1;
> }

Thanks, Brad. Extremely cool.
Concept car looked nice but was not moving :)

BTW: on Windows you would use VirtualAlloc for that.
With the same result, though.

And try to imagine that you are passing something
like root node of some XML DOM into function
designed to return second child node....