The Atom Consists of Protons, Neutrons and Electrons - D Programming Language Discussion Forum

Forums

New users
- Learn
Community
- General
- Announce
Improvements
- DIP Ideas
- DIP Devel.
Ecosystem
- GDC
- LDC
- Debuggers
- IDEs
- DWT
Development
- Internals
- Issues
- Beta
- DMD
- Phobos
- Druntime
- Study
Turkish
- Genel
- Duyuru

Index » General » The Atom Consists of Protons, Neutrons and Electrons

Thread overview

The Atom Consists of Protons, Neutrons and Electrons
Feb 05, 2013 Zach the Mystic
Feb 05, 2013 Marco Leise
Feb 05, 2013 Zach the Mystic
Feb 05, 2013 John Colvin
Feb 05, 2013 Zach the Mystic
Feb 06, 2013 John Colvin
Feb 06, 2013 Zach the Mystic
Feb 06, 2013 Andrei Alexandrescu
Feb 05, 2013 Era Scarecrow

February 05, 2013

The Atom Consists of Protons, Neutrons and Electrons

Posted by Zach the Mystic

Zach the Mystic

Introduction:

A somewhat heated discussion between Steven Schveighoffer and myself led to his challenging me to show not only how properties could be implemented as structs, but also why that is the best way for D to implement them.

The challenge is to do better, both in terms of functionality and in terms of syntax, than his proposal:

@property foo {
   int get(); // or opGet
   void set(int val); // or opSet

   opBinary(...)  // etc.
}

The @property namespace defined above implements all operator overrides a struct is capable of, but with access to its surrounding scope. For example:

struct Goo
{
  int _foo;
  property foo { int get(){ return _foo; } }
}

Good stuff, and D would not be the worse off for a property implementation such as this. Note how I removed the @ in front of property, because if we go this far, we might as well just go all the way and add it to the language as a keyword.

My job, therefore, was to imagine how simple structs could, at least in theory, do all of this, plus some, thus providing a better product and saving D a keyword.

I hope I'm not too late with this proposal. I've used the metaphor of the atom (i.e. explicit properties), adhering to the theory that there's no need to provide atoms if you can provide all of the protons, neutrons, and electrons they consist of.

Part One: Neutrons

We'll start with the heaviest particle first.

Why is it that a struct nested inside a function is allowed access to its function's data, whereas a struct nested inside a struct receives no such privilege?

void func()
{
  int n;
  struct G { int getN() { return n; } } // Hey, no problem
}

struct foo
{
  int n;
  struct G { int getN() { return n; } } // Error: n not defined
}

Is it too much to ask that a struct gain access to an instance of its parent's data? Well, yes. First of all, the nested instance would have to hold a hidden pointer to a parent instance, not only bloating the nested instance but also risking losing track of the desired parent instance should the parent instance get moved in memory.

But wait. A struct's member functions act upon its _own_ data just fine. That's because they are designed to receive a hidden pointer to an instance of their struct. And it's not actually the nested struct's data which would want to operate upon its parent struct, anyway. After all, data doesn't act upon data. The machine code containing the instructions to operate upon data is never identical to the data it needs to operate on (footnote: with the notable exception of the video game Yar's Revenge for the Atari 2600, in which the machine code was actually used to create an ad hoc random color palette - see the book "Racing the Beam" by Nick Montfort and Ian Bogost). Would it therefore be possible to allow the nested struct's _functions_ to operate upon an instance of its parent struct?

I think so. Here's how it would work. When a function is being compiled, the compiler keeps two lists, a short list of struct types to which it must include hidden pointers, and a stack of the functions currently being analyzed. It adds pointers to the list and attaches them to the symbols according to the following algorithm. If the symbol is not found in the function definition itself:

1. Look for it at the level of the enclosing struct.
2. If it is found there and it represents a data type, check the pointer list for that struct's type, and add it if it's not there already. Attach the symbol to that instance and move on, you're done.
3. If it is found and it represents a function, and semantic has already been run on the called function, add any hidden pointers it requires to your own list and attach them to the call. You're done.
4. If it represents a function and semantic has not been done, check the stack for the function represented by the symbol. If it is found, stop. It will take a second semantic pass to attach the right pointers. Otherwise, add the current function to the stack and analyze the function represented by the symbol. Add the hidden pointers it needs to your own list, attach them to the call, and you're done.
5. If it is not found and the struct is marked static, or if the struct in question is being defined at module level, you're done here. Continue to lookup the symbol at the module and import levels.
6. If it is not found and the struct definition is nested inside another struct, look for it in that struct. Goto 2.

Now we have a list of hidden pointers to enclosing structs which the function must take. The function uses these pointers invisibly, giving potential access to all members of all parent types. To refer to the 'this' pointer of any one of these, 'outer' may be used, then 'outer.outer', etc.

This is a complex new feature. I have therefore written an elaborate example to help to clarify how and when it might be used.

Meet Sparky(™), the most advanced electronic security dog the world will ever see. He's got a brain to house his advanced A.I. and a body to house his physics engine, which consists of a tail and a bladder. Sparky has stopped every intruder who ever crossed his path. He has no known weaknesses. Well, except for those pesky Jolt Brand Caffeinated Dog Biscuits. Feed him a Jolt and he just can't resist himself. Here is his current implementation:

Dog sparky;
struct Dog {
  Brain brain;
  struct Brain
  {
    bool asleep = false;
    void think() {
      if(!asleep) {
        // Sparky has truly advanced A.I. and will stop
        // any intruder so long as he is awake
      }
    }
  }
  Body bodi;
  struct Body
  {
    bool broken = false;
    Bladder bladder;
    struct Bladder {
      void release() {
        // An absolutely fascinating implementation
      }
    }
    Tail tail;
    struct Tail {
      int wagSpeed = 0;
      void wag() { ++wagSpeed; }
    }
  }
  void jolt() {
     bodi.tail.wag;
     if (bodi.tail.wagSpeed >= 7) malfunction;
  }
  void malfunction() {
    bodi.broken = true;
    bodi.tail.wagSpeed = 0;
    bodi.bladder.release;
    brain.asleep = true;
  }
}

Note how function malfunction() is declared at the top level of struct Dog. It has to be, because its purpose is to respond to calamity by adjusting all the parts of the Dog. It would make more sense, however, to declare the functionality closer to its prime cause. This is how it would look with the new language feature. I have here renamed function malfunction to suit its new location:

Dog sparky;
struct Dog {
  . . .
  struct Body
  {
    . . .
    struct Tail {
      . . .
      // Used to be function malfunction()
      void wagTheDog()
      {
        wagSpeed = 0;
        broken = true;
        bladder.release;
        brain.asleep = true;
      }
    }
  }
  void jolt() {
     bodi.tail.wag;
     if (bodi.tail.wagSpeed >= 7) bodi.tail.wagTheDog;
  }
}

wagTheDog does not need to use the full names of bodi and tail, because they have been passed to it by hidden pointer in the original function call 'bodi.tail.wagTheDog'. In fact, this is the only way from the outside to call a nested struct function which uses its parents' data. The struct objects have no pointers to their parents, so they must be provided by fully naming them at the call site.

To illustrate more clearly, I'll show how the compiler rewrites function wagTheDog as a standard top-level function:

void wagTheDog(ref Dog __dog, ref Body __body, ref Tail __tail)
{
  __tail.wagSpeed = 0;
  __body.broken = true;
  __body.bladder.release;
  __dog.brain.asleep = true;
}

Because it causes confusion both for the programmer and the compiler, calling a parent function from a nested function using an ad hoc struct object should probably be made illegal:

struct Dog {
  Brain brain;
  struct Body {
    Tail tail;
    struct Tail {
      void wagTheDog() { brain.asleep = true; }
      void tryToWag()
      {
        wagTheDog(); // Okay, fetches implicit pointers

        Tail tail;  // Ad hoc instance of Tail
        tail.wagTheDog(); // Error: Incomplete function call

        outer.tail = tail; // Okay, we've got a new tail
        wagTheDog(); // New tail attached. Works just fine

      }
    }
  }
}

wagTheDog detects that brain is a declaration two nests above and thus requires a Dog in order to be called. I think it is too much to demand that the compiler perform some kind of mix-and-match service as in the case of tail.wagTheDog(). It must simply detect this as a partial call and give an error. The workaround shown above is just as effective and not as confusing. Note also that tryToWag has inherited the need for a full set of pointers from the outside.

Just so you know how it works underneath, the compiler rewrites:

sparky.bodi.tail.wagTheDog;

as:

wagTheDog(sparky, sparky.bodi, sparky.bodi.tail);

That's the feature. So what would the impact to the D language be with this new implementation of (non-static) nested structs?

First of all, would any code break? Well, if you examine how the suggested feature works, you'll see that the only source of breakage comes from duplicating a symbol both at module and at parent struct levels.

int hmmm = 3;
struct A {
  int hmmm = 2;
  B b;
  struct B {
    int f() { return hmmm; }
  }
}
A a;
assert(a.b.f == 2);

While the shadowing of variables might be the occasional source of bugs, you don't have to worry about getting access to parent fields because you can just use 'outer' to get a reference to a parent's 'this' field and '.' for the module. All told, it is an extraordinarily light form of code breakage, and I would not be surprised if it didn't break any code at all in most existing projects, since duplicating names inside nests is a bad practice anyway. Also, I don't know if any of the binary APIs insist on passing a pointer to member functions, even those which don't in fact use the data referred to, but if so, there will obviously be associated performance costs.

And no, it's not an earth-shaking feature, but it does have a certain elegance to it, in my opinion, adding some flexibility and even some fun to using nested structs.

Part Two: Protons

Having examined the largest particle, let's move on to the second largest.

I'm sure everyone at one point has wanted to define a single instance of a structure without having to come up with both the name of the type and the name of the instance. Either you just want to whip up something quickly or you know for sure that you only need one instance. A syntax that facilitates this isn't going to get in your way when it's time to get "responsible" and declare a full-fledged type. The body of the declaration remains the same. It's just the declaration signature which has to change.

In terms of implementation, I could be wrong, but it seems rather trivial. Just define a new hidden type and create an instance of it using the name provided.

So how might D go about doing this for structs?

Well, anonymous structs already exist in the language, so that's certainly a good start. How about we just write the anonymous struct and then put the name of the single instance of the struct after it like we'd do with most other declarations?

struct {} foo;

Looks good to me. Except, of course, for the obvious fact that structs are never this short in real life. That 'foo' could come two thousand lines into the file for a particularly vicious single-instance struct.

There's got to be a way to move the name to the top while not leaving the syntax ambiguous as to what is being defined. What if we did something like:

alias foo struct {}

That would work. People could get used to it and eventually know by heart that when they saw 'alias xxxxx struct', they were working with a single-instance structure.

But while it is elegant, it's still a little noisy. What if you just took 'alias' away?

foo struct {}

Would that actually work? Let's see, if the parser sees an identifier, then 'struct'… yes, I think it *would* work.

It's king of the hill. Yes, it's rather high and mighty, but then again, it's a type which only has one instance. Maybe it *deserves* to be high and mighty. After all, "there can be only one." So I called it a Highlander, and I think it's a good syntax, although, once again, not exactly an earth-shattering feature.

Part Three: Electrons

This last particle is easy.

Emulating a built-in type with a struct object using opCall() can leak parentheses().

TrackedInt foo;
struct TrackedInt {
  private int _n;
  int timesAccessed;
  int opCall() { ++timesAccessed; return _n; }
}

foo; // Okay, we're tracking it, so it's not do-nothing code
foo(); // This doesn't look like an int…

The workaround is to use 'alias this' on the function you actually want in place of opCall:

struct TrackedInt {
  int someRandomFunction() { … }
  alias someRandomFunction this;
}

But this could be made nicer, if it turns out we're doing this a lot. Why not just add operator opGet to the list of a struct's operator overloads?

struct TrackedInt
{
  int opGet() { … }
}

foo; // Okay
foo(); // Error: no opCall defined!

And that's it for these little particles.

Conclusion:

Structs needed to be whipped into shape to see how well they could do as built-in properties. Let's see how they did. If I'm at module scope, the language already provides a mechanism for structs-as-properties. Look at the following (partial) definition of std.array.front in today's D:

import std.traits;
Front front;
struct Front
{
  alias someFunction this;
  ref T someFunction(T)(T[] a)
  if (!isNarrowString!(T[]) && !is(T[] == void[]))
  {
     assert(a.length, "Attempting to fetch the front of an empty array of " ~ typeof(a[0]).stringof);
     return a[0];
  }
}
assert([1,2,3].front == 1);

People are so used to the idea that structs operate on their own data that they don't realize that that's not the sine qua non of their existence. The compiler can easily figure out which pointers to which data it needs to include in its hidden fields. A property is a named set of overloaded operations on a piece of data which replaces the appearance of that data in code(™). Structs already perform this service for their own fields, and everyone seems to agree that this is a good thing. Why then should they not be expanded to be able to provide the same service for any data? It would spare the implementors from having to design a whole new mechanism, which wouldn't do anything that can't be done with structs anyway.

All three of the language features described above serve this function. The neutron makes it possible to nest the definition of 'front' above, so it can now access its parent struct's data:

struct DomesticatedArray(T)
{
  private T[] _data;
  Front front;
  struct Front
  {
    alias someFunction this;
    ref T someFunction(T)()
    if (!isNarrowString!(T[]) && !is(T[] == void[]))
    {
       assert(_data.length, "Attempting to fetch the front of an empty array of " ~ typeof(_data[0]).stringof);
       return _data[0];
    }
  }
}
DomesticatedArray!int neutron = { [1,2,3] };
assert(neutron.front == 1);

But it's kind of awkward to define. That's where the Highlander syntax comes in:

struct DomesticatedArray(T)
{
  T[] _data;
  front struct
  {
    alias someRandomFunction this;
    ref T someRandomFunction(T)()
    if (!isNarrowString!(T[]) && !is(T[] == void[]))
    {
       assert(_data.length, "Attempting to fetch the front of an empty array of " ~ typeof(_data[0]).stringof);
       return _data[0];
    }
  }
}
DomesticatedArray!int nucleus = { [1,2,3] };
assert(nucleus.front == 1);

opGet finishes the job:

struct DomesticatedArray(T)
{
  T[] _data;
  front struct
  {
    ref T opGet(T)()
    if (!isNarrowString!(T[]) && !is(T[] == void[]))
    {
       assert(_data.length, "Attempting to fetch the front of an empty array of " ~ typeof(_data[0]).stringof);
       return _data[0];
    }
  }
}
DomesticatedArray!int atom = { [1,2,3] };
assert(atom.front == 1);
assert(!is(atom.front() == 1));

I think enhanced structs do pretty well as a replacement for explicit properties. Not only that, but each of the new features which make properties possible has a use or two of its own, totally apart from its effectiveness as a property replacement. I have attempted to prove that properties are nothing more than the some of their component parts. The atom consists of protons, neutrons, and electrons.

Smash.

February 05, 2013

Re: The Atom Consists of Protons, Neutrons and Electrons

Posted by Marco Leise
in reply to Zach the Mystic

Marco Leise

Posted in reply to Zach the Mystic

Just one note on your Sparky example (from the ASM point of view): Instead of passing in a pointer to each struct member (wagTheDog(sparky, sparky.bodi, sparky.bodi.tail);) you could pass in only 'sparky' as the rest are known offsets from sparky. It works because it is a POD. If bodi was a reference, sparky.bodi.tail would still have to be passed in.

February 05, 2013

Re: The Atom Consists of Protons, Neutrons and Electrons

Posted by Zach the Mystic
in reply to Marco Leise

Zach the Mystic

Posted in reply to Marco Leise

On Tuesday, 5 February 2013 at 01:41:23 UTC, Marco Leise wrote:
> Just one note on your Sparky example (from the ASM point of
> view): Instead of passing in a pointer to each struct member
> (wagTheDog(sparky, sparky.bodi, sparky.bodi.tail);) you could
> pass in only 'sparky' as the rest are known offsets from
> sparky. It works because it is a POD. If bodi was a reference,
> sparky.bodi.tail would still have to be passed in.

I think this is a great optimization, provided Sparky has only one body and only one tail, which may actually be rather easy to verify statically.

February 05, 2013

Re: The Atom Consists of Protons, Neutrons and Electrons

Posted by John Colvin
in reply to Zach the Mystic

John Colvin

Posted in reply to Zach the Mystic

On Tuesday, 5 February 2013 at 00:23:42 UTC, Zach the Mystic wrote:
>
> Smash.

I'm no expert but this is appears a quite comprehensive solution. I like the emphasis on improving the underlying mechanisms of the language (i.e. structs) in order to facilitate the special case (properties).

Anyone else care to comment?

February 05, 2013

Re: The Atom Consists of Protons, Neutrons and Electrons

Posted by Zach the Mystic
in reply to John Colvin

Zach the Mystic

Posted in reply to John Colvin

On Tuesday, 5 February 2013 at 18:58:23 UTC, John Colvin wrote:
> On Tuesday, 5 February 2013 at 00:23:42 UTC, Zach the Mystic wrote:
>>
>> Smash.
>
> I'm no expert but this is appears a quite comprehensive solution. I like the emphasis on improving the underlying mechanisms of the language (i.e. structs) in order to facilitate the special case (properties).
>
> Anyone else care to comment?

I love you.

February 05, 2013

Re: The Atom Consists of Protons, Neutrons and Electrons

Posted by Era Scarecrow
in reply to John Colvin

Era Scarecrow

Posted in reply to John Colvin

On Tuesday, 5 February 2013 at 18:58:23 UTC, John Colvin wrote:
> On Tuesday, 5 February 2013 at 00:23:42 UTC, Zach the Mystic wrote:
>>
>> Smash.
>
> I'm no expert but this is appears a quite comprehensive solution. I like the emphasis on improving the underlying mechanisms of the language (i.e. structs) in order to facilitate the special case (properties).
>
> Anyone else care to comment?

 I had very little to offer in response. I agree the whole proposal seems quite good, similar to mine; It still has a C-like language feel to it compared to other proposals.

 It should refuse to compile when you return a nested struct that leaves the area of the struct's control or influence. This is to reserve stack integrity and remove obvious stack related bugs. I think this covers most of the cases.

  struct X {
    struct S {}
    S xs;
  }

  X.S func(ref X.S s) {
    X.S s2;
    X.S s3 = s.xs; //s3 has s's hidden pointer
    X x;
    X x2 = new X();
    return s;  //fine
    return s2; //should fail, exists only on this level
    return s3; //if it can confirm that s3's parent exists
               //outside this function. This will compile.
    return x.xs;  //fails (local stack)
    return s.xs;  //okay
    return x2.xs; //heap referenced, okay. Struct may have
                  //added internal pointer in struct rather
                  //than passed via silent function parameter.
                  //two versions of structs compiled for this case?
  }

  X.S func(X x) {
    return x.xs;  //fails, x is local copy
    return x;     //parent and all copied, fine.
  }

  X.S func(X.S s) {
    return s;     //passes, although s is a temp/local copy
                  //it's parent/hidden pointer is still valid
  }

  If auto ref is accepted, it should be disallowed for nested structs as the temporary parent's existence would cease to exist while the nested struct returns.

  X.S func(auto ref X.S s) {
    return s;     //Fail!
  }

  X.S func(auto ref X x) {
    return x.xs;  //Also Fail!
  }

  //both are unsafe and uncertain.
  auto xs =  func(X().xs);
  auto xs2 = func(X());

 As for the alternate syntax, for struct/one instantiation, it feels backwards, yet at the same time it feels correct. Is the struct name the same as the instantiation name? Seems breaking both if it is. If it isn't, then...?

  struct S {
    x struct {
    }
  }

  void func(S.x x); //struct name is... ??

 Would hate to use templates to determine this, although if it's more a name space than a actual struct then it may not be needed/wanted, and may just be a silent consequence of using the syntax. Maybe this was already in the proposal (I can't recall and don't have time to re-read it all to find out right now).

February 06, 2013

Re: The Atom Consists of Protons, Neutrons and Electrons

Posted by John Colvin
in reply to Zach the Mystic

John Colvin

Posted in reply to Zach the Mystic

On Tuesday, 5 February 2013 at 19:50:06 UTC, Zach the Mystic wrote:
> On Tuesday, 5 February 2013 at 18:58:23 UTC, John Colvin wrote:
>> On Tuesday, 5 February 2013 at 00:23:42 UTC, Zach the Mystic wrote:
>>>
>>> Smash.
>>
>> I'm no expert but this is appears a quite comprehensive solution. I like the emphasis on improving the underlying mechanisms of the language (i.e. structs) in order to facilitate the special case (properties).
>>
>> Anyone else care to comment?
>
> I love you.

Why thank you!

I think the title of this thread might be preventing it getting the attention it deserves unfortunately.

February 06, 2013

Re: The Atom Consists of Protons, Neutrons and Electrons

Posted by Zach the Mystic
in reply to John Colvin

Zach the Mystic

Posted in reply to John Colvin

On Wednesday, 6 February 2013 at 02:08:47 UTC, John Colvin wrote:
> On Tuesday, 5 February 2013 at 19:50:06 UTC, Zach the Mystic wrote:
>> On Tuesday, 5 February 2013 at 18:58:23 UTC, John Colvin wrote:
>>> On Tuesday, 5 February 2013 at 00:23:42 UTC, Zach the Mystic wrote:
>>>>
>>>> Smash.
>>>
>>> I'm no expert but this is appears a quite comprehensive solution. I like the emphasis on improving the underlying mechanisms of the language (i.e. structs) in order to facilitate the special case (properties).
>>>
>>> Anyone else care to comment?
>>
>> I love you.
>
> Why thank you!
>
> I think the title of this thread might be preventing it getting the attention it deserves unfortunately.

Actually, it did receive some attention from Andrei after a brief exchange of posts between him and me. Here's the page:

http://forum.dlang.org/thread/kel6c8$1h5d$1@digitalmars.com?page=21

At first I was upset.

Then I responded in greater detail.

There's a degree to which the phase of development D is in seriously affects the developers' willingness to try new things. And the concern that some dark corner of the suggested enhancements will limit their usefulness is certainly legitimate. For my part, I'm ready to commend the fate of this proposal to a higher power, be it human or divine.

But before I go, I want to say one brief thing. The last item I considered when analyzing how nested structs should work was how functions could inherit the needs of the nested functions they called. Because I didn't even realize the power of my own feature, I didn't fully utilize it when I wrote the example.

Here is the final version of Sparky(™) using the full power of nested structs:

Dog sparky;
struct Dog {
  Brain brain;
  struct Brain
  {
    bool asleep = false;
    void think() {
      if(!asleep) {
        // Sparky has truly advanced A.I. and will stop
        // any intruder so long as he is awake
      }
    }
  }
  Body bodi;
  struct Body
  {
    bool broken = false;
    Bladder bladder;
    struct Bladder {
      void release() {
        // An absolutely fascinating implementation
      }
    }
    Tail tail;
    struct Tail {
      int wagSpeed = 0;
      void wag() {
        ++wagSpeed;
        if (wagSpeed >= 7) wagTheDog;
      }
      void wagTheDog()
      {
        wagSpeed = 0;
        broken = true;
        bladder.release;
        brain.asleep = true;
      }
    }
  }
  void jolt() { bodi.tail.wag; }
}

February 06, 2013

Re: The Atom Consists of Protons, Neutrons and Electrons

Posted by Andrei Alexandrescu
in reply to Zach the Mystic

Andrei Alexandrescu

Posted in reply to Zach the Mystic

On 2/6/13 12:47 AM, Zach the Mystic wrote:
> At first I was upset.

I feared so.

> Then I responded in greater detail.

I'm glad you did!

> There's a degree to which the phase of development D is in seriously
> affects the developers' willingness to try new things. And the concern
> that some dark corner of the suggested enhancements will limit their
> usefulness is certainly legitimate. For my part, I'm ready to commend
> the fate of this proposal to a higher power, be it human or divine.
[snip]

So you think you stumbled upon a great idea, one that deserves being pursued. So then push it until it breaks, you break, or the others break :o).

One possible angle is that everybody has missed that particular point in fifty years of language design. I believe this has happened, but is very rare so you need a stronger argument than that.

I noticed that a good thing to do is prove or at least argue that the feature has a positive impact on a desirable objective. For example, there's this great quote by Bob Martin that goes like "all software engineering techniques address at the core dependency management". That goes a lot about languages, too, so if you show that your technique improves dependencies then it's a net win.

Consider e.g. "final switch" as a simple example. When we introduced it, the argument was that it improves modularity by making compilation fail whenever new members are added to the enum, thus forcing appropriate code updates. Thus, modularity is improved. So it would be great to find such an objective criterion and show how your feature makes good steps toward that.

Andrei

Top | Forum index | About this forum

Copyright © 1999-2021 by the D Language Foundation