Thread overview
foreach iterator with closure
Jun 28
Denis
Jun 28
Denis
Jun 28
Denis
June 28
Is it possible to write an iterator that does the following, using a struct and some functions?

 - Operates in a foreach loop
 - Has BEGIN-like and END-like blocks or functions that are executed automatically, before and after the iterations
 - Initializes variables in the BEGIN block that are used in the other two. These variables are for internal use only, i.e. must not be accessible to the user of the foreach loop

I'd like to use the simplest solution while keeping the code clean. As a starting point, here's a contrived example using a struct with a range-style iterarator:

  import std.stdio;

  struct letters {
    string str;
    int pos = 0;
    char front() { return str[pos]; }
    void popFront() { pos ++; }
    bool empty() {
      if (pos == 0) writeln(`BEGIN`);
      else if (pos == str.length) writeln("\nEND");
      return pos == str.length; }}

  void main() {
    foreach (letter; letters(`hello`)) {
      write(letter, ' '); }
    writeln(); }

The obvious problems with this code include:

(1) The user can pass a second argument, which will set the initial value of pos. This must not be allowed. (The real code will need to initialize a half dozen internal-only variables, and do some additional work, before the looping starts.)

(2) Sticking the code for the BEGIN and END blocks into the empty() function is ugly.

Can this iterator be written using a range-style struct? Or is something more complicated needed, like an OO solution?

I should add that the final version of this will be put in a separate module, possibly in a library, so I can call it from many programs. Not sure if that might help simplify things.

Thanks for your guidance.
June 27
On 6/27/20 8:19 PM, Denis wrote:

> Is it possible to write an iterator

It is arguable whether D's ranges are iterators but if nouns are useful, we call them ranges. :) (Iterators can be written in D as well and then it would really be confusing.)

>    struct letters {
>      string str;
>      int pos = 0;
>      char front() { return str[pos]; }
>      void popFront() { pos ++; }
>      bool empty() {
>        if (pos == 0) writeln(`BEGIN`);
>        else if (pos == str.length) writeln("\nEND");
>        return pos == str.length; }}
>
>    void main() {
>      foreach (letter; letters(`hello`)) {
>        write(letter, ' '); }
>      writeln(); }
>
> The obvious problems with this code include:
>
> (1) The user can pass a second argument, which will set the initial
> value of pos.

That problem can be solved by a constructor that takes a single string. Your BEGIN code would normally go there as well. And END goes into the destructor:

struct letters {
    this(string str) {
        this.str = str;
        this.pos = 0;  // Redundant
        writeln(`BEGIN`);
    }

    ~this() {
        writeln("\nEND");
    }

    // [...]
}

Note: You may want to either disallow copying of your type or write copy constructor that does the right thing:

  https://dlang.org/spec/struct.html#struct-copy-constructor

However, it's common to construct a range object by a function. The actual range type can be kept as an implementation detail:

struct Letters {  // Note capital L
  // ...
}

auto letters(string str) {
  // ...
  return Letters(str);
}

struct Letter can be a private type of its module or even a nested struct inside letters(), in which case it's called a "Voldemort type".

Ali

June 28
Many thanks: your post has helped me get past the initial stumbling blocks I was struggling with. I do have a followup question.

First, here are my conclusions up to this point, based on your post above, some additional experimentation, and further research (for future reference, and for any other readers).

* foreach is the actual iterator, the instantiation of a struct is the range.
* When a constructor is not used, the arguments in the call to instantiate the range (in this case, `hello` in letters(`hello`)) are mapped sequentially to the member variables in the struct definition (i.e. to letters.str).
* When a constructor is used, the member variables in the struct definition are in essence private. The arguments in the call to instantiate the range are now mapped directly to the parameters in the definition of the "this" function.
* The syntax and conventions for constructors is difficult and non-intuitive for anyone who hasn't learned Java (or a derivative). The linked document provides a simplified explanation for the "this" keyword, which is helpful for the first read: https://docs.oracle.com/javase/tutorial/java/javaOO/thiskey.html.
* In some respects, the Java syntax is not very D-like. (For example, it breaks the well-established convention of "Do not use the same name to mean two different things".) However, it does need to be learned, because it is common in D source code.

Here is the complete revised code for the example (in condensed form):

  import std.stdio;

  struct letters {

    string str;
    int pos = 1;		// Assign here or in this())

    this(string param1) {	// cf. shadow str
      str = param1;		// cf. this.str = param1 / this.str = str
      writeln(`BEGIN`); }

    char front() { return str[pos]; }
    void popFront() { pos ++; }
    bool empty() { return pos == str.length; }

    ~this() { writeln("\nEND"); }}

  void main() {
    foreach (letter; letters(`hello`)) {
      write(letter, ' '); }}

At this point, I do have one followup question:

Why is the shadow str + "this.str = str" the more widely used syntax in D, when the syntax in the code above is unambiguous?

One possible reason that occurred to me is that "str = param1" might require additional GC, because they are different names. But I wouldn't think it'd make any difference to the compiler.

Denis
June 28
On 6/28/20 9:07 AM, Denis wrote:

> * foreach is the actual iterator,

Yes. foreach is "lowered" to the following equivalent:

  for ( ; !range.empty; range.popFront()) {
    // Use range.front here
  }

A struct can support foreach iteration through its opCall() member function as well. opCall() takes the body of the foreach as a delegate. Because it's a function call, it can take full advantage of the function call stack. This may help with e.g. writing recursive iteration algorithms.


http://ddili.org/ders/d.en/foreach_opapply.html#ix_foreach_opapply.opApply

> the instantiation of a struct is the
> range.

Yes.

> * When a constructor is not used, the arguments in the call to
> instantiate the range (in this case, `hello` in letters(`hello`)) are
> mapped sequentially to the member variables in the struct definition
> (i.e. to letters.str).

Yes, that is a very practical struct feature. I write my structs with as little as needed and provide a constructor only when it is necessary as in your case.

> * When a constructor is used, the member variables in the struct
> definition are in essence private.

Not entirely true. You can still make them public if you want.

  http://ddili.org/ders/d.en/encapsulation.html

> The arguments in the call to
> instantiate the range are now mapped directly to the parameters in the
> definition of the "this" function.

Yes.

> * The syntax and conventions for constructors is difficult and
> non-intuitive for anyone who hasn't learned Java (or a derivative).

C++ uses the name of the class as the constructor:

// C++ code
struct S {
  S();     // <-- Constructor
  S(int);  // <-- Another one
};

The problem with that syntax is having to rename more than one thing when the name of struct changes e.g. to Q:

struct Q {
  Q();
  Q(int);
};

And usually in the implementation:

Q::Q() {}
Q::Q(int) {}

D's choice of 'this' is productive.

> The
> linked document provides a simplified explanation for the "this"
> keyword, which is helpful for the first read:
> https://docs.oracle.com/javase/tutorial/java/javaOO/thiskey.html.

I like searching for keywords in my index. The "this, constructor" here links to the constructor syntax:

  http://ddili.org/ders/d.en/ix.html

> * In some respects, the Java syntax is not very D-like. (For example, it
> breaks the well-established convention of "Do not use the same name to
> mean two different things".)

Yes but it competes with another goal: Change as little code as possible when one thing needs to be changed. This is not only practical but helps with correctness.

> However, it does need to be learned,
> because it is common in D source code.

I like D. :p

> Here is the complete revised code for the example (in condensed form):
>
>    import std.stdio;
>
>    struct letters {
>
>      string str;
>      int pos = 1;        // Assign here or in this())
>
>      this(string param1) {    // cf. shadow str
>        str = param1;        // cf. this.str = param1 / this.str = str
>        writeln(`BEGIN`); }
>
>      char front() { return str[pos]; }
>      void popFront() { pos ++; }
>      bool empty() { return pos == str.length; }
>
>      ~this() { writeln("\nEND"); }}
>
>    void main() {
>      foreach (letter; letters(`hello`)) {
>        write(letter, ' '); }}
>
> At this point, I do have one followup question:
>
> Why is the shadow str + "this.str = str" the more widely used syntax in
> D, when the syntax in the code above is unambiguous?

Because one needs to come up with names like "param7", "str_", "_str", "s", etc. I like and follow D's standard here.

> One possible reason that occurred to me is that "str = param1" might
> require additional GC, because they are different names.

Not at all because there is not memory allocation at all. strings are implemented as the equivalent of the following struct:

struct __D_native_string {
  size_t length_;
  char * ptr;
  // ...
}

So, the "str = param1" assignment is nothing but two 64 bit data transfer, which can easily by optimized away by the compiler in many cases.

> But I wouldn't
> think it'd make any difference to the compiler.

Yes. :)

>
> Denis

Ali

June 28
To keep this reply brief, I'll just summarize:

Lots of great takeaways from both of your posts, and a handful of topics you mentioned that I need to dig into further now. This is great (I too like D :)

I very much appreciate the extra insight into how things work and why certain design decisions were made: for me, this is essential for gaining fluency in a language.

Thanks again for all your help!
Denis