July 12, 2021

On Sunday, 11 July 2021 at 13:14:23 UTC, Steven Schveighoffer wrote:

>

when I've done this kind of stuff, what I usually do is:

struct Thing {
  ... // actual struct
}

mixin("alias ", lstrStructureID, " = Thing;");

the downside is that the actual struct name symbol will be Thing, or whatever you called it. But at least you are not writing lots of code using mixins.

-Steve

Thanks for your tip Steve, I ended with something similar, I'll be posting my whole example below.

July 12, 2021

On Sunday, 11 July 2021 at 05:54:48 UTC, Ali Çehreli wrote:

>

Ali

Primarily to Ali & Steve for their help, be advised, this post will be somehow ... long.

Some bit of background to begin with: a week or so ago I posted asking advice on code safeness, and still I didn't reply to the ones that kindly answered. Seeing some replies, and encountering a code issue regarding string manipulation, I pretty soon figured out that I still did not have solid knowledge on many basic things regarding D, so I put the brakes on, and went to square one and started reading and researching some things a bit more ... slowly.

One of the things that struck me this week is that UniCode string manipulation in many cases is more complex that I previously thought, because there is no precise-concept of what is a character in UniCode, at least, not the way we are used to with plain-old-ASCII. After reading a lot of about it (this was good: https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/) I learned of code-units, code-points, abstract-graphemes, graphemes-clusters, and the like.

And I learned the inner details of the UTF encodings and that UTF-32 is best (almost required) for string processing (easier, faster, etc) and of course UTF-8 for definitive storage, and UTF-16 to the trashcan unless you need to interface with Windows (I was previously using UTF-8 within all my code for processing).

So, in order to manipulate a string, say, left(n), right(n), substr(n,m), ie: the usual stuff for many languages/libraries, I need to operate on grapheme-clusters and not in code-points and never ever on code-units, at least, for unexpected text, ie: incoming text, user-input, etc, the things that we can not control beforehand.

Both primary D books, Andrei's and Ali's ones, as the D documentation, have plenty of examples but they are mainly focused on simple things like strings having nothing-out-of-the-ordinary. They perform string manipulation mainly slicing the source string (ie: the char array) with the functions of std.range like take, takeOne, etc.

I needed to set this things once-and-for-all for my code and thus I decided to build a grapheme-aware UDT that once instantiated with any given string will provide the usual string manipulation functions so I can forget the minutiae about them. The unittest at the bottom has many usage examples.

The whole UDT needed to be templated for the three string types (string, dstring, wstring -and nothing else) and this was what produced this post to begin with. This issue was solved, not the way I liked to, but solved. The code works alas for something that smells like a phobos bug (# 20483) using foreach with grapheme arrays (foreach always missing the last one).

I ended up with the following (as usual advice/suggestions welcomed):

/// testing D on 2021-06~07

import std.algorithm : map, joiner;
import std.array : array;
import std.conv : to;
import std.range : walkLength, take, tail, drop, dropBack;
import std.stdio;
import std.uni : Grapheme, byGrapheme;

alias stringUGC = Grapheme;
alias stringUGC08 = gudtUGC!(stringUTF08);
alias stringUGC16 = gudtUGC!(stringUTF16);
alias stringUGC32 = gudtUGC!(stringUTF32);
alias stringUTF08 = string;  /// same as immutable(char )[];
alias stringUTF16 = dstring; /// same as immutable(dchar)[];
alias stringUTF32 = wstring; /// same as immutable(wchar)[];

void main() {}

//mixin templateUGC!(stringUTF08, r"gudtUGC08"w); /// if these were possible there will be no need for stringUGC## aliases in main()
//mixin templateUGC!(stringUTF16, r"gudtUGC16"w);
//mixin templateUGC!(stringUTF32, r"gudtUGC32"w);

//template templateUGC (
//   typeStringUTF,
//   alias lstrStructureID
//   ) {

public struct gudtUGC(typeStringUTF) { /// UniCode grapheme cluster‐aware string manipulation

   void popFront() { ++pintSequenceCurrent; }
   bool empty() { return pintSequenceCurrent == pintSequenceCount; }
   typeStringUTF front() { return toUTFtake(pintSequenceCurrent); }

   private stringUGC[] pugcSequence;
   private size_t pintSequenceCount = cast(size_t) 0;
   private size_t pintSequenceCurrent = cast(size_t) 0;

   @property public size_t count() { return pintSequenceCount; }

   this(scope const typeStringUTF lstrSequence) {

      decode(lstrSequence);

   }

   @safe public size_t decode(
      scope const typeStringUTF lstrSequence
      ) {

      scope size_t lintSequenceCount = cast(size_t) 0;

      if (lstrSequence is null) {

         pugcSequence = null;
         pintSequenceCount = cast(size_t) 0;
         pintSequenceCurrent = cast(size_t) 0;

      } else {

         pugcSequence = lstrSequence.byGrapheme.array;
         pintSequenceCount = pugcSequence.walkLength;
         pintSequenceCurrent = cast(size_t) 1;

         lintSequenceCount = pintSequenceCount;

      }

      return lintSequenceCount;

   }

   @safe public typeStringUTF encode() { /// UniCode grapheme cluster to UniCode UTF‐encoded string

      scope typeStringUTF lstrSequence = null;

      if (pintSequenceCount >= cast(size_t) 1) {

         lstrSequence = pugcSequence
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

   @safe public typeStringUTF toUTFtake( /// UniCode grapheme cluster to UniCode UTF‐encoded string
      scope const size_t lintStart,
      scope const size_t lintCount = cast(size_t) 1
      ) {

      scope typeStringUTF lstrSequence = null;

      if (lintStart <= lintStart + lintCount) {

         /// eg#1: toUTFtake(1,3) → range#1=start-1=1-1=0 and range#2=range#1+count=0+3=3 → 0..3
         /// eg#1: toUTFtake(6,3) → range#2=start-1=6-1=5 and range#2=range#1+count=5+3=8 → 5..8

         /// eg#2: toUTFtake(01,1) → range#1=start-1=01-1=00 and range#2=range#1+count=00+1=01 → 00..01
         /// eg#2: toUTFtake(50,1) → range#2=start-1=50-1=49 and range#2=range#1+count=49+1=50 → 49..50

         scope size_t lintRange1 = lintStart - cast(size_t) 1;
         scope size_t lintRange2 = lintRange1 + lintCount;

         if (lintRange1 >= cast(size_t) 0 && lintRange2 <= pintSequenceCount) {

            lstrSequence = pugcSequence[lintRange1..lintRange2]
               .map!((ref g) => g[])
               .joiner
               .to!(typeStringUTF)
               ;

         }

      }

      return lstrSequence;

   }

   @safe public typeStringUTF toUTFtakeL( /// UniCode grapheme cluster to UniCode UTF‐encoded string
      scope const size_t lintCount
      ) {

      scope typeStringUTF lstrSequence = null;

      if (lintCount <= pintSequenceCount) {

         lstrSequence = pugcSequence
            .take(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

   @safe public typeStringUTF toUTFtakeR( /// UniCode grapheme cluster to UniCode UTF‐encoded string
      scope const size_t lintCount
      ) {

      scope typeStringUTF lstrSequence = null;

      if (lintCount <= pintSequenceCount) {

         lstrSequence = pugcSequence
            .tail(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

   @safe public typeStringUTF toUTFchopL( /// UniCode grapheme cluster to UniCode UTF‐encoded string
      scope const size_t lintCount
      ) {

      scope typeStringUTF lstrSequence = null;

      if (lintCount <= pintSequenceCount) {

         lstrSequence = pugcSequence
            .drop(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

   @safe public typeStringUTF toUTFchopR( /// UniCode grapheme cluster to UniCode UTF‐encoded string
      scope const size_t lintCount
      ) {

      scope typeStringUTF lstrSequence = null;

      if (lintCount <= pintSequenceCount) {

         lstrSequence = pugcSequence
            .dropBack(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

   @safe public typeStringUTF toUTFpadL( /// UniCode grapheme cluster to UniCode UTF‐encoded string
      scope const size_t lintCount,
      scope const typeStringUTF lstrPadding = cast(typeStringUTF) r" "
      ) {

      scope typeStringUTF lstrSequence = null;

      if (lintCount > pintSequenceCount) {

         lstrSequence = null; /// pending

      }

      return lstrSequence;

   }

   @safe public typeStringUTF toUTFpadR( /// UniCode grapheme cluster to UniCode UTF‐encoded string
      scope const size_t lintCount,
      scope const typeStringUTF lstrPadding = cast(typeStringUTF) r" "
      ) {

      scope typeStringUTF lstrSequence = null;

      if (lintCount > pintSequenceCount) {

         lstrSequence = null; /// pending

      }

      return lstrSequence;

   }

   /*@safe public gudtUGC(typeStringUTF) take(
      scope const size_t lintStart,
      scope const size_t lintCount = cast(size_t) 1
      ) {

      /// the idea behind this new set of functions (returning a new object) is to enable the following one‐liner constructions:
      /// assert(lugcSequence3.take(35, 3).take(1,2).take(1,1).encode() == cast(stringUTF) r"日");

      /// ooops … error: function declaration without return type. (Note that constructors are always named `this`)
      /// ooops … error: no identifier for declarator `@safe gudtUGC(typeStringUTF)`

      scope gudtUGC(typeStringUTF) lugcSequence;

      if (lintStart <= lintStart + lintCount) {

         /// eg#1: toUTFtake(1,3) → range#1=start-1=1-1=0 and range#2=range#1+count=0+3=3 → 0..3
         /// eg#1: toUTFtake(6,3) → range#2=start-1=6-1=5 and range#2=range#1+count=5+3=8 → 5..8

         /// eg#2: toUTFtake(01,1) → range#1=start-1=01-1=00 and range#2=range#1+count=00+1=01 → 00..01
         /// eg#2: toUTFtake(50,1) → range#2=start-1=50-1=49 and range#2=range#1+count=49+1=50 → 49..50

         scope size_t lintRange1 = lintStart - cast(size_t) 1;
         scope size_t lintRange2 = lintRange1 + lintCount;

         if (lintRange1 >= cast(size_t) 0 && lintRange2 <= pintSequenceCount) {

            lugcSequence = gudtUGC(typeStringUTF)(pugcSequence[lintRange1..lintRange2]
               .map!((ref g) => g[])
               .joiner
               .to!(typeStringUTF)
               );

         }

      }

      return lugcSequence;

   }*/

}

//}

unittest {

   version (useUTF08) {
   scope stringUTF08 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"c;
   scope stringUTF08 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"c;
   scope stringUTF08 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"c;
   }

   version (useUTF16) {
   scope stringUTF16 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"d;
   scope stringUTF16 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"d;
   scope stringUTF16 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"d;
   }

   version (useUTF32) {
   scope stringUTF32 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"w;
   scope stringUTF32 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"w;
   scope stringUTF32 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"w;
   }

   scope size_t lintSequence1sizeUTF = lstrSequence1.length;
   scope size_t lintSequence2sizeUTF = lstrSequence2.length;
   scope size_t lintSequence3sizeUTF = lstrSequence3.length;

   scope size_t lintSequence1sizeUGA = lstrSequence1.walkLength;
   scope size_t lintSequence2sizeUGA = lstrSequence2.walkLength;
   scope size_t lintSequence3sizeUGA = lstrSequence3.walkLength;

   scope size_t lintSequence1sizeUGC = lstrSequence1.byGrapheme.walkLength;
   scope size_t lintSequence2sizeUGC = lstrSequence2.byGrapheme.walkLength;
   scope size_t lintSequence3sizeUGC = lstrSequence3.byGrapheme.walkLength;

   assert(lintSequence1sizeUGC == cast(size_t) 50);
   assert(lintSequence2sizeUGC == cast(size_t) 50);
   assert(lintSequence3sizeUGC == cast(size_t) 50);

   assert(lintSequence1sizeUGA == cast(size_t) 50);
   assert(lintSequence2sizeUGA == cast(size_t) 50);
   assert(lintSequence3sizeUGA == cast(size_t) 52);

   version (useUTF08) {
   assert(lintSequence1sizeUTF == cast(size_t) 50);
   assert(lintSequence2sizeUTF == cast(size_t) 60);
   assert(lintSequence3sizeUTF == cast(size_t) 91);
   }

   version (useUTF16) {
   assert(lintSequence1sizeUTF == cast(size_t) 50);
   assert(lintSequence2sizeUTF == cast(size_t) 50);
   assert(lintSequence3sizeUTF == cast(size_t) 52);
   }

   version (useUTF32) {
   assert(lintSequence1sizeUTF == cast(size_t) 50);
   assert(lintSequence2sizeUTF == cast(size_t) 50);
   assert(lintSequence3sizeUTF == cast(size_t) 57);
   }

   /// the following should be the same regardless of the encoding being used and is the whole point of this UDT being made:

   version (useUTF08) { alias stringUTF = stringUTF08; scope stringUGC08 lugcSequence3 = stringUGC08(lstrSequence3); }
   version (useUTF16) { alias stringUTF = stringUTF16; scope stringUGC16 lugcSequence3 = stringUGC16(lstrSequence3); }
   version (useUTF32) { alias stringUTF = stringUTF32; scope stringUGC32 lugcSequence3 = stringUGC32(lstrSequence3); }

   assert(lugcSequence3.encode() == lstrSequence3);

   assert(lugcSequence3.toUTFtake(21) == cast(stringUTF) r"р");
   assert(lugcSequence3.toUTFtake(27) == cast(stringUTF) r"й");
   assert(lugcSequence3.toUTFtake(35) == cast(stringUTF) r"日");
   assert(lugcSequence3.toUTFtake(37) == cast(stringUTF) r"語");
   assert(lugcSequence3.toUTFtake(21, 7) == cast(stringUTF) r"русский");
   assert(lugcSequence3.toUTFtake(35, 3) == cast(stringUTF) r"日本語");

   assert(lugcSequence3.toUTFtakeL(1) == cast(stringUTF) r"ä");
   assert(lugcSequence3.toUTFtakeR(1) == cast(stringUTF) r"😎");
   assert(lugcSequence3.toUTFtakeL(7) == cast(stringUTF) r"äëåčñœß");
   assert(lugcSequence3.toUTFtakeR(16) == cast(stringUTF) r"日本語 = japanese 😎");

   assert(lugcSequence3.toUTFchopL(10) == cast(stringUTF) r"russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎");
   assert(lugcSequence3.toUTFchopR(21) == cast(stringUTF) r"äëåčñœß … russian = русский 🇷🇺");

   version (useUTF08) { scope stringUTF08 lstrSequence3reencoded; }
   version (useUTF16) { scope stringUTF16 lstrSequence3reencoded; }
   version (useUTF32) { scope stringUTF32 lstrSequence3reencoded; }

   for (
      size_t lintSequenceUGC = cast(size_t) 1;
      lintSequenceUGC <= lintSequence3sizeUGC;
      ++lintSequenceUGC
      ) {

      lstrSequence3reencoded ~= lugcSequence3.toUTFtake(lintSequenceUGC);

   }

   assert(lstrSequence3reencoded == lstrSequence3);

   lstrSequence3reencoded = null;

   version (useUTF08) { foreach (stringUTF08 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } }
   version (useUTF16) { foreach (stringUTF16 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } }
   version (useUTF32) { foreach (stringUTF32 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } }

   assert(lstrSequence3reencoded == lstrSequence3); /// ooops … missing last grapheme: possible bug # 20483

}
July 12, 2021
On 12.07.21 03:37, someone wrote:
> I ended up with the following (as usual advice/suggestions welcomed): 
[...]> alias stringUTF16 = dstring; /// same as immutable(dchar)[];> alias stringUTF32 = wstring; /// same as immutable(wchar)[];
Bug: You mixed up `wstring` and `dstring`. `wstring` is UTF-16. `dstring` is UTF-32.

[...]
> public struct gudtUGC(typeStringUTF) { /// UniCode grapheme cluster‐aware string manipulation

Style: `typeStringUTF` is a type, so it should start with a capital letter (`TypeStringUTF`).

[...]
>     private size_t pintSequenceCount = cast(size_t) 0;
>     private size_t pintSequenceCurrent = cast(size_t) 0;

Style: There's no need for the casts (throughout).

[...]
>     @safe public typeStringUTF encode() { /// UniCode grapheme cluster to UniCode UTF‐encoded string
> 
>        scope typeStringUTF lstrSequence = null;
[...]
>        return lstrSequence;
> 
>     }

Bug: `scope` makes no sense if you want to return `lstrSequence` (throughout).

>     @safe public typeStringUTF toUTFtake( /// UniCode grapheme cluster to UniCode UTF‐encoded string
>        scope const size_t lintStart,
>        scope const size_t lintCount = cast(size_t) 1
>        ) {
Style: `scope` does nothing on `size_t` parameters (throughout).

[...]
>        if (lintStart <= lintStart + lintCount) {
[...]
>           scope size_t lintRange1 = lintStart - cast(size_t) 1;

Possible bug: Why subtract 1?

>           scope size_t lintRange2 = lintRange1 + lintCount;
> 
>           if (lintRange1 >= cast(size_t) 0 && lintRange2 <= pintSequenceCount) {

Style: The first half of that condition is pointless. `lintRange1` is unsigned, so it will always be greater than or equal to 0. If you want to defend against overflow, you have to do it before subtracting.

[...]
>           }
> 
>        }
[...]
>     }
[...]
>     @safe public typeStringUTF toUTFpadL( /// UniCode grapheme cluster to UniCode UTF‐encoded string
>        scope const size_t lintCount,
>        scope const typeStringUTF lstrPadding = cast(typeStringUTF) r" "

Style: Cast is not needed (throughout).

>        ) {
[...]
>     }
[...]
> }
[...]
July 12, 2021

On Monday, 12 July 2021 at 05:33:22 UTC, ag0aep6g wrote:

>

Bug: You mixed up wstring and dstring. wstring is UTF-16. dstring is UTF-32.

I can't believe this one ... these lines were introduced almost a week ago LoL !

>

Style: typeStringUTF is a type, so it should start with a capital letter (TypeStringUTF).

Style is a personal preference; I am not following D style conventions (if any) nor do I follow any other language style conventions; I have my personal style and I apply it everywhere, I think it is not important which style you use, what is important in the end is that you adhere to your chosen style all the time -unless, of course, you are contributing to x project which states its own style and then there's no choice but to follow it.

> >

private size_t pintSequenceCount = cast(size_t) 0;
private size_t pintSequenceCurrent = cast(size_t) 0;

>

Style: There's no need for the casts (throughout).

I know. I do these primarily because of muscle memory and secondly because I try to write code thinking someone not knowing the language details may be porting it later so I tend to state the obvious; besides, it won't hurt, and it helps me in many ways.

> >

@safe public typeStringUTF encode() {

      scope typeStringUTF lstrSequence = null;
[...]
      return lstrSequence;

   }

Bug: scope makes no sense if you want to return lstrSequence (throughout).

Teach me please: if I declare a variable right after the function declaration like this one ... ain't scope its default visibility ? I understand (not quite sure whether correct or not right now) that everything you declare without explicitly stating its visibility (public/private/whatever) becomes scope ie: what in many languages are called a local variable. What actually is the visibility of lstrSequence without my scope declaration ?

> >

@safe public typeStringUTF toUTFtake(
scope const size_t lintStart,
scope const size_t lintCount = cast(size_t) 1
) {

>

Style: scope does nothing on size_t parameters (throughout).

A week ago I was using [in] almost everywhere for parameters, ain't [in] an alias for [scope const] ? Did I get it wrong ? I'm not talking style here, I'm talking unexpected (to me) functionality.

> >

scope size_t lintRange1 = lintStart - cast(size_t) 1;
scope size_t lintRange2 = lintRange1 + lintCount;

>

Possible bug: Why subtract 1?

Because ranges are zero-based for their first argument and one-based for their second; ie: something[n..m] where m should always be one-beyond than the one we want.

> >

if (lintRange1 >= cast(size_t) 0 && lintRange2 <= pintSequenceCount) {

>

Style: The first half of that condition is pointless. lintRange1 is unsigned, so it will always be greater than or equal to 0. If you want to defend against overflow, you have to do it before subtracting.

Indeed. Refactored the code (previously were int parameters) and got stuck in the wrong place !

All in all, thank you very much for your detailed reply, this kind of stuff is what helps me most understanding the language nuances :)

July 12, 2021

On Monday, 12 July 2021 at 22:35:27 UTC, someone wrote:

> >

Bug: scope makes no sense if you want to return lstrSequence (throughout).

Teach me please: if I declare a variable right after the function declaration like this one ... ain't scope its default visibility ? I understand (not quite sure whether correct or not right now) that everything you declare without explicitly stating its visibility (public/private/whatever) becomes scope ie: what in many languages are called a local variable. What actually is the visibility of lstrSequence without my scope declaration ?

Local variables don't have a visibility in the sense of public or private. They do have a 'scope' in the general computer science sense, and a variable can be said to be in or out of scope at different points in a program, but this is the case without regard for whether the variable is declared with D's scope. What scope says is https://dlang.org/spec/attribute.html#scope

>

For local declarations, scope ... means that the destructor for an object is automatically called when the reference to it goes out of scope.

The value of a normal, non-scope local variable has a somewhat indefinite lifetime: you have to examine the program and think about operations on the variable to be sure about that lifetime. Does it survive the function? Might it die even before the function completes? Does it live until the next GC collection or until the program ends? These are questions you can ask.

For a scope variable, the lifetime of its value ends with the scope of the variable.

Consider:

import std.stdio : writeln, writefln;
import std.conv : to;
import core.memory : pureMalloc, pureFree;

class Noisy {
    static int ids;
    int* id;
    this() {
        id = cast(int*) pureMalloc(int.sizeof);
        *id = ids++;
    }

    ~this() {
        writefln!"[%d] I perish."(*id);
        pureFree(id);
    }
}

Noisy f() {
    scope n = new Noisy;
    return n;
}

void main() {
    scope a = f();
    writeln("Checking a.n...");
    writefln!"a.n = %d"(*a.id);
}

Which has this output on my system:

[0] I perish.
Checking a.n...
Error: program killed by signal 11

Or with -preview=dip1000, this dmd output:

Error: scope variable `n` may not be returned

the lifetime of the Noisy object bound by scope n is the same as the scope of the variable, and the varaible goes out of scope when the function returns, so the Noisy object is destructed at that point.

July 12, 2021
On 7/12/21 3:35 PM, someone wrote:

>>> private size_t pintSequenceCurrent = cast(size_t) 0;
>
>> Style: There's no need for the casts (throughout).
>
> [...] besides, it won't hurt, and it helps me in many ways.

I think you are doing it only for literal values but in general, casts can be very cumbersome and harmful.

For example, if we change the parameter from 'int' to 'long', the cast in the function body is a bug to be chased and fixed:

// Used to be 'int arg'
void foo(long arg) {
  // ...
  auto a = cast(int)arg;  // BUG?
  // ...
}

void main() {
  foo(long.max);
}

Ali

July 12, 2021

On Monday, 12 July 2021 at 22:35:27 UTC, someone wrote:

>

On Monday, 12 July 2021 at 05:33:22 UTC, ag0aep6g wrote:
[...]
Teach me please: if I declare a variable right after the function declaration like this one ... ain't scope its default visibility ? I understand (not quite sure whether correct or not right now) that everything you declare without explicitly stating its visibility (public/private/whatever) becomes scope ie: what in many languages are called a local variable. What actually is the visibility of lstrSequence without my scope declaration ?

scope is not a visibility level.

lstrSequence is local to the function, so visibility (public, private, ...) doesn't even apply.

Most likely, you don't have any use for scope at the moment. You're obviously not compiling with -preview=dip1000. And neither should you, because the feature is not ready for a general audience yet.

[...]

> >

Style: scope does nothing on size_t parameters (throughout).

A week ago I was using [in] almost everywhere for parameters, ain't [in] an alias for [scope const] ? Did I get it wrong ? I'm not talking style here, I'm talking unexpected (to me) functionality.

I'm not sure where we stand with in, but let's say that it means scope const. The scope part of scope const still does nothing to a size_t. These are all the same: in size_t, const size_t, scope const size_t.

> > >

scope size_t lintRange1 = lintStart - cast(size_t) 1;
scope size_t lintRange2 = lintRange1 + lintCount;

>

Possible bug: Why subtract 1?

Because ranges are zero-based for their first argument and one-based for their second; ie: something[n..m] where m should always be one-beyond than the one we want.

That doesn't make sense. A length of zero is perfectly fine. It's just an empty range. You're making lintStart one-based for no reason.

July 12, 2021

On Monday, 12 July 2021 at 23:18:57 UTC, jfondren wrote:

>

On Monday, 12 July 2021 at 22:35:27 UTC, someone wrote:

> >

Bug: scope makes no sense if you want to return lstrSequence (throughout).

Teach me please: if I declare a variable right after the function declaration like this one ... ain't scope its default visibility ? I understand (not quite sure whether correct or not right now) that everything you declare without explicitly stating its visibility (public/private/whatever) becomes scope ie: what in many languages are called a local variable. What actually is the visibility of lstrSequence without my scope declaration ?

Local variables don't have a visibility in the sense of public or private. They do have a 'scope' in the general computer science sense, and a variable can be said to be in or out of scope at different points in a program, but this is the case without regard for whether the variable is declared with D's scope. What scope says is https://dlang.org/spec/attribute.html#scope

>

For local declarations, scope ... means that the destructor for an object is automatically called when the reference to it goes out of scope.

The value of a normal, non-scope local variable has a somewhat indefinite lifetime: you have to examine the program and think about operations on the variable to be sure about that lifetime. Does it survive the function? Might it die even before the function completes? Does it live until the next GC collection or until the program ends? These are questions you can ask.

For a scope variable, the lifetime of its value ends with the scope of the variable.

Consider:

import std.stdio : writeln, writefln;
import std.conv : to;
import core.memory : pureMalloc, pureFree;

class Noisy {
    static int ids;
    int* id;
    this() {
        id = cast(int*) pureMalloc(int.sizeof);
        *id = ids++;
    }

    ~this() {
        writefln!"[%d] I perish."(*id);
        pureFree(id);
    }
}

Noisy f() {
    scope n = new Noisy;
    return n;
}

void main() {
    scope a = f();
    writeln("Checking a.n...");
    writefln!"a.n = %d"(*a.id);
}

Which has this output on my system:

[0] I perish.
Checking a.n...
Error: program killed by signal 11

Or with -preview=dip1000, this dmd output:

Error: scope variable `n` may not be returned

the lifetime of the Noisy object bound by scope n is the same as the scope of the variable, and the varaible goes out of scope when the function returns, so the Noisy object is destructed at that point.

Some days ago I assumed scope was, as I previously stated, the local default scope, and explicitly added scope to all my local variables. Soon afterward I encountered a situation which gave me the "program killed by signal 11" which I did not fully-understand why it was happening at all, because it never occurred to me it was connected to my previous scope refactor. Now I understand.

Regarding -preview=dip1000 (and the explicit error description that could have helped me a lot back then) : DMD man page says the preview switch lists upcoming language features, so DIP1000 is something like a D proposal as I glanced somewhere sometime ago ... where do DIPs get listed (docs I mean) ?

So, every local variable within a chunk of code, say, a function, should be declared without anything else to avoid this type of behavior ? I mean, anything in code that it is not private/public/etc.

Or, as I presume, every local meaning aux variable that won't need to survive the function should be declared scope but not the one we are returning ... lstrSequence in my specific case ?

Can I declare everything scope within and on the last line using lstrSequence.dup instead ? dup/idup duplicates the variable (the first allowing mutability while the second not) right ?

Which one of the following approaches do you consider best practice if you were directed to explicitly state as much behavior as possible ?

Your reply with this example included was very illustrating to me -right to the point.

Thanks a lot for your time :) !

July 13, 2021
On Monday, 12 July 2021 at 23:25:13 UTC, Ali Çehreli wrote:
> On 7/12/21 3:35 PM, someone wrote:
>
> >>> private size_t pintSequenceCurrent = cast(size_t) 0;
> >
> >> Style: There's no need for the casts (throughout).
> >
> > [...] besides, it won't hurt, and it helps me in many ways.
>
> I think you are doing it only for literal values but in general, casts can be very cumbersome and harmful.

Cumbersome and harmful ... could you explain ?

> For example, if we change the parameter from 'int' to 'long', the cast in the function body is a bug to be chased and fixed:
>
> // Used to be 'int arg'
> void foo(long arg) {
>   // ...
>   auto a = cast(int)arg;  // BUG?
>   // ...
> }

nope, I'll never do such a downcast UNLESS I previously tested with if () {} for proper int range; I use cast a lot, but this is mainly because I am used to strongly-typed languages etc etc, for example if for whatever reason I have to:

ushort a = 250;
ubyte b = cast(ubyte) a;

I'll do:

ushort a = 250;
ubyte b = cast(ubyte) 0; /// redundant of course; but we don't have nulls in D for ints so this is muscle-memory
if (a <= 255) { /// or ubyte.max instead of 255 (I think it is possible)
   b = cast(ubyte) a;
}

> void main() {
>   foo(long.max);
> }
>
> Ali


July 13, 2021

On Monday, 12 July 2021 at 23:28:29 UTC, ag0aep6g wrote:

>

scope is not a visibility level.

Well, that explains why it is not listed among the visibility attributes to begin with -something that at first glance seemed weird to me.

>

lstrSequence is local to the function, so visibility (public, private, ...) doesn't even apply.

Being local to ... ain't imply visibility too regardless scope not being a visibility attribute ? I mean, scope is restricting the variable to be leaked outside the function/whatever and to me it seems like restricted to be seen from the outside. Please note that I am not making an argument against the implementation, I am just trying to understand why it is not being classified as another visibility attribute given that more-or-less has the same concept as a local variable like in other languages.

>

Most likely, you don't have any use for scope at the moment.

Almost sure if you say so given your vast knowledge of D against my humble first steps LoL.

>

You're obviously not compiling with -preview=dip1000.

Nope. I didn't knew it even existed.

>

And neither should you, because the feature is not ready for a general audience yet.

ACK.

>

[...]

> >

Style: scope does nothing on size_t parameters (throughout).

A week ago I was using [in] almost everywhere for parameters, ain't [in] an alias for [scope const] ? Did I get it wrong ? I'm not talking style here, I'm talking unexpected (to me) functionality.

I'm not sure where we stand with in

You mean we = D developers ?

>

but let's say that it means scope const

This I stated because I read it somewhere in the docs, it was not my assumption.

>

The scope part of scope const still does nothing to a size_t.
These are all the same:

>

in size_t
const size_t
scope const size_t

OK. Specifically to integers nothing then. But, what about strings and whatever else ? I put them more-or-less as a general rule or so was the idea when I replaced the in's in the parameters app-wide.

> > > >

scope size_t lintRange1 = lintStart - cast(size_t) 1;
scope size_t lintRange2 = lintRange1 + lintCount;

>

Possible bug: Why subtract 1?

Because ranges are zero-based for their first argument and one-based for their second; ie: something[n..m] where m should always be one-beyond than the one we want.

That doesn't make sense. A length of zero is perfectly fine. It's just an empty range. You're making lintStart one-based for no reason.

For a UDT like mine I think it has a lot of sense because when I think of a string and I want to chop/count/whatever on it my mind works one-based not zero-based. Say "abc" needs b my mind works a lot easier mid("abc", 2, 1) than mid("abc", 1, 1) and besides I am not returning a range or a reference slice to a range or whatever I am returning a whole new string construction. If I would be returning a range I will follow common sense since I don't know what will be done thereafter of course.