July 13, 2021

On Monday, 12 July 2021 at 23:45:57 UTC, someone wrote:

>

Regarding -preview=dip1000 (and the explicit error description that could have helped me a lot back then) : DMD man page says the preview switch lists upcoming language features, so DIP1000 is something like a D proposal as I glanced somewhere sometime ago ... where do DIPs get listed (docs I mean) ?

DIPs are handled in this repository:

https://github.com/dlang/DIPs

This is a list of every DIP that is going through or has gone through the review process:

https://github.com/dlang/DIPs/blob/master/DIPs/README.md

DIP1000 is here:

https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1000.md

But it doesn't describe the actual implementation, as described here:

https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1000.md#addendum

I don't know what all the differences are, as I haven't followed it.

>

So, every local variable within a chunk of code, say, a function, should be declared without anything else to avoid this type of behavior ? I mean, anything in code that it is not private/public/etc.

Not "without anything", but without scope---unless you're using -preview=dip1000, or unless you're applying it to class references (see below).

>

Or, as I presume, every local meaning aux variable that won't need to survive the function should be declared scope but not the one we are returning ... lstrSequence in my specific case ?

Can I declare everything scope within and on the last line using lstrSequence.dup instead ? dup/idup duplicates the variable (the first allowing mutability while the second not) right ?

Which one of the following approaches do you consider best practice if you were directed to explicitly state as much behavior as possible ?

Consider this example, which demonstrates the original purpose of scope prior to DIP 1000:

import std.stdio;

class C {
    int id;
    this(int id) { this.id = id; }
    ~this() { writeln("C destructor #", id); }
}

struct S {
    int id;
    this(int id) { this.id = id; }
    ~this() { writeln("S destructor #", id); }
}

void main()
{
    {
	C c1 = new C(1);
        scope c2 = new C(2);

        S s1 = S(1);
        S* s2 = new S(2);
        scope s3 = new S(3);

        writeln("The inner scope is exiting now.");
    }
    writeln("Main is exiting now.");
}

static ~this() { writeln("The GC will cleanup after this point."); }

Classes are reference types and must be allocated. c1 is allocated on the GC and lives beyond its scope. By applying the scope attribute to c2, its destructor is forced to execute when its scope exits. It is not allocated on the GC, but on the stack.

Structs are value types, so s1 is automatically allocated on the stack. Its destructor will be always be called when the scope exits. s2 is a pointer allocated on the GC heap, so its lifetime is managed by the GC and it exists beyond its scope. s3 is also of type S*. The scope attribute has no effect on it, and it is still managed by the GC. If you want stack allocation and RAII destructors for structs, you just use the default behavior like s1.

You can run it here:
https://run.dlang.io/is/iu7QiO

Someone else will have to explain what DIP 1000 actually does right now (if anyone really knows). What I'm certain about is that it prevents things like this:

void func() {
    int i = 10;
    int* pi = &i;
    return pi;

The compiler has always raised an error when it encountered something like return &i, but the above would slip by. With -preview=dip1000, that is also an error. But scope isn't needed on either variable for it to do so.

Beyond that, my knowledge of DIP 1000's implementation is limited. But I do know that scope has no effect on variables with no indirections. It's all about indirections (pointers & references).

At any rate, DIP 1000 is not yet ready for prime time. Getting it to that state is a current priority of the language maintainers. So for now, you probably just shouldn't worry about scope at all.

July 13, 2021

On Tuesday, 13 July 2021 at 01:03:11 UTC, someone wrote:

>

Being local to ... ain't imply visibility too regardless scope not being a visibility attribute ? I mean, scope is restricting the variable to be leaked outside the function/whatever and to me it seems like restricted to be seen from the outside. Please note that I am not making an argument against the implementation, I am just trying to understand why it is not being classified as another visibility attribute given that more-or-less has the same concept as a local variable like in other languages.

>

OK. Specifically to integers nothing then. But, what about strings and whatever else ? I put them more-or-less as a general rule or so was the idea when I replaced the in's in the parameters app-wide.

Hopefully, my post above will shed some light on this.

July 13, 2021

On Tuesday, 13 July 2021 at 02:22:46 UTC, Mike Parker wrote:

>

On Tuesday, 13 July 2021 at 01:03:11 UTC, someone wrote:

>

Being local to ... ain't imply visibility too regardless scope not being a visibility attribute ? I mean, scope is restricting the variable to be leaked outside the function/whatever and to me it seems like restricted to be seen from the outside.

And I meant to add... local variables are by default visible only inside the scope in which they are declared and, by extension, any inner scopes within that scope, and can never be visible outside.

{
    // Scope A
    // x can never be visible here
    {
        // Scope B
        int x;
        {
            // Scope C
            // x is visible here
        }
    }
}

The only possible use for your concept of scope applying to visibility would be to prevent x from being visible in in Scope C. But since we already have the private attribute, it would make more sense to use that instead, e.g., private int x would not be visible in scope C.

I don't know of any language that has that kind of feature, or if it would even be useful. But at any rate, there's no need for a visibility attribute to prevent outer scopes from seeing a local variable, as that's already impossible.

July 13, 2021

On Tuesday, 13 July 2021 at 02:22:46 UTC, Mike Parker wrote:

>

Hopefully, my post above will shed some light on this.

Yes Mike, a lot.

Your previous example was crystal-clear -it makes a lot of sense for some class usage scenarios I am thinking of but not for what I did with my example.

Now I understand a couple of things more clearly. I was using scope thinking it was something else -now glancing at my code using scope like the way I did is ... pointless; period. I am getting rid of all those statements.

Thanks a lot for your example and the links :) !

July 13, 2021

On Tuesday, 13 July 2021 at 02:34:07 UTC, Mike Parker wrote:

>

On Tuesday, 13 July 2021 at 02:22:46 UTC, Mike Parker wrote:

>

On Tuesday, 13 July 2021 at 01:03:11 UTC, someone wrote:

>

Being local to ... ain't imply visibility too regardless scope not being a visibility attribute ? I mean, scope is restricting the variable to be leaked outside the function/whatever and to me it seems like restricted to be seen from the outside.

And I meant to add... local variables are by default visible only inside the scope in which they are declared and, by extension, any inner scopes within that scope, and can never be visible outside.

{
    // Scope A
    // x can never be visible here
    {
        // Scope B
        int x;
        {
            // Scope C
            // x is visible here
        }
    }
}

Yes. This one I understood from the beginning -it was on Ali's book and previously I remember seeing it in Andrei's one too IIRC.

http://ddili.org/ders/d.en/name_space.html

The thing that I supposed started my confusion was the lack of a statement for it, nothing more; something like: whatever int x; ... it was more of form than concept.

>

The only possible use for your concept of scope applying to visibility would be to prevent x from being visible in in Scope C. But since we already have the private attribute, it would make more sense to use that instead, e.g., private int x would not be visible in scope C.

No. My concept is/was the same that the one above. It was form not function.

>

I don't know of any language that has that kind of feature, or if it would even be useful. But at any rate, there's no need for a visibility attribute to prevent outer scopes from seeing a local variable, as that's already impossible.

Me neither.

July 12, 2021
On 7/12/21 5:42 PM, someone wrote:

> On Monday, 12 July 2021 at 23:25:13 UTC, Ali Çehreli wrote:
>> On 7/12/21 3:35 PM, someone wrote:
>>
>> >>> private size_t pintSequenceCurrent = cast(size_t) 0;
>> >
>> >> Style: There's no need for the casts (throughout).
>> >
>> > [...] besides, it won't hurt, and it helps me in many ways.
>>
>> I think you are doing it only for literal values but in general, casts
>> can be very cumbersome and harmful.
>
> Cumbersome and harmful ... could you explain ?

Cumbersome because one has to make sure existing casts are correct after changing a type.

Harmful because it bypasses the compiler's type checking.

>> For example, if we change the parameter from 'int' to 'long', the cast
>> in the function body is a bug to be chased and fixed:
>>
>> // Used to be 'int arg'
>> void foo(long arg) {
>>   // ...
>>   auto a = cast(int)arg;  // BUG?
>>   // ...
>> }
>
> nope, I'll never do such a downcast

The point was, nobody did a downcast in that code. The original parameter was 'int' so cast(int) was "correct" initially. Then somebody charnged the parameter to "long" and the cast became potentially harmful.

> UNLESS I previously tested with if
> () {} for proper int range; I use cast a lot, but this is mainly because
> I am used to strongly-typed languages etc etc,

Hm. I am used to strongly-typed languages as well and that's exactly why I *avoid* casts as much as possible. :)

> for example if for
> whatever reason I have to:
>
> ushort a = 250;
> ubyte b = cast(ubyte) a;
>
> I'll do:
>
> ushort a = 250;
> ubyte b = cast(ubyte) 0; /// redundant of course; but we don't have

We have a different way of looking at this. :) My first preference would be:

 ubyte b;

This alternative has less typing than your method and is easier to change the code because 'ubyte' appears only in one place. (DRY principle.)

  auto b = ubyte(0);

Another alternative:

  auto b = ubyte.init;

Ali


July 13, 2021
On 13.07.21 03:03, someone wrote:
> On Monday, 12 July 2021 at 23:28:29 UTC, ag0aep6g wrote:
[...]
>> I'm not sure where we stand with `in`
> 
> You mean *we* = D developers ?

Yes. Let me rephrase and elaborate: I'm not sure what the current status of `in` is. It used to mean `const scope`. But DIP1000 changes the effects of `scope` and there was some discussion about its relation to `in`.

Checking the spec, it says that `in` simply means `const` unless you use `-preview=in`. The preview switch makes it `const scope` again, but that's not all. There's also something about passing by reference.

https://dlang.org/spec/function.html#in-params

[...]
> For a UDT like mine I think it has a lot of sense because when I think of a string and I want to chop/count/whatever on it my mind works one-based not zero-based. Say "abc" needs b my mind works a lot easier mid("abc", 2, 1) than mid("abc", 1, 1) and besides I am *not* returning a range or a reference slice to a range or whatever I am returning a whole new string construction. If I would be returning a range I will follow common sense since I don't know what will be done thereafter of course.

I think you're setting yourself up for off-by-one bugs by going against the grain like that. Your functions are one-based. The rest of the D world, including the standard library, is zero-based. You're bound to forget to account for the difference.

But it's your code, and you can do whatever you want, of course. Just looked like it might be a mistake.
July 13, 2021
On Tuesday, 13 July 2021 at 05:26:56 UTC, Ali Çehreli wrote:

> Cumbersome because one has to make sure existing casts are correct after changing a type.

ACK.

> Harmful because it bypasses the compiler's type checking.

Hmmm ... I'll be reconsidering my cast usage approach then.

> >> For example, if we change the parameter from 'int' to
> 'long', the cast
> >> in the function body is a bug to be chased and fixed:
> >>
> >> // Used to be 'int arg'
> >> void foo(long arg) {
> >>   // ...
> >>   auto a = cast(int)arg;  // BUG?
> >>   // ...
> >> }
> >
> > nope, I'll never do such a downcast
>
> The point was, nobody did a downcast in that code. The original parameter was 'int' so cast(int) was "correct" initially. Then somebody charnged the parameter to "long" and the cast became potentially harmful.

ACK.

> > UNLESS I previously tested with if
> > () {} for proper int range; I use cast a lot, but this is
> mainly because
> > I am used to strongly-typed languages etc etc,
>
> Hm. I am used to strongly-typed languages as well and that's exactly why I *avoid* casts as much as possible. :)
>
> > for example if for
> > whatever reason I have to:
> >
> > ushort a = 250;
> > ubyte b = cast(ubyte) a;
> >
> > I'll do:
> >
> > ushort a = 250;
> > ubyte b = cast(ubyte) 0; /// redundant of course; but we
> don't have
>
> We have a different way of looking at this. :) My first preference would be:
>
>  ubyte b;
>
> This alternative has less typing than your method and is easier to change the code because 'ubyte' appears only in one place. (DRY principle.)
>
>   auto b = ubyte(0);
>
> Another alternative:
>
>   auto b = ubyte.init;

ACK. I'll be revisiting the whole matter. I just re-read your http://ddili.org/ders/d.en/cast.html chapter. I did not have a clear understanding between the difference of to!(...) and cast() for example; and, re-reading integer promotion and arithmetic conversions refreshed my knowledge at this point.

> Ali


July 13, 2021
On Tuesday, 13 July 2021 at 05:37:49 UTC, ag0aep6g wrote:
> On 13.07.21 03:03, someone wrote:
>> On Monday, 12 July 2021 at 23:28:29 UTC, ag0aep6g wrote:
> [...]
>>> I'm not sure where we stand with `in`
>> 
>> You mean *we* = D developers ?
>
> Yes. Let me rephrase and elaborate: I'm not sure what the current status of `in` is. It used to mean `const scope`. But DIP1000 changes the effects of `scope` and there was some discussion about its relation to `in`.
>
> Checking the spec, it says that `in` simply means `const` unless you use `-preview=in`. The preview switch makes it `const scope` again, but that's not all. There's also something about passing by reference.
>
> https://dlang.org/spec/function.html#in-params

ACK. So for the time being I'll be reverting all my input parameters to const (unless ref or out of course) and when the whole in DIP matter resolves (one way or the other) I'll revert them (or not) accordingly. Parameters declared in read more naturally (and akin to out) than const but is form not function what I need to get right right now.

>> For a UDT like mine I think it has a lot of sense because when I think of a string and I want to chop/count/whatever on it my mind works one-based not zero-based. Say "abc" needs b my mind works a lot easier mid("abc", 2, 1) than mid("abc", 1, 1) and besides I am *not* returning a range or a reference slice to a range or whatever I am returning a whole new string construction. If I would be returning a range I will follow common sense since I don't know what will be done thereafter of course.
>
> I think you're setting yourself up for off-by-one bugs by going against the grain like that. Your functions are one-based. The rest of the D world, including the standard library, is zero-based. You're bound to forget to account for the difference.

And I think you have a good point. I'll reconsider.

> But it's your code, and you can do whatever you want, of course. Just looked like it might be a mistake.

All in all the whole module was updated accordingly and it seems it is working as expected (further testing needed) but, in the meantime, I learned a lot of things following the advice given by you, Ali, and others in this forum:

```d
/// implementation-bugs [-] using foreach (with this structure) always misses the last grapheme‐cluster … possible phobos bug # 20483 @ unittest's last line

/// implementation‐tasks [+] reconsider making this whole UDT zero‐based as suggested by ag0aep6g—has a good point
/// implementation‐tasks [+] reconsider excessive cast usage as suggested by Ali: bypassing compiler checks could be potentially harmful … cast and integer promotion @ http://ddili.org/ders/d.en/cast.html
/// implementation‐tasks [-] for the time being input parameters are declared const instead of in; eventually they'll be back to in when the related DIP was setted once and for all; but, definetely—not scope const

/// implementation‐tasks‐possible [-] pad[L|R]
/// implementation‐tasks‐possible [-] replicate/repeat
/// implementation‐tasks‐possible [-] replace(string, string)
/// implementation‐tasks‐possible [-] translate(string, string) … same‐size strings matching one‐to‐one

/// usage: array slicing can be used for usual things like: left() right() substr() etc … mainly when grapheme‐clusters are not expected at all
/// usage: array slicing needs a zero‐based first range argument and a second one one‐based (or one‐past‐beyond; which it is somehow … counter‐intuitive

module fw.types.UniCode;

import std.algorithm : map, joiner;
import std.array : array;
import std.conv : to;
import std.range : walkLength, take, tail, drop, dropBack; /// repeat, padLeft, padRight
import std.stdio;
import std.uni : Grapheme, byGrapheme;

/// within this file: gudtUGC



shared static this() { }  /// the following will be executed only‐once per‐app:
static this() { }         /// the following will be executed only‐once per‐thread:
static ~this() { }        /// the following will be executed only‐once per‐thread:
shared static ~this() { } /// the following will be executed only‐once per‐app:



alias stringUGC = Grapheme;
alias stringUGC08 = gudtUGC!(stringUTF08);
alias stringUGC16 = gudtUGC!(stringUTF16);
alias stringUGC32 = gudtUGC!(stringUTF32);
alias stringUTF08 = string;  /// same as immutable(char )[];
alias stringUTF16 = wstring; /// same as immutable(wchar)[];
alias stringUTF32 = dstring; /// same as immutable(dchar)[];

/// mixin templateUGC!(stringUTF08, r"gudtUGC08"d);
/// mixin templateUGC!(stringUTF16, r"gudtUGC16"d);
/// mixin templateUGC!(stringUTF32, r"gudtUGC32"d);
/// template templateUGC (typeStringUTF, alias lstrStructureID) { /// if these were possible there will be no need for stringUGC## aliases in main()

public struct gudtUGC(typeStringUTF) { /// UniCode grapheme‐cluster‐aware string manipulation (implemented for one‐based operations)

   /// provides: public property size_t count

   /// provides: public size_t decode(typeStringUTF strSequence)
   /// provides: public typeStringUTF encode()

   /// provides: public gudtUGC!(typeStringUTF) take(size_t intStart, size_t intCount = 1)
   /// provides: public gudtUGC!(typeStringUTF) takeL(size_t intCount)
   /// provides: public gudtUGC!(typeStringUTF) takeR(size_t intCount)
   /// provides: public gudtUGC!(typeStringUTF) chopL(size_t intCount)
   /// provides: public gudtUGC!(typeStringUTF) chopR(size_t intCount)
   /// provides: public gudtUGC!(typeStringUTF) padL(size_t intCount, typeStringUTF strPadding = r" ")
   /// provides: public gudtUGC!(typeStringUTF) padR(size_t intCount, typeStringUTF strPadding = r" ")

   /// provides: public typeStringUTF takeasUTF(size_t intStart, size_t intCount = 1)
   /// provides: public typeStringUTF takeLasUTF(size_t intCount)
   /// provides: public typeStringUTF takeRasUTF(size_t intCount)
   /// provides: public typeStringUTF chopLasUTF(size_t intCount)
   /// provides: public typeStringUTF chopRasUTF(size_t intCount)
   /// provides: public typeStringUTF padL(size_t intCount, typeStringUTF strPadding = r" ")
   /// provides: public typeStringUTF padR(size_t intCount, typeStringUTF strPadding = r" ")

   /// usage; eg: stringUGC32("äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese"d).take(35, 3).take(1,2).take(1,1).encode(); /// 日
   /// usage; eg: stringUGC32("äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese"d).take(35).encode(); /// 日
   /// usage; eg: stringUGC32("äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese"d).takeasUTF(35); /// 日

   void popFront() { ++pintSequenceCurrent; }
   bool empty() { return pintSequenceCurrent == pintSequenceCount; }
   typeStringUTF front() { return takeasUTF(pintSequenceCurrent); }

   private stringUGC[] pugcSequence;
   private size_t pintSequenceCount = cast(size_t) 0;
   private size_t pintSequenceCurrent = cast(size_t) 0;

   @property public size_t count() { return pintSequenceCount; }

   this(
      const typeStringUTF lstrSequence
      ) {

      /// (1) given UTF‐encoded sequence

      decode(lstrSequence);

   }

   @safe public size_t decode( /// UniCode (UTF‐encoded → grapheme‐cluster) sequence
      const typeStringUTF lstrSequence
      ) {

      /// (1) given UTF‐encoded sequence

      size_t lintSequenceCount = cast(size_t) 0;

      if (lstrSequence is null) {

         pugcSequence = null;
         pintSequenceCount = cast(size_t) 0;
         pintSequenceCurrent = cast(size_t) 0;

      } else {

         pugcSequence = lstrSequence.byGrapheme.array;
         pintSequenceCount = pugcSequence.walkLength;
         pintSequenceCurrent = cast(size_t) 1;

         lintSequenceCount = pintSequenceCount;

      }

      return lintSequenceCount;

   }

   @safe public typeStringUTF encode() { /// UniCode (grapheme‐cluster → UTF‐encoded) sequence

      typeStringUTF lstrSequence = null;

      if (pintSequenceCount >= cast(size_t) 1) {

         lstrSequence = pugcSequence
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

   @safe public gudtUGC!(typeStringUTF) take( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence
      const size_t lintStart,
      const size_t lintCount = cast(size_t) 1
      ) {

      /// (1) given start position >= 1
      /// (2) given count >= 1

      gudtUGC!(typeStringUTF) lugcSequence;

      if (lintStart >= cast(size_t) 1 && lintCount >= cast(size_t) 1) {

         /// eg#1: takeasUTF(1,3) → range#1=start-1=1-1=0 and range#2=range#1+count=0+3=3 → 0..3
         /// eg#1: takeasUTF(6,3) → range#2=start-1=6-1=5 and range#2=range#1+count=5+3=8 → 5..8

         /// eg#2: takeasUTF(01,1) → range#1=start-1=01-1=00 and range#2=range#1+count=00+1=01 → 00..01
         /// eg#2: takeasUTF(50,1) → range#2=start-1=50-1=49 and range#2=range#1+count=49+1=50 → 49..50

         size_t lintRange1 = lintStart - cast(size_t) 1;
         size_t lintRange2 = lintRange1 + lintCount;

         if (lintRange2 <= pintSequenceCount) {

            lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence[lintRange1..lintRange2]
               .map!((ref g) => g[])
               .joiner
               .to!(typeStringUTF)
               );

         }

      }

      return lugcSequence;

   }

   @safe public gudtUGC!(typeStringUTF) takeL( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      gudtUGC!(typeStringUTF) lugcSequence;

      if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence
            .take(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            );

      }

      return lugcSequence;

   }

   @safe public gudtUGC!(typeStringUTF) takeR( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      gudtUGC!(typeStringUTF) lugcSequence;

      if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence
            .tail(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            );

      }

      return lugcSequence;

   }

   @safe public gudtUGC!(typeStringUTF) chopL( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      gudtUGC!(typeStringUTF) lugcSequence;

      if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence
            .drop(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            );

      }

      return lugcSequence;

   }

   @safe public gudtUGC!(typeStringUTF) chopR( /// UniCode (grapheme‐cluster → grapheme‐cluster) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      gudtUGC!(typeStringUTF) lugcSequence;

      if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lugcSequence = gudtUGC!(typeStringUTF)(pugcSequence
            .dropBack(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            );

      }

      return lugcSequence;

   }

   @safe public typeStringUTF takeasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintStart,
      const size_t lintCount = cast(size_t) 1
      ) {

      /// (1) given start position >= 1
      /// (2) given count >= 1

      typeStringUTF lstrSequence = null;

      if (lintStart >= cast(size_t) 1 && lintCount >= cast(size_t) 1) {

         /// eg#1: takeasUTF(1,3) → range#1=start-1=1-1=0 and range#2=range#1+count=0+3=3 → 0..3
         /// eg#1: takeasUTF(6,3) → range#2=start-1=6-1=5 and range#2=range#1+count=5+3=8 → 5..8

         /// eg#2: takeasUTF(01,1) → range#1=start-1=01-1=00 and range#2=range#1+count=00+1=01 → 00..01
         /// eg#2: takeasUTF(50,1) → range#2=start-1=50-1=49 and range#2=range#1+count=49+1=50 → 49..50

         size_t lintRange1 = lintStart - cast(size_t) 1;
         size_t lintRange2 = lintRange1 + lintCount;

         if (lintRange2 <= pintSequenceCount) {

            lstrSequence = pugcSequence[lintRange1..lintRange2]
               .map!((ref g) => g[])
               .joiner
               .to!(typeStringUTF)
               ;

         }

      }

      return lstrSequence;

   }

   @safe public typeStringUTF takeLasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      typeStringUTF lstrSequence = null;

      if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lstrSequence = pugcSequence
            .take(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

   @safe public typeStringUTF takeRasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      typeStringUTF lstrSequence = null;

      if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lstrSequence = pugcSequence
            .tail(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

   @safe public typeStringUTF chopLasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      typeStringUTF lstrSequence = null;

      if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lstrSequence = pugcSequence
            .drop(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

   @safe public typeStringUTF chopRasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintCount
      ) {

      /// (1) given count >= 1

      typeStringUTF lstrSequence = null;

      if (lintCount >= cast(size_t) 1 && lintCount <= pintSequenceCount) {

         lstrSequence = pugcSequence
            .dropBack(lintCount)
            .map!((ref g) => g[])
            .joiner
            .to!(typeStringUTF)
            ;

      }

      return lstrSequence;

   }

   @safe public typeStringUTF padLasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintCount,
      const typeStringUTF lstrPadding = cast(typeStringUTF) r" "
      ) {

      /// (1) given count >= 1
      /// [2] given padding (default is a single blank space)

      typeStringUTF lstrSequence = null;

      if (lintCount >= cast(size_t) 1 && lintCount > pintSequenceCount) {

         lstrSequence = null; /// pending

      }

      return lstrSequence;

   }

   @safe public typeStringUTF padRasUTF( /// UniCode (grapheme‐cluster → UTF‐encoded) sequence
      const size_t lintCount,
      const typeStringUTF lstrPadding = cast(typeStringUTF) r" "
      ) {

      /// (1) given count >= 1
      /// [2] given padding (default is a single blank space)

      typeStringUTF lstrSequence = null;

      if (lintCount >= cast(size_t) 1 && lintCount > pintSequenceCount) {

         lstrSequence = null; /// pending

      }

      return lstrSequence;

   }

}

unittest {

   version (useUTF08) {
   stringUTF08 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"c;
   stringUTF08 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"c;
   stringUTF08 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"c;
   }

   version (useUTF16) {
   stringUTF16 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"w;
   stringUTF16 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"w;
   stringUTF16 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"w;
   }

   version (useUTF32) {
   stringUTF32 lstrSequence1 = r"12345678901234567890123456789012345678901234567890"d;
   stringUTF32 lstrSequence2 = r"1234567890АВГДЕЗИЙКЛABCDEFGHIJabcdefghijQRSTUVWXYZ"d;
   stringUTF32 lstrSequence3 = "äëåčñœß … russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎"d;
   }

   size_t lintSequence1sizeUTF = lstrSequence1.length;
   size_t lintSequence2sizeUTF = lstrSequence2.length;
   size_t lintSequence3sizeUTF = lstrSequence3.length;

   size_t lintSequence1sizeUGA = lstrSequence1.walkLength;
   size_t lintSequence2sizeUGA = lstrSequence2.walkLength;
   size_t lintSequence3sizeUGA = lstrSequence3.walkLength;

   size_t lintSequence1sizeUGC = lstrSequence1.byGrapheme.walkLength;
   size_t lintSequence2sizeUGC = lstrSequence2.byGrapheme.walkLength;
   size_t lintSequence3sizeUGC = lstrSequence3.byGrapheme.walkLength;

   assert(lintSequence1sizeUGC == cast(size_t) 50);
   assert(lintSequence2sizeUGC == cast(size_t) 50);
   assert(lintSequence3sizeUGC == cast(size_t) 50);

   assert(lintSequence1sizeUGA == cast(size_t) 50);
   assert(lintSequence2sizeUGA == cast(size_t) 50);
   assert(lintSequence3sizeUGA == cast(size_t) 52);

   version (useUTF08) {
   assert(lintSequence1sizeUTF == cast(size_t) 50);
   assert(lintSequence2sizeUTF == cast(size_t) 60);
   assert(lintSequence3sizeUTF == cast(size_t) 91);
   }

   version (useUTF16) {
   assert(lintSequence1sizeUTF == cast(size_t) 50);
   assert(lintSequence2sizeUTF == cast(size_t) 50);
   assert(lintSequence3sizeUTF == cast(size_t) 57);
   }

   version (useUTF32) {
   assert(lintSequence1sizeUTF == cast(size_t) 50);
   assert(lintSequence2sizeUTF == cast(size_t) 50);
   assert(lintSequence3sizeUTF == cast(size_t) 52);
   }

   /// the following should be the same regardless of the encoding being used and is the whole point of this UDT being made:

   version (useUTF08) { alias stringUTF = stringUTF08; stringUGC08 lugcSequence3 = stringUGC08(lstrSequence3); }
   version (useUTF16) { alias stringUTF = stringUTF16; stringUGC16 lugcSequence3 = stringUGC16(lstrSequence3); }
   version (useUTF32) { alias stringUTF = stringUTF32; stringUGC32 lugcSequence3 = stringUGC32(lstrSequence3); }

   assert(lugcSequence3.encode() == lstrSequence3);

   assert(lugcSequence3.take(35, 3).take(1,2).take(1,1).encode() == cast(stringUTF) r"日");

   assert(lugcSequence3.take(21).encode() == cast(stringUTF) r"р");
   assert(lugcSequence3.take(27).encode() == cast(stringUTF) r"й");
   assert(lugcSequence3.take(35).encode() == cast(stringUTF) r"日");
   assert(lugcSequence3.take(37).encode() == cast(stringUTF) r"語");
   assert(lugcSequence3.take(21, 7).encode() == cast(stringUTF) r"русский");
   assert(lugcSequence3.take(35, 3).encode() == cast(stringUTF) r"日本語");

   assert(lugcSequence3.takeasUTF(21) == cast(stringUTF) r"р");
   assert(lugcSequence3.takeasUTF(27) == cast(stringUTF) r"й");
   assert(lugcSequence3.takeasUTF(35) == cast(stringUTF) r"日");
   assert(lugcSequence3.takeasUTF(37) == cast(stringUTF) r"語");
   assert(lugcSequence3.takeasUTF(21, 7) == cast(stringUTF) r"русский");
   assert(lugcSequence3.takeasUTF(35, 3) == cast(stringUTF) r"日本語");

   assert(lugcSequence3.takeL(1).encode() == cast(stringUTF) r"ä");
   assert(lugcSequence3.takeR(1).encode() == cast(stringUTF) r"😎");
   assert(lugcSequence3.takeL(7).encode() == cast(stringUTF) r"äëåčñœß");
   assert(lugcSequence3.takeR(16).encode() == cast(stringUTF) r"日本語 = japanese 😎");

   assert(lugcSequence3.takeLasUTF(1) == cast(stringUTF) r"ä");
   assert(lugcSequence3.takeRasUTF(1) == cast(stringUTF) r"😎");
   assert(lugcSequence3.takeLasUTF(7) == cast(stringUTF) r"äëåčñœß");
   assert(lugcSequence3.takeRasUTF(16) == cast(stringUTF) r"日本語 = japanese 😎");

   assert(lugcSequence3.chopL(10).encode() == cast(stringUTF) r"russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎");
   assert(lugcSequence3.chopR(21).encode() == cast(stringUTF) r"äëåčñœß … russian = русский 🇷🇺");

   assert(lugcSequence3.chopLasUTF(10) == cast(stringUTF) r"russian = русский 🇷🇺 ≠ 🇯🇵 日本語 = japanese 😎");
   assert(lugcSequence3.chopRasUTF(21) == cast(stringUTF) r"äëåčñœß … russian = русский 🇷🇺");

   version (useUTF08) { stringUTF08 lstrSequence3reencoded; }
   version (useUTF16) { stringUTF16 lstrSequence3reencoded; }
   version (useUTF32) { stringUTF32 lstrSequence3reencoded; }

   for (
      size_t lintSequenceUGC = cast(size_t) 1;
      lintSequenceUGC <= lintSequence3sizeUGC;
      ++lintSequenceUGC
      ) {

      lstrSequence3reencoded ~= lugcSequence3.takeasUTF(lintSequenceUGC);

   }

   assert(lstrSequence3reencoded == lstrSequence3);

   lstrSequence3reencoded = null;

   version (useUTF08) { foreach (stringUTF08 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } }
   version (useUTF16) { foreach (stringUTF16 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } }
   version (useUTF32) { foreach (stringUTF32 lstrSequence3UGC; lugcSequence3) { lstrSequence3reencoded ~= lstrSequence3UGC; } }

   //assert(lstrSequence3reencoded == lstrSequence3); /// ooops … always missing last grapheme‐cluster: possible bug # 20483

}
```
1 2 3
Next ›   Last »