August 02, 2019
As it is mentioned in a previous post, this project has got hardly any attention. And since there was nothing to do, I did not post weekly.

=== Current State ===

--- Dmem* utilities ---

Fortunately, Nicholas Wilson has been helping me the last week get the 2 Dmem*
PRs I had done merged [1], [2]

I don't know of anything that these PRs need, although possibly I have done something wrong in the documentation.

I don't know if / when they will get merged since they're awaiting review.
I hope to have enough reviews to merge at least memset in the next 2.5 weeks.
And again, thanks a lot Nicholas for your time.

--- core.thread ---

Since there was nothing to do, I asked if there was anything that I could
do in the time. It was proposed that I could refactor core.thread.
With some help from Nicholas, I made a PR [3].
I'm glad that people seem to care about this change. It's going good I think.

=== Final 2.5 weeks ===

I honestly have no idea. Ideally, I would PR memmove() as well but I think it's
better to try to get at least one of the other 2 PRs merged first.
Other than that, if the core.thread gets merged, I will finish it.

One thing I proposed for the time remaining is a cross-compiler SIMD module.
I will write in a separate thread about that, but the idea came from the fact that when writing Dmem* utils, I could not find a way to use SIMD intrinsics
across compilers. So, I created something like a small SIMD library [4].
That is of course not really general, but it shows the idea.

[1] memset: https://github.com/dlang/druntime/pull/2662
[2] memcpy: https://github.com/dlang/druntime/pull/2687
[3] core.thread: https://github.com/dlang/druntime/pull/2689
[4] Mini SIMD module: https://github.com/dlang/druntime/pull/2687/files#diff-c2fcd73761ae6659ef91245ce1195b6d
September 05, 2019
On Friday, 2 August 2019 at 14:51:25 UTC, Stefanos Baziotis wrote:
> As it is mentioned in a previous post, this project has got hardly any attention. And since there was nothing to do, I did not post weekly.
>
> === Current State ===
>
> --- Dmem* utilities ---
>
> Fortunately, Nicholas Wilson has been helping me the last week get the 2 Dmem*
> PRs I had done merged [1], [2]
>
> I don't know of anything that these PRs need, although possibly I have done something wrong in the documentation.
>
> I don't know if / when they will get merged since they're awaiting review.
> I hope to have enough reviews to merge at least memset in the next 2.5 weeks.
> And again, thanks a lot Nicholas for your time.
>
> --- core.thread ---
>
> Since there was nothing to do, I asked if there was anything that I could
> do in the time. It was proposed that I could refactor core.thread.
> With some help from Nicholas, I made a PR [3].
> I'm glad that people seem to care about this change. It's going good I think.
>
> === Final 2.5 weeks ===
>
> I honestly have no idea. Ideally, I would PR memmove() as well but I think it's
> better to try to get at least one of the other 2 PRs merged first.
> Other than that, if the core.thread gets merged, I will finish it.
>
> One thing I proposed for the time remaining is a cross-compiler SIMD module.
> I will write in a separate thread about that, but the idea came from the fact that when writing Dmem* utils, I could not find a way to use SIMD intrinsics
> across compilers. So, I created something like a small SIMD library [4].
> That is of course not really general, but it shows the idea.
>
> [1] memset: https://github.com/dlang/druntime/pull/2662
> [2] memcpy: https://github.com/dlang/druntime/pull/2687
> [3] core.thread: https://github.com/dlang/druntime/pull/2689
> [4] Mini SIMD module: https://github.com/dlang/druntime/pull/2687/files#diff-c2fcd73761ae6659ef91245ce1195b6d

Is this project dead in the water? Great, another dead project in the graveyard of dead projects.
September 05, 2019
On Thursday, 5 September 2019 at 15:53:20 UTC, 12345swordy wrote:
> Is this project dead in the water? Great, another dead project in the graveyard of dead projects.

A dead project is a project that hasn't achieved its goals. This project
did twice, but both times the goals were not useful.
I explained that in the other thread [1].

Let's please only concern ourselves with constructive discussions from now on.

- Stefanos

[1] https://forum.dlang.org/post/triweshixkzzyxnaldlj@forum.dlang.org
September 05, 2019
On Thursday, 5 September 2019 at 16:26:08 UTC, Stefanos Baziotis wrote:
> On Thursday, 5 September 2019 at 15:53:20 UTC, 12345swordy wrote:
>> Is this project dead in the water? Great, another dead project in the graveyard of dead projects.
>
> A dead project is a project that hasn't achieved its goals. This project
> did twice, but both times the goals were not useful.
> I explained that in the other thread [1].
>
> Let's please only concern ourselves with constructive discussions from now on.
>
> - Stefanos
>
> [1] https://forum.dlang.org/post/triweshixkzzyxnaldlj@forum.dlang.org

Is the implementation of memory allocation of the C standard library ever going to be achieved?

- Alex
September 05, 2019
On Thursday, 5 September 2019 at 17:30:54 UTC, 12345swordy wrote:
>
> Is the implementation of memory allocation of the C standard library ever going to be achieved?
>
> - Alex

It depends on what you mean "achieved". Let me state some questions:
- Why do you want that memory allocator ?
- What this allocator should be able to achieve ?
- Why the libc one is not appropriate for the job ?
- Why no other allocator is appropriate for the job ?
- Can we create and maintain this allocator ?

These questions are presented humbly. And they are important.
The fact that I did not set and answer such questions firstly _to myself_
for the first part of the project, meant that I did the project twice,
yet all this work was just thrown away as far as the D community is concerned.

- Stefanos


September 05, 2019
On Thursday, 5 September 2019 at 17:30:54 UTC, 12345swordy wrote:
> On Thursday, 5 September 2019 at 16:26:08 UTC, Stefanos
>
> Is the implementation of memory allocation of the C standard library ever going to be achieved?
>
> - Alex

> ... mimalloc() of Microsoft:
> https://forum.dlang.org/thread/krsnngbaudausabfsqkn@forum.dlang.org
September 05, 2019
On Thursday, 5 September 2019 at 17:56:07 UTC, Stefanos Baziotis wrote:
> On Thursday, 5 September 2019 at 17:30:54 UTC, 12345swordy wrote:
>>
>> Is the implementation of memory allocation of the C standard library ever going to be achieved?
>>
>> - Alex
>
> It depends on what you mean "achieved". Let me state some questions:
> - Why do you want that memory allocator ?
> - What this allocator should be able to achieve ?
> - Why the libc one is not appropriate for the job ?
> - Why no other allocator is appropriate for the job ?
> - Can we create and maintain this allocator ?
>
> These questions are presented humbly. And they are important.
> The fact that I did not set and answer such questions firstly _to myself_
> for the first part of the project, meant that I did the project twice,
> yet all this work was just thrown away as far as the D community is concerned.
>
> - Stefanos

- It is easier to debug and read in the d langauge then in the c language.
- I was shown faster memory allocation speed compared to libc.
- other memory allocator are not part of d langauge standard library.

Most importantly a yet another disappointed development I seen in regards to the development of the d language.

- Alex

September 05, 2019
On Thu, Sep 05, 2019 at 08:16:24PM +0000, 12345swordy via Digitalmars-d wrote: [...]
> - It is easier to debug and read in the d langauge then in the c language.
> - I was shown faster memory allocation speed compared to libc.
> - other memory allocator are not part of d langauge standard library.
> 
> Most importantly a yet another disappointed development I seen in regards to the development of the d language.
[...]

Read the discussion that Stefanos referred to. Here are some of the key blocking issues:

- C library APIs like memcpy, memset, etc., are not only in the C
  library, but are often implemented as *intrinsics* in compilers. One
  of the most important effects of this is that optimizers recognize
  them and understand their semantics, and can sometimes produce better
  code because of that. For example:

	int x, y=5;
	memcpy(&x, &y, int.sizeof); // C version
	... // optimizer knows that now x==5.

  Using a D version of memcpy in the above code can mean that the
  optimizer does *not* recognize that x==5, which can lead to poorer
  performance.

- Even if the previous point isn't an issue, there's still the problem
  of maintenance: the D version of mem* needs to be continuously updated
  because hardware is constantly evolving, and it takes significant
  manpower to (1) port the implementation to every supported
  architecture, (2) make sure they take maximum advantage of the quirks
  of the targeted platform, and (3) checking that they are actually
  faster than the C implementations (which is available on basically any
  new platform anyway).

- D already has syntax for abstractly representing a memcpy operation:
  a[] = b[]. This syntax is type-safe, memory-safe, and the compiler can
  lower it to whatever it likes, including memcpy, or a custom
  implementation specialized for the target platform. That's where such
  primitives really belong, actually. (Historically they went into the C
  library, but these days compilers are more and more building them into
  intrinsics that can drive various codegen strategies (inlining,
  arch-specific optimizations, etc). They're gradually becoming more
  like low-level compiler primitives than your average C library
  functions.)

The current work Stefanos has produced has a big performance impact mainly only in DMD, which is known to have a weak optimizer, and anyone who cares about runtime performance ought to be using GDC or LDC anyway. In GDC/LDC using these custom D implementations wind up being worse because they defeat the respective optimizers (they no longer recognize memcpy/etc. semantics from these functions, so can't optimize based on that).  So lot of the effort ended up being directed towards working around flaws in DMD's optimizer rather than producing *actual* improvement over C's mem* primitives. This is really the wrong way to go about things IMO; we should rather be fixing DMD's optimizer instead. But once that's done there's even less reason to implement mem* ourselves.

Note that this does not preclude the D compiler from, e.g., translating statements like `a[] = b[]` into target-optimized instructions instead of calling a function named 'memcpy'.  I'd argue that it's the compiler's job (more specifically, the optimizer's job) to do the best translation of a[] = b[] into machine code, not the standard library's problem to account for N versions of M platforms in a gigantic unmaintainable block of static if'd (or version'd) custom implementation, whose only real value is to be able to pat ourselves in the back that yes, we have our own memcpy/memset/etc., implementation that we wrote in D, just because we can.  Porting the D compiler to a new architecture already requires codegen work anyway, and work on memory-copying/moving primitives really should be included under that umbrella, rather than poorly reinvented in the runtime library.


T

-- 
Curiosity kills the cat. Moral: don't be the cat.
September 05, 2019
On Thursday, 5 September 2019 at 20:16:24 UTC, 12345swordy wrote:
>
> - It is easier to debug and read in the d langauge then in the c language.
> - I was shown faster memory allocation speed compared to libc.
> - other memory allocator are not part of d langauge standard library.
>
> Most importantly a yet another disappointed development I seen in regards to the development of the d language.
>
> - Alex

Sorry, but IMHO, these reasons are not enough for me to start an allocator project.
You may want to consider that these reasons are not enough for you too and / or
the D community either.

The first one is subjective. Considering that we're part of the D community,
most of us would agree. But what is not subjective is how many people know
D vs e.g. C, meaning how many people can actually contribute.

For the second, I guess you mean "if you were shown". It's really very difficult
to create _and_ maintain a libc all-around equivalent in performance (for all
archs etc.). And even then, it probably is not a useful goal. Most people will
have the libc available if they care so much about performance.

Maybe a more useful goal would be to create a minimalistic allocator, which
is very different. And then you have to think if we actually need it.
I had asked a person who was working on WASM, which
would be one target if this moved forward and he told me that he could
do his job using the std.experimental.allocator.

For the third question, I'll reply with a question: So? :)

- Stefanos
September 05, 2019
On Thursday, 5 September 2019 at 21:17:07 UTC, H. S. Teoh wrote:

Thanks for the descriptive comment! Some comments from me:

>
> Read the discussion that Stefanos referred to. Here are some of the key blocking issues:
>
> - C library APIs like memcpy, memset, etc., are not only in the C
>   library, but are often implemented as *intrinsics* in compilers. One
>   of the most important effects of this is that optimizers recognize
>   them and understand their semantics, and can sometimes produce better
>   code because of that. For example:
>
> 	int x, y=5;
> 	memcpy(&x, &y, int.sizeof); // C version
> 	... // optimizer knows that now x==5.
>
>   Using a D version of memcpy in the above code can mean that the
>   optimizer does *not* recognize that x==5, which can lead to poorer
>   performance.
>
> - Even if the previous point isn't an issue, there's still the problem
>   of maintenance: the D version of mem* needs to be continuously updated
>   because hardware is constantly evolving, and it takes significant
>   manpower to (1) port the implementation to every supported
>   architecture, (2) make sure they take maximum advantage of the quirks
>   of the targeted platform, and (3) checking that they are actually
>   faster than the C implementations (which is available on basically any
>   new platform anyway).
>

- For the first 2, let me thank again Manu and Johan helped who me realize them! Note also that we don't currently know of a way of informing LLVM or GCC
about the semantics and thus get this optimization. The closest thing
we have is LLVM  recognizing that a function does what e.g. memcpy() does
by name. Which is a bad assumption to build upon.

> - D already has syntax for abstractly representing a memcpy operation:
>   a[] = b[]. This syntax is type-safe, memory-safe, and the compiler can
>   lower it to whatever it likes, including memcpy, or a custom
>   implementation specialized for the target platform. That's where such
>   primitives really belong, actually. (Historically they went into the C
>   library, but these days compilers are more and more building them into
>   intrinsics that can drive various codegen strategies (inlining,
>   arch-specific optimizations, etc). They're gradually becoming more
>   like low-level compiler primitives than your average C library
>   functions.)
>

AFAIK, this is implemented in the druntime. And the druntime
calls memcpy(). Essentially the goal of this project was to create
versions that would be used from the druntime, not the user. Other than that,
I agree!

> The current work Stefanos has produced has a big performance impact mainly only in DMD, which is known to have a weak optimizer,

Actually, when I was optimizing for DMD, I used assembly mainly because
I had to reach libc in performance. And using DMD, the only way to do
that is using assembly. A more useful goal would be to not try to reach
libc (certainly not in x86_64). Rather, create optimized versions
but using generic D. Meaning, to optimize purely based on algorithms,
with very few assumptions about the hardware. Much like MUSL.

> and anyone who cares about runtime performance ought to be using GDC or LDC anyway. In GDC/LDC using these custom D implementations wind up being worse because they defeat the respective optimizers (they no longer recognize memcpy/etc. semantics from these functions, so can't optimize based on that).

Actually, this project reached libc in LDC, GDC in 1-1 benchmarks using D
and SIMD functions (but not ASM). The problem is when used in context exactly
for the reasons you described.


> So lot of the effort ended up being directed towards working around flaws in DMD's optimizer rather than producing *actual* improvement over C's mem* primitives.

Yes essentially that was one of my first objections. To counter-act
the DMD flaws, you have to write ASM (if you want parity) which in turn
brings the question: Then why do it ? This is what libc already does.

> This is really the wrong way to go about things IMO; we should rather be fixing DMD's optimizer instead. But once that's done there's even less reason to implement mem* ourselves.

IMHO, I don't think that fixing the DMD optimizer is a good way to go.
Rather, as I said above, aim for generic D implementation, _without_ SIMD,
based purely on algorithms. This can be useful for systems that don't
have libc and since the DMD optimizer does not use intrinsics as LLVM / GCC,
the aforementioned problems, are not problems. Essentially, it's a win-win
situation.

- Stefanos