June 03, 2019
On Monday, 3 June 2019 at 22:45:28 UTC, Andrei Alexandrescu wrote:

> At 512 lines including tests, it seems on the involved side. The benchmarks ought to show a hefty improvement to match. Are there benchmark results available?
>
> Quoting the rationale from the motivation in another thread:
>
> 1) C’s implementations are not type-safe and memory-safe.
> 2) C’s implementations have accumulated a lot of cruft over the years.
> 3) Cross-compiling is more difficult as now one should have available and configured a C runtime and toolchain apart from the D runtime. This makes it difficult for D to create freestanding software.
>
> And then the listed advantages of using D for implementation (renumbered):
>
> 4) Type-safety and memory safety (bounds-checking etc.)
> 5) Templates to branch to an optimal implementation at compile-time.
> 6) Inlining, as the branching in C happens at runtime.
> 7) Compile-Time Function Execution (CTFE) and introspection (type info).
>
> My view on formulating motivation is simple: do it like a scientist. Argue the facts. If facts are not available, argue fundaments and universal principles. If such are not available, the motivation is too weak.
>
> (1) checks the "facts" box but has the obvious comeback "then how about a 2-line trusted wrapper over memcpy?" that needs to be explained. Related, obviously people who reach for memcpy() are often not looking for a safe primitive. a[] = b[] is safe, syntactically simple, and could lower to anything including memcpy.
>
> (2) is quite specious and really needs some evidence. Is cruft in memcpy really an issue? I looked memcpy() implementations a while ago but didn't save bookmarks. Did a google search just now and found https://github.com/gcc-mirror/gcc/blob/master/libgcc/memcpy.c, which is very far from cruft-ridden. I do remember elaborate implementations of memcpy but so are (somewhat ironically) the 512 lines of the proposed implementation. I found one here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/lib/memcpy_64.S?id=HEAD
>
> No idea of its level of cruftiness, where it's used etc. The right way to argue (2) is to provide links to implementations that people can look at and decide without doubt, "yep, crufty".
>
> (3) is... odd. Doesn't every machine ever come with a C implementation including a ready-to-link standard library? If not, isn't that a rarity? Again, that should be argued preemptively by the motivation section.
>
> (4) brings again the wrapper argument
> (5) is nice if and only if confirmed by benchmarks
> (6) is also nice under the same conditions as (5)
> (7) again... what's wrong with a wrapper that does if (__ctfe)
>
> These considerations are built with memcpy() in mind. With malloc() we're looking at a completely different ballgame. Implementing malloc() from scratch is a very serious project that needs almost overwhelming motivation. The goal of std.experimental.allocator was to offer a flexible framework for implementing general and specialized allocators, but simply replacing malloc() is more difficult to argue. Also, achieving comparable performance will be difficult.

Stefanos, everything Andrei has said here is correct, but it is missing some perspective and does not consider everything we've discussed.  Please STAY THE COURSE!  Do not let this post discourage you.  The time for questioning the merits of this proposal was 2 months ago; not now.  Now that it is a full-fledged GSoC project you are tasked to do the best you can.

Andrei, I agree with everything you've said, but there's more to take into consideration.  I have a response to some of the items you've mentioned, and maybe I'll post that later.

For now allow me to express that I'm quite disappointed that you are questioning the merit of this proposal when the time to do so was 2 months ago when the GSoC projects were being reviewed, and you were supposed to participate.  The GSoC project is well underway and Stefanos now needs to see the project through to completion regardless of what anyone thinks about it.  Please don't undermine this project or diminish the morale of our students with such posts.

At the moment we need feedback on the actual memcpy implementation, not whether you think this project is a good idea or not.

Stefanos, please don't let this post discourage you.  Please STAY ON TASK.

Mike
June 03, 2019
On Sunday, 2 June 2019 at 11:19:20 UTC, Sebastiaan Koppe wrote:

> What I am trying to say is that you can avoid porting the whole thing.

Yes, that is understood.  Only what is required to implement a malloc replacement is within the scope of the project.

>> use the `IAllocator` interface.  Therefore, any allocator conforming to that interface could potentially serve as druntime's allocator.
>
> I am not a big fan of the IAllocator interface since it introduces a layer of indirection. There is no simple solution to avoid the indirection and get a pluggable allocator. Well, maybe a combination of ldc's @weak and LTO. Dunno...
>
> https://wiki.dlang.org/LDC-specific_language_changes#.40.28ldc.attributes.weak.29
> http://johanengelen.github.io/ldc/2016/11/10/Link-Time-Optimization-LDC.html

The project is pressed for time, so I'd like to stick with something known and well-documented.  Perhaps IAllocator is not the right solution in the end, but still, implementing it and seeing how it fits into druntime should inform future directions, and perhaps even elicit some new ideas.

Mike
June 03, 2019
On 6/3/19 7:35 PM, Mike Franklin wrote:
> Andrei, I agree with everything you've said, but there's more to take into consideration.  I have a response to some of the items you've mentioned, and maybe I'll post that later.

My point was not to cast doubt and debate away in the forum (I think we really should avoid the "weak writeup/strong forum argument" pattern like the plague), but instead to help improve the writeup of the motivation. Ideally the pertinent parts should be used to improve that section in the project. Essentially someone reading the motivation should gather a good understanding and have most relevant questions preempted.
June 04, 2019
On Monday, 3 June 2019 at 23:35:21 UTC, Mike Franklin wrote:
>
> Stefanos, everything Andrei has said here is correct, but it is missing some perspective and does not consider everything we've discussed.  Please STAY THE COURSE!  Do not let this post discourage you.  The time for questioning the merits of this proposal was 2 months ago; not now.  Now that it is a full-fledged GSoC project you are tasked to do the best you can.
>
> Andrei, I agree with everything you've said, but there's more to take into consideration.  I have a response to some of the items you've mentioned, and maybe I'll post that later.
>
> For now allow me to express that I'm quite disappointed that you are questioning the merit of this proposal when the time to do so was 2 months ago when the GSoC projects were being reviewed, and you were supposed to participate.  The GSoC project is well underway and Stefanos now needs to see the project through to completion regardless of what anyone thinks about it.  Please don't undermine this project or diminish the morale of our students with such posts.
>
> At the moment we need feedback on the actual memcpy implementation, not whether you think this project is a good idea or not.
>
> Stefanos, please don't let this post discourage you.  Please STAY ON TASK.

Thank you very much Mike! Andrei, I hear you as well and thank you for the feedback!

I want to say this. Before about 5 days, _I_ was even unsure about the goals.
And in the last 5 days of writing code, I'm getting more and more unsure.
However, Mike is so ridiculously helpful that the last thing I want is to:
- Sound disheartening.
- Sound like the guy that was picked and doesn't believe in the project.
- Sound like the guy who writes D for 5 months and came to question Mike, jpf
  and any other involved in the project.

Please, for any constructive feedback, questions or anything that you're
unsure about, direct the message to the most relevant person. The facts mentioned
are not my responsibility (hopefully, for the better), so you probably
want to ask the mentors.

But, my opinion was asked.
First of all, for memcpy et all:
- To reach memcpy, you have to write assembly, not D. In the end the code
  will be bigger than memcpy because we will have the D improvement (what Mike
  has done), plus a memcpy-size implementation (The two implementations that
  you posted are not the version that gets called. You can check (the horror)
  if you step in a debugger. Mike had a link about what seems to be the
  actual implementation, but I can't find it).
- Because of the above, that code will not be D, it will be assembly, which
  brings one to the question "Why not use the already made asm versions for
  the assembly parts (like the libc version) rather than re-write it yourself?".
- To reach memcpy, although I'm getting good benchmarks, is next to impossible
  in half a summer. Yet, this is what is expected.
- Personally, I don't use dynamic arrays. My D is mostly betterC.
  For me, if people use the D features, they would probably never use memcpy.
  And if they don't, then they would probably use a low-level (unsafer?) memcpy
  with pointers. However, this is targeted to the D Runtime, with which I don't
  have any experience. So, I trust the mentors.

- In my opinion, the best way to go about this is to get only the
  memcpy implementation linked (so remove the dependency on libc), create wrappers
  around it, something like Jonathan Marler's code _and_ use D for the small
  sizes, where it shines (as Mike's work has already showed). That way you
  leverage the work that has been put on memcpy, write idiomatic D, remove the
  libc dependency and make a (way) faster memcpy for small sizes.

But, who am I to question things? And I don't claim in any way that because
my opinion is different than what we planned to do, that I believe that I am
correct "but hey, they decide.. right?". No. I just trust that they know better than me.

For malloc:
- The initial plan was that a malloc() would be written. Having tried to write my
  own malloc(), I say that that I was pretty naive to think I would do a
  replacement from scratch in half a summer. Thankfully, something else was
  decided. The decision to use std.experimental.allocator was not mine. I learned
  about it probably less than a month ago.
  I can't support if it's a bad or good decision, because I know very little to
  have any meaningful input. To me, it sounded good though. And again, I trust
  that the mentors know better (and it's not a final decision yet).

I don't want to sound rude. I'm grateful to the D community for giving me the
chance to work on something so challenging. But the project, the goals, the
motivation and the approach are not my responsibility.
I, of course, have opinions about those, but also my opinion is that: For something that I'm not in charge, better try to help than contradict, except if I think there's something _very_ wrong. And I already felt I contradicted a lot.
In the end, if the motivation is too weak, if the approach is wrong and
if the goals are not that desired, then why was the project picked?
And why are those things questioned after 2 months?

Last but not least, while this may not be the best place for "famous last words", I want again to thank the mentors and especially Mike(!). Seb as well. This project, well.. let's just say it didn't have exactly the warmest feedback and their
support is important.
June 04, 2019
On Monday, 3 June 2019 at 22:45:28 UTC, Andrei Alexandrescu wrote:

> At 512 lines including tests, it seems on the involved side. The benchmarks ought to show a hefty improvement to match. Are there benchmark results available?

I did some initial benchmarks at https://github.com/JinShil/memcpyD when I made the first feasibility study to see if this project was worth pursuing.  The initial results were encouraging, which is why we're taking it further in this project.

I'll work with Stefanos to get a more polished implementation that users can download and run for themselves.

> Quoting the rationale from the motivation in another thread:
>
> 1) C’s implementations are not type-safe and memory-safe.
> 2) C’s implementations have accumulated a lot of cruft over the years.
> 3) Cross-compiling is more difficult as now one should have available and configured a C runtime and toolchain apart from the D runtime. This makes it difficult for D to create freestanding software.

> 4) Type-safety and memory safety (bounds-checking etc.)
> 5) Templates to branch to an optimal implementation at compile-time.
> 6) Inlining, as the branching in C happens at runtime.
> 7) Compile-Time Function Execution (CTFE) and introspection (type info).
>
> My view on formulating motivation is simple: do it like a scientist. Argue the facts. If facts are not available, argue fundaments and universal principles. If such are not available, the motivation is too weak.

Yes, the motivation could be improved, but the time for motivating this project was 2 months ago, not now.  Now the project is underway, and we need to see it to completion.  The focus now should be on providing feedback on the implementations not the rationale/motivation.

> (1) checks the "facts" box but has the obvious comeback "then how about a 2-line trusted wrapper over memcpy?" that needs to be explained. Related, obviously people who reach for memcpy() are often not looking for a safe primitive. a[] = b[] is safe, syntactically simple, and could lower to anything including memcpy.

Part of the motivation is so druntime no longer has a hard intrinsic dependency on libc.  If you just wrap the libc function you're not acheiving that goal.

Now, that being said, it is way out of the scope of this project to provide a D implementation of memcpy for all platforms, architectures and mircoarchitectures that D supports.  So, we need to deal with that.

Before I elaborate further, it's important to understand that druntime is currently a monolith that is not architected or structures properly.  druntime is supposed to be the language implementation, not libc bindings, libc++ bindings, windows bindings, linux bindings, low-level code (whatever that means), etc.

The language implementation *will* require certain features of the underlying operating system and hardware. Some of those features may be provided by libc, but that decision should be made on a platform-by-platform basis.  So what we hope to achieve with this project is an idiomatic-D memory copy/compare interface.  That interface may simply forward to libc for those features that don't have an optimized D implementation.  Other platforms may choose to implement a highly optimized implementation in D.  Other platforms may choose to mix the two (e.g. an optimized D implementation for small copies, and forward to libc for large copies).  Others may choose to just implement a simple while-loop because they either don't want to obtain a C toolchain (those cross-compiling to embedded targets) or because there isn't C implementation available (new platforms like WASM).  This project aims to remove druntime's dependency on libc, but the platform port of druntime may still choose to depend on it.

That being said you might be wondering why we are bothering to implement an entire memcpy in D for the x86_64 architecture.
1) because DMD's implementation is suboptimal,
2) to help motivate the entire project
3) to demonstrate D as a first-class systems programming language
4) to set an example and precedent for other plaforms to potentially follow

Please keep in mind we're trying to expand D to more platforms include resource-constrained embedded systems, OS programming, bare-metal applications, and new platforms such as WASM.  We want D to be more easily portable, and that is partically achieved by making a platform abstraction, independent of libc.  libc is a platform implementation detail.

> (2) is quite specious and really needs some evidence. Is cruft in memcpy really an issue? I looked memcpy() implementations a while ago but didn't save bookmarks. Did a google search just now and found https://github.com/gcc-mirror/gcc/blob/master/libgcc/memcpy.c, which is very far from cruft-ridden.

That is not the memcpy that is actually on your machine.  You can find the more elaborate implementations here:  https://sourceware.org/git/?p=glibc.git;a=tree;f=sysdeps/x86_64/multiarch;h=14ec2285c0f82b570bf872c5b9ff0a7f25724dfd;hb=HEAD

Another from intel:  https://github.com/DPDK/dpdk/blob/master/lib/librte_eal/common/include/arch/x86/rte_memcpy.h

> I do remember elaborate implementations of memcpy but so are (somewhat ironically) the 512 lines of the proposed implementation. I found one here:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/lib/memcpy_64.S?id=HEAD
>
> No idea of its level of cruftiness, where it's used etc. The right way to argue (2) is to provide links to implementations that people can look at and decide without doubt, "yep, crufty".

The more elaborate C implementations are typically written in assembly.  They are difficult to follow due to all of the various techniques to handle misalignment and the cleverness typically required to achieve the best performance.

It is my hope that this project will explore how D can improve such implementations by reducing the cleverness to small isolated inline assembly blocks surrounded by D to make it easier to see the flow control.  I think D can do that.

> (3) is... odd. Doesn't every machine ever come with a C implementation including a ready-to-link standard library? If not, isn't that a rarity? Again, that should be argued preemptively by the motivation section.

Yes its a rarity, but nevertheless an artificial dependency for druntime.

druntime does not sufficiently utilize libc to justify the hard dependency.  It just needs a few memory utilities and an allocator.  I think it's worthwhile to see if D can do just as well without libc.  In fact, if I had my druthers, I'd remove libc's malloc altogether today and just add jemalloc to the druntime repository.  Maybe it could even be mechanically translated to D.

> (4) brings again the wrapper argument

For some platforms, it may just be a wrapper.

> (5) is nice if and only if confirmed by benchmarks

We've already demonstrated this with benchmarks, I'll work with Stefanos to get them made available, but https://github.com/JinShil/memcpyD already shows the benefit.

> (6) is also nice under the same conditions as (5)

Yep, see my response to (5)

> (7) again... what's wrong with a wrapper that does if (__ctfe)

I think Stefanos is probably arguing in general about the design-by-introspection features of D which include CTFE and other metaprogramming features which is more-or-less the same as (5).  Those benefits have been demonstrated, and we'll work to make those more apparent in the near future.

That being said, there's nothing ruling out an `if (__ctfe)` block in the implementation if that's what is determined to be best.

> With malloc() we're looking at a completely different ballgame. Implementing malloc() from scratch is a very serious project that needs almost overwhelming motivation. The goal of std.experimental.allocator was to offer a flexible framework for implementing general and specialized allocators, but simply replacing malloc() is more difficult to argue. Also, achieving comparable performance will be difficult.

I agree to all of that, but we're going to try it anyway and see how it does.  If all we achieve in the end is just a wrapper that forwards to libc's malloc and friends, it will still be better than what we have now, because libc will then be simply an implementation detail.

Mike
June 03, 2019
On 6/3/19 9:11 PM, Mike Franklin wrote:
> Yes, the motivation could be improved, but the time for motivating this project was 2 months ago, not now.  Now the project is underway, and we need to see it to completion.  The focus now should be on providing feedback on the implementations not the rationale/motivation.

Mike, you must understand this is a terrible argument. It should never be made again. It is in fact the only part about your response that makes me genuinely worried. Benchmarks look good.

This is not a pregnancy. Any time is good for asking about the motivation, and the difference 2 months make goes in favor of the student and mentor is that the answers to questions about the motivation are only stronger, clearer, and more convincing. I've seen PhD candidates roasted over their motivation (literally their "thesis", which means "proposition") on their DEFENSE day after years of hard work. It is not the fault of the person asking.
June 04, 2019
On Tuesday, 4 June 2019 at 01:32:49 UTC, Andrei Alexandrescu wrote:
> On 6/3/19 9:11 PM, Mike Franklin wrote:
>> Yes, the motivation could be improved, but the time for motivating this project was 2 months ago, not now.
>
> Mike, you must understand this is a terrible argument. It should never be made again. It is in fact the only part about your response that makes me genuinely worried. Benchmarks look good.
>
> This is not a pregnancy. Any time is good for asking about the motivation, and the difference 2 months make goes in favor of the student and mentor is that the answers to questions about the motivation are only stronger, clearer, and more convincing.

I agree that a good time to ask is simply always. And that we should
always argue if something doesn't seem right. But it's true
that this was to be decided months ago.

Moreover, for me it's important to consider the other side, and I think that's what Mike meant.
Imagine that you're a GSoC student and you open the forum, 4 a.m.: "Aah... Here's a 1-page post about what you might be doing wrong with your project, already 1+ month working on it. Have a good night.." :p
This is a personal opinion, but to me: Of course contradict the bad, but also help / motivate the good.
These people (meaning any mentors and any GSoC student) chances are they're doing
_something_ beneficial. For example..

> Benchmarks look good.

They're better now. ;)
June 04, 2019
On Tuesday, 4 June 2019 at 01:32:49 UTC, Andrei Alexandrescu wrote:
> On 6/3/19 9:11 PM, Mike Franklin wrote:
>> Yes, the motivation could be improved, but the time for motivating this project was 2 months ago, not now.  Now the project is underway, and we need to see it to completion.  The focus now should be on providing feedback on the implementations not the rationale/motivation.
>
> Mike, you must understand this is a terrible argument. It should never be made again. It is in fact the only part about your response that makes me genuinely worried. Benchmarks look good.

The point I'm trying to make is that we are in the coding stage of this project.  Right now, students should be focused on getting the assignment done, not justifying the project to everyone.  The project already went through a vetting process and was approved.

> This is not a pregnancy.

Thank you.  I'm glad you were able to fulfill your sarcasm quota.

> Any time is good for asking about the motivation

Asking for more information about the motivation is one thing.  Publicly doubting the motivation of a project that the D Language Foundation approved (a process you even participated in) is another.


June 04, 2019
TL;DR
Should we attn to WASM where there are no system things (mmap, allocators), where memory is an array of ints?

June 04, 2019
On Tuesday, 4 June 2019 at 08:31:54 UTC, KnightMare wrote:
> TL;DR
> Should we attn to WASM where there are no system things (mmap, allocators), where memory is an array of ints?

LDC can compile code to WASM already