February 01, 2017
On Wednesday, 1 February 2017 at 10:20:45 UTC, Patrick Schluter wrote:
> On Wednesday, 1 February 2017 at 10:05:49 UTC, Richard Delorme wrote:
>>
>> //-----8<-------------------------------------------------------
>> #include <string.h>
>> #include <stdio.h>
>>
>> void* mymemcpy(void* restrict dest, const void* restrict src, size_t n) {
>> 	const char *s = src;
>> 	char *d = dest;
>> 	for (size_t i = 0; i < n; ++i) d[i] = s[i];
>> 	return d;
>> }
>>
>> void *copy(const void *c, size_t n) {
>> 	char d[16];
>> 	return mymemcpy(d, c, n);
>> }	
>>
>> int main(void) {
>> 	char a[16] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
>> 	char *b = copy(a, 8);
>>
>> 	for (int i = 0; i < 16; ++i) printf("%d ", b[i]);
>> 	putchar('\n');
>> }
>> //-----8<-------------------------------------------------------
>> $ gcc mymemcpy.c -O2 -W
>> mymemcpy.c: In function 'copy':
>> mymemcpy.c:13:9: warning: function returns address of local variable [-Wreturn-local-addr]
>>   return mymemcpy(d, c, n);
>>          ^~~~~~~~~~~~~~~~~
>> memcpy4.c:12:7: note: declared here
>>   char d[16];
>>
>> clang (version 3.8.1) failed to find error in this code.
>
> You have to define the mymemcpy() in another source file and only put the prototype in this module. If the compiler sees the code it can do the complete data flow analyses. With only the declaration it can't and that is Walter's point. The annotations allow to give to the declaration the information the compiler can not deduce itself from the code, because the code is in another module (object file, library).

Right, if defined in another file, the compiler will not emit any warning. However other tools can detect this kind of error. For instance, valgrind works great in this example, directly on the executable:

$ valgrind --track-origins=yes mymemcpy
[...]
==31041== Conditional jump or move depends on uninitialised value(s)
==31041==    at 0x4E843C7: vfprintf (in /usr/lib64/libc-2.23.so)
==31041==    by 0x4E8B9A8: printf (in /usr/lib64/libc-2.23.so)
==31041==    by 0x400682: main (main.c:15)
==31041==  Uninitialised value was created by a stack allocation
==31041==    at 0x400652: main (main.c:13)
[...]

Thus, I still have a mitigated feeling on attributes. In my humble opinion, it is wrong to put on the programmer the responsibility to make his program safer by stacking attributes on function declarations. I prefer to ask the compiler to detect as much defects as possible (but not more!), and to rely on external tools like valgrind, gdb, etc. to detect more subtle bugs.

February 01, 2017
On 2/1/2017 2:05 AM, Richard Delorme wrote:
> On Tuesday, 31 January 2017 at 23:30:04 UTC, Walter Bright wrote:
>> On 1/31/2017 3:00 PM, Richard Delorme wrote:
>> The thing about memcpy is compilers build in a LOT of information about it
>> that simply is not there in the declaration. I suggest retrying your example
>> for gcc/clang, but use your own memcpy, i.e.:
>>
>>    void* mymemcpy(void * restrict s1, const void * restrict s2, size_t n);
>>
>> Let us know what the results are!
>
> //-----8<-------------------------------------------------------
> #include <string.h>
> #include <stdio.h>
>
> void* mymemcpy(void* restrict dest, const void* restrict src, size_t n) {
>     const char *s = src;
>     char *d = dest;
>     for (size_t i = 0; i < n; ++i) d[i] = s[i];
>     return d;
> }
>
> void *copy(const void *c, size_t n) {
>     char d[16];
>     return mymemcpy(d, c, n);
> }
>
> int main(void) {
>     char a[16] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
>     char *b = copy(a, 8);
>
>     for (int i = 0; i < 16; ++i) printf("%d ", b[i]);
>     putchar('\n');
> }
> //-----8<-------------------------------------------------------
> $ gcc mymemcpy.c -O2 -W
> mymemcpy.c: In function 'copy':
> mymemcpy.c:13:9: warning: function returns address of local variable
> [-Wreturn-local-addr]
>   return mymemcpy(d, c, n);
>          ^~~~~~~~~~~~~~~~~
> memcpy4.c:12:7: note: declared here
>   char d[16];

Note that you included the source code for mymemcpy(). gcc is apparently able to examine the source code to determine that 'd' is returned. Please try it just using the declaration.

February 01, 2017
On 2/1/2017 8:12 AM, Chris Wright wrote:
> OTOH I haven't seen anyone distribute a D library with .di files

druntime uses a lot of header-only imports, including for D code. (The gc, for example, is not presented as source code to the compiler.)

February 01, 2017
On 2/1/2017 12:38 PM, Richard Delorme wrote:
> Right, if defined in another file, the compiler will not emit any warning.
> However other tools can detect this kind of error. For instance, valgrind works
> great in this example, directly on the executable:
>
> $ valgrind --track-origins=yes mymemcpy
> [...]
> ==31041== Conditional jump or move depends on uninitialised value(s)
> ==31041==    at 0x4E843C7: vfprintf (in /usr/lib64/libc-2.23.so)
> ==31041==    by 0x4E8B9A8: printf (in /usr/lib64/libc-2.23.so)
> ==31041==    by 0x400682: main (main.c:15)
> ==31041==  Uninitialised value was created by a stack allocation
> ==31041==    at 0x400652: main (main.c:13)
> [...]
>
> Thus, I still have a mitigated feeling on attributes. In my humble opinion, it
> is wrong to put on the programmer the responsibility to make his program safer
> by stacking attributes on function declarations. I prefer to ask the compiler to
> detect as much defects as possible (but not more!), and to rely on external
> tools like valgrind, gdb, etc. to detect more subtle bugs.

You're right that valgrind can detect these sorts of things, and valgrind is such an amazing tool I suspect that it has almost single handedly saved C from oblivion.

That said, there are limitations:

1. Valgrind does not detect errors if it isn't run, and not many people run it regularly.

2. Valgrind does not detect errors unless the error actually happens in the running code. This means you'll need a test suite with 100% coverage for valgrind to find all the errors.

3. The prevalence of memory safety errors in shipped code shows that valgrind is not being used enough nor is effective enough.

4. Valgrind slows down the execution of code by an order of magnitude or two. This makes it impractical for many applications, and impossible to instrument code being run by the user. (It isn't run by the D autotester, for example.)

5. Bugs caught at compile time are far, far cheaper to fix than those caught by the test suite. This is well documented.

6. Valgrind isn't available on all platforms, like Windows, embedded systems, phones (?), etc.

7. There is value in having a guarantee that code doesn't suffer from certain kinds of bugs. Valgrind cannot offer such a guarantee.

It's much like the dynamic typing vs static typing debate. Do you prefer finding problems at run time or compile time?

And lastly, D does not require you to use these annotations.
February 01, 2017
On 2/1/2017 6:39 AM, Cody Laeder wrote:
> The _traditional_ C-like memcpy [3] in the stdlib. It is unsafe, and carries no
> side effects for the src buffer. It enforces type safety, but it cannot enforce
> memory safety as you can blow past the allocation side on your dst buffer (hence
> why it is unsafe).

It also does not guarantee the function does not save a copy of those pointers and dereference them later.

Programmers "know" this to be true for memcpy, but the compiler cannot know this from the Rust (or C) declaration. The D version does present this guarantee by annotating it with 'pure'.

This matters because such a saved pointer can become a dangling reference - a memory corruption bug waiting to happen.

[Note: in Rust, functions marked 'unsafe' may store copies of their arguments in globals. 'safe' functions may not access mutable global storage.]

February 01, 2017
On 2/1/2017 9:28 AM, Michael Howell wrote:
> This function signature *does* guarantee that src and self don't overlap, unlike
> the C and D versions. Personally, I think that's at least as important as
> whether the function's pure or not.

The overlap is handled in C with the 'restrict' annotation. D does not have an equivalent.

February 01, 2017
On 2/1/2017 9:28 AM, Michael Howell wrote:
> unsafe fn copy_nonoverlapping_ref<T>(src: &T, dest: &mut T, len: usize) {
>   std::ptr::copy_nonoverlapping(src, dest, len)
> }
>
> Again, it doesn't guarantee no side effects, it may guarantee that src isn't
> mutated, it does guarantee that they aren't stored away somewhere, and it
> guarantees that src and dest don't overlap.

What part of the signature guarantees non-overlap?

> It's still unsafe, because it
> doesn't do anything about len being possibly out of bounds, and I left out the
> Copy bound for the sake of flexibility.

Being marked 'unsafe' also includes the ability of the function to save the pointers in global variables.

February 01, 2017
On 2/1/2017 9:56 AM, Michael Howell wrote:
> Oops, forgot the "restrict" keyword. It is there in the C and D versions.

D doesn't have the 'restrict' annotation.
February 01, 2017
On 2/1/2017 9:22 AM, Tobias Müller wrote:
> You wouldn't use memcpy but just assign the slices.

I clearly made a mistake in this example. I wanted to show how a compiler learns things from the declaration by using a very familiar declaration. But it keeps getting diverted into what people (and some compilers) "know" about memcpy that is not in the declaration.
February 01, 2017
On Wednesday, 1 February 2017 at 21:31:33 UTC, Walter Bright wrote:
> What part of the signature guarantees non-overlap?

At the rate D is going, pretty soon the entire function body will be retold in the signature. What's the point when it is obvious that in practice, we can actually analyze the content and get BETTER coverage anyway?

(actually in the real world, it won't since nobody will care enough to write `pure public return const(T) hi(return scope T t) nothrow @nogc @safe @noincompetence @a_million_other_things { return t; }`)