Jump to page: 1 2 3
Thread overview
Best interface for memcpy() (and the string.h family of functions)
May 29, 2019
Stefanos Baziotis
May 29, 2019
Jonathan Marler
May 29, 2019
Stefanos Baziotis
May 29, 2019
Jonathan Marler
May 29, 2019
Stefanos Baziotis
May 29, 2019
Jonathan Marler
May 29, 2019
Stefanos Baziotis
May 29, 2019
Jonathan Marler
May 29, 2019
Stefanos Baziotis
May 29, 2019
Jonathan Marler
May 29, 2019
Stefanos Baziotis
May 29, 2019
Jonathan Marler
May 30, 2019
Stefanos Baziotis
May 30, 2019
kinke
May 29, 2019
welkam
May 29, 2019
kinke
May 30, 2019
Stefanos Baziotis
May 30, 2019
kinke
May 30, 2019
Mike Franklin
May 30, 2019
Stefanos Baziotis
May 30, 2019
Mike Franklin
May 30, 2019
Mike Franklin
May 30, 2019
Stefanos Baziotis
May 30, 2019
Kagamin
May 29, 2019
I'm a GSoC student (I'll post this week an update) in
the project "Independency of D from the C Standard Library".
Part of this project is a D implementation of the family of functions
memcpy(), memset() etc.

What do you think is the best interface for say memcpy()?

My initial pick was void memcpyD(T)(T* dst, const T* src), but it was proposed
that `ref` instead of pointers might be better.

Thanks,
Stefanos
May 29, 2019
On Wednesday, 29 May 2019 at 11:46:28 UTC, Stefanos Baziotis wrote:
> I'm a GSoC student (I'll post this week an update) in
> the project "Independency of D from the C Standard Library".
> Part of this project is a D implementation of the family of functions
> memcpy(), memset() etc.
>
> What do you think is the best interface for say memcpy()?
>
> My initial pick was void memcpyD(T)(T* dst, const T* src), but it was proposed
> that `ref` instead of pointers might be better.
>
> Thanks,
> Stefanos

The default memcpy signature is still pretty useful in many cases.  The original signature should still be implemented and available as a non-template function:

void memcpy(void* dst, void* src, size_t length);

For D, you should also create a template so developer's don't have to cast to `void*` all the time, but it just forwards all calls to the real memcpy function like this:

void memcpy(T,U)(T* dst, U* src, size_t length)
{
    pragma(inline, true);
    memcpy(cast(void*)dst, cast(void*)src, length);
}

And there's no need to have a different name like `memcpyD`. The function behaves the same as libc's memcpy, and when you have libc available, you should use that implementation instead so you can leverages other people's work when you can.

However, we also want to get type-safety and bounds-checking when when can.  So we should also provide a set of templates that accept D arrays, verifies type-safety and bounds checking, then forwards the call to memcpy.

/**
acopy - Array Copy
*/
void acopy(T,U)(T dst, U src) @trusted
if (isArrayLike!T && isArrayLike!U && dst[0].sizeof == src[0].sizeof)
in { assert(dst.length >= src.length, "copyFrom source length larger than destination"); } do
{
    pragma(inline, true);
    static assert (!__traits(isStaticArray, T), "acopy doest not accept static arrays since they are passed by value");

    import whereever_memcpy_is: memcpy;
    memcpy(dst.ptr, src.ptr, src.length * ElementSizeForCopy!dst);
}
/// ditto
void acopy(T,U)(T dst, U src) @system
if (isArrayLike!T && isPointerLike!U && dst[0].sizeof == src[0].sizeof)
{
    pragma(inline, true);
    static assert (!__traits(isStaticArray, T), "acopy doest not accept static arrays since they are passed by value");

    import whereever_memcpy_is: memcpy;
    memcpy(dst.ptr, src, dst.length * ElementSizeForCopy!dst);
}
/// ditto
void acopy(T,U)(T dst, U src) @system
if (isPointerLike!T && isArrayLike!U && dst[0].sizeof == src[0].sizeof)
{
    pragma(inline, true);

    import whereever_memcpy_is: memcpy;
    memcpy(dst, src.ptr, src.length * ElementSizeForCopy!dst);
}
/// ditto
void acopy(T,U)(T dst, U src, size_t size) @system
if (isPointerLike!T && isPointerLike!U && dst[0].sizeof == src[0].sizeof)
{
    pragma(inline, true);
    import whereever_memcpy_is: memcpy;
    memcpy(dst, src, size * ElementSizeForCopy!dst);
}


Note that the isArrayLike and isPointerLike and ElementSizeForCopy would probably look something like:


template isArrayLike(T)
{
    enum isArrayLike =
           is(typeof(T.init.length))
        && is(typeof(T.init.ptr))
        && is(typeof(T.init[0]));
}
template isPointerLike(T)
{
    enum isPointerLike =
           T.sizeof == (void*).sizeof
        && is(typeof(T.init[0]));
}

// The size of each array element.  If the actual size is 0, then it
// is assumed to be 1.
template ElementSizeForCopy(alias Array)
{
    static if (Array[0].sizeof == 0)
        enum ElementSizeForCopy = 1;
    else
        enum ElementSizeForCopy = Array[0].sizeof;
}

Note that everything here is an inline-template, so everything gets reduced to a single memcpy call and some bounds checks.


May 29, 2019
On Wednesday, 29 May 2019 at 15:41:42 UTC, Jonathan Marler wrote:
>
> The default memcpy signature is still pretty useful in many cases.  The original signature should still be implemented and available as a non-template function:
>
> void memcpy(void* dst, void* src, size_t length);
>
> For D, you should also create a template so developer's don't have to cast to `void*` all the time, but it just forwards all calls to the real memcpy function like this:
>
> void memcpy(T,U)(T* dst, U* src, size_t length)
> {
>     pragma(inline, true);
>     memcpy(cast(void*)dst, cast(void*)src, length);
> }
>
> And there's no need to have a different name like `memcpyD`. The function behaves the same as libc's memcpy, and when you have libc available, you should use that implementation instead so you can leverages other people's work when you can.
>

I'm not sure about that. Does it really make sense to have such an
interface in the case where you don't have libc memcpy available?
Although, there is a discussion about such fallback functions. But I don't
know, I feel like it will encourage bad practices.

In the same way, I don't know about whether it should accept two different types.

> However, we also want to get type-safety and bounds-checking when when can.  So we should also provide a set of templates that accept D arrays, verifies type-safety and bounds checking, then forwards the call to memcpy.
>

Those are good ideas. But I think all this could be done explicitly with
(ref T[] dst, ref T[] source). This makes a specific-to-arrays version,
which again I'm unsure if it is good to make specific cases.

Generally, all those things are up for discussion, I don't pretend
to have some definitive answer.

The thing with all this code depending on libc memcpy is that to my understanding,
the prospect is that libc will be removed. And this project is a step towards that
by making some better D versions (meaning, leveraging D features).
If the better version calls libc, then when libc
is finally removed, all this code will break. And because we encouraged
this bad practice, _a lot_ of code will break.
Which will then force people to write their D-version of memcpy(void *dst, const void *src, size_t len);
Which of course is bad because suddenly, we lost all the D benefits + we lost
all the work that has been put on libc.

Best regards,
Stefanos


May 29, 2019
On Wednesday, 29 May 2019 at 17:35:03 UTC, Stefanos Baziotis wrote:
> On Wednesday, 29 May 2019 at 15:41:42 UTC, Jonathan Marler wrote:
>>
>> The default memcpy signature is still pretty useful in many cases.  The original signature should still be implemented and available as a non-template function:
>>
>> void memcpy(void* dst, void* src, size_t length);
>>
>> For D, you should also create a template so developer's don't have to cast to `void*` all the time, but it just forwards all calls to the real memcpy function like this:
>>
>> void memcpy(T,U)(T* dst, U* src, size_t length)
>> {
>>     pragma(inline, true);
>>     memcpy(cast(void*)dst, cast(void*)src, length);
>> }
>>
>> And there's no need to have a different name like `memcpyD`. The function behaves the same as libc's memcpy, and when you have libc available, you should use that implementation instead so you can leverages other people's work when you can.
>>
>
> I'm not sure about that. Does it really make sense to have such an
> interface in the case where you don't have libc memcpy available?

Sure.  Any time you have a buffer whose type isn't known at compile-time and you need to copy between them.  For example, I have an audio program that copies buffers of audio, but the format of that buffer could be an array of floats or integers depending on the format that your audio hardware and OS support.

> Although, there is a discussion about such fallback functions. But I don't
> know, I feel like it will encourage bad practices.
>
> In the same way, I don't know about whether it should accept two different types.

Well that's why you have memcpy (for those who know what they're doing) and you have other functions for safe behavior.  But you don't want to instantiate a new version of memcpy for every type variation, that's why they all just forward the call to the real memcpy.

>
>> However, we also want to get type-safety and bounds-checking when when can.  So we should also provide a set of templates that accept D arrays, verifies type-safety and bounds checking, then forwards the call to memcpy.
>>
>
> Those are good ideas. But I think all this could be done explicitly with
> (ref T[] dst, ref T[] source). This makes a specific-to-arrays version,
> which again I'm unsure if it is good to make specific cases.
>

Yes it could be done, but then you end up with N copies of your memcpy implementation, one for every combination of types.  You're code size is going to explode.  You can certainly support the signature you provided, I just wouldn't have the implementation inside of that template, instead you should cast and forward to memcpy.

> The thing with all this code depending on libc memcpy is that to my understanding,
> the prospect is that libc will be removed. And this project is a step towards that
> by making some better D versions (meaning, leveraging D features).

Right, which is why you use the libc version by default, and only use your own when libc is disabled.  This is what I do in my standard library https://github.com/marler8997/mar which works with or without libc.  I went through several designs for how to go about this memcpy solution and what I've provided you is the result of that.

> If the better version calls libc, then when libc
> is finally removed, all this code will break. And because we encouraged
> this bad practice, _a lot_ of code will break.

How would it break?  If you remove libc, your module should now enable your implementation of memcpy.  And all the code that calls memcpy doesn't care whether it came from libc or from a D module.


May 29, 2019
On Wednesday, 29 May 2019 at 17:45:59 UTC, Jonathan Marler wrote:
>> I'm not sure about that. Does it really make sense to have such an
>> interface in the case where you don't have libc memcpy available?
>
> Sure.  Any time you have a buffer whose type isn't known at compile-time and you need to copy between them.  For example, I have an audio program that copies buffers of audio, but the format of that buffer could be an array of floats or integers depending on the format that your audio hardware and OS support.
>

So, you copy ubyte*.

>
> Well that's why you have memcpy (for those who know what they're doing) and you have other functions for safe behavior.  But you don't want to instantiate a new version of memcpy for every type variation, that's why they all just forward the call to the real memcpy.
>

You want, because instantiation and inlining of specific types is
what makes D memcpy fast. And also, what I hope will make better error
messages and instrumentation. But that's yet to be seen, most important
is the performance.

>
> Yes it could be done, but then you end up with N copies of your memcpy implementation, one for every combination of types.  You're code size is going to explode.  You can certainly support the signature you provided, I just wouldn't have the implementation inside of that template, instead you should cast and forward to memcpy.
>

Actually, code size for arrays is a very good reminder, thanks.

>> The thing with all this code depending on libc memcpy is that to my understanding,
>> the prospect is that libc will be removed. And this project is a step towards that
>> by making some better D versions (meaning, leveraging D features).
>
> Right, which is why you use the libc version by default, and only use your own when libc is disabled.  This is what I do in my standard library https://github.com/marler8997/mar which works with or without libc.  I went through several designs for how to go about this memcpy solution and what I've provided you is the result of that.
>
>> If the better version calls libc, then when libc
>> is finally removed, all this code will break. And because we encouraged
>> this bad practice, _a lot_ of code will break.
>
> How would it break?  If you remove libc, your module should now enable your implementation of memcpy.  And all the code that calls memcpy doesn't care whether it came from libc or from a D module.

My point is that you will write code differently depending on what memcpy
you have, that's why this "new memcpy" will have different signature. To have
the best of both worlds, we would have to write our own
memcpy(void*, void*, size_t);.
And so, if you encourage the use of this interface (because hey, even if you don't
have libc eventually, your code will not crash), when libc is not present,
the code will be slow.
May 29, 2019
On Wednesday, 29 May 2019 at 17:55:49 UTC, Stefanos Baziotis wrote:
> On Wednesday, 29 May 2019 at 17:45:59 UTC, Jonathan Marler wrote:
>>> I'm not sure about that. Does it really make sense to have such an
>>> interface in the case where you don't have libc memcpy available?
>>
>> Sure.  Any time you have a buffer whose type isn't known at compile-time and you need to copy between them.  For example, I have an audio program that copies buffers of audio, but the format of that buffer could be an array of floats or integers depending on the format that your audio hardware and OS support.
>>
>
> So, you copy ubyte*.

It doesn't make a difference whether the final memcpy is `void*` or `byte*`.  The point is that it's one function, not a template, and you might as well use the same type that the real memcpy uses so you don't change the signature when you're not using libc.

>
>>
>> Well that's why you have memcpy (for those who know what they're doing) and you have other functions for safe behavior.
>>  But you don't want to instantiate a new version of memcpy for every type variation, that's why they all just forward the call to the real memcpy.
>>
>
> You want, because instantiation and inlining of specific types is
> what makes D memcpy fast. And also, what I hope will make better error
> messages and instrumentation. But that's yet to be seen, most important
> is the performance.

You don't want to inline the memcpy implementation.  What makes you think that would be faster?

May 29, 2019
On Wednesday, 29 May 2019 at 18:00:57 UTC, Jonathan Marler wrote:
>
> It doesn't make a difference whether the final memcpy is `void*` or `byte*`.

Yes.

> The point is that it's one function, not a template, and you might as well use the same type that the real memcpy uses so you don't change the signature when you're not using libc.
>

This is what will prevent doing anything really useful in D.
This is what I meant that to have that, you have to implement
the D version of libc memcpy.

>
> You don't want to inline the memcpy implementation.  What makes you think that would be faster?

CTFE / introspection I hope and currently, benchmarks.
May 29, 2019
On Wednesday, 29 May 2019 at 18:04:07 UTC, Stefanos Baziotis wrote:
>> You don't want to inline the memcpy implementation.  What makes you think that would be faster?
>
> CTFE / introspection I hope and currently, benchmarks.

You didn't answer the question.  How would inlining the implementation of memcpy be faster? The implementation of memcpy doesn't need to know which types it is copying, so every call to it can have the exact same implementation.  You only need one instance of the implementation.  This means you can fine-tune it, many libc implementations will implement it in assembly because it's used so often and again, it doesn't need to know what types it is copying.  All it needs is 2 pointers a size.  That's why in D, you should only create wrappers that ensure type-safety and bounds checking and then forward to the real implementation, and those wrappers should be inlined but not the memcpy implementation itself.

If you want to provide you own implementation of memcpy you can, but inlining your implementation into every call, when the implementation is truly type agnostic just results in code bloat with no benefit.

May 29, 2019
On Wednesday, 29 May 2019 at 18:14:11 UTC, Jonathan Marler wrote:
>
> You didn't answer the question.
>

I don't know how "benchmarks" does not answer a question. For me, it's
the most important answer.

> How would inlining the implementation of memcpy be faster? The implementation of memcpy doesn't need to know which types it is copying, so every call to it can have the exact same implementation.  You only need one instance of the implementation.  This means you can fine-tune it, many libc implementations will implement it in assembly because it's used so often and again, it doesn't need to know what types it is copying.  All it needs is 2 pointers a size.  That's why in D, you should only create wrappers that ensure type-safety and bounds checking and then forward to the real implementation, and those wrappers should be inlined but not the memcpy implementation itself.
>
> If you want to provide you own implementation of memcpy you can, but inlining your implementation into every call, when the implementation is truly type agnostic just results in code bloat with no benefit.

It is typed currently, with benefits. It's not the same for every type and our
idea is not to just forward the size. By inlining, you can get quite better
performance exactly because you inline and you don't just forward the size and
because you know info about the type.
Check this: https://github.com/JinShil/memcpyD/blob/master/memcpyd.d
And preferably, run it and see the asm generated.
Also, what should be considered is that types give you the info about alignment
and different implementations depending on this alignment.
May 29, 2019
On Wednesday, 29 May 2019 at 19:06:43 UTC, Stefanos Baziotis wrote:
> On Wednesday, 29 May 2019 at 18:14:11 UTC, Jonathan Marler wrote:
>>
>> You didn't answer the question.
>>
>
> I don't know how "benchmarks" does not answer a question. For me, it's
> the most important answer.

Yes that would be an answer, I guess I got confused when you mentioned CTFE and introspection, I wasn't sure if "benchmarks" was referring to those features or to runtime benchmarks.  And looks like @Mike posted the benchmarks on that github link you sent.


>
>> How would inlining the implementation of memcpy be faster? The implementation of memcpy doesn't need to know which types it is copying, so every call to it can have the exact same implementation.  You only need one instance of the implementation.  This means you can fine-tune it, many libc implementations will implement it in assembly because it's used so often and again, it doesn't need to know what types it is copying.  All it needs is 2 pointers a size.  That's why in D, you should only create wrappers that ensure type-safety and bounds checking and then forward to the real implementation, and those wrappers should be inlined but not the memcpy implementation itself.
>>
>> If you want to provide you own implementation of memcpy you can, but inlining your implementation into every call, when the implementation is truly type agnostic just results in code bloat with no benefit.
>
> It is typed currently, with benefits. It's not the same for every type and our
> idea is not to just forward the size. By inlining, you can get quite better
> performance exactly because you inline and you don't just forward the size and
> because you know info about the type.
> Check this: https://github.com/JinShil/memcpyD/blob/master/memcpyd.d
> And preferably, run it and see the asm generated.
> Also, what should be considered is that types give you the info about alignment
> and different implementations depending on this alignment.

It's true that if you can assume pointers are aligned on a particular boundary that you can be faster than memcpy which works with any alignment.  This must be what Mike is doing, though, I would then create only a few instances of memcpy that assume alignment on boundaries like 4, 8, 16.  And if you have a pointer or an array to a particular type, you can probably assume that pointer/array is aligned on that types's "alignof" property.

I think I will use this in my library.

« First   ‹ Prev
1 2 3