View mode: basic / threaded / horizontal-split · Log in · Help
December 29, 2011
Re: System programming in D (Was: The God Language)
On Thu, 29 Dec 2011 17:20:22 +0200, Vladimir Panteleev  
<vladimir@thecybershadow.net> wrote:

> On Thursday, 29 December 2011 at 14:44:45 UTC, Don wrote:
>> I don't think the situation is any different with DMC. I think that if  
>> D isn't a systems programming lanugage, neither is C or C++ without  
>> vendor-specific extensions.
>
> You're right... I've never extensively used a C/C++ compiler without  
> similar extensions, though. The fact that major vendors come up with  
> their own extensions to do many of the same features shows that they  
> might have better been standardized.

Well i remember at most one or two supported me when i brought it up and  
Walter dismissed instantly.
December 29, 2011
Re: System programming in D (Was: The God Language)
On Thu, 29 Dec 2011 13:44:12 +0200, Alex Rønne Petersen  
<xtzgzorex@gmail.com> wrote:

> +1. D needs a way to force inlining. The compiler can, at best, do  
> heuristics. If D wants to cater to systems programmers -- that is,  
> programmers who *know their shit* -- it needs advanced features like  
> this. Same reason we have __gshared, for example.
>
> - Alex

The legitimate "D performs so bad in my example" posts appeared in this  
forum
almost always ended up with the conclusion that D's lack a controlled  
inline mechanism.
December 29, 2011
Re: System programming in D (Was: The God Language)
On Thursday, 29 December 2011 at 14:44:45 UTC, Don wrote:
> I don't think the situation is any different with DMC. I think 
> that if D isn't a systems programming lanugage, neither is C or 
> C++ without vendor-specific extensions.

C macros are a crude form of inlining. String mixins do not scale 
well in the same way as C macros (e.g. in the way they're used in 
said memcpy implementation).
December 29, 2011
Re: System programming in D (Was: The God Language)
On 29.12.2011 16:07, Vladimir Panteleev wrote:
> On Thursday, 29 December 2011 at 14:44:45 UTC, Don wrote:
>>> http://www.danielvik.com/2010/02/fast-memcpy-in-c.html . It doesn't even
>>> use inline assembler or compiler intrinsics.
>>
>> Note that the memcpy described there is _far_ from optimal. Memcpy is
>> all about cache effciency. DMD translates memcpy to the single
>> instruction "rep movsd" which you'd think would be optimal, but you
>> can actually beat it by a factor of four or more for long lengths.
>
> I've never seen DMD emit rep movsd. Does rep movsd even make sense when
> the memory areas do not have the same alignment? memcpy in snn.lib has a
> rep movsd instruction, but there's lots of other code (including what
> looks like Duff's device).

It's in the backend in cod2.c, line 3260. But on closer inspection -- 
you're right! It's in an
if(0 && ...) block.
So it never does it, even when everything's aligned.

There's a _huge_ potential for improvement in that function.
December 29, 2011
Re: The God Language
On 12/29/2011 2:15 AM, Caligo wrote:
> If there is a God (I'm not saying there
> isn't, and I'm not saying there is), what language would he choose to create the
> universe?

Mathematics.
December 29, 2011
Re: System programming in D (Was: The God Language)
Specially because some 64 bit compilers are providing intrinsics as the only way to access the processor.

Visual C++ for example, does not provide inline assembly support.

David Nadlinger Wrote:

> On 12/29/11 2:13 PM, a wrote:
> > void test(ref V a, ref V b)
> > {
> >      asm
> >      {
> >          movaps XMM0, a;
> >          addps  XMM0, b;
> >          movaps a, XMM0;
> >      }
> >      asm
> >      {
> >          movaps XMM0, a;
> >          addps  XMM0, b;
> >          movaps a, XMM0;
> >      }
> > }
> >
> > […]
> >
> > The needles loads and stores would make it impossible to write an efficient simd add function even if the functions containing asm blocks could be inlined.
> 
> Yes, this is indeed a problem, and as far as I'm aware, usually solved 
> in the gamedev world by using the (SSE) intrinsics your favorite C++ 
> compiler provides, instead of resorting to inline asm.
> 
> David
December 29, 2011
Re: System programming in D (Was: The God Language)
Vladimir Panteleev:

> The fact that major vendors 
> come up with their own extensions to do many of the same features 
> shows that they might have better been standardized.

Right. (This is why once I have asked for a explicitly not implemented computed gotos, to have them in D standard despite DMD doesn't implement them (LDC/GDC are probably able implement them quickly)).

On the other hand D2 already makes standard several of the non-standard features of GNU C.

Bye,
bearophile
December 29, 2011
Re: The God Language
On 29.12.2011 11:15, Caligo wrote:
>
>
> On Thu, Dec 29, 2011 at 3:16 AM, Walter Bright
> <newshound2@digitalmars.com <mailto:newshound2@digitalmars.com>> wrote:
>
>     http://pastebin.com/AtuzJqh0
>
>
> This is somewhat of a serious question:  If there is a God (I'm not
> saying there isn't, and I'm not saying there is), what language would he
> choose to create the universe?  It would be hard for us mortals to
> imagine, but would it resemble a functional programming language more or
> something else?  And what type of hardware would the code run on?  I
> mean, there are computations happening all around us, e.g., when an
> apple falls or planets circle the sun, etc, so what's performing all the
> computation?

Declarative.
Program begins with void.
Let there be <thing>.
December 29, 2011
Re: System programming in D (Was: The God Language)
On 12/29/2011 3:19 AM, Vladimir Panteleev wrote:
> I'd like to invite you to translate Daniel Vik's C memcpy implementation to D:
> http://www.danielvik.com/2010/02/fast-memcpy-in-c.html

Challenge accepted.
------------------------
/********************************************************************
 ** File:     memcpy.c
 **
 ** Copyright (C) 1999-2010 Daniel Vik
 **
 ** This software is provided 'as-is', without any express or implied
 ** warranty. In no event will the authors be held liable for any
 ** damages arising from the use of this software.
 ** Permission is granted to anyone to use this software for any
 ** purpose, including commercial applications, and to alter it and
 ** redistribute it freely, subject to the following restrictions:
 **
 ** 1. The origin of this software must not be misrepresented; you
 **    must not claim that you wrote the original software. If you
 **    use this software in a product, an acknowledgment in the
 **    use this software in a product, an acknowledgment in the
 **    product documentation would be appreciated but is not
 **    required.
 **
 ** 2. Altered source versions must be plainly marked as such, and
 **    must not be misrepresented as being the original software.
 **
 ** 3. This notice may not be removed or altered from any source
 **    distribution.
 **
 **
 ** Description: Implementation of the standard library function memcpy.
 **             This implementation of memcpy() is ANSI-C89 compatible.
 **
 **             The following configuration options can be set:
 **
 **           LITTLE_ENDIAN   - Uses processor with little endian
 **                             addressing. Default is big endian.
 **
 **           PRE_INC_PTRS    - Use pre increment of pointers.
 **                             Default is post increment of
 **                             pointers.
 **
 **           INDEXED_COPY    - Copying data using array indexing.
 **                             Using this option, disables the
 **                             PRE_INC_PTRS option.
 **
 **           MEMCPY_64BIT    - Compiles memcpy for 64 bit
 **                             architectures
 **
 **
 ** Best Settings:
 **
 ** Intel x86:  LITTLE_ENDIAN and INDEXED_COPY
 **
 *******************************************************************/

module memcpy;


/********************************************************************
 ** Configuration definitions.
 *******************************************************************/

version = LITTLE_ENDIAN;
version = INDEXED_COPY;


/********************************************************************
 ** Includes for size_t definition
 *******************************************************************/



/********************************************************************
 ** Typedefs
 *******************************************************************/

alias ubyte       UInt8;
alias ushort      UInt16;
alias uint        UInt32;
alias ulong       UInt64;

version (D_LP64)
{
    alias UInt64   UIntN;
    enum TYPE_WIDTH = 8;
}
else
{
    alias UInt32 UIntN;
    enum TYPE_WIDTH = 4;
}


/********************************************************************
 ** Remove definitions when INDEXED_COPY is defined.
 *******************************************************************/

//#if defined (INDEXED_COPY)
//#if defined (PRE_INC_PTRS)
//#undef PRE_INC_PTRS
//#endif /*PRE_INC_PTRS*/
//#endif /*INDEXED_COPY*/



/********************************************************************
 ** Definitions for pre and post increment of pointers.
 *******************************************************************/

version (PRE_INC_PTRS)
{
    void START_VAL(ref UInt8* x)      { x--; }
    ref T INC_VAL(T)(ref T* x)        { return *++x; }
    UInt8* CAST_TO_U8(void* p, int o) { return cast(UInt8*)p + o + TYPE_WIDTH; }
    enum WHILE_DEST_BREAK  = (TYPE_WIDTH - 1);
    enum PRE_LOOP_ADJUST   = -(TYPE_WIDTH - 1);
    enum PRE_SWITCH_ADJUST = 1;
}
else
{
    void START_VAL(UInt8* x)	      { }
    ref T INC_VAL(T)(ref T* x)        { return *x++; }
    UInt8* CAST_TO_U8(void* p, int o) { return cast(UInt8*)p + o; }
    enum WHILE_DEST_BREAK  = 0;
    enum PRE_LOOP_ADJUST   = 0;
    enum PRE_SWITCH_ADJUST = 0;
}







/********************************************************************
 **
 ** void *memcpy(void *dest, const void *src, size_t count)
 **
 ** Args:     dest        - pointer to destination buffer
 **           src         - pointer to source buffer
 **           count       - number of bytes to copy
 **
 ** Return:   A pointer to destination buffer
 **
 ** Purpose:  Copies count bytes from src to dest.
 **           No overlap check is performed.
 **
 *******************************************************************/

void *memcpy(void *dest, const void *src, size_t count)
{
    auto dst8 = cast(UInt8*)dest;
    auto src8 = cast(UInt8*)src;

    UIntN* dstN;
    UIntN* srcN;
    UIntN dstWord;
    UIntN srcWord;

    /********************************************************************
     ** Macros for copying words of  different alignment.
     ** Uses incremening pointers.
     *******************************************************************/

    void CP_INCR() {
	INC_VAL(dstN) = INC_VAL(srcN);
    }

    void CP_INCR_SH(int shl, int shr) {
	version (LITTLE_ENDIAN)
	{
	    dstWord   = srcWord >> shl;
	    srcWord   = INC_VAL(srcN);
	    dstWord  |= srcWord << shr;
	    INC_VAL(dstN) = dstWord;
	}
	else
	{
	    dstWord   = srcWord << shl;
	    srcWord   = INC_VAL(srcN);
	    dstWord  |= srcWord >> shr;
	    INC_VAL(dstN) = dstWord;
	}
    }



    /********************************************************************
     ** Macros for copying words of  different alignment.
     ** Uses array indexes.
     *******************************************************************/

    void CP_INDEX(size_t idx) {
	dstN[idx] = srcN[idx];
    }

    void CP_INDEX_SH(size_t x, int shl, int shr) {
	version (LITTLE_ENDIAN)
	{
	    dstWord   = srcWord >> shl;
	    srcWord   = srcN[x];
	    dstWord  |= srcWord << shr;
	    dstN[x]  = dstWord;
	}
	else
	{
	    dstWord   = srcWord << shl;
	    srcWord   = srcN[x];
	    dstWord  |= srcWord >> shr;
	    dstN[x]  = dstWord;
	}
    }


    /********************************************************************
     ** Macros for copying words of different alignment.
     ** Uses incremening pointers or array indexes depending on
     ** configuration.
     *******************************************************************/

    version (INDEXED_COPY)
    {
	void CP(size_t idx) { CP_INDEX(idx); }
	void CP_SH(size_t idx, int shl, int shr) { CP_INDEX_SH(idx, shl, shr); }

	void INC_INDEX(T)(ref T* p, size_t o) { p += o; }
    }
    else
    {
	void CP(size_t idx) { CP_INCR(); }
	void CP_SH(size_t idx, int shl, int shr) { CP_INCR_SH(shl, shr); }

	void INC_INDEX(T)(T* p, size_t o) { }
    }


    void COPY_REMAINING(size_t count) {
	START_VAL(dst8);
	START_VAL(src8);

	switch (count) {
	case 7: INC_VAL(dst8) = INC_VAL(src8);
	case 6: INC_VAL(dst8) = INC_VAL(src8);
	case 5: INC_VAL(dst8) = INC_VAL(src8);
	case 4: INC_VAL(dst8) = INC_VAL(src8);
	case 3: INC_VAL(dst8) = INC_VAL(src8);
	case 2: INC_VAL(dst8) = INC_VAL(src8);
	case 1: INC_VAL(dst8) = INC_VAL(src8);
	case 0:
	default: break;
	}
    }

    void COPY_NO_SHIFT() {
	dstN = cast(UIntN*)(dst8 + PRE_LOOP_ADJUST);
	srcN = cast(UIntN*)(src8 + PRE_LOOP_ADJUST);
	size_t length = count / TYPE_WIDTH;

	while (length & 7) {
	    CP_INCR();
	    length--;
	}

	length /= 8;

	while (length--) {
	    CP(0);
	    CP(1);
	    CP(2);
	    CP(3);
	    CP(4);
	    CP(5);
	    CP(6);
	    CP(7);

	    INC_INDEX(dstN, 8);
	    INC_INDEX(srcN, 8);
	}

	src8 = CAST_TO_U8(srcN, 0);
	dst8 = CAST_TO_U8(dstN, 0);

	COPY_REMAINING(count & (TYPE_WIDTH - 1));
    }


    void COPY_SHIFT(int shift) {
	dstN  = cast(UIntN*)(((cast(UIntN)dst8) + PRE_LOOP_ADJUST) &
				 ~(TYPE_WIDTH - 1));
	srcN  = cast(UIntN*)(((cast(UIntN)src8) + PRE_LOOP_ADJUST) &
				 ~(TYPE_WIDTH - 1));
	size_t length  = count / TYPE_WIDTH;
	srcWord = INC_VAL(srcN);

	while (length & 7) {
	    CP_INCR_SH(8 * shift, 8 * (TYPE_WIDTH - shift));
	    length--;
	}

	length /= 8;

	while (length--) {
	    CP_SH(0, 8 * shift, 8 * (TYPE_WIDTH - shift));
	    CP_SH(1, 8 * shift, 8 * (TYPE_WIDTH - shift));
	    CP_SH(2, 8 * shift, 8 * (TYPE_WIDTH - shift));
	    CP_SH(3, 8 * shift, 8 * (TYPE_WIDTH - shift));
	    CP_SH(4, 8 * shift, 8 * (TYPE_WIDTH - shift));
	    CP_SH(5, 8 * shift, 8 * (TYPE_WIDTH - shift));
	    CP_SH(6, 8 * shift, 8 * (TYPE_WIDTH - shift));
	    CP_SH(7, 8 * shift, 8 * (TYPE_WIDTH - shift));

	    INC_INDEX(dstN, 8);
	    INC_INDEX(srcN, 8);
	}

	src8 = CAST_TO_U8(srcN, (shift - TYPE_WIDTH));
	dst8 = CAST_TO_U8(dstN, 0);

	COPY_REMAINING(count & (TYPE_WIDTH - 1));
    }


    if (count < 8) {
        COPY_REMAINING(count);
        return dest;
    }

    START_VAL(dst8);
    START_VAL(src8);

    while ((cast(UIntN)dst8 & (TYPE_WIDTH - 1)) != WHILE_DEST_BREAK) {
        INC_VAL(dst8) = INC_VAL(src8);
        count--;
    }

    final switch (((cast(UIntN)src8) + PRE_SWITCH_ADJUST) & (TYPE_WIDTH - 1)) {
    case 0: COPY_NO_SHIFT(); break;
    case 1: COPY_SHIFT(1);   break;
    case 2: COPY_SHIFT(2);   break;
    case 3: COPY_SHIFT(3);   break;
    static if (TYPE_WIDTH >= 4)
    {
	case 4: COPY_SHIFT(4);   break;
	case 5: COPY_SHIFT(5);   break;
	case 6: COPY_SHIFT(6);   break;
	case 7: COPY_SHIFT(7);   break;
    }
    }

    return dest;
}
December 29, 2011
Re: System programming in D (Was: The God Language)
On 12/29/11 1:47 PM, Walter Bright wrote:
> On 12/29/2011 3:19 AM, Vladimir Panteleev wrote:
>> I'd like to invite you to translate Daniel Vik's C memcpy
>> implementation to D:
>> http://www.danielvik.com/2010/02/fast-memcpy-in-c.html
>
> Challenge accepted.
[snip]

Benchmarks?

Andrei
1 2 3 4 5 6 7
Top | Discussion index | About this forum | D home