May 27, 2016
On Friday, 27 May 2016 at 10:00:40 UTC, Era Scarecrow wrote:
> On Friday, 27 May 2016 at 09:51:56 UTC, rikki cattermole wrote:
>> struct Foo {
>>   int x;
>>
>>   void foobar() {
>>     asm {
>>       mov EAX, this;
>>       inc [EAX+Foo.x.offsetof];
>>     }
>>   }
>> }
>>
>> You have to reference the field via a register.
>
>  This is good progress. Using the assembler doesn't have many documentation examples of how to do things

 Hmmm actually this is incorrect...

void main() {
  import std.stdio;

  Foo foo = Foo(-1);
  writeln(foo.x);
  foo.foobar;
  writeln(foo.x);
}

-1
-256

 It's assuming a byte obviously for the size. So this is the correct instruction:

  inc dword ptr [EAX+Foo.x.offsetof];

 However trying it with a long and a qword shows it reverts to a byte again, meaning 64 bit instructions are inaccessible.
May 27, 2016
On Friday, 27 May 2016 at 10:14:31 UTC, Era Scarecrow wrote:
>   inc dword ptr [EAX+Foo.x.offsetof];


 So just tested it, and it didn't hang, meaning all unittests also passed.

 Final solution is:

  asm pure @nogc nothrow {
    mov EAX, this;
    add dword ptr [EAX+wideIntImpl.lo.offsetof], 1;
    adc dword ptr [EAX+wideIntImpl.lo.offsetof+4], 0;
    adc dword ptr [EAX+wideIntImpl.hi.offsetof], 0;
    adc dword ptr [EAX+wideIntImpl.hi.offsetof+4], 0;
  }
May 28, 2016
On Friday, 27 May 2016 at 09:22:49 UTC, Guillaume Piolat wrote:
> You have to write your code three times, one for
>
> version(D_InlineAsm_X86)
> version (D_InlineAsm_X86_64)
> and a version without assembly.

 Rather than make a new thread I wonder if struct inheritance wouldn't solve this, as trying to manage specific versions, lack of versions, checks for CTFE all became a headache. and bloated a 4 line function (2 of which were the opening/declaration) to something like 20 lines and looks like a huge mess.

 So...

 Let's assume structs as they are don't (otherwise) change.
 Let's assume structs can be inherited.
 Let's assume inherited structs change _behavior_ only (overridden functions as final), but don't add/expand any new data (non-polymorphic, no vtables).

 Then I could do something like this!

  //contains plain portable version
  struct base {}

  version(X86) {
    struct inherited : base {
      //only adds or replaces functions, no data changes
      //all asm injection is known to be 32bit x86
    }
  }
  version(X86_64) {
    ...
  }

 Truthfully going with my example, only a couple functions would be considered, namely multiply and divide as they would be the slowest ones, while everything else has very little to improve on, at least based on how wideint.d was implemented.
May 28, 2016
On Saturday, 28 May 2016 at 08:10:50 UTC, Era Scarecrow wrote:
> On Friday, 27 May 2016 at 09:22:49 UTC, Guillaume Piolat wrote:
>> You have to write your code three times, one for
>>
>> version(D_InlineAsm_X86)
>> version (D_InlineAsm_X86_64)
>> and a version without assembly.
>
>  Rather than make a new thread I wonder if struct inheritance wouldn't solve this, as trying to manage specific versions, lack of versions, checks for CTFE all became a headache. and bloated a 4 line function (2 of which were the opening/declaration) to something like 20 lines and looks like a huge mess.
>
>  So...
>
>  Let's assume structs as they are don't (otherwise) change.
>  Let's assume structs can be inherited.
>  Let's assume inherited structs change _behavior_ only (overridden functions as final), but don't add/expand any new data (non-polymorphic, no vtables).
>
>  Then I could do something like this!
>
>   //contains plain portable version
>   struct base {}
>
>   version(X86) {
>     struct inherited : base {
>       //only adds or replaces functions, no data changes
>       //all asm injection is known to be 32bit x86
>     }
>   }
>   version(X86_64) {
>     ...
>   }
>
>  Truthfully going with my example, only a couple functions would be considered, namely multiply and divide as they would be the slowest ones, while everything else has very little to improve on, at least based on how wideint.d was implemented.

The great thing about D's UFCS is that it allows exactly that:

void main()
{
    WideInt myInt;
    myInt.inc(); // looks like a member function
    myInt++; // can be hidden behind operator overloading
}

struct WideInt
{
    ulong[2] data;

    int opUnary(string s)()
    {
        static if (s == "++")
            this.inc();
    }
}

version(D_InlineAsm_X86_64)
{
    void inc(ref WideInt w) { /* 32-bit increment implementation */ }
}
else version(D_InlineAsm_X86)
{
    void inc(ref WideInt w) { /* 64-bit increment implementation */ }
}
else
{
    void inc(ref WideInt w) { /* generic increment implementation */ }
}

Also, you can implement inc() in terms of ulong[2] - void inc(ref ulong[2] w), which makes it applicable for other types, with the same memory representation.
E.g. cent - (cast(ulong[2]*)&cent).inc(), arrays - ulong[] arr; arr[0..2].inc(), and so on.


May 28, 2016
On Saturday, 28 May 2016 at 10:10:19 UTC, ZombineDev wrote:
> The great thing about D's UFCS is that it allows exactly that:
>
> <snip>
>
> Also, you can implement inc() in terms of ulong[2] - void inc(ref ulong[2] w), which makes it applicable for other types, with the same memory representation. E.g. cent - (cast(ulong[2]*)&cent).inc(), arrays - ulong[] arr; arr[0..2].inc(), and so on.

 Hmmm if it wasn't wideint being template I'd agree with you. Then again the way you have it listed the increment would probably call the version that's generated and doesn't require specific template instantiation to work.

 I don't know, personally to me it makes more sense to replace functions rather than export them and add an unknown generated type. If inherited structs worked (as i have them listed) then you could export all the CPU specific code to another file and never have to even know it exists. And if my impressions of code management and portability are accurate, then having OS/Architecture specific details should be separate from what is openly shared.

 Besides I'd like to leave the original source completely untouched if i can while applying updates/changes that don't add any confusion to the existing source code; Plus it's more an opt-in option at that point where you can hopefully have both active at the same time to unittest one against the other. (A is known to be correct, so B's output is tested against A).
May 31, 2016
Am Fri, 27 May 2016 10:06:28 +0000
schrieb Guillaume Piolat <first.last@gmail.com>:

> Referencing EBP or ESP yourself is indeed dangerous. Not sure why the documentation would advise that. Using "this", names of parameters/locals/field offset is much safer.

DMD makes sure that the EBP relative access of parameters and stack variables works by copying everything to the stack that's in registers when you have an asm block in the function. Using var[EBP] or just plain var will then dereference that memory location.

-- 
Marco

May 31, 2016
Am Fri, 27 May 2016 10:16:48 +0000
schrieb Era Scarecrow <rtcvb32@yahoo.com>:

> On Friday, 27 May 2016 at 10:14:31 UTC, Era Scarecrow wrote:
> >   inc dword ptr [EAX+Foo.x.offsetof];
> 
> 
>   So just tested it, and it didn't hang, meaning all unittests
> also passed.
> 
>   Final solution is:
> 
>    asm pure @nogc nothrow {
>      mov EAX, this;
>      add dword ptr [EAX+wideIntImpl.lo.offsetof], 1;
>      adc dword ptr [EAX+wideIntImpl.lo.offsetof+4], 0;
>      adc dword ptr [EAX+wideIntImpl.hi.offsetof], 0;
>      adc dword ptr [EAX+wideIntImpl.hi.offsetof+4], 0;
>    }

The 'this' pointer is usually in some register already. On Linux 32-bit for example it is in EAX, on Linux 64-bit is in RDI. What DMD does when it encounters an asm block is, it stores every parameter (including the implicit this) on the stack and when you do "mov EAX, this;" it loads it back from there using EBP as the base pointer to the stack variables. The boilerplate will look like this on 32-bit Linux:

   push   EBP                     // Save what's currently in EBP
   mov    EBP,ESP                 // Remember current stack pointer as base for variables
   push   EAX                     // Save implicit 'this' parameter on the stack
   mov    EAX,DWORD PTR [EBP-0x4] // Load 'this' into EAX as you requested
   <add and adc code here>
   mov    ESP,EBP     // Restore stack to what it was before saving parameters and variables
   pop    EBP         // Restore EBP register
   ret                // Return from function

Remember that this works only for x86 32-bit in DMD and LDC. GDC passes inline asm right through to an arbitrary external assembler after doing some template replacements. It will not understand any of the asm you feed it, but forward the external assemblers error messages.

On the other hand GDC's and LDC's extended assemblers free you
from manually loading stuff into registers. You just use a
placeholder and tell the compiler to put 'this' into some
register. The compiler will realize it is already in EAX or
RDI and do nothing but use that register instead of EAX in
your code above. Sometimes that has the additional benefit that
the same asm code works on both 32-bit and 64-bit.
Also, extended asm is transparent to the optimizer. The code
can be inlined and already loaded variables reused.

By the way, you are right that 32-bit does not have access to 64-bit machine words (actually kind of obvious), but your idea wasn't far fetched, since there is the X32 architecture at least for Linux. It uses 64-bit machine words, but 32-bit pointers and allows for compact and fast programs.

-- 
Marco

May 31, 2016
On Tuesday, 31 May 2016 at 18:52:16 UTC, Marco Leise wrote:
> The 'this' pointer is usually in some register already. On Linux 32-bit for example it is in EAX, on Linux 64-bit is in RDI.

 The AX register seems like a bad choice, since you require the AX/DX registers when you do multiplication and division (although all other registers are general purpose some instructions are still tied to specific registers). SI/DI are a much better choice.

> By the way, you are right that 32-bit does not have access to 64-bit machine words (actually kind of obvious), but your idea wasn't far fetched, since there is the X32 architecture at least for Linux. It uses 64-bit machine words, but 32-bit pointers and allows for compact and fast programs.

 As i recall the switch to use the larger registers is a simple switch per instruction, something like either 60h, 66h or 67h. I forget which one exactly, as i recall writing assembly programs using 16bit DOS but using 32bit registers using that trick (built into the assembler). Although to use the lower registers by themselves required the same switch, so...
1 2
Next ›   Last »