Thread overview
Help needed on inline assembly
Jan 30, 2008
Hendrik Renken
Jan 30, 2008
downs
Jan 30, 2008
Hendrik Renken
Jan 30, 2008
downs
Jan 30, 2008
Hendrik Renken
Jan 31, 2008
Hendrik Renken
Jan 31, 2008
Don Clugston
January 30, 2008
Hi,

i'd like to work with the SSE-commands in assembly. I wrote some
testroutines (with my limited knowledge). Some of them work, others
dont. I'd like to know, why's that so. Can someone of you guys help me out?


First thing:
#void main()
#{
#	float[4] array = [ 1f, 2f, 3f, 4f ];
#	float* a = &array[0];
#//	float t;
#
#	asm
#	{
#		mov EBX, a;
#		movaps XMM1, [EBX];
#	}
#}

doesnt work. uncomment the line
//	float t;

and it works. Why? Does the assembler code need to be aligned to something? When yes, how can i do this without the need of allocating another float on the stack?


Second thing: I dont know how to address a public const variable, with my limited knowledge i would do something like this:

#float[4] array = [ 1f, 2f, 3f, 4f ];
#
#void main()
#{
#	float* a = &array[0];
#
#	asm
#	{
#		mov EAX, [a];
#		movaps XMM1, [EAX];
#	}
#}


But i get a secfault. Why? I would interpret the code like this: a holds the address to the first arrayelement. moc EAX, [a]; copies the address to the first arrayelement to EAX. Which is then used to access the array.

Im using DMD 1.024 (since later versions broke derelict on my platform)
January 30, 2008
> 
> #float[4] array = [ 1f, 2f, 3f, 4f ];
> #
> #void main()
> #{
> #	float* a = &array[0];
> #
> #	asm
> #	{
> #		mov EAX, [a];
> #		movaps XMM1, [EAX];
> #	}
> #}
> 
> 
> But i get a secfault. Why? I would interpret the code like this: a holds the address to the first arrayelement. moc EAX, [a]; copies the address to the first arrayelement to EAX.


Nope :)
Remember, [] dereferences. mov EAX, [a] copies the value, i.e. "a dereferenced", to EAX.
So EAX now contains the first value in array, 1f.
Trying to dereference that floating point number leads understandably to a segfault.

 --downs
January 30, 2008
downs schrieb:
>> #float[4] array = [ 1f, 2f, 3f, 4f ];
>> #
>> #void main()
>> #{
>> #	float* a = &array[0];
>> #
>> #	asm
>> #	{
>> #		mov EAX, [a];
>> #		movaps XMM1, [EAX];
>> #	}
>> #}
>>
>>
>> But i get a secfault. Why? I would interpret the code like this: a holds the address to the first arrayelement. moc EAX, [a]; copies the address to the first arrayelement to EAX.
> 
> 
> Nope :)
> Remember, [] dereferences. mov EAX, [a] copies the value, i.e. "a dereferenced", to EAX.
> So EAX now contains the first value in array, 1f.
> Trying to dereference that floating point number leads understandably to a segfault.

ok. yeah. didnt posted the right example (i have a bunch of asm-test-files). But this doesnt work either:

float[4] array = [ 1f, 2f, 3f, 4f ];

void main()
{
	float* a = &array[0];

	asm
	{
		mov EAX, a;
		movaps XMM1, [EAX];
	}
}
January 30, 2008
The problem is that parameters to movaps need to be aligned on a 16-byte boundary. That's what the 'a' in movaps means, "aligned".

So you can either use movups (unaligned), which is slower, or explicitly allocate your memory to lie on a 16-byte boundary.


Example:

> > import std.gc, std.stdio;
> > void* malloc_align16(size_t count) {
> >   void* res = malloc(count+15).ptr;
> >   return cast(void*) ((cast(size_t)(res + 15))&(0xFFFFFFFF - 15));
> > }
> >
> > float[4] array = [ 1f, 2f, 3f, 4f ];
> >
> > void main()
> > {
> > 	auto _array = (cast(float*)malloc_align16(4*float.sizeof))[0 .. 4];
> >         _array[] = array;
> >         auto a = &_array[0];
> > 	asm
> > 	{
> > 		mov EAX, a;
> > 		movaps XMM1, [EAX];
> > 	}
> >         writefln("Done");
> > }

Hope it helps.
 --downs
January 30, 2008
"Hendrik Renken" <funsheep@-[no-spam]-gmx.net> wrote in message news:fnprum$6oh$1@digitalmars.com...
>
> float[4] array = [ 1f, 2f, 3f, 4f ];
>
> void main()
> {
> float* a = &array[0];
>
> asm
> {
> mov EAX, a;
> movaps XMM1, [EAX];
> }
> }

If you're using the newest DMD, this should work, it does for me.  If you're using anything older than 1.023 (like, hm, 1.015?  GRRGH), this will probably fail.  1.023 made anything in the static data segment >= 16 bytes paragraph aligned, so that data is already aligned properly.

I don't know what GDC does in this case.

Another way to get an aligned allocation is to use a struct with the float[4] in it.

struct vec
{
    float[4] array;
}

void main()
{
    vec* v = new vec;

    // ptr will get you the pointer to the 0th element too
    float* a = v.array.ptr;

    asm
    {
        mov EAX, a;
        movaps XMM1, [EAX];
    }
}

This also doesn't rely on any standard library stuff.


January 30, 2008
"Jarrett Billingsley" <kb3ctd2@yahoo.com> wrote in message news:fnq21n$mvk$1@digitalmars.com...

> Another way to get an aligned allocation is to use a struct with the float[4] in it.
>
> struct vec
> {
>    float[4] array;
> }
>
> void main()
> {
>    vec* v = new vec;
>
>    // ptr will get you the pointer to the 0th element too
>    float* a = v.array.ptr;
>
>    asm
>    {
>        mov EAX, a;
>        movaps XMM1, [EAX];
>    }
> }
>
> This also doesn't rely on any standard library stuff.

A third way is to wrap the second way in a function, allowing you to allocate statically-sized arrays directly:

T* alloc(T)()
{
    struct S
    {
        T t;
    }

    return &(new S).t;
}

void main()
{
    float[4]* array = alloc!(float[4]);
    float* a = array.ptr;

    asm
    {
        mov EAX, a;
        movaps XMM1, [EAX];
    }
}


January 30, 2008
Jarrett Billingsley schrieb:
> "Hendrik Renken" <funsheep@-[no-spam]-gmx.net> wrote in message news:fnprum$6oh$1@digitalmars.com...
>> float[4] array = [ 1f, 2f, 3f, 4f ];
>>
>> void main()
>> {
>> float* a = &array[0];
>>
>> asm
>> {
>> mov EAX, a;
>> movaps XMM1, [EAX];
>> }
>> }
> 
> If you're using the newest DMD, this should work, it does for me.

Now i've updated to 1.026. And the above example doesnt work (still not
aligned) movups works.



> If you're using anything older than 1.023 (like, hm, 1.015?  GRRGH), this will probably fail.

Yeah. I used 1.015 before...


> 1.023 made anything in the static data segment >= 16 bytes paragraph aligned, so that data is already aligned properly.

Doesnt seem to work for me (using DMD 1.026 on linux). Or the aligment
is in 1.026 broken again...



> I don't know what GDC does in this case.
> 
> Another way to get an aligned allocation is to use a struct with the float[4] in it.
> 
> struct vec
> {
>     float[4] array;
> }
> 
> void main()
> {
>     vec* v = new vec;
> 
>     // ptr will get you the pointer to the 0th element too
>     float* a = v.array.ptr;
> 
>     asm
>     {
>         mov EAX, a;
>         movaps XMM1, [EAX];
>     }
> }
> 
> This also doesn't rely on any standard library stuff.
> 
> 
January 31, 2008
"Hendrik Renken" <funsheep@-[no-spam]-gmx.net> wrote in message news:fnqet1$1on0$1@digitalmars.com...

> Doesnt seem to work for me (using DMD 1.026 on linux). Or the aligment
> is in 1.026 broken again...

Ah, it's probably because of Linux.  That code works on Windows.  I forgot that DMD uses ELF on Linux like GDC.  DMD maybe can't control the alignment of the data there.

Either that, or it's a genuine bug.  :\


January 31, 2008
Jarrett Billingsley wrote:
> "Hendrik Renken" <funsheep@-[no-spam]-gmx.net> wrote in message news:fnqet1$1on0$1@digitalmars.com...
> 
>> Doesnt seem to work for me (using DMD 1.026 on linux). Or the aligment
>> is in 1.026 broken again...
> 
> Ah, it's probably because of Linux.  That code works on Windows.  I forgot that DMD uses ELF on Linux like GDC.  DMD maybe can't control the alignment of the data there.
> 
> Either that, or it's a genuine bug.  :\ 

i did some more testing, it seems that dynamically allocated data is aligned. for that i can use movaps. however statically allocated data is not aligned. but we can allocate 1 to 3 ints/floats/etc before the data, until it is aligned ;)

thanks for the help, got it working - and a speedup factor of 240! yeah. from 480 millisec down to 2 millisec with sse instructions.

that rocks!

regards
Hendrik
January 31, 2008
Hendrik Renken wrote:
> Jarrett Billingsley wrote:
>> "Hendrik Renken" <funsheep@-[no-spam]-gmx.net> wrote in message news:fnqet1$1on0$1@digitalmars.com...
>>
>>> Doesnt seem to work for me (using DMD 1.026 on linux). Or the aligment
>>> is in 1.026 broken again...
>>
>> Ah, it's probably because of Linux.  That code works on Windows.  I forgot that DMD uses ELF on Linux like GDC.  DMD maybe can't control the alignment of the data there.
>>
>> Either that, or it's a genuine bug.  :\ 
> 
> i did some more testing, it seems that dynamically allocated data is aligned. for that i can use movaps. however statically allocated data is not aligned. but we can allocate 1 to 3 ints/floats/etc before the data, until it is aligned ;)

That's what I found on Windows, and persuaded Walter to fix it. I didn't realise it wasn't working on Linux yet.
I hope that eventually we'll get stack data properly aligned; if it gets into the D ABI, then we only have to worry about callbacks from C -- ie, only extern() functions would need to align the stack.

> thanks for the help, got it working - and a speedup factor of 240! yeah. from 480 millisec down to 2 millisec with sse instructions.
> 
> that rocks!

Oh yeah!
DMD's floating point code generation is very basic; it does almost no optimisation; and it's excellent support for inline asm makes asm particularly attractive.
But a factor of 240 is pretty extreme.