June 05, 2015 Re: string to char array? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kyoji Klyden | Well, reading assembler is good enough: void f(int[] a) { a[0]=0; a[1]=1; a[2]=2; } Here pointer is passed in rsi register and length - in rdi: void f(int[]): push rax test rdi, rdi je .LBB0_4 mov dword ptr [rsi], 0 cmp rdi, 1 jbe .LBB0_5 mov dword ptr [rsi + 4], 1 cmp rdi, 2 jbe .LBB0_6 mov dword ptr [rsi + 8], 2 pop rax ret .LBB0_4: mov edi, 55 mov esi, .L.str mov edx, 5 call _d_arraybounds .LBB0_5: mov edi, 55 mov esi, .L.str mov edx, 6 call _d_arraybounds .LBB0_6: mov edi, 55 mov esi, .L.str mov edx, 7 call _d_arraybounds You play with assembler generated for D code at http://ldc.acomirei.ru/ |
June 05, 2015 Re: string to char array? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kyoji Klyden | On Friday, 5 June 2015 at 17:27:18 UTC, Kyoji Klyden wrote: > On Thursday, 4 June 2015 at 22:28:50 UTC, anonymous wrote: [...] >> By the way, there are subtly different meanings of "array" and "string" which I hope you're aware of, but just to be sure: >> "array" can refer to D array types, i.e. a pointer-length pair, e.g. char[]. Or it can refer to the general concept of a contiguous sequence of elements in memory. >> And as a special case, "string" can refer to D's `string` type, which is an alias for `immutable(char)[]`. Or it can refer to a contiguous sequence of characters in memory. >> And when ketmar writes: "it's a pointer to array of pointers to first chars of strings", then "array" and "string" are meant in the generic way, not in the D-specific way. > [...] > > So how does D store arrays in memory then? > > I know you already explained this part, but.. > Does the slice's pointer point to the slice's position in memory? Then if an array isn't sequential, is it atleast a sequence of pointers to the slice structs (& those are just in whatever spot in memory they could get?) > There's a slice for each array index..right? Or is it only for the first element? Oh boy, I think I might have put more confusion on top than taking away from it. I didn't mean to say that D's arrays are not sequential. They are. Or more specifically, the elements are. "Array" is often meant to mean just that: a contiguous sequence of elements in memory. This is the same in C and in D. If you have a pointer to such a sequence, and you know the number of elements (or what element is last), then you can access them all. In C, the pointer and the length are usually passed separately. But D groups them together and calls it a "dynamic array" or "slice": `ElementType[]`. And they're often simply called "arrays". This is done to confuse newbies, of course. There's an article on slices (dynamic arrays): http://dlang.org/d-array-article.html C doesn't know about D's slice structure, of course. So when talking to C, it's often necessary to shuffle things around. For example, where a D function has a parameter `char[] arr`, a C version may have two parameters: `size_t length, char* pointer_to_first`. And if you want to call that C version with a D slice, you'd do `cfun(dslice.length, dslice.ptr)`. It gets interesting with arrays of arrays. In D that would be `char[][] arr`. And in C it could be `size_t length, char** pointers_to_firsts, size_t* lengths`. Now what to do? A `char[][]` refers to a sequence of D slices, but the C function expects two sequences: one of pointers, and one of lengths. The memory layout is incompatible. You'd have to split the `char[][]` up into two arrays: a `char*[] ptrs` and a `size_t[] lengths`, and then call the function as `cfun(ptrs.length, ptrs.ptr, lengths.ptr)` (Hope I'm not makings things worse with this.) |
June 05, 2015 Re: string to char array? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kagamin | On Friday, 5 June 2015 at 18:30:53 UTC, Kagamin wrote:
> Well, reading assembler is good enough:
>
> void f(int[] a)
> {
> a[0]=0;
> a[1]=1;
> a[2]=2;
> }
>
> Here pointer is passed in rsi register and length - in rdi:
>
> void f(int[]):
> push rax
> test rdi, rdi
> je .LBB0_4
> mov dword ptr [rsi], 0
> cmp rdi, 1
> jbe .LBB0_5
> mov dword ptr [rsi + 4], 1
> cmp rdi, 2
> jbe .LBB0_6
> mov dword ptr [rsi + 8], 2
> pop rax
> ret
> .LBB0_4:
> mov edi, 55
> mov esi, .L.str
> mov edx, 5
> call _d_arraybounds
> .LBB0_5:
> mov edi, 55
> mov esi, .L.str
> mov edx, 6
> call _d_arraybounds
> .LBB0_6:
> mov edi, 55
> mov esi, .L.str
> mov edx, 7
> call _d_arraybounds
>
> You play with assembler generated for D code at http://ldc.acomirei.ru/
Never said I was good at asm but I'll give it a shot...
So push rax to the top of the memory stack, test if rdi == rdi since yes jump to.LBB0_4, in LBB0_4 move the value 55 into edi, then move .L.str (whatever that is) into esi, then 5 into edx, then call _d_arraybounds (something from Druntime maybe?) then LBB0_4 has nothing left so go back, move the value 0 into a 32-bit pointer(to rsi register), if rdi == 1 jump to LBB0_5 (pretty much the same as LBB0_4), then move 1 into the pointer (which points to rsi[+ 4 bytes cuz it's an int]), so on and so forth until we pop rax from the memory stack and return.
How did I do? :P (hopefully at least B grade)
I'm not really sure what .L.str or _d_arraybounds is, but I'm guessing it's the D runtime?
Also in the mov parts, is that moving 1 into the pointer or into the rsi register? And is rsi + 4, still in rsi, or does it move to a different register?
|
June 05, 2015 Re: string to char array? | ||||
---|---|---|---|---|
| ||||
Posted in reply to anonymous | On Friday, 5 June 2015 at 19:18:39 UTC, anonymous wrote:
> On Friday, 5 June 2015 at 17:27:18 UTC, Kyoji Klyden wrote:
>> On Thursday, 4 June 2015 at 22:28:50 UTC, anonymous wrote:
> [...]
>>> By the way, there are subtly different meanings of "array" and "string" which I hope you're aware of, but just to be sure:
>>> "array" can refer to D array types, i.e. a pointer-length pair, e.g. char[]. Or it can refer to the general concept of a contiguous sequence of elements in memory.
>>> And as a special case, "string" can refer to D's `string` type, which is an alias for `immutable(char)[]`. Or it can refer to a contiguous sequence of characters in memory.
>>> And when ketmar writes: "it's a pointer to array of pointers to first chars of strings", then "array" and "string" are meant in the generic way, not in the D-specific way.
>>
> [...]
>>
>> So how does D store arrays in memory then?
>>
>> I know you already explained this part, but..
>> Does the slice's pointer point to the slice's position in memory? Then if an array isn't sequential, is it atleast a sequence of pointers to the slice structs (& those are just in whatever spot in memory they could get?)
>> There's a slice for each array index..right? Or is it only for the first element?
>
> Oh boy, I think I might have put more confusion on top than taking away from it.
>
> I didn't mean to say that D's arrays are not sequential. They are. Or more specifically, the elements are. "Array" is often meant to mean just that: a contiguous sequence of elements in memory. This is the same in C and in D.
>
> If you have a pointer to such a sequence, and you know the number of elements (or what element is last), then you can access them all.
>
> In C, the pointer and the length are usually passed separately.
>
> But D groups them together and calls it a "dynamic array" or "slice": `ElementType[]`. And they're often simply called "arrays". This is done to confuse newbies, of course.
>
> There's an article on slices (dynamic arrays):
> http://dlang.org/d-array-article.html
>
> C doesn't know about D's slice structure, of course. So when talking to C, it's often necessary to shuffle things around.
>
> For example, where a D function has a parameter `char[] arr`, a C version may have two parameters: `size_t length, char* pointer_to_first`. And if you want to call that C version with a D slice, you'd do `cfun(dslice.length, dslice.ptr)`.
>
> It gets interesting with arrays of arrays. In D that would be `char[][] arr`. And in C it could be `size_t length, char** pointers_to_firsts, size_t* lengths`.
>
> Now what to do? A `char[][]` refers to a sequence of D slices, but the C function expects two sequences: one of pointers, and one of lengths. The memory layout is incompatible. You'd have to split the `char[][]` up into two arrays: a `char*[] ptrs` and a `size_t[] lengths`, and then call the function as `cfun(ptrs.length, ptrs.ptr, lengths.ptr)`
>
> (Hope I'm not makings things worse with this.)
Okay, so it's primarily an interfacing with C problem that started all this? (My brain is just completely scrambled at this point xP )
So pretty much the slice gives you the pointer to the start of the array in memory and also how many elements are in the array. Then depending on the array type it'll jump that many bytes for each element. (So 5 indexes in an int array, would start at address 0xblahblah00 , then go to 0xblahblah04, until it reaches 0xblahblah16(?) or something like that)
If I FINALLY have it right, then that makes alot of sense actually.
|
June 05, 2015 Re: string to char array? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kyoji Klyden | On Friday, 5 June 2015 at 19:30:58 UTC, Kyoji Klyden wrote: > Okay, so it's primarily an interfacing with C problem that started all this? (My brain is just completely scrambled at this point xP ) Yeah, you wanted to call glShaderSource, which is a C function and as such it's not aware of D slices. So things get more complicated than they would be in D alone. > So pretty much the slice gives you the pointer to the start of the array in memory and also how many elements are in the array. Yes. > Then depending on the array type it'll jump that many bytes for each element. (So 5 indexes in an int array, would start at address 0xblahblah00 , then go to 0xblahblah04, until it reaches 0xblahblah16(?) or something like that) Yes. > If I FINALLY have it right, then that makes alot of sense actually. Sweet. |
June 05, 2015 Re: string to char array? | ||||
---|---|---|---|---|
| ||||
Posted in reply to anonymous | On Friday, 5 June 2015 at 19:41:03 UTC, anonymous wrote:
> On Friday, 5 June 2015 at 19:30:58 UTC, Kyoji Klyden wrote:
>> Okay, so it's primarily an interfacing with C problem that started all this? (My brain is just completely scrambled at this point xP )
>
> Yeah, you wanted to call glShaderSource, which is a C function and as such it's not aware of D slices. So things get more complicated than they would be in D alone.
>
>> So pretty much the slice gives you the pointer to the start of the array in memory and also how many elements are in the array.
>
> Yes.
>
>> Then depending on the array type it'll jump that many bytes for each element. (So 5 indexes in an int array, would start at address 0xblahblah00 , then go to 0xblahblah04, until it reaches 0xblahblah16(?) or something like that)
>
> Yes.
>
>> If I FINALLY have it right, then that makes alot of sense actually.
>
> Sweet.
Awesome thankyou very much for all your help!(and ofcourse everyone else who posted, too!)
|
June 06, 2015 Re: string to char array? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kyoji Klyden | On Friday, 5 June 2015 at 19:19:23 UTC, Kyoji Klyden wrote: > On Friday, 5 June 2015 at 18:30:53 UTC, Kagamin wrote: >> Well, reading assembler is good enough: >> >> void f(int[] a) >> { >> a[0]=0; >> a[1]=1; >> a[2]=2; >> } >> >> Here pointer is passed in rsi register and length - in rdi: >> >> void f(int[]): >> push rax >> test rdi, rdi >> je .LBB0_4 >> mov dword ptr [rsi], 0 >> cmp rdi, 1 >> jbe .LBB0_5 >> mov dword ptr [rsi + 4], 1 >> cmp rdi, 2 >> jbe .LBB0_6 >> mov dword ptr [rsi + 8], 2 >> pop rax >> ret >> .LBB0_4: >> mov edi, 55 >> mov esi, .L.str >> mov edx, 5 >> call _d_arraybounds >> .LBB0_5: >> mov edi, 55 >> mov esi, .L.str >> mov edx, 6 >> call _d_arraybounds >> .LBB0_6: >> mov edi, 55 >> mov esi, .L.str >> mov edx, 7 >> call _d_arraybounds >> >> You play with assembler generated for D code at http://ldc.acomirei.ru/ > > Never said I was good at asm but I'll give it a shot... > > So push rax to the top of the memory stack, test if rdi == rdi since yes jump to.LBB0_4, in LBB0_4 move the value 55 into edi, then move .L.str (whatever that is) into esi, then 5 into edx, then call _d_arraybounds (something from Druntime maybe?) then LBB0_4 has nothing left so go back, move the value 0 into a 32-bit pointer(to rsi register), if rdi == 1 jump to LBB0_5 (pretty much the same as LBB0_4), then move 1 into the pointer (which points to rsi[+ 4 bytes cuz it's an int]), so on and so forth until we pop rax from the memory stack and return. > > How did I do? :P (hopefully at least B grade) Almost correct :-) The part of "has nothing left, so go back" is wrong. The call to _d_arraybounds doesn't return, because it throws an Error. > > I'm not really sure what .L.str or _d_arraybounds is, but I'm guessing it's the D runtime? Yes, inside the `f` function, the compiler cannot know the length of the array during compilation. To keep you from accidentally accessing invalid memory (e.g. if the array has only two elements, but you're trying to access the third), it automatically inserts a check, and calls that runtime helper function to throw an Error if the check fails. .L.str is most likely the address of the error message or filename, and 55 is its length. The 5/6/7 values are the respective line numbers. You can disable this behaviour by compiling with `dmd -boundscheck=off`. > > Also in the mov parts, is that moving 1 into the pointer or into the rsi register? And is rsi + 4, still in rsi, or does it move to a different register? It stores the `1` into the memory pointed to by `rsi`, or `rsi+4` etc. This is what the brackets [...] mean. Because it's an array of ints, and ints are 4 bytes in size, [rsi] is the first element, [rsi+4] the second, and [rsi+8] the third. `rsi+4` is just a temporary value that is only used during the store, it's not saved into a (named) register. This is a peculiarity of the x86 processors; they allow quite complex address calculations for memory accesses. |
June 06, 2015 Re: string to char array? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Marc Schütz | On Saturday, 6 June 2015 at 10:12:54 UTC, Marc Schütz wrote: >> ... > > Almost correct :-) The part of "has nothing left, so go back" is wrong. The call to _d_arraybounds doesn't return, because it throws an Error. > >> ... > > Yes, inside the `f` function, the compiler cannot know the length of the array during compilation. To keep you from accidentally accessing invalid memory (e.g. if the array has only two elements, but you're trying to access the third), it automatically inserts a check, and calls that runtime helper function to throw an Error if the check fails. .L.str is most likely the address of the error message or filename, and 55 is its length. The 5/6/7 values are the respective line numbers. You can disable this behaviour by compiling with `dmd -boundscheck=off`. > Thanks for the reply! so I'm a tad unsure of what exactly is happening in this asm, mainly because I'm only roughly familiar with x86 instruction set. _d_arraybounds throws an error because it can't access the runtime? or because as you said the compiler can't know the length of the array? for .L.str, 55 is the length of the address..? >> Also in the mov parts, is that moving 1 into the pointer or into the rsi register? And is rsi + 4, still in rsi, or does it move to a different register? > > It stores the `1` into the memory pointed to by `rsi`, or `rsi+4` etc. This is what the brackets [...] mean. Because it's an array of ints, and ints are 4 bytes in size, [rsi] is the first element, [rsi+4] the second, and [rsi+8] the third. `rsi+4` is just a temporary value that is only used during the store, it's not saved into a (named) register. This is a peculiarity of the x86 processors; they allow quite complex address calculations for memory accesses. Does the address just get calculated whenever the program using this asm, then? :o |
June 06, 2015 Re: string to char array? | ||||
---|---|---|---|---|
| ||||
Posted in reply to Kyoji Klyden | On Saturday, 6 June 2015 at 17:31:15 UTC, Kyoji Klyden wrote: > On Saturday, 6 June 2015 at 10:12:54 UTC, Marc Schütz wrote: >>> ... >> >> Almost correct :-) The part of "has nothing left, so go back" is wrong. The call to _d_arraybounds doesn't return, because it throws an Error. >> >>> ... >> >> Yes, inside the `f` function, the compiler cannot know the length of the array during compilation. To keep you from accidentally accessing invalid memory (e.g. if the array has only two elements, but you're trying to access the third), it automatically inserts a check, and calls that runtime helper function to throw an Error if the check fails. .L.str is most likely the address of the error message or filename, and 55 is its length. The 5/6/7 values are the respective line numbers. You can disable this behaviour by compiling with `dmd -boundscheck=off`. >> > > Thanks for the reply! > > so I'm a tad unsure of what exactly is happening in this asm, mainly because I'm only roughly familiar with x86 instruction set. > > _d_arraybounds throws an error because it can't access the runtime? or because as you said the compiler can't know the length of the array? _d_arraybounds() always throws an error because that's its purpose. It's implemented here: https://github.com/D-Programming-Language/druntime/blob/master/src/core/exception.d#L640 My point was that _d_arraybounds never returns, instead it throws that Error object. The compiler inserts the checks for the array length whenever you access an array element, _except_ if it can either prove that the array is always long enough (e.g. if its a fixed-size array), in which case it can leave the check out because it's unnecessary, or if it can prove that the array is never long enough, in which case it may already print an error during compilation. > > for .L.str, 55 is the length of the address..? No, the length of the string. It's roughly the equivalent of this pseudo-code: extern void _d_arraybounds(void* filename_ptr, size_t filename_len, size_t line); void f(void* a_ptr, size_t a_length) { if(a_length == 0) goto LBB0_4; *cast(int*) a_ptr = 0; // line 5 if(a_length <= 1) goto LBB0_5; *cast(int*) (a_ptr+4) = 1; // line 6 if(a_length <= 2) goto LBB0_6; *cast(int*) (a_ptr+8) = 1; // line 7 return; LBB0_4: // (pretend this filename is 55 chars long) static string __FILE__ = "/path/to/your/source/file.d"; _d_arraybounds(__FILE__.ptr, __FILE__.length, 5 /* line number */); LBB0_5: _d_arraybounds(__FILE__.ptr, __FILE__.length, 6 /* line number */); LBB0_6: _d_arraybounds(__FILE__.ptr, __FILE__.length, 7 /* line number */); } > > >>> Also in the mov parts, is that moving 1 into the pointer or into the rsi register? And is rsi + 4, still in rsi, or does it move to a different register? >> >> It stores the `1` into the memory pointed to by `rsi`, or `rsi+4` etc. This is what the brackets [...] mean. Because it's an array of ints, and ints are 4 bytes in size, [rsi] is the first element, [rsi+4] the second, and [rsi+8] the third. `rsi+4` is just a temporary value that is only used during the store, it's not saved into a (named) register. This is a peculiarity of the x86 processors; they allow quite complex address calculations for memory accesses. > > Does the address just get calculated whenever the program using this asm, then? :o Yes, but it is extremely fast. I'm pretty sure accessing memory at [RSI] and [RSI+4] both take exactly the same time (but can't find a reference now). |
June 06, 2015 Re: string to char array? | ||||
---|---|---|---|---|
| ||||
Posted in reply to anonymous | On Friday, 5 June 2015 at 19:18:39 UTC, anonymous wrote:
> If you have a pointer to such a sequence, and you know the number of elements (or what element is last), then you can access them all.
I never really worked with C or C++, but I'm sure you also need to know element size.
|
Copyright © 1999-2021 by the D Language Foundation